東北大学研究者紹介

研究者詳細

ホーム

日本語 English

タキザワ　ヒロユキ

滝沢　寛之

Hiroyuki Takizawa

所属

サイバーサイエンスセンター　研究開発部　スーパーコンピューティング研究部

職名

教授

学位

博士(情報科学) （東北大学）

researchmap

https://researchmap.jp/h_takizawa

J-GLOBAL ID

200901079984691878

e-Rad 研究者番号

70323996

経歴 7

2024年4月～継続中

東北大学　総長特別補佐
2019年4月～継続中

東北大学　サイバーサイエンスセンター　副センター長
2017年1月～継続中

東北大学　サイバーサイエンスセンター　教授
2009年1月～ 2016年12月

東北大学大学院情報科学研究科准教授
2004年4月～ 2008年12月

東北大学大学院情報科学研究科講師
2003年3月～ 2004年3月

東北大学情報シナジーセンター助手
1999年10月～ 2003年2月

新潟大学総合情報処理センター助手

︎全件表示 ︎最初の5件までを表示

委員歴 43

HPCIコンソーシアム　理事

2024年7月～継続中
情報処理学会HPC研究会運営委員会　幹事(副主査)

2021年4月～継続中
HPCI連携サービス委員会　委員

2021年3月～継続中
International Workshop on Automatic Performance Tuning Program Committee　Program Committee Member

2009年～継続中
COOL Chips Conference Program Committee　Program Committee Member

2007年～継続中
HPC Asia 2026　General Chair

2025年～ 2026年
HPCI連携サービス運営作業部会　作業部会長

2019年4月～ 2021年3月
情報処理学会HPC研究会運営委員会　運営委員

2015年4月～ 2019年3月
HPC Asia 2019　Program Committee Track co-chair

2018年～ 2019年
ACM/IEEE Supercomputing Conference, Tutorials Committee　Committee member

2017年～ 2019年
Legacy HPC Application Migration (LHAM)　Organizing Committee Member

2013年～ 2018年
Auto-Tuning for Multicore and GPU (ATMG)　Program Committee Member

2012年～ 2018年
情報処理学会システムアーキテクチャ研究会運営委員会　運営委員

2013年4月～ 2017年3月
情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS)プログラム委員会　委員

2008年10月～ 2017年3月
情報処理学会東北支部運営委員会　運営委員

2014年4月～ 2016年3月
情報処理学会Annual Meeting on Advanced Computing System and Infrastructure (ACSI) プログラム委員会　委員

2014年4月～ 2015年3月
情報処理学会論文誌コンピューティングシステム(ACS)編集委員会　ACS編集委員

2011年4月～ 2015年3月
情報処理学会東北支部庶務幹事　庶務幹事

2012年4月～ 2014年3月
情報処理学会先進的計算基盤システムシンポジウム(SACSIS) プログラム委員会　委員

2012年4月～ 2014年3月
情報処理学会東北支部庶務幹事　庶務幹事

2012年4月～ 2014年3月
情報処理学会先進的計算基盤システムシンポジウム(SACSIS) プログラム委員会　委員

2012年4月～ 2014年3月
情報処理学会東北支部広報幹事　広報幹事

2010年4月～ 2012年3月
サイエンティフィック・システム研究会　アクセラレータ技術ワーキンググループ委員

2009年9月～ 2012年3月
情報処理学会HPC研究会運営委員会　運営委員

2007年4月～ 2011年3月
電子情報通信学会コンピュータシステム研究専門委員会　委員

2005年4月～ 2011年3月
International Workshop on Automatic Performance Tuning (iWAPT)　Program chair

2025年～
HPCI連携サービス運営作業部会　委員

2024年10月～
ICPP2021 Program Committee　Member

2021年～
ACM/IEEE Supercomputing Conference 2020 (SC20)　Technical Program Committee Member

2020年11月～
International Workshop on Large-scale HPC Application Modernization (LHAM)　Program Committee Chair

2018年～
HPC Asia 2018　Program Committee Member

2018年～
HPC Asia 2018　Poster Chair

2018年～
情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS)プログラム委員会　プログラム委員長

2016年6月～
International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART)　Program Committee Member

2015年4月～
International Workshop on Software Engineering for Parallel Systems (SEPS)　Program Committee Member

2015年～
International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART)　Program Committee Member

2015年～
2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS)　Program Committee Member

2014年6月～
International Workshop on Hardware-Software Co-Design for High Performance Computing (Co-HPC)　Program Committee Member

2014年～
ACM/IEEE Supercomputing Conference 2013 (SC13)　Technical Program Committee Member

2013年11月～
Auto-Tuning for Multicore and GPU (ATMG)　Organizing Committee Chair

2013年9月～
Legacy HPC Application Migration (LHAM)　Organizing Committee Chair

2013年～
International Workshop on Automatic Performance Tuning　Organizing Committee Chair

2012年～
International Workshop on Peer-to-Peer Networking (P2PNet'10)　Program Committee Member

2010年12月～

︎全件表示 ︎最初の5件までを表示

所属学協会 4

電子情報通信学会
Association for Computing Machinery (ACM)
The Institute of Electrical and Electronics Engineers (IEEE)
情報処理学会

研究キーワード 3

並列分散処理
コンピュータアーキテクチャ
高性能計算

研究分野 5

情報通信 / 高性能計算 /
情報通信 / 知能情報学 /
情報通信 / 情報ネットワーク /
情報通信 / 計算機システム /
情報通信 / ソフトウェア /

受賞 22

Best Student Oral Presentation Award

2025年8月　xSIG2025　大規模言語モデルを活用したレガシーコード近代化の実用性
Outstanding Effort Award

2025年8月　xSIG2025　RISC-Vベクトル拡張における最大ベクトル長とキャッシュ管理手法に関する検討
Outstanding Student Award

2025年8月　xSIG2025　レガシーコードのモダン化におけるLLMの実用性の検討
SCA'25 Best Paper Award

2025年2月　Supercomputing Asia 2025　Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning
Best Undergraduate Student Award

2024年8月　タスク間の依存関係を考慮したワークフローのバッチジョブスケジューリング
IEEE Computer Society Japan Chapter xSIG Young Researcher Award

2024年8月　説明可能AI技術によるプログラムの性能モデルの解析
Best paper award at the 26th Workshop on Advances in Parallel and Distributed Computational Models

2024年5月　Combining lossy compression with multi-level caching for data staging over network
Outstanding Effort Award

2023年7月　ベクトルプロセッサを用いた統計的機械学習に関する研究
Best Workshop Paper Award

2020年11月　International Symposium on Computing and Networking (CANDAR20)　Improving the Accuracy in SpMV Implementation Selection with Machine Learning
IEEE Computer Society Japan Chapter xSIG Young Researcher Award

2020年7月　The 4th cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming　ベイズ最適化による洪水シミュレーションコードの負荷バランス自動調整
HPC IN ASIA POSTER AWARD

2020年6月　ISC High Performance Computing 2020　Challenges in Solving Scheduling Problems with the D-Wave Quantum Annealer
Best Poster Award at COOL Chips 22

2019年4月　IEEE Symposium on Low-Power and High-Speed Chips　An Energy Optimization Method for Hybrid In-Memory Checkpointing
Best Paper Award

2018年12月　The Second International Workshop on Automation in Machine Learning and Big Data (AutoML 2018)
Best Workshop Paper Award at CANDAR'18

2018年11月　International Symposium on Computing and Networking (CANDAR)
Best Workshop Paper Award at CANDAR'15

2015年12月10日　International Symposium on Computing and Networking (CANDAR)
Best Poster Award at COOL Chips XV

2012年4月　IEEE Symposium on Low-Power and High-Speed Chips
Best Poster Award of HiPEAC '12

2012年1月
The Poster Award

2011年1月　次世代スーパーコンピューティング・シンポジウム2008
平成21年石井實記念財団研究奨励賞

2009年10月30日　石田記念財団
野口研究奨励賞

2008年5月28日　情報処理学会東北支部
船井情報科学奨励賞

2006年4月22日　船井情報科学振興財団
ISPA'04 Best Paper Award

2004年12月14日　ISPA2004

︎全件表示 ︎最初の5件までを表示

論文 316

Workflow Batch Job Scheduling with Considering Task Dependencies 査読有り

Kaito Yanai, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

Lecture Notes in Computer Science　123-144　2026年1月2日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-032-10507-3_7 　

ISSN：0302-9743

eISSN：1611-3349
CityScaleCast: Spatiotemporal GNN for City-Scale Weather Prediction with GraphCast-Guided Parallel Modeling and Multi-Step Forecasting in Sendai 査読有り

Xuanwen Pan, Yoichi Shimomura, Sichen Tao, Masatoshi Kawai, Keichi Takahashi, Hiroyuki Takizawa

Workshop on Multi-scale, Multi-physics, Coupled Problems and AI enhanced simulations on HPC (MMCP'26)　2026年1月
Explainable AI-Guided Genetic Algorithms for Efficient Software Automatic Tuning 査読有り

Toshinobu Katayama, Masatoshi Kawai, Yoichi Shimomura, Keichi Takahashi, Hiroyuki Takizawa

Workshop on Multi-scale, Multi-physics, Coupled Problems and AI enhanced simulations on HPC (MMCP'26)　2026年1月
Semantic Equivalence Verification of HPC Codes Using LLMs 査読有り

Yuta Tanizawa, Masatoshi Kawai, Keichi Takahashi, Hiroyuki Takizawa

International Workshop on Foundational Large Language Models Advances for HPC in Asia　2026年1月
Co-Design of a Power State-Aware Scheduler and an Intelligent Power Manager for Energy-Efficient HPC Systems 査読有り

Raka Satya Prasasta, Santana Yuda Pradata, Kadek Gemilang Santiyuda, Muhammad Alfian Amrizal, Reza Pulungan, Hiroyuki Takizawa

Energy Efficient HPC State of the Practice Workshop 2026　2026年1月
Deep Learning-Integrated Pairwise-Qubit Subsystems for Highly Efficient Quantum Circuit Simulation 査読有り

Santana Yuda Pradata, Muhammad Alfian Amrizal, Wiwit Suryanto, Ahmad Ridwan, Tresna Nugraha, Hiroyuki Takizawa

Supercomputing Asia/HPC Asia 2026　2026年1月
TRIOS: Reducing File-System Contention through Predictive Time-Resolved I/O Simulation in Job Scheduling 査読有り

YuTsen Tseng, Masatoshi Kawai, Keichi Takahashi, Hiroyuki Takizawa

Supercomputing Asia/HPC Asia 2026　2026年1月
Climate Change Effects on Probable Maximum Precipitation (PMP) of Mesoscale Convective Systems: Model-based Estimation and Large Ensemble-based Frequency Analysis 査読有り

Yusuke Hiraga, Satoshi Watanabe, Takeshi Yamashita, Hiroyuki Takizawa

Journal of Hydrology　661　133724-133724　2025年11月
出版者・発行元： Elsevier BV
DOI： 10.1016/j.jhydrol.2025.133724 　

ISSN：0022-1694
Developing an End-to-End 3D X-Ray Ptychography Workflow Using Surrogate Models 査読有り

Ryota Koda, Keichi Takahashi, Hiroyuki Takizawa, Nozomu Ishiguro, Yukio Takahashi

Concurrency and Computation: Practice and Experience　37　(25-26)　2025年10月2日
出版者・発行元： Wiley
DOI： 10.1002/cpe.70308 　

ISSN：1532-0626

eISSN：1532-0634

詳細を見る詳細を閉じる

ABSTRACT Recently, X‐ray ptychography has attracted significant attention as a non‐destructive imaging technique with high spatial resolution. However, its application to real‐time imaging is limited by the long execution time required for iterative phase retrieval, which reconstructs sample images from diffraction patterns. To address this issue, deep learning‐based surrogate models have been proposed to accelerate iterative phase retrieval by directly predicting sample images. While these surrogate models achieve significant speed‐ups, they typically ignore the time needed for model training and dataset preparation, which can diminish their benefits. Consequently, conventional iterative phase retrieval may outperform surrogate‐based approaches in end‐to‐end performance. This study aims to implement real‐time X‐ray ptychography using surrogate models that explicitly incorporate model training and dataset preparation into the workflow. Specifically, we propose a method that constructs a sample‐specific surrogate model on‐the‐fly using a small subset of observed diffraction patterns and uses its predictions as initial estimates for iterative phase retrieval. The proposed method is up to 2.72 times faster than conventional iterative phase retrieval, even when including training and dataset preparation times. Moreover, the proposed method ensures that the reconstructed images satisfy physical constraints. Comprehensive performance evaluations further demonstrate that the trade‐off between model accuracy and preparation time is critical for optimizing the total execution time in the X‐ray ptychography workflow.
Power absorption and temperature rise in deep learning based head models for local radiofrequency exposures 査読有り

Sachiko Kodera, Reina Yoshida, Essam A Rashed, Yinliang Diao, Hiroyuki Takizawa, Akimasa Hirata

Physics in Medicine & Biology　2025年3月16日

DOI： 10.1088/1361-6560/adb935 　
Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning 査読有り

Thomas Budiarjo, Santana Yuda Pradata, Kadek Gemilang Santiyuda, Muhammad Alfian Amrizal, Reza Pulungan, Hiroyuki Takizawa

Proceedings of the 2025 Supercomputing Asia Conference　1-13　2025年3月10日
出版者・発行元： ACM
DOI： 10.1145/3718350.3718359 　
Performance evaluation of the LBM simulations in fluid dynamics on SX-Aurora TSUBASA vector engine 査読有り

Xiangcheng Sun, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa, Xian Wang

Computer Physics Communications　307　109411-109411　2025年2月
出版者・発行元： Elsevier BV
DOI： 10.1016/j.cpc.2024.109411 　

ISSN：0010-4655
Clustering Based Job Runtime Prediction for Backfilling Using Classification 査読有り

Hang Cui, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

Lecture Notes in Computer Science　40-59　2024年12月21日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-74430-3_3 　

ISSN：0302-9743

eISSN：1611-3349
Maximizing Energy Budget Utilization Using Dynamic Power Cap Control 査読有り

Sho Ishii, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

Lecture Notes in Computer Science　161-180　2024年12月21日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-74430-3_9 　

ISSN：0302-9743

eISSN：1611-3349
A Node Selection Method for on-Demand Job Execution with Considering Deadline Constraints 査読有り

Daiki Nakai, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

Lecture Notes in Computer Science　141-160　2024年12月21日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-74430-3_8 　

ISSN：0302-9743

eISSN：1611-3349
Leveraging Hardware Performance Counters for Predicting Workload Interference in Vector Supercomputers 査読有り

Shubham, Keichi Takahashi, Hiroyuki Takizawa

International Conference on Parallel and Distributed Computing: Applications and Technologies (PDCAT)　2024年12月

DOI： 10.48550/arXiv.2410.18126 　
DRAS-OD: A Reinforcement Learning based Job Scheduler for On-Demand Job Scheduling in High-Performance Computing Systems 査読有り

Hang Cui, Keichi Takahashi, Hiroyuki Takizawa

2024 Twelfth International Symposium on Computing and Networking (CANDAR)　21-29　2024年11月26日
出版者・発行元： IEEE
DOI： 10.1109/candar64496.2024.00011 　
Real-Time Phase Retrieval Using On-the-Fly Training of Sample-Specific Surrogate Models 査読有り

Ryota Koda, Keichi Takahashi, Hiroyuki Takizawa, Nozomu Ishiguro, Yukio Takahashi

2024 Twelfth International Symposium on Computing and Networking (CANDAR)　59-66　2024年11月26日
出版者・発行元： IEEE
DOI： 10.1109/candar64496.2024.00015 　
A QA-Assisted Job Scheduler for Minimizing the Impact of Urgent Computing on HPC System Operation 査読有り

Tatsuyoshi Ohmura, Keichi Takahashi, Ryusuke Egawa, Hiroyuki Takizawa

2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW)　197-203　2024年11月26日
出版者・発行元： IEEE
DOI： 10.1109/candarw64572.2024.00039 　
Modernizing an Operational Real-Time Tsunami Simulator to Support Diverse Hardware Platforms 査読有り

Keichi Takahashi, Takashi Abe, Akihiro Musa, Yoshihiko Sato, Yoichi Shimomura, Hiroyuki Takizawa, Shunichi Koshimura

2024 IEEE International Conference on Cluster Computing (CLUSTER)　414-425　2024年9月24日
出版者・発行元： IEEE
DOI： 10.1109/cluster59578.2024.00043 　
XAI-Based Feature Importance Analysis on Loop Optimization 査読有り

Toshinobu Katayama, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　38　782-791　2024年5月27日
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw63119.2024.00142 　
Combining Lossy Compression with Multi-Level Caching for Data Staging over Network 査読有り

Rei Aoyagi, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　41　212-221　2024年5月27日
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw63119.2024.00059 　
Towards sub-10 nm spatial resolution by tender X-ray ptychographic coherent diffraction imaging 査読有り

Nozomu Ishiguro, Fusae Kaneko, Masaki Abe, Yuki Takayama, Junya Yoshida, Taiki Hoshino, Shuntaro Takazawa, Hideshi Uematsu, Yuhei Sasaki, Naru Okawa, Keichi Takahashi, Hiroyuki Takizawa, Hiroyuki Kishimoto, Yukio Takahashi

Applied Physics Express　17　(5)　2024年5月1日

DOI： 10.35848/1882-0786/ad4846 　

ISSN：1882-0778

eISSN：1882-0786
AOBA: The Most Powerful Vector Supercomputer in the World 招待有り

Hiroyuki Takizawa, Keichi Takahashi, Yoichi Shimomura, Ryusuke Egawa, Kenji Oizumi, Satoshi Ono, Takeshi Yamashita, Atsuko Saito

Sustained Simulation Performance 2022　71-81　2024年3月15日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-41073-4_6 　
Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systems 査読有り

Jiaheng Liu, Ryusuke Egawa, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

IEICE Electronics Express　21　(4)　20230520-20230520　2024年2月25日
出版者・発行元： Institute of Electronics, Information and Communications Engineers (IEICE)
DOI： 10.1587/elex.21.20230520 　

eISSN：1349-2543
Current Status and Future of the ABINIT-MP Program

Yuji MOCHIZUKI, Tatsuya NAKANO, Kota SAKAKURA, Hideo DOI, Koji OKUWAKI, Toshihiro KATO, Hiroyuki TAKIZAWA, Satoshi OHSHIMA, Tetsuya HOSHINO, Takahiro KATAGIRI

Journal of Computer Chemistry, Japan　2024年
出版者・発行元： Society of Computer Chemistry Japan
DOI： 10.2477/jccj.2024-0022 　

ISSN：1347-1767

eISSN：1347-3824
FMOプログラムABINIT-MPの整備状況2023 査読有り

望月祐志, 中野達也, 坂倉耕太, 奥脇弘次, 土居英男, 加藤季広, 滝沢寛之, 成瀬彰, 大島聡史, 星野哲也, 片桐孝洋

23　(1)　4-8　2024年
出版者・発行元：
DOI： 10.2477/jccj.2024-0001 　

ISSN：1347-1767

eISSN：1347-3824
Association of nuclear cataract prevalence with UV radiation and heat load in lens of older people -five city study- 査読有り

Kotaro Kinoshita, Sachiko Kodera, Natsuko Hatsusaka, Ryusuke Egawa, Hiroyuki Takizawa, Eri Kubo, Hiroshi Sasaki, Akimasa Hirata

Environmental Science and Pollution Research　30　(59)　123832-123842　2023年11月22日
出版者・発行元： Springer Science and Business Media LLC
DOI： 10.1007/s11356-023-31079-2 　

eISSN：1614-7499
Prototype of a Batched Quantum Circuit Simulator for the Vector Engine 査読有り

Keichi Takahashi, Toshio Mori, Hiroyuki Takizawa

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis　1499-1505　2023年11月12日
出版者・発行元： ACM
DOI： 10.1145/3624062.3624226 　
Conflict-aware workload co-execution on SX-aurora TSUBASA 査読有り

Riku Nunokawa, Yoichi Shimomura, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

CCF Transactions on High Performance Computing　4　(6)　425-438　2023年10月5日
出版者・発行元： Springer Science and Business Media LLC
DOI： 10.1007/s42514-023-00171-x 　

ISSN：2524-4922

eISSN：2524-4930

詳細を見る詳細を閉じる

Abstract NEC SX-Aurora TSUBASA (SX-AT) is the latest vector supercomputer, consisting of host processors called Vector Hosts (VHs) and vector processors called Vector Engines (VEs). The goal of this work is to simultaneously use both VHs and VEs to increase the resource utilization and improve the system throughput by co-executing more workloads. One difficulty is that performance interferences among VH and VE workloads could occur because they share some computing resources and potentially compete to use the same resource at the same time, so-called resource conflicts. To achieve efficient workload co-execution, first, this paper experimentally investigates the performance interference between a VH and a VE, when each of the two processors executes a different workload. It is empirically shown that the frequency of system calls from the VE workload could be a good indicator to predict if the co-execution could cause severe performance interference, even though monitoring system calls requires a huge runtime overhead and it is impractical to simply use it for decision making of co-execution. Then, this paper proposes a workload co-execution strategy based on a practical approach to identifying a pair of VE and VH workloads that could cause severe performance interferences. Our evaluation results clearly demonstrate that the system call frequency can be used to predict if the workload can affect the performance of another co-executing workload, and VH’s CPU load can be a good approximation of the system call frequency. The proposed approach based on the CPU loads could accurately identify a pair of workloads causing frequent resource conflicts, and thus reduce the risk of severe performance interferences between co-executing workloads on an SX-AT system, resulting in shorter makespan without significantly increasing the turn-around time.
Balancing exploitation and exploration in parallel Bayesian optimization under computing resource constraint 査読有り

Moto Satake, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　706-713　2023年5月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw59300.2023.00122 　
An Advantage Actor-Critic Deep Reinforcement Learning Method for Power Management in HPC Systems 査読有り

Fitra Rahmani Khasyah, Kadek Gemilang Santiyuda, Gabriel Kaunang, Faizal Makhrus, Muhammad Alfian Amrizal, Hiroyuki Takizawa

Lecture Notes in Computer Science　94-107　2023年4月8日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_8 　

ISSN：0302-9743

eISSN：1611-3349
Equivalence Checking of Code Transformation by Numerical and Symbolic Approaches 査読有り

Shunpei Sugawara, Keichi Takahashi, Yoichi Shimomura, Ryusuke Egawa, Hiroyuki Takizawa

Parallel and Distributed Computing, Applications and Technologies　373-386　2023年4月8日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_29 　

ISSN：0302-9743

eISSN：1611-3349
Towards Priority-Flexible Task Mapping for Heterogeneous Multi-core NUMA Systems 査読有り

Yifan Jin, Mulya Agung, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

Parallel and Distributed Computing, Applications and Technologies　3-15　2023年4月8日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_1 　

ISSN：0302-9743

eISSN：1611-3349
A Task-Parallel Runtime for Heterogeneous Multi-node Vector Systems 査読有り

Kazuki Ide, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

Parallel and Distributed Computing, Applications and Technologies　331-343　2023年4月8日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_26 　

ISSN：0302-9743

eISSN：1611-3349
Xevolver for Performance Tuning of C Programs 招待有り

Hiroyuki Takizawa, Shunpei Sugawara, Yoichi Shimomura, Keichi Takahashi, Ryusuke Egawa

Sustained Simulation Performance 2021　85-93　2023年2月18日
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-031-18046-0_6 　
Estimation of the number of heat illness patients in eight metropolitan prefectures of Japan: Correlation with ambient temperature and computed thermophysiological responses 査読有り

Akito Takada, Sachiko Kodera, Koji Suzuki, Mio Nemoto, Ryusuke Egawa, Hiroyuki Takizawa, Akimasa Hirata

Frontiers in Public Health　11　2023年2月17日
出版者・発行元： Frontiers Media SA
DOI： 10.3389/fpubh.2023.1061135 　

eISSN：2296-2565

詳細を見る詳細を閉じる

The number of patients with heat illness transported by ambulance has been gradually increasing due to global warming. In intense heat waves, it is crucial to accurately estimate the number of cases with heat illness for management of medical resources. Ambient temperature is an essential factor with respect to the number of patients with heat illness, although thermophysiological response is a more relevant factor with respect to causing symptoms. In this study, we computed daily maximum core temperature increase and daily total amount of sweating in a test subject using a large-scale, integrated computational method considering the time course of actual ambient conditions as input. The correlation between the number of transported people and their thermophysiological temperature is evaluated in addition to conventional ambient temperature. With the exception of one prefecture, which features a different Köppen climate classification, the number of transported people in the remaining prefectures, with a Köppen climate classification of Cfa, are well estimated using either ambient temperature or computed core temperature increase and daily amount of sweating. For estimation using ambient temperature, an additional two parameters were needed to obtain comparable accuracy. Even using ambient temperature, the number of transported people can be estimated if the parameters are carefully chosen. This finding is practically useful for the management of ambulance allocation on hot days as well as public enlightenment.
Toward Building a Digital Twin of Job Scheduling and Power Management on an HPC System 査読有り

Tatsuyoshi Ohmura, Yoichi Shimomura, Ryusuke Egawa, Hiroyuki Takizawa

Lecture Notes in Computer Science　47-67　2023年1月12日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-22698-4_3 　

ISSN：0302-9743

eISSN：1611-3349
Efficient Pause Location Prediction Using Quantum Annealing Simulations and Machine Learning. 査読有り

Michael R. Zielewski, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

IEEE Access　11　104285-104294　2023年

DOI： 10.1109/ACCESS.2023.3317698 　
Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer. 査読有り

Keichi Takahashi, Soya Fujimoto, Satoru Nagase, Yoko Isobe, Yoichi Shimomura, Ryusuke Egawa, Hiroyuki Takizawa

ISC High Performance　359-378　2023年

DOI： 10.1007/978-3-031-32041-5_19 　
A Real-time Flood Inundation Prediction on SX-Aurora TSUBASA 査読有り

Yoichi Shimomura, Akihiro Musa, Yoshihiko Sato, Atsuhiko Konja, Guoqing Cui, Rei Aoyagi, Keichi Takahashi, Hiroyuki Takizawa

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)　27　192-197　2022年12月
出版者・発行元： IEEE
DOI： 10.1109/hipc56025.2022.00035 　
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations 査読有り

Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hiroki Kobayashi, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuke Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi

2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)　2022年9月12日
出版者・発行元： IEEE
DOI： 10.1109/dasc/picom/cbdcom/cy55231.2022.9927975 　
A SYCL-based high-level programming framework for HPC programmers to use remote FPGA clusters 査読有り

Satoshi Kaneko, Hiroyuki Takizawa, Kentaro Sano

International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies　92-94　2022年6月9日
出版者・発行元： ACM
DOI： 10.1145/3535044.3535058 　
A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy 査読有り

Jiaheng LIU, Ryusuke EGAWA, Hiroyuki TAKIZAWA

IEICE Transactions on Information and Systems　E105.D　(6)　1150-1163　2022年6月1日
出版者・発行元： Institute of Electronics, Information and Communications Engineers (IEICE)
DOI： 10.1587/transinf.2021edp7201 　

ISSN：0916-8532

eISSN：1745-1361
Towards Conflict-Aware Workload Co-execution on SX-Aurora TSUBASA 査読有り

Riku Nunokawa, Yoichi Shimomura, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

Lecture Notes in Computer Science　163-174　2022年3月16日
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-96772-7_16 　

ISSN：0302-9743

eISSN：1611-3349
Evaluating the Performance and Conformance of a SYCL Implementation for SX-Aurora TSUBASA 査読有り

Jiahao Li, Mulya Agung, Hiroyuki Takizawa

Lecture Notes in Computer Science　36-47　2022年3月16日
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-96772-7_4 　

ISSN：0302-9743

eISSN：1611-3349
A Method for Reducing Time-to-Solution in Quantum Annealing Through Pausing 査読有り

Michael Ryan Zielewski, Hiroyuki Takizawa

International Conference on High Performance Computing in Asia-Pacific Region　7　137-145　2022年1月7日
出版者・発行元： ACM
DOI： 10.1145/3492805.3492815 　
A Cost Model for Compilers Based on Transfer Learning. 査読有り

Yuta Sasaki, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

IPDPS Workshops　942-951　2022年

DOI： 10.1109/IPDPSW55747.2022.00152 　
Automated selection of build configuration based on machine learning. 査読有り

Reo Furuhata, Minglu Zhao, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa

IPDPS Workshops　934-941　2022年

DOI： 10.1109/IPDPSW55747.2022.00151 　
Spatiotemporal Anomaly Detection for Large-Scale Sensor Data 査読有り

Minglu Zhao, Hiroyuki Takizawa, Tomoya Soma

2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)　2021年12月10日
出版者・発行元： IEEE
DOI： 10.1109/paap54281.2021.9720310 　
Portability of Vectorization-aware Performance Tuning Expertise across System Generations 査読有り

Shunpei Sugawara, Yoichi Shimomura, Ryusuke Egawa, Hiroyuki Takizawa

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　30　242-248　2021年12月
出版者・発行元： IEEE
DOI： 10.1109/mcsoc51149.2021.00043 　
A memory bank conflict prevention mechanism for SYCL on SX-Aurora TSUBASA 査読有り

Wenbin Wang, Jiahao Li, Yohichi Shimomura, Hiroyuki Takizawa

2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)　2　217-222　2021年11月
出版者・発行元： IEEE
DOI： 10.1109/candarw53999.2021.00043 　
Evaluating I/O Acceleration Mechanisms of SX-Aurora TSUBASA 査読有り

Yuta Sasaki, Ayumu Ishizuka, Mulya Agung, Hiroyuki Takizawa

2021 IEEE International Parallel & Distributed Processing Symposium Workshops　2021年5月
OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA 査読有り

Hiroyuki Takizawa, Shinji Shiotsuki, Naoki Ebata, Ryusuke Egawa

Parallel Computing　102　102754-102754　2021年5月
出版者・発行元： Elsevier {BV}
DOI： 10.1016/j.parco.2021.102754 　

ISSN：0167-8191
Evaluation of flood damage reduction throughout Japan from adaptation measures taken under a range of emissions mitigation scenarios 査読有り

Tao Yamamoto, So Kazama, Yoshiya Touge, Hayata Yanagihara, Tsuyoshi Tada, Takeshi Yamashita, Hiroyuki Takizawa

Climatic Change　165　(60)　2021年4月
出版者・発行元： Springer Science and Business Media LLC
DOI： 10.1007/s10584-021-03081-5 　

ISSN：0165-0009

eISSN：1573-1480
Preemptive Parallel Job Scheduling for Heterogeneous Systems Supporting Urgent Computing 査読有り

Mulya Agung, Yuta Watanabe, Henning Weber, Ryusuke Egawa, Hiroyuki Takizawa

IEEE Access　9　17557-17571　2021年
出版者・発行元： Institute of Electrical and Electronics Engineers ({IEEE})
DOI： 10.1109/ACCESS.2021.3053162 　

eISSN：2169-3536
neoSYCL: a SYCL implementation for SX-Aurora TSUBASA 査読有り

Yinan Ke, Mulya Agung, Hiroyuki Takizawa

International Conference on High Performance Computing in ASia-Pacific Region　2021年1月
Improving Quantum Annealing Performance on Embedded Problems 招待有り査読有り

Zielewski, M.R., Agung, M., Egawa, R., Takizawa, H.

Supercomputing Frontiers and Innovations　7　(4)　2020年12月

DOI： 10.14529/js?200403 　

ISSN：2313-8734 2409-6008
Failure Prediction in Datacenters Using Unsupervised Multimodal Anomaly Detection 査読有り

Minglu Zhao, Reo Furuhata, Mulya Agung, Hiroyuki Takizawa, Tomoya Soma

The IEEE BigData 2020, the third international conference on the Internet of Things Data Analytics (IoTDA)　2020年12月
A Conflict-Aware Capacity Control Mechanism for Last-Level Cache 査読有り

Jiaheng Liu, Ryusuke Egawa, Mulya Agung, Hiroyuki Takizawa

Proceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020　416-420　2020年11月1日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/CANDARW51189.2020.00085 　
Exploiting the Potentials of the Second Generation SX-Aurora TSUBASA 査読有り

Ryusuke Egawa, Souya Fujimoto, Tsuyoshi Yamashita, Daisuke Sasaki, Yoko Isobe, Yoichi Shimomura, Hiroyuki Takizawa

The 11th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS’20)　2020年11月
Improving the accuracy in SpMV implementation selection with machine learning 査読有り

Reo Furuhata, Minglu Zhao, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

The Eighth International Conference on Computing and Networking Workshops (CANDARW)　2020年11月
Polymorphic Data Layout for SX-Aurora TSUBASA Vector Engines 査読有り

Naoki Ebata, Yoko Isobe, Ryusuke Egawa, Hiroyuki Takizawa

The Eighth International Conference on Computing and Networking (CANDAR)　2020年11月
ベイズ最適化による洪水シミュレーションコードの負荷分散自動調整査読有り

石塚歩, 山下毅, 江川隆輔, 滝沢寛之, 山本道, 風間聡

The 4-th cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG2020)　2020年7月
Quantum Compiler : Automatic Vectorization Assisted by Quantum Annealer 査読有り

Yuta Sasaki, Michael Ryan Zielewski, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

The ISC High Performance 2020 (poster)　2020年6月
Challenges in Solving Scheduling Problems with the D-Wave Quantum Annealer 査読有り

Michael Ryan Zielewski, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

The ISC High Performance 2020 (poster)　2020年6月
Automatically Avoiding Memory Access Conflicts on SX-Aurora TSUBASA 査読有り

Naoki Ebata, Ryusuke Egawa, Yoko Isobe, Ryoji Takaki, Hiroyuki Takizawa

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2020年5月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw50202.2020.00139 　
Task Priority Control for the HPX Runtime System 査読有り

Suhang Jiang, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2020年5月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw50202.2020.00137 　
Comparison of Direct and Indirect Networks for High-Performance FPGA Clusters 査読有り

Antoniette Mondigo, Tomohiro Ueno, Kentaro Sano, Hiroyuki Takizawa

Applied Reconfigurable Computing. Architectures, Tools, and Applications　314-329　2020年4月
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-44534-8_24 　

ISSN：0302-9743

eISSN：1611-3349
Xevolver: A code transformation framework for separation of system-awareness from application codes 査読有り

Kazuhiko Komatsu, Ayumu Gomi, Ryusuke Egawa, Daisuke Takahashi, Reiji Suda, Hiroyuki Takizawa

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE　32　(7)　2020年4月

DOI： 10.1002/cpe.5577 　

ISSN：1532-0626

eISSN：1532-0634
Online MPI process mapping for coordinating locality and memory congestion on NUMA systems 査読有り

Agung, M., Amrizal, M.A., Egawa, R., Takizawa, H.

Supercomputing Frontiers and Innovations　7　(1)　71-90　2020年3月
出版者・発行元： FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/js200104 　

ISSN：2313-8734 2409-6008
Exafsa: Parallel fluid-structure-acoustic simulation

Florian Lindner, Amin Totounferoush, Miriam Mehl, Benjamin Uekermann, Neda Ebrahimi Pour, Verena Krupp, Sabine Roller, Thorsten Reimann, Dörte C. Sternel, Ryusuke Egawa, Hiroyuki Takizawa, Frédéric Simonis

Lecture Notes in Computational Science and Engineering　136　271-300　2020年
出版者・発行元： Springer
DOI： 10.1007/978-3-030-47956-5_10 　

ISSN：2197-7100 1439-7358
Preliminary Evaluation towards Task Priority Control in HPX 査読有り

International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2020) (poster)　2020年1月
Acceleration of Hyper-Parameter Auto-Tuning with Parallelization and Time Constraints 査読有り

International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2020) (poster)　2020年1月
An Optimization Technology of Software Auto-Tuning Applied to Machine Learning Software 査読有り

Toshiki Tabeta, Naoto Seki, Akihiro Fujii, Teruo Tanaka, Hiroyuki Takizawa

International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2020) (poster)　2020年1月
DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems 査読有り

Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa

IEEE Access　8　6937-6953　2020年
出版者・発行元： Institute of Electrical and Electronics Engineers ({IEEE})
DOI： 10.1109/ACCESS.2019.2963726 　
An OpenCL-like Offload Programming Framework for SX-Aurora TSUBASA 査読有り

Hiroyuki Takizawa, Shinji Shiotsuki, Naoki Ebata, Ryusuke Egawa

The 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2019)　285-291　2019年12月
Peachy Parallel Assignments (EduHPC 2019)

Mulya Agung, Allen Malony, Hiroyuki Takizawa, David P. Bunde, Muhammad A. Amrizal, Steven Bogaerts, Ryusuke Egawa, Daniel A. Ellsworth, Jorge Fernandez-Fabeiro, Arturo Gonzalez-Escribano, Sukhamay Kundu, Alina Lazar

Proceedings of EduHPC 2019: Workshop on Education for High Performance Computing - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis　75-83　2019年11月1日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/EduHPC49559.2019.00015 　
An Automatic MPI Process Mapping Method Considering Locality and Memory Congestion on NUMA Systems 査読有り

Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　17-24　2019年9月
Optimization of a gas-particle flow solver on vector supercomputers 査読有り

Yoichi Shimomura, Midori Kano, Takashi Soga, Kenta Yamaguchi, Akihiro Musa, Yusuke Mizuno, Shun Takahashi, Ryusuke Egawa, Hiroyuki Takizawa

The 31st International Conference on Parallel Computational Fluid Dynamics (ParCFD’2019)　1-4　2019年6月
Memory First : A Performance Tuning Strategy Focusing on Memory Access Patterns 査読有り

Naoki Ebata, Ryusuke Egawa, Yoko Isobe, Ryoji Takaki, Hiroyuki Takizawa

The ISC High Performance conference 2019 (poster)　2019年6月
Scaling performance for n-body stream computation with a ring of FPGAs 査読有り

Jens Huthmann, Abiko Shin, Artur Podobas, Kentaro Sano, Hiroyuki Takizawa

The International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART2019)　1-6　2019年6月
Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs 査読有り

Antoniette Mondigo, Tomohiro Ueno, Kentaro Sano, Hiroyuki Takizawa

IEICE Transactions on Information and Systems　E102-D　(5)　1029-1036　2019年5月
出版者・発行元：
DOI： 10.1587/transinf.2018RCP0007 　

ISSN：0916-8532

eISSN：1745-1361
An Energy Optimization Method for Hybrid In-Memory Checkpointing 査読有り

Muhammad Alfian Amrizal, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

2019 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)(poster)　2019年4月
The Impacts of Locality and Memory Congestion-aware Thread Mapping on Energy Consumption of Modern NUMA Systems 査読有り

Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa

2019 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　2019年4月
Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines 査読有り

Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, Kazuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(1)　36-47　2019年3月

DOI： 10.14529/jsfi190106 　
Xevolver: A user-defined code transformation approach to streamlining legacy code migration

Hiroyuki Takizawa, Reiji Suda, Daisuke Takahashi, Ryusuke Egawa

Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project　163-181　2018年12月6日
出版者・発行元： Springer Singapore
DOI： 10.1007/978-981-13-1924-2_9 　
Enhancing memory bandwidth in a single stream computation with multiple FPGAs 査読有り

Antoniette Mondigo, Kentaro Sano, Hiroyuki Takizawa

The 2018 International Conference on Field-Programmable Technology (FPT’18)　2018年12月
Automatic hyperparameter tuning of machine learning models under time constraints 査読有り

Zhen Wang, Agung Mulya, Ryusuke Egawa, Reiji Suda, Hiroyuki Takizawa

IEEE Big Data 2018 Workshop　2018年12月
A Locality and Memory Congestion-aware Thread Mapping Method for Modern NUMA Systems 査読有り

Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa

ACM/IEEE Supercomputing Conference 2018 (SC18) (poster)　2018年11月
Preconditioner auto-tuning with deep learning for sparse iterative algorithms 査読有り

Kenya Yamada, Takahiro Katagiri, Hiroyuki Takizawa, Kazuo Minami, Mitsuo Yokokawa, Toru Nagai, Masao Ogino

The Sixth International Symposium on Computing and Networking Workshops (CANDARW 2018), LHAM workshop　2018年11月
Investigating the Effects of Dynamic Thread Team Size Adjustment for Irregular Applications 査読有り

Xiong Xiao, Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa

The Sixth International Symposium on Computing and Networking (CANDAR 2018)　2018年11月
A Failure Prediction-based Adaptive Checkpointing Method with Less Reliance on Temperature Monitoring for HPC Applications 査読有り

Muhammad Alfian Amrizal, Pei Li, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa

2018 IEEE International Conference on Cluster Computing, FTS workshop　483-491　2018年9月
A machine learning-based approach for selecting SpMV kernels and matrix storage formats 査読有り

Cui, H., Hirasawa, S., Kobayashi, H., Takizawa, H.

IEICE Transactions on Information and Systems　E101D　(9)　2307-2314　2018年9月

DOI： 10.1587/transinf.2017EDP7176 　

ISSN：1745-1361 0916-8532
Expressing the Differences in Code Optimizations between Intel Knights Landing and NEC SX-ACE Processors

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

The 13th World Congress on Computational Mechanics/2nd Pan American Congress on Computational Mechanics　2018年7月
Performance Estimation of Deeply Pipelined Fluid Simulation on Multiple FPGAs with High-speed Communication Subsystem 査読有り

Antoniette Mondigo, Ketnaro Sano, Hiroyuki Takizawa

The 29th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2018)　10-12　2018年7月
MIGRATING AN OLD VECTOR CODE TO MODERN VECTOR MACHINES 査読有り

Hiroyuki Takizawa, Kenta Yamaguchi, Takashi Soga, Thorsten Reimann, Kazuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

30th International Conference on Parallel Computational Fluid Dynamics　2018年4月
反応・相変化を伴う多分散系混相流シミュレーションコードの最適化

佐々木, 大輔, 加藤, 季広, 磯部, 洋子, 笠原, 弘貴, 渡部, 広吾輝, 志村, 啓, 奥野, 航平, 松尾, 亜紀子, 江川, 隆輔, 滝沢, 寛之, 小林, 広明

SENAC : 東北大学大型計算機センター広報　51　(1)　47-51　2018年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419

詳細を見る詳細を閉じる

紀要類（bulletin）
Use of Code Structural Features for Machine Learning to Predict Effective Optimizations. 査読有り

Yuki Kawarabatake, Mulya Agung, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa

33rd IEEE International Parallel & Distributed Processing Symposium Workshops(IPDPSW), International Workshop on Automatic Performance Tuning　1049-1055　2018年
出版者・発行元： IEEE Computer Society
DOI： 10.1109/IPDPSW.2018.00163 　
Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems 査読有り

Muhammad Alfian Amrizal, Atsuya Uno, Yukinori Sato, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E100D　(12)　2749-2760　2017年12月

DOI： 10.1587/transinf.2017PAP0002 　

ISSN：1745-1361
Optimizing Energy Consumption on HPC Systems with a Multi-Level Checkpointing Mechanism 査読有り

Muhammad Alfian Amrizal, Hiroyuki Takizawa

2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017 - Proceedings　2017年9月6日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/NAS.2017.8026868 　
Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

JOURNAL OF SUPERCOMPUTING　73　(9)　3948-3976　2017年9月

DOI： 10.1007/s11227-017-1993-y 　

ISSN：0920-8542

eISSN：1573-0484
A customizable auto-tuning scenario with user-defined code transformations 査読有り

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, Daisuke Takahashi

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017　1372-1378　2017年6月30日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/IPDPSW.2017.79 　
機械学習によるコード最適化の可能性

滝沢寛之, 崔航, 平澤将一

計算工学講演会論文集　22　2017年6月
データレイアウト最適化のためのコード変換規則の自動生成

山田剛史, 平澤将一, 須田礼仁, 滝沢寛之

研究報告ハイパフォーマンスコンピューティング（HPC）　2017-HPC-158　(28)　1-8　2017年3月
シナリオテンプレートを用いた自動チューニングに関する研究

佐藤大智, 平澤将一, 滝沢寛之, 小林広明

第79回全国大会講演論文集　2017　(1)　45-46　2017年3月
Toward Dynamic Load Balancing across OpenMP Thread Teams for Irregular Workloads 査読有り

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

International Journal of Networking and Computing　7　(2)　387-404　2017年
出版者・発行元： IJNC編集委員会
DOI： 10.15803/ijnc.7.2_387 　

ISSN：2185-2839

詳細を見る詳細を閉じる

In the field of high performance computing, massively-parallel many-core processors such as Intel Xeon Phi coprocessors are becoming popular because they can significantly accelerate various applications. In order to efficiently parallelize applications for such many-core processors, several high-level programming models have been proposed. The de facto standard programming model mainly for shared-memory parallel processing is OpenMP. For hierarchical parallel processing, OpenMP version 4.0 or later allows programmers to create multiple thread teams. Each thread team contains a bunch of newly-created synchronizable threads. When multiple thread teams are used to execute an application, it is important to have dynamic load balancing across thread teams, since static load balancing easily encounters load imbalance across teams, and thus degrades performance. In this paper, we first motivate our work by clarifying the benefit of using multiple thread teams to execute an irregular workload on a many-core processor. Then, we demonstrate that dynamic load balancing across those thread teams has a potential of significantly improving the performance of irregular workloads on a many-core processor, with considering the scheduling overhead. Although such a dynamic load balancing mechanism has not been provided by the current OpenMP specification, the benefits of dynamic load balancing across thread teams are discussed through experiments using the Intel Xeon Phi coprocessor. We evaluate the performance gain of dynamic load balancing across thread teams using a ray tracing code. The results show that such a dynamic load balancing mechanism can improve the performance by up to 14% compared to static load balancing across teams, with considering scheduling overhead.
A Directive Generation Approach to High Code-Maintainability for Various HPC Systems. 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Int. J. Netw. Comput.　7　(2)　405-418　2017年
Vectorization-aware Loop Optimization with User-defined Code Transformations 査読有り

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　685-692　2017年

DOI： 10.1109/CLUSTER.2017.102 　

ISSN：1552-5244
Performance and Power Analysis of SX-ACE using HP-X Benchmark Programs 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi, Yoko Isobe, Toshihiro Kato, Souya Fujimoto

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　693-700　2017年

DOI： 10.1109/CLUSTER.2017.65 　

ISSN：1552-5244
An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning 査読有り

Hiroyuki Takizawa, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa

2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　389-394　2017年

DOI： 10.1109/CANDAR.2017.96 　

ISSN：2379-1888
Designing an Open Database of System-aware Code Optimizations 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Hiroyuki Takizawa

2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　369-374　2017年

DOI： 10.1109/CANDAR.2017.102 　

ISSN：2379-1888
A Memory Congestion-aware MPI Process Placement for Modern NUMA Systems 査読有り

Mulya Agung, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa

2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC)　152-161　2017年

DOI： 10.1109/HiPC.2017.00026 　

ISSN：1094-7256
Directive Translation for Various HPC Systems Using the Xevolver Framework 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2016　109-117　2016年12月

DOI： 10.1007/978-3-319-46735-1_9 　
Making a Legacy Code Auto-tunable without Messing It Up 査読有り

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference 2016 (SC16)　2016年11月
A Power-Performance Tradeoff of HBM by Limiting Access Channels 査読有り

Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of IEEE Symposium on Low-Power and High-Speed Chips　2016年4月
アプリケーション適応型キャッシュリサイズのためのバイパス機構査読有り

佐藤雅之, 高井拓実, 江川隆輔, 滝沢寛之, 小林広明

電子情報通信学会論文誌　J99-D　(3)　2016年3月
機械学習を用いたコード変換に関する研究

川原畑勇希, 平澤将一, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会講演論文集　2016　227-227　2016年
出版者・発行元：電気関係学会東北支部連合大会実行委員会
DOI： 10.11528/tsjc.2016.0_227 　
ディレクティブに基づくステンシル計算の性能パラメータ自動設定査読有り

角川拓也, 平澤将一, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム(ACS)　2016年
A Cache Partitioning Mechanism to Protect Shared Data for CMPs 査読有り

Masayuki Sato, Shin Nishimura, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS XIX)　2016年

DOI： 10.1109/CoolChips.2016.7503674 　

ISSN：2473-4683
Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework. 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

Int. J. Netw. Comput.　6　(2)　167-180　2016年
A Code Selection Mechanism Using Deep Learning 査読有り

Hang Cui, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC)　385-392　2016年

DOI： 10.1109/MCSoC.2016.46 　
The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads 査読有り

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　529-535　2016年

DOI： 10.1109/CANDAR.2016.48 　

ISSN：2379-1888
A Directive Generation Approach Using User-defined Rules 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　515-521　2016年

DOI： 10.1109/CANDAR.2016.94 　

ISSN：2379-1888
A User-Defined Code Transformation Approach to Overlapping MPI Communication with Computation 査読有り

Yasuharu Hayashi, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　508-514　2016年

DOI： 10.1109/CANDAR.2016.35 　

ISSN：2379-1888
Xevdriver: A software system supporting XML-based source-to-source code transformations on Fortran programs 査読有り

Reiji Suda, Hiroyuki Takizawa

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　522-528　2016年

DOI： 10.1109/CANDAR.2016.113 　

ISSN：2379-1888
Performance Evaluation of Compiler-Assisted OpenMP Codes on Various HPC Systems 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2015　147-157　2015年12月

DOI： 10.1007/978-3-319-20340-9_12 　
A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning 査読有り

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E98D　(12)　2178-2186　2015年12月

DOI： 10.1587/transinf.2015PAP0028 　

ISSN：1745-1361
An approach to the highest efficiency of the HPCG benchmark on the SX-ACE supercomputer 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Yoko Isobe, Ryusei Ogata, Hiroyuki Takizawa, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference 2015 (SC15)　1-2　2015年11月
Expressing system-awareness as code transformations for performance portability across diverse HPC systems 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi

International Workshop on Portability Among HPC Architectures for Scientific Applications 2015　1-6　2015年11月
FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON ELECTRONICS　E98C　(7)　550-558　2015年7月

DOI： 10.1587/transele.E98.C.550 　

ISSN：1745-1353
Xevolver による実アプリケーションの性能と保守性の両立

平澤将一, 滝沢寛之, 小林広明

計算工学講演会論文集　20　4p　2015年6月
出版者・発行元：日本計算工学会
Performance Evaluation of an OpenMP Parallelization by Using Automatic Parallelization Information

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2014　119-126　2015年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-319-10626-7_10 　
A Data Management Policy for Energy-Efficient Cache Mechanisms

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2015　61-75　2015年

DOI： 10.1007/978-3-319-20340-9_6 　
Automatic Parameter Tuning of Hierarchical Incremental Checkpointing 査読有り

Alfian Amrizal, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2014　8969　298-309　2015年

DOI： 10.1007/978-3-319-17353-5_25 　

ISSN：0302-9743
Optimized Data Transfers Based on the OpenCL Event Management Mechanism 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Makoto Sugawara, Isaac Gelado, Hiroaki Kobayashi, Wen-mei W. Hwu

SCIENTIFIC PROGRAMMING　2015　(576498)　1-16　2015年

DOI： 10.1155/2015/576498 　

ISSN：1058-9244

eISSN：1875-919X
Combining code refactoring and auto-tuning to improve performance portability of high-performance computing applications 査読有り

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

The Sixth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking (COMPUTATION TOOLS 2015)　20-26　2015年
Identification and elimination of platform-specific code smells in high performance computing applications 査読有り

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

International Journal of Networking and Computing　5　(1)　180-199　2015年
出版者・発行元： IJNC Editorial Committee
DOI： 10.15803/ijnc.5.1_180 　

ISSN：2185-2839

詳細を見る詳細を閉じる

A code smell is a code pattern that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help users find which parts of their application codes should be refactored. However, code smells have not been defined in a formal manner. Moreover, existing detection tools are designed mainly for object-oriented applications, but rarely provided for high performance computing (HPC) applications. HPC applications are usually optimized for a particular platform to achieve a high performance, and hence have special code smells called platform-specific code smells (PSCSs). The purpose of this work is to develop a code smell alert system to help users find PSCSs of HPC applications to improve the performance portability across different platforms. This paper presents a PSCS alert system that is based on an abstract syntax tree (AST) and XML. Code patterns of PSCSs are defined in a formal way using the AST information represented in XML. XML Path Language (XPath) is used to describe those patterns. A database is built to store the transformation recipes written in XSLT files for eliminating detected PSCSs. The recall and precision evaluation results obtained by using real applications show that the proposed system can detect potential PSCSs accurately. The evaluation on performance portability of real applications demonstrates that eliminating PSCSs leads to significant performance changes and therefore the code portions with detected PSCSs have to be refactored to improve the performance portability across multiple platforms.
Xevolver を用いた自動チューニング

平澤将一, 肖熊, 滝沢寛之, 小林広明

計算工学会学会誌「計算工学」　20　(2)　14-17　2015年
An Energy-Efficient Dynamic Memory Address Mapping Mechanism 査読有り

Masayuki Sato, Chengguang Han, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2015 IEEE SYMPOSIUM ON LOW-POWER AND HIGH-SPEED CHIPS　1-3　2015年

DOI： 10.1109/CoolChips.2015.7158660 　
A Verification Framework for Streamlining Empirical Auto-tuning 査読有り

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　508-514　2015年

DOI： 10.1109/CANDAR.2015.115 　

ISSN：2379-1888
Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　515-520　2015年

DOI： 10.1109/CANDAR.2015.102 　

ISSN：2379-1888
Xevtgen: Fortran code transformer generator for high performance scientific codes 査読有り

Reiji Suda, Hiroyuki Takizawa, Shoichi Hirasawa

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　528-534　2015年

DOI： 10.1109/CANDAR.2015.63 　

ISSN：2379-1888
A Case Study of User-Defined Code Transformations for Data Layout Optimizations 査読有り

Takeshi Yamada, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　535-541　2015年

DOI： 10.1109/CANDAR.2015.96 　

ISSN：2379-1888
Xevtgen: Fortran code transformer generator for high performance scientific codes 査読有り

Reiji Suda, Hiroyuki Takizawa, Shoichi Hirasawa

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　6　(2)　528-534　2015年

DOI： 10.1109/CANDAR.2015.63 　

ISSN：2379-1888
MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications 査読有り

Ye Gao, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E97D　(11)　2835-2843　2014年11月

DOI： 10.1587/transinf.2014EDP7227 　

ISSN：1745-1361
Early evaluation of the SX-ACE processor 査読有り

Ryusuke Egawa, Shintaro Momose, Kazuhiko Komatsu, Yoko Isobe, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference 2014 (SC14)　1-2　2014年11月
ベクトル型メディアプロセッサの低消費電力化に関する研究

宇野渉, 高也, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会予稿集　2014年8月
キャッシュメモリにおけるスレッド間共有データの管理に関する研究

西村秦, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会予稿集　2014年8月
Exploring system architectures for next-generation CFD simulations in the postpeta-scale era 査読有り

KOMATSU Kazuhiko, EGAWA Ryusuke, TAKIZAWA Hiroyuki, SOGA Takashi, MUSA Akihiro, KOBAYASHI Hiroaki

Journal of Fluid Science and Technology　9　(5)　JFST0073-JFST0073　2014年
出版者・発行元：一般社団法人日本機械学会
DOI： 10.1299/jfst.2014jfst0073 　

ISSN：1880-5558

詳細を見る詳細を閉じる

CFD simulations with uniform grids have been paid attention as a next-generation CFD simulation on a large-scale supercomputing system. The Building-Cube Method (BCM) is one of the next-generation CFD methods. The basic idea is to balance loads of calculations among processing elements on a supercomputing system by dividing the whole calculations into many parallel tasks with the same amount of computation. Thus, it is suitable for highly parallel computation on supercomputing systems. This paper firstly implements BCM on five supercomputing systems as an example of a next-generation CFD simulation in the upcoming postpeta-scale era. Then, by theoretical analyses and performance evaluations, this paper clarifies the requirements of future supercomputing systems for a next-generation CFD simulation. The performance evaluations show that as the number of processing elements increases, the imbalance of data exchanges among nodes becomes more serious than that of calculations even in a next-generation CFD simulation. While the calculation time can ideally be reduced according to the number of processing elements, the data transfer time becomes dominant in the total execution time. Different from the massively-parallel system architecture, the number of nodes in a system should be as small as possible to prevent the data transfer. The performance analyses also show that the memory bandwidth limits the performance of BCM and use of an on-chip memory is effective to improve the performance. A memory subsystem that achieves a higher sustained memory bandwidth is required. Therefore, a supercomputing system that consists of a small number of high-performance nodes is essential to achieve high sustained performance of the next-generation CFD in the up coming postpeta-scale era by reducing the data transfers, which becomes eventually a bottleneck in large-scale simulation.
A Platform-Specific Code Smell Alert System for High Performance Computing Applications 査読有り

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)　653-662　2014年

DOI： 10.1109/IPDPSW.2014.76 　
On-chip checkpointing with 3D-stacked memories 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 International 3D Systems Integration Conference, 3DIC 2014 - Proceedings　1-6　2014年
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/3DIC.2014.7152173 　
An Energy Optimization Method for Vector Processing Mechanisms 査読有り

Ye Gao, Masayuki Satoi, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 IEEE COOL CHIPS XVII　1-3　2014年

DOI： 10.1109/CoolChips.2014.6842957 　

ISSN：2473-4683
A compiler-assisted OpenMP migration method based on automatic parallelizing information 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　8488　450-459　2014年
出版者・発行元： Springer Verlag
DOI： 10.1007/978-3-319-07518-1_30 　

ISSN：1611-3349 0302-9743
An Approach to Customization of Compiler Directives for Application-Specific Code Transformations 査読有り

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC)　99-106　2014年

DOI： 10.1109/MCSoC.2014.23 　
Xevolver: An XML-based Code Translation Framework for Supporting HPC Application Migration 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Yasuharu Hayashi, Ryusuke Egawa, Hiroaki Kobayashi

2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC)　1-11　2014年

DOI： 10.1109/HiPC.2014.7116902 　

ISSN：1094-7256
Xevolver: an XML-based programming framework for software evolution 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference 2013 (SC13)　1-2　2013年11月
ソフトウェア進化のための自動性能追跡システム査読有り

平澤将一, 滝沢寛之, 小林広明

情報処理学会論文誌：コンピューティングシステム(ACS)　2013年10月
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E96D　(9)　2047-2054　2013年9月

DOI： 10.1587/transinf.E96.D.2047 　

ISSN：1745-1361
ブロックバイパス機構によるキャッシュのエネルギ効率化に関する研究

高井拓実, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

並列/分散/協調処理に関する「北九州」サマー・ワークショップ (SWoPP2013)　1-9　2013年7月
Performance Evaluation of a Next-Generation CFD on Various Supercomputing Systems

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2012　123-132　2013年
出版者・発行元： Springer Berlin Heidelberg
DOI： 10.1007/978-3-642-32454-3_11 　
Analysing the performance improvements of optimizations on modern HPC systems 査読有り

Kazuhiko Komatsu, Toshihide Sasaki, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2013 - Proceedings of the Joint Workshop on Sustained Simulation Performance　13-25　2013年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-319-01439-5-2 　
HPC refactoring with hierarchical abstractions to help software evolution 査読有り

Hiroyuki Takizawa, Ryusuke Egawa, Daisuke Takahashi, Reiji Suda

Sustained Simulation Performance 2012 - Proceedings of the Joint Workshop on High Performance Computing on Vector Systems, and Workshop on Sustained Simulation Performance　27-33　2013年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-32454-3-3 　
Performance evaluation of phase-based correspondence matching on GPUs 査読有り

Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXVI　8856　2013年

DOI： 10.1117/12.2023550 　

ISSN：0277-786X

eISSN：1996-756X
複合システムにおけるチェックポイントリスタート招待有り

滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

日本信頼性学会誌 : 信頼性　35　(8)　515　2013年

DOI： 10.11348/reajshinrai.35.8_515 　
A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms 査読有り

Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2013 IEEE COOL CHIPS XVI (COOL CHIPS)　1-3　2013年

DOI： 10.1109/CoolChips.2013.6547923 　

ISSN：2473-4683
ClMPI: An opencl extension for interoperation with the message passing interface 査読有り

Hiroyuki Takizawa, Makoto Sugawara, Shoichi Hirasawa, Isaac Gelado, Hiroaki Kobayashi, Wen-Mei W. Hwu

Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013　1138-1148　2013年
出版者・発行元： IEEE Computer Society
DOI： 10.1109/IPDPSW.2013.183 　
A comparison of performance tunabilities between OpenCL and OpenACC 査読有り

Makoto Sugawara, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013　147-152　2013年
出版者・発行元： IEEE Computer Society
DOI： 10.1109/MCSoC.2013.31 　
Design and Evaluation of a Media-oriented Vector Processor with a Multi-banked Cache Memory 査読有り

Ye Gao, Naold Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2013 IEEE 11TH SYMPOSIUM ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA (ESTIMEDIA)　78-87　2013年

DOI： 10.1109/ESTIMedia.2013.6704506 　

ISSN：2325-1271
Performance evaluation of BCM on various supercomputing systems 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

The 24th International Conference on Parallel Computational Fluid Dynamics　1-2　2012年11月
Performance Evaluation of BCM on Various Supercomputing Systems 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

In 24th International Conference on Parallel Computational Fluid Dynamics　2012年5月21日
ウェイ適応型キャッシュの高エネルギ効率化のためのデッドブロック早期追い出しポリシ査読有り

東方雄亮, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤シンポジウムSACSIS2012　4-5　2012年5月
メタ情報拡散に基づくP2P 型自己組織化サービス資源検索機構査読有り

稲葉勉, 村田善智, 滝沢寛之, 小林広明

電子情報通信学会論文誌. D　J95-D　(5)　1110-1122　2012年5月1日
出版者・発行元：一般社団法人電子情報通信学会
ISSN：1880-4535

詳細を見る詳細を閉じる

PCやゲーム機などの不特定多数のサービス資源を対象としたサービス資源共有基盤の資源検索機構を実現するため,筆者らはこれまで自己組織化サービス資源検索機構(SORMS)を提案してきた.SORMSは,利用者の利用特徴に基づきオーバレイネットワーク上でサービス資源の論理リンクを張り替えることでクエリの転送先を絞り込み,求める資源の発見数や検索効率向上させることができる.しかし,従来のSORMSの発見資源数や検索効率は大規模計算環境の実用性の観点からは十分とはいえない.また,利用頻度の低いサービス資源がネットワークから孤立してしまうという問題も引き起こしていた.そこで,本論文ではSORMSの実用性の更なる向上を目的として,サービス資源のメタ情報を利用者の利用特徴に基づいてネットワーク内に効率良く拡散させ,それを積極的に検索に利用することで資源の孤立回避と検索性能向上を図るオーバレイネットワーク再構築手法を提案する.シミュレーションによる性能評価の結果,提案機構はサービス資源の発見数を約3.9倍,検索効率を約3.6倍程度向上可能であるとともに,サービス資源のネットワークからの孤立を回避でき,サービス資源の相互利用に有用に機能することが明らかとなった.
A bypass mechanism for way-adaptable caches 査読有り

Takumi Takai, Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips XV　2012年4月
Performance and scalability analysis of a chip multi vector processor 査読有り

Yoshiei Sato, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2011　3-20　2012年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-22244-3-1 　
A prototype implementation of OpenCL for SX vector systems 査読有り

Hiroyuki Takizawa, Ryusuke Egawa, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2011　41-50　2012年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-22244-3-3 　
Exploring Design Space of a 3D Stacked Vector Cache 査読有り

Ryusuke Egawa, Yusuke Endo, Jubee Tada, Hiroyuki Takizawa, Hiroaki Kobayashi

2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC)　1477-1477　2012年
A capacity-efficient insertion policy for dynamic cache resizing mechanisms 査読有り

Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

CF '12 - Proceedings of the ACM Computing Frontiers Conference　265-267　2012年

DOI： 10.1145/2212908.2212949 　
A media-oriented vector architectural extension with a high bandwidth cache system 査読有り

Ye Gao, Naoki Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Symposium on Low-Power and High-Speed Chips - Proceedings for 2012 IEEE COOL Chips XV　1-3　2012年

DOI： 10.1109/COOLChips.2012.6216588 　
An out-of-order vector processing mechanism for multimedia applications 査読有り

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

CF '12 - Proceedings of the ACM Computing Frontiers Conference　233-235　2012年

DOI： 10.1145/2212908.2212941 　
GPU IMPLEMENTATION OF PHASE-BASED STEREO CORRESPONDENCE AND ITS APPLICATION 査読有り

Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012)　1697-1700　2012年

DOI： 10.1109/ICIP.2012.6467205 　

ISSN：1522-4880
Improving the Scalability of Transparent Checkpointing for GPU Computing Systems 査読有り

Alfian Amrizal, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY　1-6　2012年

DOI： 10.1109/TENCON.2012.6412343 　

ISSN：2159-3442
Exploring Design Space of a 3D Stacked Vector Cache 査読有り

Ryusuke Egawa, Yusuke Endo, Hiroyuki Takizawa, Hiroaki Kobayashi, Jubee Tada

2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC)　1475-+　2012年
A Network Clustering Algorithm for Sybil-Attack Resisting 査読有り

Ling Xu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E94D　(12)　2345-2352　2011年12月

DOI： 10.1587/transinf.E94.D.2345 　

ISSN：0916-8532

eISSN：1745-1361
Performance of building cube method on various platforms 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

The 8th International Conference on Flow Dynamics 2011 (ICFD2011)　2011年11月
An automatic task assignment method for heterogeneous computing systems 査読有り

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

The 8th International Conference on Flow Dynamics 2011 (ICFD2011)　2011年11月
マイグレーションによる複合型計算システム向けジョブスケジューリング査読有り

小山賢太郎, 佐藤功人, 小松一彦, 村田善智, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　4　(4)　203-213　2011年10月5日
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

消費電力が厳しく制約された条件下で演算性能を大幅に向上させることができるシステムアーキテクチャとして，汎用プロセッサに加えてアクセラレータを混載する複合型計算システムが注目されている．本論文では，大規模複合型計算システムにおけるターンアラウンドタイムの短縮を目的とし，マイグレーションとプリエンプティブバックフィルに基づくスケジューリング手法を提案する．また，ジョブ投入時にマイグレーションのコストを予測するため，その予測モデルも提案する．予測モデルの精度を評価した結果，ほぼすべてのアプリケーションにおいて，マイグレーションコストの最悪値をジョブの最大メモリ使用量から高精度で予測できることが明らかになった．また，提案スケジューリング手法はマイグレーションとプリエンプティブバックフィルの両方の長所を利用できるため，それらのいずれかが有効に機能する状況において，ターンアラウンドタイムを短縮可能であることが示された．A heterogeneous computing system of general-purpose processors and accelerators is a promising approach to improve the system performance under severe power consumption limitation. This paper proposes a job scheduling method that uses job migration and preemptive backfilling to reduce the turn around time of job execution in a large-scale heterogeneous computing system. A prediction model is also proposed to predict the migration cost of a job when the job is submitted. The evaluation results indicate that the prediction model can accurately estimate the worst-case migration costs of most applications from their maximum memory usage. It is also demonstrated that the proposed mechanism can reduce the turn around time of a job in the situations where either job migration or backfilling works well because it has the advantages of both of the two scheduling policies.
A patch-based bit mask ltering method for micropolygon rasterization 査読有り

Jiali Yao, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

High-Performance Graphics (HPG2011)　2011年8月
Performance of SOR methods on modern vector and scalar processors 査読有り

Takashi Soga, Akihiro Musa, Koki Okabe, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　215-221　2011年6月

DOI： 10.1016/j.compfluid.2010.12.024 　

ISSN：0045-7930
Parallel processing of the Building-Cube Method on a GPU platform 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　122-128　2011年6月

DOI： 10.1016/j.compfluid.2010.12.019 　

ISSN：0045-7930
ルーフラインモデルに基づくベクトルプロセッサ向けプログラム最適化戦略査読有り

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会論文誌：コンピューティングシステム(ACS)　4　(3)　77-87　2011年5月12日

ISSN：1882-7772
ウェイ適応型キャッシュのための低消費エネルギ指向挿入ポリシ査読有り

東方雄亮, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤シンポジウムSACSIS2011　2011　213-214　2011年5月
Power-aware insertion policy for the way-adaptable caches 査読有り

Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips XIV　2011年4月
実アプリケーションを用いたチップマルチベクトルプロセッサの消費エネルギ評価

永岡龍一, 佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 小林広明

情報処理学会研究報告(CD-ROM)　2010　(5)　ROMBUNNO.ARC-192,NO.3　2011年2月15日

ISSN：2186-2583
動的負荷分散機能を持つ高性能ボランティアコンピューティングの実現査読有り

村田善智, 石杜佑記, 滝沢寛之, 小林広明

情報処理学会論文誌　52　(2)　401-414　2011年2月15日
出版者・発行元：情報処理学会
ISSN：1882-7837
Performance Evaluation of Real-Time Stereo Correspondence on GPU

三浦衛, 札野欽也, 伊藤康一, 青木孝文, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会講演論文集　2011　31-31　2011年
出版者・発行元：電気関係学会東北支部連合大会実行委員会
DOI： 10.11528/tsjc.2011.0_31 　
A Self-Organized Overlay Network Management Mechanism for Heterogeneous Environments

Inaba Tsutomu, Takizawa Hiroyuki, Kobayashi Hiroaki

Information and Media Technologies　6　(2)　546-559　2011年
出版者・発行元： Information and Media Technologies 編集運営会議
DOI： 10.11185/imt.6.546 　

詳細を見る詳細を閉じる

The technologies of Cloud Computing and NGN are now growing a paradigm shift where various services are provided to business users over the network. In conjunction with this movement, many studies are active to realize a ubiquitous computing environment in which a huge number of individual users can share their computing resources on the Internet, such as personal computers (PCs), game consoles, sensors and so on. To realize an effective resource discovery mechanism for such an environment, this paper presents an adaptive overlay network that enables a self-organizing resource management system to efficiently adapt to a heterogeneous environment. The proposed mechanism is composed of two functions. One is to adjust the number of logical links of a resource, which forward search queries so that less-useful query flooding can be reduced. The other is to connect resources so as to decrease the communication latency on the physical network rather than the number of query hops on an overlay network. To further improve the discovery efficiency, this paper integrates these functions into a self-organizing resource management system, SORMS, which has been proposed in our previous work. The simulation results indicate that the proposed mechanism can increase the number of discovered resources by 60% without decreasing the discovery efficiency, and can reduce the total communication traffic by 80% compared with the original SORMS. This performance improvement is obtained by efficient control of logical links in a large scale network.
NVCR: A transparent checkpoint-restart library for NVIDIA CUDA 査読有り

Akira Nukada, Hiroyuki Takizawa, Satoshi Matsuoka

IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum　104-113　2011年

DOI： 10.1109/IPDPS.2011.131 　
Power-aware dynamic cache partitioning for CMPs 査読有り

Isao Kotera, Kenta Abe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　6590　135-153　2011年

DOI： 10.1007/978-3-642-19448-1_8 　

ISSN：0302-9743 1611-3349
OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法査読有り

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム(ACS)　5　(1)　53-67　2011年1月
OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法査読有り

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム(ACS)　4　(5)　2011年
A self-organized overlay network management mechanism for heterogeneous environments 査読有り

Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

Journal of Information Processing　19　(0)　25-38　2011年
出版者・発行元： Information Processing Society of Japan
DOI： 10.2197/ipsjjip.19.25 　

ISSN：1882-6652 0387-5806
A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems 査読有り

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011　135-142　2011年

DOI： 10.1109/ISPA.2011.36 　
Effects of 3-D stacked vector cache on energy consumption 査読有り

Ryusuke Egawa, Yusuke Funaya, Ryuichi Nagaoka, Yusuke Endo, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

2011 IEEE International 3D Systems Integration Conference, 3DIC 2011　2011年

DOI： 10.1109/3DIC.2012.6263026 　
CheCL: Transparent checkpointing and process migration of OpenCL applications 査読有り

Hiroyuki Takizawa, Kentaro Koyama, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011　864-876　2011年

DOI： 10.1109/IPDPS.2011.85 　
A performance tuning strategy under combining loop transforms for a vector processor with an on-chip cache 査読有り

Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC10)　2010年11月
Evaluating Performance and Portability of OpenCL Programs 査読有り

Kazuhiko Komatsu, Katsuto Sato, Yusuke Arai, Kentaro Koyama, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 5th international Workshop on Automatic Performance Tuning　1-15　2010年6月
Automatic tuning of CUDA execution parameters for stencil processing 査読有り

Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Software Automatic Tuning: From Concepts to State-of-the-Art Results　209-228　2010年
出版者・発行元： Springer New York
DOI： 10.1007/978-1-4419-6935-4_13 　
Lessons Learned from 1-Year Experience with SX-9 and Toward the Next Generation Vector Computing 査読有り

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoko Isobe

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009　3-+　2010年

DOI： 10.1007/978-3-642-03913-3_1 　
Cache partitioning strategies for 3-D stacked vector processors 査読有り

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2010, 3DIC 2010　1-6　2010年

DOI： 10.1109/3DIC.2010.5751453 　
A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications 査読有り

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

13TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN: ARCHITECTURES, METHODS AND TOOLS　412-415　2010年

DOI： 10.1109/DSD.2010.93 　
Efficient data management for the building cube method using cartesian meshes on the GPU platform 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

International Supercomputing Conference (ISC10)　2010年
A majority-based control scheme for way-adaptable caches 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　6310　16-28　2010年

DOI： 10.1007/978-3-642-16233-6_5 　

ISSN：0302-9743 1611-3349
Resisting sybil attack by social network and network clustering 査読有り

Ling Xu, Satayapiwat Chainan, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 2010 10th Annual International Symposium on Applications and the Internet, SAINT 2010　15-21　2010年

DOI： 10.1109/SAINT.2010.32 　
A Voting-Based Working Set Assessment Scheme for Dynamic Cache Resizing Mechanisms 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2010 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN　98-105　2010年

DOI： 10.1109/ICCD.2010.5647599 　

ISSN：1063-6404
Design and early evaluation of a 3-D die stacked chip multi-vector processor 査読有り

Ryusuke Egawa, Yusuke Funaya, Ryu-Ichi Nagaoka, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2010, 3DIC 2010　1-8　2010年

DOI： 10.1109/3DIC.2010.5751448 　
Performance of hemisphere algorithm for fast form factor calculation 査読有り

Noboru Yamada, Tomoaki Shinoda, Hiroyuki Takizawa

Heat Transfer - Asian Research　38　(7)　450-463　2009年11月

DOI： 10.1002/htj.20259 　

ISSN：1099-2871 1523-1496
キャッシュメモリを有するベクトルプロセッサのためのプログラム最適化手法

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会研究報告(CD-ROM)　2009　(3)　ROMBUNNO.ARC-184,6　2009年10月15日

ISSN：2186-2583
Working Sets based Thread Scheduling with Cache Partitioning 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Poster Abstracts of The Eighteenth International Conference on Parallel Architecture and Compilation Techniques　12　2009年9月
ワーキングセット評価に基づくスレッドスケジューリング

佐藤雅之, 小寺功, 江川隆輔, 滝沢寛之, 小林広明

並列/分散/協調処理に関する「仙台」サマー・ワークショップ (SWoPP仙台2009)　1-10　2009年8月
Early evaluation of a memory-stacked vector processor 査読有り

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips XII　165　2009年4月
A Cache-Aware Thread Scheduling Policy for Multi-Core Processors 査読有り

Masayuki Sato, Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

The IASTED International Conference on Parallel and Distributed Computing and Networks　2009年2月
実アプリケーションによるSX‐9の性能評価

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会シンポジウム論文集　2009　(2)　57-64　2009年1月15日

ISSN：1344-0640
Evaluating Computational Performance of Backpropagation Learning on Graphics Hardware 査読有り

Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

Electronic Notes in Theoretical Computer Science　225　(C)　379-389　2009年1月2日

DOI： 10.1016/j.entcs.2008.12.087 　

ISSN：1571-0661
3D On-Chip Memory for the Vector Architecture 査読有り

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION　352-357　2009年

ISSN：2164-0157
へミスフィア法による形態係数の高速算出性能査読有り

山田昇, 信田知暁, 滝沢寛之

日本機械学会論文集B　075　(749)　132-139　2009年1月
出版者・発行元：一般社団法人日本機械学会
DOI： 10.1299/kikaib.75.749_132 　

ISSN：0387-5016

詳細を見る詳細を閉じる

Development of fast and accurate algorithm of radiative heat transfer simulation is important in terms of efficient thermal design and simulation on diverse engineering area. This paper describes the performance of Hemisphere algorithm which has originally developed as a fast form factor calculation in the field of photorealistic three-dimensional computer graphics. We compared performance of the Hemisphere algorithm with two conventional methods which are frequently used in the field of radiative heat transfer simulation. As a result, the Hemisphere algorithm is significant faster than the conventional methods if one can accept an absolute error of 1.0×10^<-5>. In addition, the result indicates that the Hemisphere algorithm possibly suit for try and error process of large-scale model simulation due to its tolerable form factor distribution.
Characteristics of an On-Chip Cache on NEC SX Vector Architecture 査読有り

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Interdisciplinary Information Sciences　15　(1)　51-66　2009年1月
出版者・発行元： The Editorial Committee of the Interdisciplinary Information Sciences
DOI： 10.4036/iis.2009.51 　

ISSN：1340-9050

詳細を見る詳細を閉じる

Thanks to the highly effective memory bandwidth of the vector systems, they can achieve the high computation efficiency for computation-intensive scientific applications. However, they have been encountering the memory wall problem and the effective memory bandwidth rate has decreased, resulting in the decrease in the bytes per flop rates of recent vector systems from 4 (SX-7 and SX-8) to 2 (SX-8R) and 2.5 (SX-9). The situation is getting worse as many functions units and/or cores will be brought into a single chip, because the pin bandwidth is limited and does not scale. To solve the problem, we propose an on-chip cache, called vector cache, to maintain the effective memory bandwidth rate of future vector supercomputers. The vector cache employs a bypass mechanism between the main memory and register files under software controls. We evaluate the performance of the vector cache on the NEC SX vector processor architecture with bytes per flop rates of 2 B/FLOP and 1 B/FLOP, to clarify the basic characteristics of the vector cache. For the evaluation, we use the NEC SX-7 simulator extended with the vector cache mechanism. Benchmark programs for performance evaluation are two DAXPY-like loops and five leading scientific applications. The results indicate that the vector cache boosts the computational efficiencies of the 2 B/FLOP and 1 B/FLOP systems up to the level of the 4 B/FLOP system. Especially, in the case where cache hit rates exceed 50%, the 2 B/FLOP system can achieve a performance comparable to the 4 B/FLOP system. The vector cache with the bypass mechanism can provide the data both from the main memory and the cache simultaneously. In addition, from the viewpoints of designing the cache, we investigate the impact of cache associativity on the cache hit rate, and the relationship between cache latency and the performance. The results also suggest that the associativity hardly affects the cache hit rate, and the effects of the cache latency depend on the vector loop length of applications. The cache shorter latency contributes to the performance improvement of the applications with shorter loop lengths, even in the case of the 4 B/FLOP system. In the case of longer loop lengths of 256 or more, the latency can effectively be hidden, and the performance is not sensitive to the cache latency. Finally, we discuss the effects of selective caching using the bypass mechanism and loop unrolling on the vector cache performance for the scientific applications. The selective caching is effective for efficient use of the limited cache capacity. The loop unrolling is also effective for the improvement of performance, resulting in a synergistic effect with caching. However, there are exceptional cases; the loop unrolling worsens the cache hit rate due to an increase in the working space to process the unrolled loops over the cache. In this case, an increase in the cache miss rate cancels the gain obtained by unrolling.
Performance tuning and analysis of future vector processors based on the roofline model 査読有り

Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM International Conference Proceeding Series　7-14　2009年

DOI： 10.1145/1621960.1621962 　
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications 査読有り

Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009)　408-+　2009年

DOI： 10.1109/PDCAT.2009.78 　
Performance Evaluation of NEC SX-9 using Real Science and Engineering Applications 査読有り

Takashi Soga, Akihiro Musa, Youichi Shimomura, Ken'ichi Itakura, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS　2009年

DOI： 10.1145/1654059.1654088 　
Auction-based Resource Allocation for Activating Incentives in Resource Trading in Grid Computing 査読有り

Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of The 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications　252-260　2008年12月
Caching on a chip multi vector processor 査読有り

Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC08)　2008年11月
SPRAT:実行時自動チューニング機能を備えるストリーム処理記述用言語査読有り

滝沢寛之, 白取寛貴, 佐藤功人, 小林広明

情報処理学会論文誌：コンピューティングシステム(ACS)　1　(2)　207-220　2008年8月
A Reliability Model for Result Checking in Volunteer Computing 査読有り

Ling Xu, Hiroyuki Takizawa, Hiroaki Kobayashi

SAINT2008　201-204　2008年7月

DOI： 10.1109/SAINT.2008.25 　
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination 査読有り

Kazuhiro Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

情報処理学会論文誌：コンピューティングシステム(ACS)　1　(1)　85-95　2008年4月
大規模計算環境における分散協調型負荷分散手法査読有り

村田善智, 稲葉勉, 滝沢寛之, 小林広明

情報処理学会論文誌　49　(3)　1214-1228　2008年3月
A Parallel Image Generation Algorithm based on Photon Map Partitioning 査読有り

Masahide Tamura, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 10th IASTED International Conference on Computer Graphics and Imaging (CGIM 2008)　145-151　2008年2月
An Efficient Intersection Algorithm Design of Ray Tracing For Many-Core Graphics Processors 査読有り

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 10th IASTED International Conference on Computer Graphics and Imaging (CGIM 2008)　165-171　2008年2月
First Experiences with NEC SX-9.

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoichi Shimomura

High Performance Computing on Vector Systems　3-11　2008年
出版者・発行元： Springer
DOI： 10.1007/978-3-540-85869-0_1 　
Modeling of cache access behavior based on Zipf's law 査読有り

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　310　9-15　2008年

DOI： 10.1145/1509084.1509086 　

ISSN：1089-795X
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination 査読有り

Komatsu Kazuhiko, Kaeriyama Yoshiyuki, Suzuki Kenichi, Takizawa Hiroyuki, Kobayashi Hiroaki

IPSJ Online Transactions　1　(1)　1-11　2008年
出版者・発行元：一般社団法人情報処理学会
DOI： 10.2197/ipsjtrans.1.1 　

ISSN：1882-6660

詳細を見る詳細を閉じる

Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.
The potential of on-chip memory systems for future vector architectures 査読有り

Hiroaki Kobayashi, Akihiko Musa, Yoshiei Sato, Hiroyuki Takizawa, Koki Okabe

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2007　247-+　2008年
A Utility-based Double Auction Mechanism for Efficient Grid Resource Allocation 査読有り

Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS　252-260　2008年

DOI： 10.1109/ISPA.2008.103 　
SPRAT: Runtime Processor Selection for Energy-aware Computing 査読有り

Hiroyuki Takizawa, Katuto Sato, Hiroaki Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING　386-393　2008年

ISSN：1552-5244
A Performance Study of Secure Data Mining on the Cell Processor 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS　633-+　2008年

DOI： 10.1109/CCGRID.2008.16 　
Implementation and Evaluation of a Distributed and Cooperative Load-Balancing Mechanism for Dependable Volunteer Computing 査読有り

Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS & NETWORKS WITH FTCS & DCC　316-+　2008年

DOI： 10.1109/DSN.2008.4630100 　

ISSN：1530-0889
Consideration of resource access history for optimizing overlay networks in P2P-based resource discovery 査読有り

Tsutomu Inaba, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayash

Proceedings - 2008 International Symposium on Applications and the Internet, SAINT 2008　269-272　2008年

DOI： 10.1109/SAINT.2008.104 　
SPRAT: Runtime processor selection for energy-aware computing 査読有り

Hiroyuki Takizawa, Katuto Sato, Hiroaki Kobayashi

Proceedings - IEEE International Conference on Cluster Computing, ICCC　2008　386-393　2008年
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/CLUSTR.2008.4663799 　

ISSN：1552-5244
A shared cache for a chip multi vector processor 査読有り

Akihiro Musa, Yoshiei Sato, Takashi Soga, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　310　24-29　2008年

DOI： 10.1145/1509084.1509088 　

ISSN：1089-795X
Effects of MSHR and Prefetch Mechanisms on an On-Chip Cache of the Vector Architecture 査読有り

Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS　335-+　2008年

DOI： 10.1109/ISPA.2008.100 　
A Progressive 3-D Meshing Algorithm for Interactive Simulation of Soft Bodie 査読有り

SAOI Tomoyuki, TAKIZAWA Hiroyuki, KOBAYASHI Hiroaki

Journal of INFORMATION　10　(6)　761-776　2007年12月
A dependable Peer-to-Peer computing platform 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE　23　(8)　939-955　2007年11月

DOI： 10.1016/j.future.2007.03.004 　

ISSN：0167-739X

eISSN：1872-7115
Early evaluation of on-chip vector caching for the NEC SX vector architecture 査読有り

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC07)　2007年11月
An Efficient Control Mechanism for Self-Organizing Overlay Networks of Large-Scale P2P Systems 査読有り

Hiroaki Kobayashi, Hiroyuki Takizawa, Takuro Okawa, Tsutomu Inaba

Interdisciplinary Information Sciences　13　(2)　227-237　2007年9月18日
出版者・発行元：東北大学
DOI： 10.4036/iis.2007.227 　

ISSN：1340-9050

詳細を見る詳細を閉じる

P2P (Peer to Peer) has a great potential to handle highly-distributed computing resources and is expected to be a key technology to realize ubiquitous computing environments over the Internet. However, P2P systems tend to waste the network bandwidth for resource acquisition because of their decentralized resource management. This paper presents an efficient control mechanism for self-organizing overlay networks of large-scale P2P systems, and evaluate its performance in detail. The overlay network is configured by making local clusters reflect current interests of individual peers and connecting them together based on their similarity. As a result, the overlay network provides the resource exploitation space for some specific interests. In addition, the overlay network can dynamically be reconfigured based on the change in the interests of individual peers across time so that more useful peers at that time can be reconnected closer to their client peers. Therefore, multicasting of resource requesting messages can be carried out only over peers with similar interests that are dynamically connected through the overlay network, resulting in a remarkable decrease in both messages for resource acquisition and hops a resource requesting query travels to reach the peer that satisfies the request. Experimental results indicate that the proposed mechanism can realize effective self-organization of the overlay network in which useful peers are dynamically relocated around client peers. In addition, the adaptive allocation of links to peers according to their capability works well to keep the higher performance and fault-tolerance of the self-organizing overlay network.
A Power-Aware Shared Cache Mechanism ased on Locality Assessment of memory Reference for CMPs 査読有り

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 8th MEDEA workshop　121-128　2007年9月16日
SMTプロセッサの実行時性能予測のためのハードウェアリソース競合解析査読有り

佐藤雅之, 船矢祐介, 小寺功, 滝沢寛之, 小林広明

情報科学技術レターズ　6　67-70　2007年9月5日
消費電力を考慮したウェイアロケーション型共有キャッシュ機構査読有り

小寺功, 滝沢寛之, 小林広明

情報科学技術レターズ　6　55-58　2007年9月5日
Partial distortion entropy maximization for online data clustering 査読有り

Hiroyuki Takizawa, Hiroaki Kobayashi

NEURAL NETWORKS　20　(7)　819-831　2007年9月

DOI： 10.1016/j.neunet.2007.04.029 　

ISSN：0893-6080
An Estimation-Based Redundant Task Dispatch Policy for Volunteer Computing Platforms 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference on Dependable Systems and Networks　348-349　2007年6月25日

詳細を見る詳細を閉じる

Fast Abstract (Supplemental Volume)
A fair-sharing and power-aware L2 cache system for chip multiprocessors 査読有り

Isao Kotera, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips X　2007年4月
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs 査読有り

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　113-120　2007年

DOI： 10.1145/1327171.1327185 　

ISSN：1089-795X
Preliminary evaluation for runtime auto-tuning of GPGPU applications 査読有り

Hiroyuki Takizawa, Hiroki Shiratori, Hiroaki Kobayashi

The 2nd International Workshop on Automatic Performance Tuning　37-37　2007年
Performance Evaluation of K-Means Clustering on the Cell Processor 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2007)　2007　(1)　161-168　2007年1月
A memory-efficient scheme for fast spectral photon mapping 査読有り

Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF THE NINTH IASTED INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS AND IMAGING　75-80　2007年
An on-chip cache design for vector processors 査読有り

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　17-23　2007年

DOI： 10.1145/1327171.1327173 　

ISSN：1089-795X
An estimation-based redundant task dispatch policy for volunteer computing platforms 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

The International Conference on Dependable Systems and Networks　348-349　2007年
P2P型資源検索システムにおける動的論理リンク管理機構査読有り

大川拓郎, 滝沢寛之, 小林広明

情報技術レターズ　5　363-366　2006年9月
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
スレッド特徴量に基づくマルチコアプロセッサスケジューリング査読有り

船矢祐介, 小寺功, 滝沢寛之, 小林広明

情報技術レターズ　5　37-40　2006年9月
Towards Effective GPU Implementation of Neural Networks 査読有り

Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

The 4th International Conference on Information-MFCSIT’06　408-411　2006年8月
Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing 査読有り

H Takizawa, H Kobayashi

JOURNAL OF SUPERCOMPUTING　36　(3)　219-234　2006年6月

DOI： 10.1007/s11227-006-8294-1 　

ISSN：0920-8542
Design and Implementation of an Efficient Search Mechanism based on the Hybrid P2P Model for Ubiquitous Computing Systems 査読有り

T Inaba, T Okawa, Y Murata, H Takizawa, H Kobayashi

INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET , PROCEEDINGS　45-+　2006年

DOI： 10.1109/SAINT.2006.23 　
A distributed and cooperative load balancing mechanism for large-scale P2P systems 査読有り

Y Murata, T Inaba, H Takizawa, H Kobayashi

INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS　126-129　2006年

DOI： 10.1109/SAINT-W.2006.2 　
Radiative heat transfer simulation using programmable graphics hardware 査読有り

Hiroyuki Takizawa, Noboru Yamada, Seigo Sakai, Hiroaki Kobayashi

Proceedings - 5th IEEE/ACIS Int. Conf. on Comput. and Info. Sci., ICIS 2006. In conjunction with 1st IEEE/ACIS, Int. Workshop Component-Based Software Eng., Softw. Archi. and Reuse, COMSAR 2006　2006　29-37　2006年

DOI： 10.1109/ICIS-COMSAR.2006.70 　
Implications of memory performance for highly efficient supercomputing of scientific applications 査読有り

Akihiro Musa, Hiroyuki Takizawa, Koki Okabe, Takashi Soga, Hiroaki Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS　4330　845-+　2006年

ISSN：0302-9743
大規模P2Pシステムにおける計算資源探索のモデル化と性能評価査読有り

大川拓郎, 滝沢寛之, 小林広明

情報科学技術フォーラム(FIT2005)情報技術レターズ　4　21-24　2005年9月
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
HPC Challengeベンチマークを用いたSX-7システムの性能評価査読有り

滝沢寛之, 小久保達信, 片海健亮, 小林広明

情報処理学会論文誌　46　(SIG 12(ACS 11))　37-45　2005年8月

詳細を見る詳細を閉じる

Also presented at SASCIS2005(May 2005)
An Incremental Photon-Mapping Algorithm for Fast Walk-Through Animations 査読有り

Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

Computer Graphics and Imaging (CGIM 2005)　2005年8月
Locality Analysis to Control Dynamically Way-Adaptable Caches 査読有り

KOBAYASHI Hiroaki, KOTERA Isao, TAKIZAWA Hiroyuki

ACM SIGARCH Computer Architecture News　33　(3)　25-32　2005年6月

DOI： 10.1145/1101868.1101874 　
Evaluation of Large-Scale Remote Interactive Visualization via Super SINET 査読有り

TAKIZAWA Hiroyuki, KOBAYASHI Hiroaki

Journal of INFORMATION　8　(3)　383-390　2005年5月
HPC Challengeベンチマークを用いたSX-7システムの性能評価査読有り

滝沢寛之, 小久保達信, 片海健亮, 小林広明

先進的計算基盤システムシンポジウム(SACSIS2005)　2005　(5)　25-33　2005年5月
P2Pコンピューティングのための分散協調スケジューリング機構

村田善智, 稲葉努, 滝沢寛之, 小林広明

先端的ネットワーク＆コンピューティングテクノロジワークショップ　(33)　23-30　2005年1月24日
A self-organizing overlay network to exploit the locality of interests for effective resource discovery in P2P systems 査読有り

H Kobayashi, H Takizawa, T Inaba, Y Takizawa

2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS　246-255　2005年
A P2P Semantic Information Search Mechanism for Ubiquitous Grid Computing Systems

Tsutomu Inaba, Takuro Okawa, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

先端的ネットワーク＆コンピューティングテクノロジワークショップ　(33)　45-52　2005年1月
A workflow management mechanism for peer-to-peer computing platforms 査読有り

H Wang, H Takizawa, H Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS　3758　827-832　2005年

ISSN：0302-9743
Efficient parallel processing of competitive learning algorithms 査読有り

K Sano, S Momose, H Takizawa, H Kobayashi, T Nakamura

PARALLEL COMPUTING　30　(12)　1361-1383　2004年12月

DOI： 10.1016/j.parco.2004.10.001 　

ISSN：0167-8191

eISSN：1872-7336
Evaluation of Large-Scale Remote Interactive Visualization via Super SINET 査読有り

TAKIZAWA Hiroyuki, KOBAYASHI Hiroaki

The 3rd International Conference on Information (INFO'2004)　3　2004年11月
スーパーSINETを介した大規模遠隔対話的可視化の評価実験

滝沢寛之, 小林広明

全国共同利用情報基盤センター研究開発論文集　26　24-29　2004年11月
An Effective Control Mechanism for Way-Adaptable Caches

KOTERA Isao, TAKIZAWA Hiroyuki, KOBAYASHI Hiroaki

電気関係学会東北支部連合大会　2004年8月
スーパーSINETを利用した大規模遠隔可視化処理の評価

滝沢寛之, 小林広明

東北大学情報シナジーセンター年報　3　90-96　2004年6月
出版者・発行元：東北大学情報シナジーセンター
グリッドミドルウェアGlobusの資源探索と通信に関するオーバヘッドの定量的評価

村田善智, 稲葉勉, 滝沢寛之, 小林広明

東北大学情報シナジーセンター年報　3　115-123　2004年6月
出版者・発行元：東北大学情報シナジーセンター
An Effective Implementation of Vector Quantization Encoder on Commodity Graphics Hardware 査読有り

Hiroyuki TAKIZAWA, Hiroaki KOBAYASHI

Proceedings of the 2nd International Conference on Information Technology and Applications(ICITA2004)　2004年1月
A fast computation scheme of partial distortion entropy updating 査読有り

H Takizawa, F Kobayashi

ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS　1　736-741　2004年

DOI： 10.1109/ITCC.2004.1286555 　
Multi-grain parallel processing of data-clustering on programmable graphics hardware 査読有り

H Takizawa, H Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS　3358　(3358)　16-27　2004年

ISSN：0302-9743
グリッド用動的資源管理のための自己組織化P2Pネットワークに関する一検討

瀧澤泰明, 滝沢寛之, 佐野健太郎, 小林広明, 中村維男

情報処理学会東北支部研究会　2003年11月
画像のエッジ劣化を抑制するベクトル量子化符号帳設計査読有り

滝沢寛之, 三浦健, 小林広明, 中村維男

FIT2003 情報科学技術フォーラム情報技術レターズ　2　(2)　243-244　2003年9月
Vector quantization codebook design using the law-of-the-jungle algorithm 査読有り

H Takizawa, T Nakajima, K Sano, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E86D　(6)　1068-1077　2003年6月

ISSN：0916-8532
A Comparison Study Of Vector Quantization Codebook Design Algorithms Based On The Equidistortion Principle 査読有り

Hiroyuki Takizawa, Taira Nakajima, Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the 21st IASTED International Multi-Conference on Applied Informatics(AI2003)　255-261　2003年3月
A Decision Criterion to Relocate Codewords for Adaptive Vector Quantization 査読有り

H. Takizawa

Proceedings of the 21st IASTED International Multi-Conference on Applied Informatics(AI2003)　262-268　2003年2月
Parallel Algorithm for the Law-of-the-Jungle Learning to the Fast Design of Optimal Codebooks 査読有り

Kentaro Sano, Shintaro Momose, Hiroyuki Takizawa, Taira Nakajima, Clecio Donizete Lima, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the 14th IASTED International Conference on Parallel and Distributed Computing and Systems(PDCS2002)　723-728　2002年11月
Practical Volume Compression based on Vector Quantization using the Law-of-the-Jungle Algorithm 査読有り

Kentaro Sano, Hiroyuki Takizawa, Taira Nakajima, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the 2nd IASTED International Conference on Visualization, Imaging and Image Processing(VIIP2002)　519-526　2002年9月
視覚的画質劣化を抑制するベクトル量子化手法査読有り

三浦健, 滝沢寛之, 佐野健太郎, 中島平, 小林広明, 中村維男

情報科学技術フォーラム(FIT) Information Technology Letters　185-186　2002年9月
ベクトル量子化のためのコードブック生成並列処理に関する研究

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, 小林広明, 中村維男

並列/協調/分散処理に関する「湯布院」サマーワークショップ資料　2002年8月
新潟大学総合情報処理センターコンピュータシステムの更新

滝沢寛之

新潟大学総合情報処理センター年報　(13)　21-27　2002年3月
PC-UNIX導入時の不正アクセス対策

滝沢寛之

新潟大学総合情報処理センター年報NIICE　12　(12)　13-19　2001年3月
出版者・発行元：新潟大学
An active learning algorithm based on existing training data 査読有り

H Takizawa, T Nakajima, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E83D　(1)　90-99　2000年1月

ISSN：0916-8532
A topology preserving neural network for nonstationary distributions 査読有り

T Nakajima, H Takizawa, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E82D　(7)　1131-1135　1999年7月

ISSN：0916-8532
A self-organizing network system forming memory from nonstationary probability distributions 査読有り

T. Nakajima, H. Takizawa, H. Kobayashi, T. Nakamura

Proceedings of IJCNN99　1999年7月
Acceleration techniques for the network inversion algorithm 査読有り

H Takizawa, T Nakajima, M Nishi, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E82D　(2)　508-511　1999年2月

ISSN：0916-8532
ニューラルネットワーク(クロストークリンク付きBPD)のFSK復調への応用査読有り

西正明, 降谷順治, 滝沢寛之, 中村維男

日本産業技術教育学会誌　41　(1)　9-16　1999年1月
Kohonen learning with a mechanism, the law of the jungle, capable of dealing with nonstationary probability distribution functions 査読有り

T Nakajima, H Takizawa, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E81D　(6)　584-591　1998年6月

ISSN：0916-8532
ウエーブレット変換を用いた顔画像処理に関する一考察

飯村海児, 滝沢寛之, 中島平, 小林広明, 中村維男

電気関係学会東北支部連合大会　1998年
多層パーセプロトンの分類能力向上法に関する一検討査読有り

滝沢寛之, 中島平, 小林広明, 中村維男

電子情報通信学会論文誌　J80-D-II　(1)　390-393　1997年1月
Facial expression recognition using neural networks capable of recognizing at an infant level 査読有り

T. Nakajima, H. Takizawa, M. Simamura, H. Kobayashi, T. Nakamura

Proceedings of WAIMH 6th Congress　66-0　1996年7月
ニューラルネットワークの最適学習法に関する一考察

滝沢寛之, 中島平, 小林広明, 中村維男

情報処理学会東北支部連合大会　1996年
ニューラルネットワークを利用した自動感情認識システム

中島平, 滝沢寛之, 島村三重子, 小林広明, 中村維男

電子情報通信学会ソサイエティ大会　1995年
ニューラルネットワークを用いた顔画像認識について

滝沢寛之, 中島平, 島村三重子, 小林広明, 中村維男

電気関係学会東北支部連合大会　1995年

︎全件表示 ︎最初の5件までを表示

MISC 60

外気環境を考慮した1-100GHz帯の人体全身ばく露における深部温度上昇評価

億田龍太朗, 小寺紗千子, 滝沢寛之, 平田晃正

電子情報通信学会技術研究報告(Web)　124　(357(EST2024 94-122))　2025年

ISSN： 2432-6380
8都道府県における熱中症搬送人員数予測

高田旭登, 江川隆輔, 滝沢寛之, 平田晃正

電子情報通信学会大会講演論文集(CD-ROM)　2022　2022年

ISSN： 1349-144X
短期暑熱順化を考慮した高齢者の熱中症搬送人員数予測

西村卓, 小寺紗千子, 滝沢寛之, 江川隆輔, 江川隆輔, 平田晃正

電子情報通信学会大会講演論文集(CD-ROM)　2020　2020年

ISSN： 1349-144X
ベクトルプロセッサからFPGAへのタスクオフロードに関する一考察

土方康平, 上野知洋, 江川隆輔, 滝沢寛之, 佐野健太郎

電子情報通信学会技術研究報告　119　(371(VLD2019 54-93))　2020年

ISSN： 0913-5685
RDMAを用いた密結合FPGAクラスタのメモリ間通信性能

上野知洋, 佐野健太郎, 土方康平, 滝沢寛之

電子情報通信学会技術研究報告　119　(18(RECONF2019 1-19)(Web))　2019年

ISSN： 0913-5685
HPGMG-FVを用いたSX-ACEの性能評価

江川隆輔, 磯部洋子, 加藤季広, 小松一彦, 滝沢寛之, 小林広明, 撫佐昭裕

SENAC : 東北大学大型計算機センター広報　50　(3)　15-18　2017年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
Xevolverによる大気・海洋結合マルチスケールモデルMSSGの性能最適化コード管理の評価

板倉憲一, 小松一彦, 江川隆輔, 滝沢寛之

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集　(2017)　12-12　2017年5月29日
計算科学・計算機科学人材育成のためのスーパーコンピュータ無償提供利用報告情報科学研究科超高速情報処理論利用報告

滝沢寛之, 江川隆輔, 後藤英昭

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　50　(3)　23-27　2017年
SX-ACEにおけるHPCG ベンチマークの性能評価

小松一彦, 江川隆輔, 磯部洋子, 緒方隆盛, 滝沢寛之, 小林広明

SENAC : 東北大学大型計算機センター広報　48　(3)　14-19　2015年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
東北大学サイバーサイエンスセンター高速化推進研究活動報告書（第6号）

小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 小松一彦, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修他

2015年4月
Xevolverを用いた自動チューニング (特集エクサスケール時代に向けた数値計算処理の自動チューニングの進展)

平澤将一, 肖熊, 滝沢寛之

計算工学　20　(2)　3258-3261　2015年
出版者・発行元：日本計算工学会
ISSN： 1341-7622
Xevolverを用いた自動チューニング

平澤将一, 肖熊, 滝沢寛之, 小林広明

計算工学会学会誌「計算工学」　20　(2)　14-17　2015年
Heuristic Data Partitioning for Social Networking Service

Sugianto Angkasa, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(34)　1-8　2013年12月9日

詳細を見る詳細を閉じる

Managing SNS data is expensive because SNS data have an explosive growth and are highly interconnected. Yet, because of the high interconnectivity of the data, every Read/Write activity of a user is associated with all of his/her friends. The response time for accessing the SNS data generally increases if the data of users and their many connections (friends/followers) are widely located over the network. Most SNS providers are commercial companies and hence need a cost-effective solution to SNS data management. In this paper, we propose a heuristic data partitioning mechanism to store all related data of pairs of users in the same place if they have frequent interaction. Moreover, our mechanism uses activity-based replication. For instance, more replicas are created for active users than inactive users. In performance evaluation against the MySQL random partitioning using real Facebook and Twitter datasets, the proposed heuristic data partitioning and replication mechanism is able to reduce the average response time of the read and write accesses by 53% and by 50%, respectively.Managing SNS data is expensive because SNS data have an explosive growth and are highly interconnected. Yet, because of the high interconnectivity of the data, every Read/Write activity of a user is associated with all of his/her friends. The response time for accessing the SNS data generally increases if the data of users and their many connections (friends/followers) are widely located over the network. Most SNS providers are commercial companies and hence need a cost-effective solution to SNS data management. In this paper, we propose a heuristic data partitioning mechanism to store all related data of pairs of users in the same place if they have frequent interaction. Moreover, our mechanism uses activity-based replication. For instance, more replicas are created for active users than inactive users. In performance evaluation against the MySQL random partitioning using real Facebook and Twitter datasets, the proposed heuristic data partitioning and replication mechanism is able to reduce the average response time of the read and write accesses by 53% and by 50%, respectively.
マルチプラットフォームにおける最適化手法の効果に関する一検討

小松一彦, 佐々木俊英, 江川隆輔, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(24)　1-7　2013年7月24日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

近年，HPC システムの多様化が進んでおり，特徴の異なる複数種類の HPC システムにおいて高い性能を引き出すことができる，性能可搬性の高い HPC コードの開発が強く求められている．本研究では，各種 HPC システム向けの最適化手法が HPC コードの性能に与える効果を詳細に解析し，その知見に基づいて性能可搬性の高い HPC コードを開発することを目的としている．本報告では，異なる手動最適化同士や自動最適化を組み合わせた場合の HPC コードの性能可搬性を解析する．HPC システムごとに，それぞれの手動最適化同士や自動最適化の組み合わせによる相乗効果を評価し，性能可搬性の低下を引き起こす可能性のある最適化について議論する．
チューニング対象の限定による効率の良い性能可搬性向上手法

平澤将一, 秋葉諒, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(19)　1-8　2013年5月22日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

計算システムの多様化に伴い，既存の科学技術計算プログラムを新たな計算システムへ移植し性能を最適化する作業がしばしば求められている．しかしながら大規模な科学技術計算プログラムの移植および性能最適化には多大な労力が必要となり，問題となっている．本研究では，性能可搬性向上を目的とした場合に優先的に性能最適化を行うべきソースコードの箇所を限定し，効率良くアプリケーション全体の性能可搬性を向上させる手法を提案する．ベンチマークプログラムおよび実アプリケーションによる評価の結果，提案手法はアプリケーション全体の性能可搬性を効率よく向上させるために，最適化すべきソースコードの部位を限定できることが示された．
Message from the chairs of iWAPT 2012

Hiroyuki Takizawa, Richard Vuduc, Takeshi Iwashita

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　7851　2013年

DOI： 10.1007/978-3-642-38718-0 　

ISSN： 0302-9743 1611-3349
複合システムにおけるチェックポイントリスタート

滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

日本信頼性学誌　35　(7)　2013年

DOI： 10.11348/reajshinrai.35.8_515 　
統合開発環境と連携するポータブルなビルドシステム

平澤将一, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(28)　1-8　2012年9月26日

詳細を見る詳細を閉じる

本研究では，性能可搬性を保ちつつアプリケーションを開発するためのフレームワーク構築に向けて，ポータブルなビルドシステムを開発する．現在の高性能計算 (High-Performance Computing, HPC) システムの構成は複雑化しており，アプリケーションを実行せずにその実効性能を予測することは困難である．このため本研究では，開発中のアプリケーションを定期的に実行し，その性能プロファイルを暗黙裡に取得して性能可搬性の低い個所を特定し，プログラマに対話的に提示することにより性能可搬性の維持を支援することを想定している．そのようなアプリケーション開発補助ツールを実現するためには，開発中のアプリケーションを暗黙裡に様々なシステム上でビルドし，実行する機能が必要である．本研究では，そのような可搬性を有するビルドシステムを開発し，アプリケーション開発支援環境として必要な機能を議論する．
ナノ粒子群形成アプリケーションのOpenACCによる実装と性能評価

菅原誠, 小松一彦, 平澤将一, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(10)　1-7　2012年9月26日

詳細を見る詳細を閉じる

本論文では，熱プラズマによるナノ粒子群創製プロセスにおける集団的粒子形成過程をシミュレーションするナノ粒子群形成アプリケーションを OpenACC と OpenCL を用いて実装し，両者を比較検討する． OpenACC は既存のプログラムにディレクティブを追記することにより容易に GPU を利用することが可能である．それに対して， OpenCL はより低い抽象度でのプログラミングが可能である．プログラム可能な抽象度がそれぞれ異なるため，実現可能な最適化技法が異なる．各最適化技法の性能評価により， OpenACC では CPU 実行時の最大約 1.9 倍の性能向上を， OpenCL では最大約 5.6 倍の性能向上を達成できることが分かった．また，現状の OpenACC において達成可能な性能限界と，高い性能を得るためには， OpenCL のような低い抽象度での最適化が必要であることを議論する．This paper presents an implementation of the plasma-assisted nanopowdergrowth simulation with OpenACC. OpenACC provides compiler directives to allow an existing application to use GPUs. On the other hand, OpenCL is a lower-level programming model. Since OpenACC and OpenCL offer programming models of different abstraction levels, they require different optimizations for a given application code. Therefore, in this paper, several versions of a practical application, the nanopowder growth simulation, are implemented using different optimizations. Then, the performance impact of each optimization is discussed through some experimental results. The evaluation results show that OpenACC and OpenCL can achieve 1.9x and 5.6x performance improvements, respectively. It is also demonstrated that the current version of OpenACC requires low-level performance tuning such as OpenCL programming in order to achieve a high performance comparable with OpenCL.
大規模計算システムにおけるBCMの性能評価

小松一彦, 曽我隆, 江川隆輔, 滝沢寛之, 小林広明

SENAC : 東北大学大型計算機センター広報　45　(3)　17-25　2012年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
プログラム自動生成技術に基づくGPUコンピューティングの性能評価

菅原誠, 佐藤功人, 小松一彦, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2011　(18)　1-7　2011年7月20日

詳細を見る詳細を閉じる

近年，描画処理用プロセッサ (Graphics Processing Unit: GPU) をアクセラレータとして利用して高速化を実現する複合型計算システムが普及しつつある．しかし，GPU を利用するためには，既存のプログラムを GPU 向けのプログラムに移植する必要があり，移植コストが問題となっている．本論文では，既存のプログラムにディレクティブを追記することにより GPU 向けのプログラムを自動生成する技術に着目し，その実用性と実効性能を評価する．また，ディレクティブを用いることで実現できる最適化を示す．そして，単純な行列積のプログラムを用いて性能を評価し，自動生成されたプログラムが実用的な性能を実現できることを示す．Recently, heterogeneous computing systems that achieve high-performance computing by using Graphics Processing Units (GPUs) as accelarators draw much attention in the area of computation sciences. However, a problem in use of GPUs is that it is necessary to port an existing program to a program for GPUs. To relieve the porting effort, this paper focuses on the technology to automatically generate a GPU program by inserting directives into an existing sequential code and evaluates the sustained performance of the auto-generated program. In addition, we show the achievable code optimizations by using directives. A simple matrix multiplication program is used for the evaluation to demonstrate that the automatically generated code can achieve a high sustained performance.
ボランティアコンピューティングにおける締切時間を考慮したクライアントレベルスケジューリング手法

村田善智, 遠藤聡明, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤システムシンポジウム論文集　2011　45-54　2011年5月18日
ルーフラインモデルに基づくベクトルプロセッサ向けプログラム最適化戦略

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　4　(3)　77-87　2011年5月12日

ISSN： 1882-7829

詳細を見る詳細を閉じる

ベクトルプロセッサにおけるピーク演算性能に対するメモリバンド幅（Bytes/Flop，以下，B/F）は年々減少している．このため近年のベクトルプロセッサは，低下する B/F を補うためにキャッシュメモリを搭載している．本研究の目的は，キャッシュメモリを有するベクトルプロセッサにおいて高い実行効率を実現するプログラム最適化手法を確立することである．複数のプログラム最適化手法を適用する場合，各々の最適化パラメータにおいてトレードオフが存在する．さらに，これらの最適化を併用する場合には互いの最適化パラメータが影響しあうため，体系的に最良のトレードオフを探索するプログラム最適化戦略が求められる．本論文では，キャッシュを有するベクトルプロセッサの性能を引き出すためのプログラム最適化戦略を提案する．最適化戦略では，最適化の対象となるプログラムのボトルネックをルーフラインモデルにより解析し，ボトルネックを改善する最適化手法を対象プログラムに施す．また，最適化手法として本論文では，ループ変換によるプログラム最適化であるループアンローリングとキャッシュブロッキングに着目する．さらに適用する最適化パラメータは，グリーディサーチアルゴリズムによる探索で決定する．そして，複数のアプリケーションを用いて実効性能と消費エネルギーを評価し，本提案手法の優位性を示す．評価結果より，提案手法を用いることで実効性能が改善でき，さらに消費エネルギーを大幅に削減できることが明らかになった．Over the last decade, the ratio of memory bandwidth to computational performance (Bytes/Flop, B/F) of vector processors has decreased. To cover the insufficient B/F, modern vector processors are equipped with an on-chip vector cache. The purpose of this work is to establish a performance tuning strategy to exploit the potential of modern vector processors. When several tuning techniques are applied to an application, there is an explicit trade-off between individual tuning techniques. Therefore, a tuning strategy which finds a good trade-off between individual tuning techniques is required. In this paper, a tuning strategy based on the roofline model for modern vector processors is proposed. We focus on two important loop transformations. One is loop unrolling and the other is cache blocking. To decide which of loop unrolling and cache blocking is performed first, the roofline model is employed to analyze the performance bottleneck of a target application. Then, the optimization effective to remove the bottleneck is applied to the application preferentially. To determine the number of loop unrolls and the cache blocking size, we employ the greedy search algorithm. The superiority of the strategy is evaluated with several applications. The evaluation results show that the strategy can improve the performance and also drastically reduce the energy consumption.
東北大学サイバーサイエンスセンター高速化推進研究活動報告書（第5号）

小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 伊藤英一, 大泉健治, 小野敏, 小久保達信, 橋本ユキ子, 磯部洋子, 撫佐昭裕, 神山典, 金野浩伸

2011年4月
チップマルチベクトルプロセッサのためのプログラム最適化技術

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　44　(2)　29-36　2011年4月
A Self-Organized Overlay Network Management Mechanism for Heterogeneous Environments (特集分散処理とネットワークサービス) -- (P2Pネットワーク)

Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

情報処理学会論文誌　52　(2)　320-333　2011年2月15日
出版者・発行元：情報処理学会
ISSN： 1882-7764

詳細を見る詳細を閉じる

The technologies of Cloud Computing and NGN are now growing a paradigm shift where various services are provided to business users over the network. In conjunction with this movement, many studies are active to realize a ubiquitous computing environment in which a huge number of individual users can share their computing resources on the Internet, such as personal computers (PCs), game consoles, sensors and so on. To realize an effective resource discovery mechanism for such an environment, this paper presents an adaptive overlay network that enables a self-organizing resource management system to efficiently adapt to a heterogeneous environment. The proposed mechanism is composed of two functions. One is to adjust the number of logical links of a resource, which forward search queries so that less-useful query flooding can be reduced. The other is to connect resources so as to decrease the communication latency on the physical network rather than the number of query hops on an overlay network. To further improve the discovery efficiency, this paper integrates these functions into a self-organizing resource management system, SORMS, which has been proposed in our previous work. The simulation results indicate that the proposed mechanism can increase the number of discovered resources by 60% without decreasing the discovery efficiency, and can reduce the total communication traffic by 80% compared with the original SORMS. This performance improvement is obtained by efficient control of logical links in a large scale network.The technologies of Cloud Computing and NGN are now growing a paradigm shift where various services are provided to business users over the network. In conjunction with this movement, many studies are active to realize a ubiquitous computing environment in which a huge number of individual users can share their computing resources on the Internet, such as personal computers (PCs), game consoles, sensors and so on. To realize an effective resource discovery mechanism for such an environment, this paper presents an adaptive overlay network that enables a self-organizing resource management system to efficiently adapt to a heterogeneous environment. The proposed mechanism is composed of two functions. One is to adjust the number of logical links of a resource, which forward search queries so that less-useful query flooding can be reduced. The other is to connect resources so as to decrease the communication latency on the physical network rather than the number of query hops on an overlay network. To further improve the discovery efficiency, this paper integrates these functions into a self-organizing resource management system, SORMS, which has been proposed in our previous work. The simulation results indicate that the proposed mechanism can increase the number of discovered resources by 60% without decreasing the discovery efficiency, and can reduce the total communication traffic by 80% compared with the original SORMS. This performance improvement is obtained by efficient control of logical links in a large scale network.
実アプリケーションを用いたチップマルチベクトルプロセッサの消費エネルギ評価

永岡龍一, 佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2010　(3)　1-8　2010年12月9日
出版者・発行元：情報処理学会
ISSN： 1884-0930

詳細を見る詳細を閉じる

ベクトル型スーパーコンピュータは高精度・大規模なシミュレーションを可能とする一方で，高実行効率を支える高いメモリバンド幅や大容量のメモリに要する消費電力が問題となっている．したがって，今後のベクトル型スーパーコンピュータの設計では，高性能化だけではなく，低消費電力化の実現も求められている．高性能かつ低消費電力なベクトル処理を実現するアーキテクチャとしてチップマルチベクトルプロセッサ (CMVP) が提案されている．しかし，これまで消費エネルギの観点から CMVP の評価はなされていない．そこで本稿では，CMVP の電力モデルを検討し，CMVP におけるベクトルキャッシュの有効性を実アプリケーションにより評価する．High performance computing using vector supercomputers has been shown to be effective for scientific simulations. However, a memory system of vector supercomputers requires the high-energy consumption to keep a high-memory bandwidth. To achieve high sustained performance and low energy consumption, a chip multi-vector processor (CMVP) has been proposed. However, a CMVP has not been evaluated from the point of view of energy consumption. Therefore, we evaluate the energy consumption of a CMVP. First, we establish an energy consumption model of a CMVP to analyze the energy consumption. Then, we evaluate the energy consumption to compare the several designs of varying hardware parameters.
An Out-of-order Vector Processing Mechanism for Multimedia Applications (計算機アーキテクチャ(ARC)) -- (プロセッサアーキテクチャ)

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

研究報告計算機アーキテクチャ（ARC）　2010　(24)　1-10　2010年7月27日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

Nowadays, multimedia applications (MMAs) form an important workload for general purpose processors. The vector processing is considered as the most potential approach for MMAs due to plenty of data level parallelism involved in them. However, the tradition vector architectures obey an in-order issue policy (IIP). The IIP issue policy blocks the following instructions to be issued, no matter whether they are ready to be issued or not. This paper proposes a media-oriented vector architectural extension with an out-of-order vector processing mechanism (OVPM). The OVPM overcomes the inefficiency on utilization of the memory bandwidth and vector functional units. As a result, the proposed architecture achieves a higher performance with lower hardware cost than the traditional one. This paper evaluates the proposed architecture with architectural design parameters and finds out the most efficient size for the vector architecture when performing MMAs.Nowadays, multimedia applications (MMAs) form an important workload for general purpose processors. The vector processing is considered as the most potential approach for MMAs due to plenty of data level parallelism involved in them. However, the tradition vector architectures obey an in-order issue policy (IIP). The IIP issue policy blocks the following instructions to be issued, no matter whether they are ready to be issued or not. This paper proposes a media-oriented vector architectural extension with an out-of-order vector processing mechanism (OVPM). The OVPM overcomes the inefficiency on utilization of the memory bandwidth and vector functional units. As a result, the proposed architecture achieves a higher performance with lower hardware cost than the traditional one. This paper evaluates the proposed architecture with architectural design parameters and finds out the most efficient size for the vector architecture when performing MMAs.
OpenCLによるGPUコンピューティングの性能評価

荒井勇亮, 佐藤功人, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2010　(11)　1-7　2010年2月15日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

近年，従来の CUDA に加えて，GPGPU プログラミングのための新たな標準プログラミング環境として OpenCL が利用可能となった．本論文では，CUDA と OpenCL のプログラムの実行性能差を定量的に評価する．まず，ほぼ同等の処理を行う CUDA と OpenCL のプログラムを実装し，性能を比較する．次に，その性能差の主要因を調査し，CUDA コンパイラではサポートされているいくつかのコンパイラ最適化手法が，現在の OpenCL コンパイラではサポートされていないことを明らかにする．最後に，OpenCL コンパイラで生成されるコードを手動で最適化することによって CUDA と同等の性能を達成できた結果から，今後の OpenCL コンパイラの最適化機能が強化されることにより，CUDA コードを OpenCL に単純変換するだけでも，CUDA と同等の性能を達成できる可能性が示された．Recently, a new open programming standard for GPGPU programming, OpenCL, has become available in addition to CUDA. In this paper, we quantitatively evaluate the performance of CUDA and OpenCL program. First, we develop some CUDA and OpenCL programs of almost the same computations and compare their performances. Then, we investigate the main factor causing their performance differences. As a result, it is shown that the current OpenCL compiler does not support several compiler optimizations that are used in the CUDA compiler. Our evaluation results also shows that OpenCL programs can achieve comparable performances with CUDA programs if the codes generated by the OpenCL compiler are manually optimized in the same way as the CUDA compiler. Therefore, these results suggest a possibility that OpenCL codes simply translated from CUDA codes can achieve the same performance with the original CUDA codes if the OpenCL compiler supports those optimizations.
CUDAアプリケーシヨン向けチェックポイント・リスタート機能の実装と評価

滝沢寛之, 佐藤功人, 小松一彦, 小林広明

情報処理学会研究報告. [ハイパフォーマンスコンピューティング]　122　(7)　G1-G7　2009年10月9日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

本論文では，CUDA アプリケーションのチェックポイント・リスタートを実現するためのツールとして CheCUDA を提案する．既存のチェックポイント・リスタートシステムを使って CUDA アプリケーションのチェックポイント・リスタートを実現するため，CheCUDA は CUDA の API 呼び出し時に GPU の状態変化をメモリに記録するためのアドオンパッケージとして設計されている．本論文では，CheCUDA を試作し，実際に CUDA アプリケーションのチェックポイント・リスタートを正常に実現できることを明らかにする．また，チェックポイントファイルを生成した PC とは環境の異なる他の PC 上でリスタートできることも確認し，CheCUDA がディペンダビリティの向上だけでなくタスクマイグレーションにも有用であることを示す．さらに，CheCUDA のチェックポイント処理のオーバヘッドを定量的に評価する．In this paper, a tool named CheCUDA is designed to enable checkpoint/restart of CUDA applications. To allow an existing checkpoint/restart implementation to checkpoint CUDA applications, CheCUDA is developed as an add-on package working at each CUDA API call to record the GPU status changes onto the main memory. This paper demonstrates that our prototype implementation of CheCUDA can correctly checkpoint and restart some CUDA applications. It is also shown that CheCUDA can restart a CUDA process from a checkpoint file generated on another PC. Accordingly, CheCUDA is useful not only to enhance the dependability of CUDA applications but also to attain task migration of CUDA applications. This paper also shows the timing overhead for checkpointing.
RC-008 ボランティアコンピューティングの高効率化ためのクライアントレベルスケジューリング(ハードウェア・アーキテクチャ,査読付き論文)

村田善智, 遠藤聡明, 滝沢寛之, 小林広明

情報科学技術フォーラム講演論文集　8　(1)　165-172　2009年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
C-024 An Auction based Resource Allocation Considering Multifaceted Utilities in a Peer to Peer Environment

Satayapiwat Chainan, Komatsu Kazuhiko, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

情報科学技術フォーラム講演論文集　8　(1)　491-494　2009年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会

詳細を見る詳細を閉じる

Recently, many market-based approaches have been studied as one of the promising alternatives in a resource allocation problem. Especially, auction-based approaches are widely chosen due to its distributed nature and its relatively lower complexity. However, employing an auction to allocate jobs is only suitable for homogeneous environments of resources. This paper proposes an auction-based resource allocation mechanism which enables resource allocation in a heterogeneous environment while minimizing user's inputs. Our preliminary results show that our resource allocation mechanism improves the performance of important jobs during high-loaded.
C-023 プロセッサ自動選択機能を有するBLASの実現に向けた性能評価(ハードウェア・アーキテクチャ,一般論文)

小松一彦, 小山賢太郎, 佐藤功人, 滝沢寛之, 小林広明

情報科学技術フォーラム講演論文集　8　(1)　485-490　2009年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
キャッシュメモリを有するベクトルプロセッサのためのプログラム最適化手法

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

研究報告計算機アーキテクチャ（ARC）　2009　(6)　1-10　2009年7月28日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

近年，ベクトルプロセッサにおいて演算性能に対する相対的なメモリバンド幅 (B/F) が低下しており，実行効率の低下が懸念されている．B/F 低下の影響を緩和するために，高いメモリバンド幅を有するキャッシュメモリを搭載することが検討され，その有効性が明らかになっている．そこで，キャッシュの性能をさらに引き出すためのプログラム最適化手法の確立が本報告の目的である．本報告では，キャッシュと性能の関係を解析するために，ルーフラインモデルを用いてキャッシュメモリを有するベクトルプロセッサの性能モデルを構築する．そして，実アプリケーションにプログラム最適化を施し，プログラム最適化の効果を性能モデルを用いて評価する．Since the ratio of memory bandwidth to computational performance(B/F) recently decreases, it is concerned that the sustained performance of future vector processors degrades. To reduce the performance degradation due to the decrease in B/F, vector cache memory with high memory bandwidth has been proposed and evaluated. The purpose of this paper is to establish the optimization techniques to further exploit the vector cache memory performance. To analyze the relationship between the vector cache memory and the sustained performance, this paper first presents a performance model of vector processors with vector cache memory based on the roofline model. Then, several optimization techniques are applied to real applications, and their effects are assessed with the performance model.
SX-9による大規模並列シミュレーション(3.2 第7回情報シナジー研究会, 3. 研究活動報告)

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明, 高橋俊, 中橋和博

年報　8　88-93　2009年7月
出版者・発行元：東北大学サイバーサイエンスセンター
科学技術計算におけるソフトウェア自動チューニング:<ソフトウェア自動チューニング技術の応用>10.GPUコンピューティングにおけるソフトウェア自動チューニング

滝沢寛之

情報処理　50　(6)　527-531　2009年6月15日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0447-8053
ソフトウェア自動チューニング技術の応用：GPUコンピューティングにおけるソフトウェア自動チューニング

滝沢寛之

情報処理学会誌　50　(6)　527-531　2009年6月15日
創造工学研修の実施報告 ― スパコンを使って計算科学・計算機科学のおもしろさを体験 ―

滝沢寛之, 江川隆輔, 笹尾泰洋, 佐野健太郎, 山本悟, 小林広明

東北大学サイバーサイエンスセンター大規模科学計算システム広報SENAC　42　(2)　87-90　2009年2月
624 消費エネルギを考慮したGPUコンピューティングの検討(OS3.GPGPUコンピューティング(3),オーガナイズドセッション)

滝沢寛之, 佐藤功人, 小林広明

計算力学講演会講演論文集　2008　(21)　558-559　2008年11月1日
出版者・発行元：一般社団法人日本機械学会
ISSN： 1348-026X
RC-006 ウェイアロケーション型共有キャッシュ機構のハードウェア設計に関する研究(ハードウェア・アーキテクチャ,査読付き論文)

阿部健太, 小寺功, 江川隆輔, 滝沢寛之, 小林広明

情報科学技術フォーラム講演論文集　7　(1)　35-38　2008年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
GPUを効率的に利用するための言語拡張と自動最適化手法

佐藤功人, 滝沢寛之, 小林広明

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）　2008　(74)　199-204　2008年7月29日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

GPU は高性能グラフィックスプロセッサでありながら，汎用演算の高速化に大きな効果があり，様々なアプリケーションでの利用が試みられている． GPU は特有のハードウェア構成のために，高い演算能力を得るためには様々な制限を満たさなければならない．我々は異種プロセッサを搭載する計算システムに対して SPRAT を提案してきたが，プログラムの可搬性と実行効率を両立するためにはプロセッサ特性に合わせた言語の拡張とその自動最適化を行う必要がある．本論文では， GPU 用にコードを自動的に最適化するための共有メモリの活用とミスアラインメントの影響を軽減する手法を提案し，メモリアクセスを調整することでエッジ検出処理と LU 分解において実効性能を向上させることが可能であることを示した．GPUs have a great potencial of high-performance computing and have been used in various applications in addition to graphics processing. In order to achieve high-performance with GPUs, we have to carry out architecture-aware optimizations because of their unique architecture. We have proposed SPRAT, a programming language for hybrid systems of CPUs and GPUs, to realize both the portability of programs and the high computation effeciency. This paper proposes some automatic optimization techniques based on memory access adjustments. The results shows significant performance improvements in the executions of Edge detection and LU decomposition.
次世代ベクトルプロセッサのためのキャッシュ機構に関する一考察(3.2 第6回情報シナジー研究会, 3. 研究活動報告)

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

年報　7　89-93　2008年7月
出版者・発行元：東北大学サイバーサイエンスセンター
GPUコンピューティングのためのストリーム処理記述言語

滝沢寛之, 佐藤功人, 小林広明

可視化情報学会誌. Suppl.　28　(1)　271-274　2008年7月1日
出版者・発行元：可視化情報学会
ISSN： 0916-4731
ベクトルプロセッサ用キャッシュメモリの性能評価

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会シンポジウム論文集　2008　(2)　55　2008年1月17日

ISSN： 1344-0640
SPRAT : 実行時自動チューニング機能を備えるストリーム処理記述用言語

滝沢寛之

先進的計算基盤システムシンポジウム(SACSIS2008)　139-148　2008年
I-004 フォトンマップ分割に基づく並列画像生成アルゴリズム(I分野:グラフィクス・画像)

田村壮秀, 滝沢寛之, 小林広明

情報科学技術フォーラム一般講演論文集　6　(3)　203-206　2007年8月22日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
実行時性能予測に基づくCPUとGPUへの動的タスク割当の検討

白取寛貴, 滝沢寛之, 小林広明

電子情報通信学会技術研究報告. CPSY, コンピュータシステム　107　(175)　37-42　2007年8月2日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

近年の描画処理ユニット(GPU)を汎用計算に用いる研究(GPGPU)の成果により,高性能なCPUとGPUを搭載したPCをヘテロジニアスな並列処理計算システムとして活用できることが明らかになっている.一方でそれらのプログラミングは複雑になってきており,これを効率的に活用するために,CPUとGPU上で動作するプログラム記述を統一化する研究がなされている.しかし,現在のGPGPUアプリケーション開発ツールの多くではプログラムを実行するプロセッサを手動で静的に選択する必要がある.その適切な選択は実行時の情報に依存しているため,実行時に適切なものを動的に予測することで更なる効率化を図ることが可能である.本報告では,CPUとGPU上でのプログラムの実行時間の見積もりと実行プロセッサの切り替えのコストから適切なプロセッサを動的に予測することの有効性について検討した結果について報告する.実験による評価の結果,CPUとGPU問のデータ転送以外の両者の切り替えのコストは小さいことから、実行時間に対して予測誤差が十分小さい場合には動的切り替えによる性能向上が期待できる可能性が示された.
ウェイアロケーション型共有キャッシュ機構の性能評価

小寺功, 江川隆輔, 滝沢寛之, 小林広明

情報処理学会研究報告計算機アーキテクチャ（ARC）　2007　(79)　31-36　2007年8月1日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

我々は，キャッシュパーティショニングと部分的に電力供給を止める消費電力削減手法を組み合わせることで，性能を維持しつつ低消費電力で動作するマルチコアプロセッサ用ウェイアロケーション型共有キャッシュ機構を提案している．本提案機構ではキャッシュの参照局所性の評価量を定義し，キャッシュパーティショニングと消費電力削減の指標として用いる．この評価量を用いることにより，提案するキャッシュ機構は柔軟に性能指向と省電力指向に設定することができる特徴を持つ．本論文では，キャッシュ参照の特徴が異なるアプリケーションを用いて本提案機構の有効性を評価する．その評価の結果，提案機構は高い参照局所性を持つアプリケーションでは適切なキャッシュパーティショニングを実現可能であることが示された．また，性能指向の設定にすることで，平均約0.3%の速度向上しつつ，約28% の消費エネルギを削減できることを明らかにした．We have proposed a way-allocatable shared cache mechanism for chip multiprocessors, which can save power consumption with remaining the performance by employing cache partitioning and power gating. In the proposed mechanism, a metric of cache access locality is defined and used for the cache partitioning and the power gating. Based on the metric, the proposed mechanism can flexibly change the configuration to be either performance-oriented or power-oriented.This paper evaluates the validity of the proposed mechanism, using some benchmarks with different cache access behaviors. The evaluation results show that the proposed mechanism can appropriately partition the shared cache for applications with high localities. In addition, our proposal at the performance-oriented mode can reduce energy consumption by 28% while improving the performance by 0.3%.
SC|06調査報告(3.2 第5回情報シナジー研究会, 3. 研究活動報告)

小野敏, 滝沢寛之, 小林広明

年報　6　83-87　2007年7月
出版者・発行元：東北大学情報シナジーセンター
SC|05調査報告(3.2 第4回情報シナジー研究会, 3. 研究活動)

大泉健治, 伊藤英一, 滝沢寛之, 小林広明

年報　5　71-74　2006年6月
出版者・発行元：東北大学情報シナジーセンター
A Runtime Optimization Method for Redundant Task Dispatch on P2P Computing Platforms.(3.2 第4回情報シナジー研究会, 3. 研究活動)

Wang Hong, Takizawa Hiroyuki, Kobayashi Hiroaki

年報　5　100-105　2006年6月
出版者・発行元：東北大学情報シナジーセンター
実シミュレーションコードによる大規模科学計算システムの性能評価(3.2 第4回情報シナジー研究会, 3. 研究活動)

滝沢寛之, 岡部公起, 伊藤英一, 撫佐昭裕, 曽我隆, 伊藤学, 小林広明

年報　5　78-83　2006年6月
出版者・発行元：東北大学情報シナジーセンター
HPCチャレンジでのSXシステムの性能評価(3.2 第3回情報シナジー研究会, 3. 研究活動)

小林広明, 滝沢寛之, 小久保達信, 岡部公起, 伊藤英一, 小林義昭, 浅見暁, 小林一夫, 後藤記一, 片海健亮, 深田大輔

年報　4　98-116　2005年5月
出版者・発行元：東北大学情報シナジーセンター
HPC チャレンジでのSX システムの性能評価

小林広明, 滝沢寛之, 小久保達信, 岡部公起, 伊藤英一, 小林義昭, 浅見暁, 小林一夫, 後藤記一, 片海健亮, 深田大輔

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　38　(1)　5-28　2005年1月
スーパーSINET を利用した大規模遠隔可視化処理の評価

滝沢寛之, 小林広明

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　37　(2)　5-10　2004年4月
ベクトル量子化用コードブック生成のための並列弱肉強食アルゴリズムの性能解析

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, 小林広明, 中村維男

電子情報通信学会技術研究報告. NC, ニューロコンピューティング　103　(92)　25-30　2003年5月22日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

ベクトル量子化は高効率なデータ圧縮手法であり,データの保存や転送において核となる技術である.これまでに,誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており,中でもアルゴリズムの改良によりコードブック生成処理時間の短縮を図る弱肉強食(Law-of-the-Jungle, LOJ)アルゴリズムが注目を集めている.しかし,大きなデータセットを単一のプロセッサで処理する場合,アルゴリズムの改良による処理時間短縮には限界があるため,本研究では並列処理によるさらなる速度向上を目指してきた.本論文では,これまでに提案を行なった並列LOJアルゴリズムについて,IBM SP2, NEC AzusA, PCクラスタを用いた実験を通して性能解析,及び評価を行なう.
ベクトル量子化のためのコードブック生成並列処理に関する研究

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, ClecioDonizeteLima, 小林広明, 中村維男

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）　2002　(80)　67-72　2002年8月21日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

ベクトル量子化は高効率なデータ圧縮手法であり、データの保存や転送において核となる技術である。これまでに、誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており、中でもアルゴリズムの改良によってコードブック生成処理時間の短縮を図る Low-of-the-Jungle(LOJ)アルゴリズムが注目を集めている。しかし、大きなデータセットを単一のCPUで処理する場合、アルゴリズムの改良による処理時間短縮には限界があり、並列処理によるさらなる速度向上が求められている。今論文では、メモリ分散型並列計算機に適した並列LOJアルゴリズムを提案する。32個の計算ノードを用いて並列コードブック生成実験を行った結果、27.4倍の高いスケーラビリティが得られた。Vector quantization is an attractive technique for lossy data compression, which has been a key technology for data storage and/or transfer. So far, various algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. In particular, the Law-of-the-Jungle(LOJ) learning algorithm has been proposed to achieve rapid codebook design by algorithmic improvements. However, its acceleration is still required when large data sets are processed on a single computer. Therefore, a scalable parallel codebook design algorithm for parallel computers is required. This paper presents a parallel algorithm for the LOJ learning, suitable for distributed-memory parallel computers with a message-passing mechanism. Experimental results indicate a high scalability of the proposed parallel algorithm on the IBM SP2 parallel computer with 32 processing elements.
ベクトル量子化のための並列コードブック生成アルゴリズムの性能評価(2.<特集>第1回情報シナジー研究会)

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, 小林広明, 中村維男, Clecio Donizete Lima, 東北大学大学院情報科学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院工学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院情報科学研究科

年報　2　33-42　2002年7月1日

詳細を見る詳細を閉じる

ベクトル量子化は高効率なデータ圧縮手法であり、データの保存や転送において核となる技術である。これまでに、誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており、中でもアルゴリズムの改良によってコードブック生成処理時間の短縮を図るLaw-of-the-Jungle(LOJ)アルゴリズムが注目を集めている。しかし、大きなデータセットを単一のCPUで処理する場合、アルゴリズムの改良による処理時間短縮には限界があり、並列処理によるさらなる速度向上が求められている。本論文では、メモリ分散型並列計算機に適した並列LOJアルゴリズムを提案する。IBM SP2、NEC AzusA、PCクラスタを用いて並列LOJアルゴリズムの性能評価を行なった結果、いずれもプロセッサ台数に対する高い速度向上率が得られた。
新潟大学総合情報処理センターコンピュータシステムの更新

滝沢寛之

新潟大学総合情報処理センター年報NIICE　(13)　21-27　2002年3月
PC-UNIX 導入時の不正アクセス対策

滝沢寛之

新潟大学総合情報処理センター年報NIICE　(12)　13-19　2001年3月

︎全件表示 ︎最初の5件までを表示

書籍等出版物 15

Sustained Simulation Performance 2022

Michael M. Resch, Johannes Geber, Hiroaki Kobayashi, Hiroyuki Takizawa, Wolfgang Bez

Springer Cham　2024年3月

ISBN: 9783031410727
VLSI Design and Test for Systems Dependability

Hiroyuki Takizawa, Ye Gao, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

Springer Japan　2019年1月
Advanced Software Technologies for Post-Peta Scale Computing

Hiroyuki Takizawa, Reiji Suda, Daisuke Takahashi, Ryusuke Egawa

Springer　2018年12月
Sustained Simulation Performance 2016

Hiroyuki Takizawa, Takeshi Yamada, Shoichi Hirasawa, Reiji Suda

Springer-Verlang　2016年
コンピュータ工学入門

鏡慎吾, 佐野健太郎, 滝沢寛之, 岡谷貴之, 小林広明

コロナ社　2015年4月
Sustained Simulation Performance 2015

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, Hiroaki Kobayashi

Springer-Verlang　2015年
Sustained Simulation Performance 2014

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Springer-Verlang　2014年
High Performance Computing on Vector Systems 2012

Hiroyuki Takizawa, Ryusuke Egawa, Daisuke Takahashi, Reiji Suda

Springer-Verlang　2012年
High Performance Computing on Vector Systems, 2012

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Springer-Verlang　2012年
Software Automatic Tuning: From Concepts to State-of-the-Art Results

Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Springer-Verlang　2010年
High Performance Computing on Vector Systems 2009

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiro Musa, Takashi Soga, Yoko Isobe

Springer-Verlang　2009年
High Performance Computing on Vector Systems 2007

Hiroaki Kobayashi, Akihiro Musa, Yoshiei Sato, Hiroyuki Takizawa, Koki Okabe

Springer-Verlang　2008年
High Performance Computing on Vector Systems 2008

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiro Musa, Takashi Soga, Yoichi Shimomura

Springer-Verlang　2008年
新情報機器操作入門

山本正信, 滝沢寛之,ほか

2003年4月
情報機器操作入門(第２版)

山崎一生, 長谷川誠, 滝沢寛之,ほか

2000年4月

︎全件表示 ︎最初の5件までを表示

講演・口頭発表等 136

GPUコンピューティングの現状と展望招待有り

滝沢寛之

放射光学会年会（JSR2026）　2026年1月9日
異種複数のスパコンの連携による津波シミュレーションの緊急実行招待有り

滝沢寛之

NEC HPC Forum　2025年11月25日
Urgent Computing of Tsunami Damage Estimation on Geographically Distributed Computing Systems 招待有り

Hiroyuki Takizawa

SC25 NEC Forum　2025年11月17日
The Cyberscience Center not only for Cyberscience

Hiroyuki Takizawa

40th Workshop on Sustained Simulation Performance　2025年10月14日
Research and user support activities at Tohoku University Cyberscience Center 招待有り

Hiroyuki Takizawa

39th Workshop on Sustained Simulation Performance　2025年5月27日
Operational experience of the largest vector supercomputer, AOBA-S 招待有り

Hiroyuki Takizawa

NUG Society Meeting 36　2025年5月13日
Advanced resource management for urgent job execution in Connected Supercomputing 招待有り

Hiroyuki Takizawa

Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2025)　2025年3月21日
ワークフローエンジンとの連携に基づく臨機応変なジョブスケジューリングの実現

滝沢寛之

第16回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2024）　2024年12月26日
スパコンAOBA-Sの性能評価と将来計画招待有り

滝沢寛之

太陽地球環境シミュレーション研究会　2024年12月24日
New Strategies at Tohoku University Cyberscience Center

Hiroyuki Takizawa

38th Workshop on Sustained Simulation Performance　2024年12月12日
ExpressHPC: towards "connected supercomputing" enabling on-demand job execution for disaster resilience.

Hiroyuki Takizawa, Tatsuyoshi Ohmura, Keichi Takahashi, Yoichi Shimomura, Ryusuke Egawa, Yoshihiko Sato, Junko Yoshino, Akihiro Musa, Shunichi Koshimura

4th Combined Workshop on Interactive and Urgent High-Performance Computing (WIUHPC)　2024年11月18日
Realizing Connected Supercomputing with dynamic and adaptive resource management 招待有り

Hiroyuki Takizawa

SC24 Nagoya University Booth Presentation　2024年11月18日
10年後の情報基盤センターは地球と人類にいかに貢献するか？招待有り

滝沢寛之

第50回ASE研究会　2024年11月8日
Connected Supercomputing with on-demand job execution for disaster mitigation and more… 招待有り

Hiroyuki Takizawa

Reality in Science, Art, and Humanities – paradigms of its media conditions　2024年10月21日
Operational experience of the latest-generation SX-Aurora TSUBASA system, AOBA-S 招待有り

Hiroyuki Takizawa

37th Workshop on Sustained Simulation Performance　2024年6月17日
Introduction of AOBA-S: The world’s largest SX-Aurora TSUBASA system operating at Tohoku University 招待有り

Hiroyuki Takizawa

NUG Society Meeting 35　2024年6月14日
ML-based Autotuning of Quantum Annealing Schedule 招待有り

Hiroyuki Takizawa, Michael Zielewski, Keichi Takahashi, Yoichi Shimomura

Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2024)　2024年3月22日
スパコンAOBAの運用開始と将来展望招待有り

滝沢寛之

Supercomputing JAPAN! 2024　2024年3月12日
Automatic Parameter Tuning for Efficient Checkpointing 国際会議招待有り

Hiroyuki Takizawa, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa

28th Workshop on Sustained Simulation Performance　2018年10月10日
SX-Aurora TSUBASAの基本性能および機能の初期評価招待有り

滝沢寛之

SX-Aurora TSUBASA フォーラム　2018年7月27日
Automatic Parameter Tuning of Application-Level Incremental Checkpointing 国際会議招待有り

Hiroyuki Takizawa, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa

2018 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing　2018年3月27日
Towards prediction of effective optimizations in performance engineering 国際会議

Hiroyuki Takizawa, Yuki Kawarabatake, Mulya Agung, Kazuhiko Komatsu, Ryusuke Egawa

27th Workshop on Sustained Simulation Performance　2018年3月22日
スパコンを使いこなす！～スパコン利用技術の重要性とその課題～招待有り

滝沢寛之

第72回国立大学共同利用・共同研究拠点知の拠点セミナー　2018年3月16日
User-Defined Code Transformation for Separation of Performance-Awareness from Application Codes 国際会議

Hiroyuki Takizawa

SIAM conference on parallel processing for scientific computing (mini-simposium)　2018年3月9日
Auto-tuning of Hyperparameters of Machine Learning Models 国際会議

Zhen Wang, Ryusuke Egawa, Reiji Suda, Hiroyuki Takizawa

HPC Asia 2018　2018年1月29日
Thermal-aware Dynamic Checkpoint Interval Tuning for High Performance Computing 国際会議

Pei Li, Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa

HPC Asia 2018　2018年1月29日
A User-defined Code Transformation Approach to Separation of Performance Concerns 国際会議

Hiroyuki Takizawa

First Workshop on Software Challenges to Exascale Computing　2017年12月17日
大規模科学計算システムにおける利用者プログラムの特性分析

大泉健治, 山下毅, 穂苅寛光, 江川隆輔, 滝沢寛之, 小林広明

大学ICT推進協議会 2017年度年次大会 (AXIES2017)　2017年12月13日
反応・相変化を伴う多分散系混相流シミュレーションコードの最適化

佐々木大輔, 加藤季広, 磯部洋子, 笠原弘貴, 渡部広吾輝, 志村啓, 奥野航平, 松尾亜紀子, 江川隆輔, 滝沢寛之, 小林広明

大学ICT推進協議会 2017年度年次大会 (AXIES2017)　2017年12月13日
Expressing performance-awareness as user-defined code transformations 国際会議

Hiroyuki Takizawa, Reiji Suda, Daisuke Takahashi, Ryusuke Egawa, Fumihiko Ino

International Symposium on Post Petascale System Software　2017年12月11日
An Evolutionary Approach to Construction of a Software Development Environment for Massively-Parallel Heterogeneous Systems 国際会議

Hiroyuki Takizawa, Reiji Suda, Daisuke Takahashi, Ryusuke Egawa

Internationl Symposium on Post Petascale System Software　2017年12月11日
Performance Engineering with User-defined Code Transformations 国際会議

Hiroyuki Takizawa

Joint Workshop on High-Performance Computing with NSCC-Wuxi and Tohoku University　2017年9月21日
ExaFSA - Exascale Simulation of Fluid-Structure-Acoustics Interactions 国際会議

Florian Lindner, Miriam Mehl, Thorsten Reimann, Sabine Roller, Dörte C. Sternel, Hiroyuki Takizawa, Sander van Zujilen

ISC High Performance 2017　2017年7月18日
Xevolverプロジェクト -- 計算科学と計算機科学をつなぐ架け橋を目指して --

滝沢寛之

高度情報科学技術研究機構平成28年度高速化ワークショップ　2017年3月24日
Performance Tuning with Machine Learning 国際会議

Hiroyuki Takizawa, Cui Hang, Shoichi Hirasawa

The 25th Workshop on Sustained Simulation Performance　2017年3月13日
Combining Code Transformations and Autotuning 国際会議

Hiroyuki Takizawa

2017 Advanced Topics and Auto-Tuning in High-Performance Scientific Computing 2017　2017年3月11日
User-Defined Directive Translation for Automatic Tuning 国際会議

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2017 Advanced Topics and Auto-Tuning in High-Performance Scientific Computing 2017　2017年3月11日
User-Defined Directive Translation Using the Xevolver Framework 国際会議

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

SIAM Computational Science and Engineering　2017年3月2日
進化的アプローチによる超並列複合システム向け開発環境の創出

滝沢寛之

第8回自動チューニング技術の現状と応用に関するシンポジウム(ATTA2016)　2016年12月26日
Xevolverプロジェクトの概要

滝沢寛之

ポストペタワークショップ　2016年12月14日
Autotuning meets Code Transformations 国際会議

Hiroyuki Takizawa

24th Workshop on Sustained Simulation Performance　2016年12月5日
Making a Legacy Code Auto-Tunable without Messing It Up 国際会議

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC16)　2016年11月13日
User-Defined Code Transformation for High Performance Portability 国際会議

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16)　2016年4月12日
Performance Engineering of HPC Applications Based on Pattern Matching 国際会議

Hiroyuki TAKIZAWA, Takeshi YAMADA, Takuya TSUNOGAWA, Shoichi HIRASAWA, Hiroaki KOBAYASHI

23rd Workshop on Sustained Simulation Performance　2016年3月16日
Data layout optimization using user-defined code transformations 国際会議

Hiroyuki Takizawa, Takeshi Yamada, Shoichi Hirasawa, Hiroaki Kobayashi

2016 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing　2016年2月19日
A Code Transformation Approach to Achieving High Performance Portability 国際会議

Hiroyuki TAKIZAWA, Daisuke TAKAHASHI, Reiji SUDA, Ryusuke EGAWA

SPPEXA Annual Plenary Meeting 2016　2016年1月25日
進化的アプローチによる超並列複合システム向け開発環境の創出

滝沢寛之, 高橋大介, 須田礼仁, 江川隆輔

第7回自動チューニング技術の現状と応用に関するシンポジウム(ATTA2015)　2015年12月25日
Xevtgen: automatic generation of code transformation rules based on before-and-after codes 国際会議

Hiroyuki Takizawa, Shoichi Hirasawa, Reiji Suda

22nd Workshop on Sustained Simulation Performance　2015年12月17日
The Xevolver Project: Separation of Concerns for Supporting Legacy Application Migration

Hiroyuki Takizawa

ATRG Open Academic Session　2015年12月11日
機械工学分野におけるシミュレーション科学の新展開

滝沢寛之

学際大規模情報基盤共同利用・共同研究拠点第7回シンポジウム　2015年7月9日
Framework for Separation of Concerns Between Application Requirements and System Requirements 国際会議

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

SIAM Conference on Computational Science & Engineering 2015　2015年3月16日
Auto-Tuning with User-Defined Code Transformations 国際会議

Hiroyuki Takizawa

2015 Conerence on Advanced Topics and Auto-Tuning in High-Performance Scientific Computing　2015年2月26日
What can we do to fight with system diversity? 国際会議

Hiroyuki Takizawa

21st Workshop on Sustained Simulation Performance　2015年2月18日
進化的アプローチによる超並列複合システム向け開発環境の創出

滝沢寛之, 須田礼仁, 高橋大介, 江川隆輔

第6回自動チューニング技術の現状と応用に関するシンポジウム(ATTA2014)　2014年12月25日
Xevolver: an extensible framework for user-defined code transformation 国際会議

Hiroyuki Takizawa

20th Workshop on Sustained Simulation Performance　2014年12月15日
Xevolver Project 国際会議

Hiroyuki Takizawa, Daisuke Takahashi, Reiji Suda, Ryusuke Egawa

International Symposium on Post Petascale System Software (ISP2S2) 2014　2014年12月2日
Xevolver Project 国際会議

Hiroyuki Takizawa, Daisuke Takahashi, Reiji Suda, Ryusuke Egawa

Asian Technology Information Program (ATIP) Workshop at SC14　2014年11月17日
機械工学分野におけるシミュレーション科学の新展開

滝沢寛之

学際大規模情報基盤共同利用・共同研究拠点第6回シンポジウム　2014年7月11日
Evolutionary Adaptation of HPC Applications to Revolutionary System Changes 国際会議

Hiroyuki Takizawa

International Supercomputing Conference (ISC) 2014　2014年6月22日
Xevolver: an extensible programming framework for cusom code transformation 国際会議

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

2014 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing　2014年3月15日
はやいぃスパコンは作れる！？

滝沢寛之

JACORN2013 Winter - 次世代 RHW 創造研究会　2013年12月26日
進化的アプローチによる超並列複合システム向け開発環境の創出

滝沢寛之, 須田礼仁, 高橋大介, 江川隆輔

第5回自動チューニング技術の現状と応用に関するシンポジウム(ATTA2013)　2013年12月25日
An XML-based Programming Framework for User-defined Code Transformations 国際会議

Hiroyuki Takizawa, Xiong Xiao, Shoichi Hirasawa, Hiroaki Kobayashi

4th AICS International Symposium　2013年12月2日
Xevolver : an XML-based Programming Framework for Software Evolution 国際会議

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC13)　2013年11月17日
XMLを用いたツール間連携に向けて

滝沢寛之

1st XcalableMP Workshop　2013年11月1日
Xevolver: towards an extensible programming environment for software evolution 国際会議

Hiroyuki Takizawa

International Symposium on Embedded Multicore/Many-core Systems-on-Chip　2013年9月26日
OpenACCにおける性能チューニングとその効果

滝沢寛之, 平澤将一, 小松一彦, 小林広明

日本応用数理学会年会　2013年9月9日
A Case Study of Performance Tuning with the POET Framework

肖熊, 平澤将一, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会　2013年8月23日
Code Refactoring for High Performance Computing Applications

Chunyan Wang, 平澤将一, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会　2013年8月23日
ブロックバイパス機構によるキャッシュのエネルギ効率化に関する研究

高井拓実, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

並列/協調/分散処理に関するサマーワークショップ(SWoPP)　2013年7月31日
マルチプラットフォームにおける最適化手法の効果に関する一検討

小松一彦, 佐々木俊英, 江川隆輔, 滝沢寛之, 小林広明

並列/協調/分散処理に関するサマーワークショップ(SWoPP)　2013年7月31日
Autotuning for Improving the Fault Tolerance of Large-scale Simulations 国際会議

Hiroyuki Takizawa, Alfian Amrizal, Shoichi Hirasawa, Hiroaki Kobayashi

Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing　2013年3月27日
ソフトウェア進化のための自動性能追跡システム

平澤将一, 滝沢寛之, 小林広明

ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2013)　2013年1月15日
プログラム自動生成技術に基づくGPUコンピューティングの性能評価

菅原誠, 佐藤功人, 小松一彦, 滝沢寛之, 小林広明

並列/協調/分散処理に関するサマーワークショップ(SWoPP)　2011年7月27日
マイグレーションによる複合型計算システム向けジョブスケジューリング

小山賢太郎, 佐藤功人, 小松一彦, 村田善智, 滝沢寛之, 小林広明

先進的計算基盤システムシンポジウム(SACSIS2011)　2011年5月25日
ルーフラインモデルに基づくベクトルプロセッサ向けプログラム最適化戦略

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2011)　2011年1月18日
実アプリケーションを用いたチップマルチベクトルプロセッサの消費エネルギ評価

永岡龍一, 佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 小林広明

ハイパフォーマンスコンピューティングとアーキテクチャの評価に関する北海道ワークショップ(HOKKE-18)　2010年12月16日
Cache Partitioning Strategies for 3-D Stacked Vector Processors 国際会議

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2010　2010年11月16日
A Performance Tuning Strategy under Combining Loop Transforms for a Vector Processor with an On-Chip Cache 国際会議

Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC10)　2010年11月13日
複合型計算システムにおける実行時自動チューニング

滝沢寛之

自動チューニング技術の現状と応用に関するシンポジウム　2010年11月
A Runtime Task Reallocation Library for Heterogeneous Computational Environments 国際会議

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

7th International Conference on Fluid Dynamics　2010年11月1日
A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications 国際会議

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Euromicro Conference on Digital System Design　2010年9月1日
An Out-of-order Vector Processing Mechanism for Multimedia Applications

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

並列/協調/分散処理に関するサマーワークショップ(SWoPP)　2010年8月3日
Efficient Data Management for the Building Cube Method using Cartesian Meshes on the GPU Platform 国際会議

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

International Supercomputing Conference (ISC10)　2010年5月30日
Parallel Processing of the Building-Cube Method on the GPU Platform 国際会議

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

22nd International Conference on Parallel Computational Fluid Dynamics　2010年5月17日
Performance of SOR Methods on Vector Processor SX-9 国際会議

Takashi Soga, Akihiro Musa, Koki Okabe, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

22nd International Conference on Parallel Computational Fluid Dynamics　2010年5月17日
ハイブリッド型計算環境のためのプログラミングフレームワークSPRAT

小松一彦, 小山賢太郎, 佐藤功人, 滝沢寛之, 小林広明

先端的ネットワーク＆コンピューティングテクノロジワークショップ　2010年3月
A High-level Programming Framework for Efficient Hybrid-architecture Computing 国際会議

Kazuhiko Komatsu, Kentaro Koyama, Katsuto Sato, Hiroyuki Takizawa, Hiroaki Kobayashi

14th SIAM Conference on Parallel Processing for Scientific Computing Minisymposium　2010年2月24日
OpenCL によるGPUコンピューティングの性能評価

荒井勇亮, 佐藤功人, 滝沢寛之, 小林広明

情報処理学会HPC研究会　2010年2月22日
GPUを手軽にちゃんと使える環境の実現に向けて

東京工業大学計算世界観GCOEセミナー　2009年12月9日
A High-level GPU Programming Framework for Fluid Dynamics Simulation 国際会議

Katsuto Sato, Hiroyuki Takizawa, Hiroaki Kobayashi

6th International Conference on Fluid Dynamics　2009年11月4日
新アーキテキチャへのアプローチ

自動チューニング技術の現状と応用に関するシンポジウム　2009年10月22日
CUDAアプリケーション向けチェックポイント・リスタート機能の実装と評価

滝沢寛之, 佐藤功人, 小松一彦, 小林広明

情報処理学会HPC研究会　2009年10月9日
実アプリケーションによるチップマルチベクトルプロセッサの性能評価

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

次世代スーパコンピューティングコンシンポジウム　2009年10月7日
三次元積層技術による次世代ベクトルキャッシュの設計と評価

船矢祐介, 永岡龍一, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

次世代スーパコンピューティングコンシンポジウム　2009年10月7日
3D On-Chip Memory for the Vector Architecture 国際会議

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2009　2009年9月28日
Cellによる高性能計算の可能性を探る

日本機械学会2009年度年次大会　2009年9月15日
Working Sets based Thread Scheduling with Cache Partitioning 国際会議

Masayuki Sato, Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architecture and Compilation Techniques (PACT)　2009年9月12日
次世代プログラミング環境～多様なプロセッサを使いこなす～

FIT2009　2009年9月3日
An Auction based Resource Allocation Considering Multifaceted Utilies in a Peer-to-Peer Environment

Chaianan Satayapiwat, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

FIT2009　2009年9月2日
ボランティアコンピューティングの高効率化ためのクライアントレベルスケジューリング

村田善智, 遠藤聡明, 滝沢寛之, 小林広明

FIT2009　2009年9月2日
プロセッサ自動選択機能を有するBLASの実現に向けた性能評価

小松一彦, 小山賢太郎, 佐藤功人, 滝沢寛之, 小林広明

FIT2009　2009年9月2日
キャッシュメモリを有するベクトルプロセッサのためのプログラム最適化手法

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

並列/協調/分散処理に関するサマーワークショップ(SWoPP)　2009年8月4日
ワーキングセット評価に基づくスレッドスケジューリング

佐藤雅之, 小寺功, 江川隆輔, 滝沢寛之, 小林広明

並列/協調/分散処理に関するサマーワークショップ(SWoPP)　2009年8月4日
メモリ積層型3次元ベクトルプロセッサの評価

船矢祐介, 江川隆輔, 滝沢寛之, 小林広明

先端的計算基盤システムシンポジウム(SACSIS 2009)　2009年6月28日
CPUとGPUを協調利用するソフトウェア開発環境

佐藤功人, 滝沢寛之, 小林広明

筑波大学計算科学研究センターGPGPU講習会/研究会　2009年6月24日
Hiding Programming Complexity for GPU Computing

Suda laboratory , GPGPU sperial seminar　2009年6月11日
ストリーム処理記述言語のGPU向け自動最適化の検討

佐藤功人, 滝沢寛之, 小林広明

先端的計算基盤システムシンポジウム(SACSIS 2009)　2009年5月28日
Early Evaluation of a Memory-Stacked Vector Processor 国際会議

Yusuke Funaya, RyusukeEgawa, Hiroyuki Takizawa, Hiroaki Kobayashi

COOL Chips XII　2009年4月15日
GPU向け線形代数ライブラリの性能評価

小山賢太郎, 佐藤功人, 小松一彦, 滝沢寛之, 小林広明

計算工学講演会　2009年4月13日

詳細を見る詳細を閉じる

計算工学講演会論文集 Vol.14, no.1, pp.289—292, 2009
SX-9による大規模並列シミュレーション

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明, 高橋俊, 中橋和博

シナジー研究会　2009年2月13日
実アプリケーションによるSX-9の性能評価

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009)　2009年1月12日
Caching on a Chip Multi Vector Processor 国際会議

Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

SC08　2008年11月15日
ベクトルプロセッサ用キャッシュメモリにおけるMSHR の性能評価

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

次世代スーパーコンピューティング・シンポジウム2008　2008年9月16日
ウェイアロケーション型共有キャッシュ機構のハードウェア設計に関する研究

第7 回情報科学技術フォーラム(FIT2008)　2008年9月2日
GPU を効率的に利用するための言語拡張と自動最適化手法

佐藤功人, 滝沢寛之, 小林広明

並列/協調/分散処理に関するサマーワークショップ(SWoPP2008)　2008年8月5日
GPU コンピューティングのためのストリーム処理記述言語

第36 回可視化情報シンポジウム　2008年7月22日
SPRAT: 実行時自動チューニング機能を備えるストリーム処理記述用言語

滝沢寛之, 白取寛貴, 佐藤功人, 小林広明

情報処理学会先進的計算基盤システムシンポジウム(SACSIS2008)　2008年6月11日
分散協調型スケジューラを用いた大規模計算環境上での負荷分散手法の紹介

村田善智, 滝沢寛之, 小林広明

第２回InTrigger Community Workshop　2008年6月4日
Auction-based Resource Allocation for activating incentives in resource trading in Grid Computing

Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

先端的ネットワーク＆コンピューティングテクノロジワークショップ　2008年3月13日
Preliminary evaluation of a result checking mechanism for reliable volunteer computing

Ling Xu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

先端的ネットワーク＆コンピューティングテクノロジワークショップ　2008年3月13日
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

2008 年ハイパフォーマンスコンピューティングと計算科学シンポジウム (HPCS2008)　2008年1月17日
ベクトルプロセッサ用キャッシュメモリの性能評価

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

2008 年ハイパフォーマンスコンピューティングと計算科学シンポジウム (HPCS2008)　2008年1月17日
Early Evaluation of On-Chip Vector Caching for the NEC SX Vector ArchitectureEarly Evaluation of On-Chip Vector Caching for the NEC SX Vector Architecture 国際会議

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

SC07　2007年11月14日
Preliminary Evaluation for Runtime Auto-tuning of GPGPU Applications 国際会議

Hiroyuki Takizawa, Hiroki Shiratori, Hiroaki Kobayashi

The Second international Workshop on Automatic Performance Tuning　2007年9月20日
フォトンマップ分割に基づく並列画像生成アルゴリズム

田村壮秀, 滝沢寛之, 小林広明

第6回情報科学技術フォーラム　2007年9月5日
実行時性能予測に基づくCPUとGPUへの動的タスク割当の検討

白取寛貴, 滝沢寛之, 小林広明

並列/分散/協調処理に関するサマー・ワークショップ　2007年8月1日
ウェイアロケーション型共有キャッシュ機構の性能評価

小寺功, 滝沢寛之, 小林広明

並列/分散/協調処理に関するサマー・ワークショップ　2007年8月1日
遊休計算資源を用いたパラメータスイープ型並列計算におけるタスクスケジューラの性能評価

村田善智, 小田川雅人, 滝沢寛之, 小林広明

先端的ネットワーク&コンピューティングテクノロジワークショップ　2007年3月
PS3を用いた分散コンピューティング環境の開発と評価

小田川雅人, 吉田向志, 村田善智, 滝沢寛之, 小林広明

先端的ネットワーク&コンピューティングテクノロジワークショップ　2007年3月
ゲームユーザーのユビキタスコンピューティングプラットフォームへの参加を促すインセンティブモデルの検討

中田武男, 大庭信之, 滝沢寛之, 小林広明

先端的ネットワーク&コンピューティングテクノロジワークショップ　2007年3月
描画用ハードウェアの活用によるふく射伝熱の対話的シミュレーションと可視化

滝沢寛之, 山田昇, 酒井清吾, 小林広明

第一回日本ヒートアイランド学会全国大会　2006年7月27日
Performance Evaluation of SX-7 Using Real Simulation Codes

Hiroyuki Takizawa, Akihiro Musa, Takashi Soga, Yoshiaki Matsumura, Manabu Ito, Hiroaki Kobayashi

ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2006)　2006年1月
A Distributed and Cooperative Load Balancing Mechanism for Large-scale P2P Systems

Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

先端的ネットワーク&コンピューティングテクノロジワークショップ　2005年10月
P2Pコンピューティングのための分散協調スケジューリング機構

村田善智, 稲葉勉, 滝沢寛之, 小林広明

先端的ネットワーク＆コンピューティングテクノロジワークショップ　2005年1月
A P2P Semantic Information Searching Mechanism for Ubiquitous Grid Computing Systems

Tsutomu Inaba, Takuro Ohkawa, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

先端的ネットワーク&コンピューティングテクノロジワークショップ　2005年1月

︎全件表示 ︎最初の5件までを表示

共同研究・競争的資金等の研究課題 38

高性能計算のためのプログラミング環境競争的資金

制度名：JST Basic Research Programs (Core Research for Evolutional Science and Technology :CREST)

2011年10月～継続中
高性能低消費電力プロセッサ競争的資金

制度名：Grant-in-Aid for Scientific Research

2003年3月～継続中
ワークフローエンジンとの連携に基づく臨機応変なジョブスケジューリングの実現

滝沢寛之, 片桐孝洋, 佐野健太郎

2024年4月1日～ 2027年3月31日
宇宙初期における位相欠陥の一般相対論的シミュレーション

北嶋直弥, 神田行宏, 藤林翔, 滝沢寛之

2025年4月～ 2026年3月
スーパーコンピュータのデジタルツインによる運用状況の把握と自動制御

滝沢寛之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Challenging Research (Exploratory)

研究機関：Tohoku University

2022年6月30日～ 2025年3月31日

詳細を見る詳細を閉じる

本研究では、実運用システムのジョブスケジューラを忠実に模擬するデジタルツインを開発している。ジョブスケジューリングを模擬するシミュレータはすでに多数開発されているが、デジタルツインと呼べるほどには実運用システムの挙動と一致しないことが事前の検討で分かっている。令和5年度には、実運用システムである東北大学スーパーコンピュータAOBAの増強が行われ、そのシステム構成が大きく変わった。このため、模擬すべき実運用システムの挙動を明らかにするために、AOBAに新規追加されたAOBA-Sサブシステムの詳細な性能評価を行うとともに、利用状況の把握とその模擬の課題を明確化した。実運用システムの利用状況は様々な要因によって変化することから、そのような運用面の変化や制約に対応するジョブスケジューリングやそれを模擬するための研究開発を行った。アクセラレータとそのホストプロセッサなど、異種複数のプロセッサが連携してアプリケーションを実行するシステム構成が、AOBAの中核となるSX-Aurora TSUBASAを含むスーパーコンピュータのシステム構成として一般化している。ただし、どちらか一方しかほとんど利用としないアプリケーションもあり、その場合にはそれぞれのプロセッサで別のアプリケーションを実行することでシステム全体としての性能を高めることができる。しかし、アプリケーション間で共有している計算資源もあるために、干渉によって性能が低下する恐れがある。そのため、性能干渉の少ないアプリケーションの組合せを予測する研究を行い、その成果が学術論文として採録された。アクセラレータとそのホストプロセッサとの性能干渉など、複数ジョブ間で一部の計算資源を共有している場合にその性能干渉を正確にモデル化することは、実運用システムを忠実に模擬するために重要であることが明らかになった。
線状降水帯の気象場変化に対する応答の解明: WRFアンサンブル計算を用いて

平賀優介, 滝沢寛之

2024年4月～ 2025年3月
中・長期障害発生予測に基づくシステム高信頼化技術の開拓

江川隆輔, 滝沢寛之, 谷村勇輔, 滝澤真一朗

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tokyo Denki University

2021年4月1日～ 2024年3月31日
ポストムーア時代のスケーラブル計算機とそのシステムソフトウェアの創成

佐野健太郎, 柴田裕一郎, 滝沢寛之, 谷川一哉, 宮島敬明, 佐藤三久, 上野知洋, 小柴篤史, Lee Jinpil

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (A)

研究機関：Institute of Physical and Chemical Research

2020年4月1日～ 2024年3月31日

詳細を見る詳細を閉じる

ポストムーア時代に適したスケーラブルデータフロー（SDF）システムの構築を目指し、布線論理型データフロー計算モデル、プログラミングモデル、システムソフト、アプリの調査検討や基本設計を行った。分担者の異動に伴いプログラミングモデルとシステムソフトを統合したPROGSYS、計算機構とモデルのCOMP、アプリのAPPの3グループ体制で研究を実施した。 COMPでは、「SDF計算モデルとそのアーキテクチャ」を検討すると共にその試作実証環境であるFPGAクラスタの整備を行った。特に、FPGAでSDF計算を動作させるシステムオンチップであるAFUShell、およびその制御のためのAPIクラスライブラリ等のシステム基盤構築を行った。 PROGSYSでは、シングルFPGA向けの計算機構やコンパイラ等の設計と実装に向けた検討を行った。特に、AFUShellに対して、次世代標準プログラミング環境として注目されているSYCLの独自実装を行い、依存関係に基づいてホストCPUとFPGAを非同期に動作させるC++コードが容易に記述できるようにした。また、FPGA以外のヘテロ構成を対象とするシステムソフトの課題について検討を行った。 APPでは、Graph500でも使用されているグラフの幅優先探索について、データフローに基づくFPGAハードウェアのプロトタイプ実装を行った。また、そのシミュレータを開発し、ボトルネックがメモリ参照であることを確認した。加えて、布線論理型データフローに基づく新たな近似凸包アルゴリズムを開発しそのプロタイプ実装を行った。その結果、事実上の標準である凸包ライブラリに比べて優位な性能が得られることや、要求に応じてハードウェア量と近似精度のトレードオフを調整できることを明らかにした。また近似精度は実用上問題無く、ドロネー三角形分割等の他の計算幾何学応用への見通しが得られた。
大規模シミュレーションによるマイクロデバイスを利用した輸送機器設計革新技術の産業利用拡大

藤井孝藏, 立川智章, 浅田健吾, 小川拓人, 滝沢寛之, 小林広明, 江川隆輔, 磯部洋子

提供機関：Tohoku University Cyber Science Center

制度名：JHPCN:Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures

研究機関：Tohoku University

2017年～ 2024年
プラズマ学際科学のためのリアル粒子シミュレーションの研究開発と応用

大谷寛明, 宇佐見俊介, 長谷川裕記, 森高外征雄, 沼波政倫, 樋田美栄子, 三浦英昭, 石黒静児, 堀内利得, 大野暢亮, 川原慎太郎, 臼井英之, 三宅洋平, 田光江, 小川智也, 深沢圭一郎, 片桐孝洋, 滝沢寛之

提供機関：Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN)

制度名：Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures: Joint Research Projects (General Joint Research Projects)

2023年4月～
日本全土の洪水氾濫被害と適応策の検討

峠嘉哉, 滝沢寛之, 風間聡, 山本道, 柳原駿太, 池本敦哉, 岡本彩果

2022年4月～ 2023年3月
日本全土の洪水氾濫被害の将来展望

風間聡, 滝沢寛之, 峠嘉哉, 柳原駿太

2021年4月～ 2022年3月
高性能計算に革新をもたらす非ノイマン型FPGAオーバーレイアーキテクチャの創出

佐野健太郎, 柴田裕一郎, 滝沢寛之, 上野知洋, 宮島敬明, 小柴篤史

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

2017年4月1日～ 2020年3月31日

詳細を見る詳細を閉じる

性能向上が鈍化しつつあるノイマン型アーキテクチャに代わりスケーラブルな次世代の高性能計算機システムを実現するために、回路再構成可能半導体であるFPGAを高度に利用するための非ノイマン型オーバーレイアーキテクチャとその基盤技術を創出した。FPGAクラスタを試作し、ハードウェア基盤とソフトウェア基盤を構築すると共に計算問題をデータフロー回路として実装するための高位合成コンパイラを開発した。幾つかの計算問題に対し、パイプライン方式を適用し性能がFPGA数に応じて向上することを示した。これは、低電力なFPGAにより高性能かつスケーラブルな計算が実現できることを実証するものである。
機械学習技術の活用による職人的プログラミングの知能化

滝沢寛之, 片桐孝洋, 横川三津夫, 南一生, 小林広明, 須田礼仁, 岡谷貴之, 江川隆輔, 大島聡史

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2016年4月1日～ 2019年3月31日

詳細を見る詳細を閉じる

本研究では，高性能計算(HPC)プログラミングの支援に機械学習を効果的に利用できる事例を示した．すでに機械学習の利用が成功している問題に変換することにより，コード最適化における種々の問題も機械学習で解決できる可能性がある．また，HPCプログラミング分野で膨大な数の訓練データを用意できる問題は稀であり，効率的な収集のためには対象問題を十分に分析する重要性が示された．さらに，HPCプログラミングと同様に，機械学習の利用においても熟練者の経験と勘に頼らなければならないが，すでに数値化されているハイパーパラメータの調整であるため，計算コストの問題に置き換えて考えることが可能であることも明らかになった．
自らを進化させ未知の計算環境に適応するソフトウェア自動チューニング機構方式の研究

須田礼仁, 滝沢寛之, 八杉昌宏, 片桐孝洋

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：The University of Tokyo

2015年4月1日～ 2018年3月31日

詳細を見る詳細を閉じる

自動チューニングは、ソフトウェアにあらかじめ可変性を仕込み、この可変性をソフトウェア自身に調整させて、様々な計算環境で良好な実行性能を目指す。本研究では、既存のプログラムに対して、事後的に可変性と調整機能を組み込むことにより、新しい計算環境や新しい高性能手法が登場しても、それを既存のプログラムに組込み自動チューニングができる仕組みを目指して研究した。我々はチームメンバーが開発してきた Xevolverというコード変換システムを活用することで、自動チューニングを想定していないプログラムに可変性と自動チューニング機構を組み込む手法を明らかにした。ただし原プログラムの分析の必要性が明らかになった。
ポストCMOSデバイスを用いたマイクロプロセッサの設計空間探索

江川隆輔, 小林広明, 滝沢寛之, 多田十兵衛, 佐藤雅之, 宇野渉, 豊嶋拓也, 坂井然太郎, 小笠原大輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2015年4月1日～ 2018年3月31日

詳細を見る詳細を閉じる

本研究では，2025年頃に実用化が期待されている新規デバイス技術を用いた高エネルギ効率マイクロプロセッサの実現を目的に，新規デバイスを用いた回路設計，メモリサブシステムに関する研究に取り組んだ．回路設計に関してはCNFETを用いたウェーブパイプライン化回路の設計手法に取り組んだ．また，メモリサブシステムに関しては，3次元積層技術，STT-RAMに着目し，将来のメモリサブシステムにおけるキャッシュバイパス機構，マルチバンクメモリのための省電力データ配置手法，ラストレベルキャッシュ(LLC)の低消費電力管理機構に関する研究に取り組み，シミュレーションによりその有効性を明らかにしている．
5.5次元設計時代のグリーンマイクロアーキテクチャの創成

江川隆輔, 多田十兵衛, 小林広明, 滝沢寛之, 佐藤雅之, 宇野渉, 西村秦, 細川麿生, 豊嶋拓也

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2014年4月1日～ 2017年3月31日

詳細を見る詳細を閉じる

本研究は，ムーアの法則終焉後のプロセッサ設計を支えることが期待されている2.5次元，および3次元実装技術，それぞれの潜在能力を十二分に引き出し，現存プロセッサを凌駕する電力効率を実現可能なマイクロアーキテクチャの実現を目指す．具体的には，微細化のみに頼らないオーバー・ザ・ムーア時代を見据え，垂直配線を積極的に利用するプロセッサ設計の要素技術に関する研究を推進した．細粒度から粗粒度まで様々な設計粒度における積層技術の有効性検討を通して，性能．・電力・コストのトレードオフを考慮しながら適材適所でTSVを活用することで，プロセッサ・システムの電力効率を飛躍的に向上可能であることを明らかにした．
ストレージ階層化時代のチェックポイント・リスタート技術の新展開

滝沢寛之, 宇野篤也, 小林広明, 江川隆輔, 佐藤幸紀

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2014年4月1日～ 2016年3月31日

詳細を見る詳細を閉じる

アプリケーション実行中にその状態を定期的に保存する状況を想定し、それを階層的なストレージに書き込む際の頻度などを自動調整する方法について検討した。また、その書き込みに要する時間を短縮する方法について検討した。そのためには、将来書き込まれる蓋然性の高いデータを投機的に書き込んでおくアプローチが有効であることから、その予測方法についても考察した。その予測のためには対象アプリケーションのメモリアクセスパターンを調べる必要があるため、メモリ解析ツールを開発した。大規模システムのジョブスケジューリングのシミュレータを開発し、これらの手法の効果を検証した。
ディペンダブルプロセッシングコデザイン型3次元プロセッサアーキテクチャの創出

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2014年4月1日～ 2016年3月31日

詳細を見る詳細を閉じる

本研究では、従来の半導体技術による製造限界、ならびにアーキテクチャ設計限界に直面するプロセッサ開発において、近年注目を集めている3次元実装技術を活用し、プロセッサの高性能化と高信頼化を実現する新たなアーキテクチャ設計技術を確立することを研究の目的としている。多くのアプリケーションの実行においてメモリサブシステムが性能制約を与えることから、本研究では3次元実装技術を活用した大規模高性能オンチップメモリ階層の設計と、これらメモリ階層を単にプログラムの実行だけでなく信頼性向上に活用できるオンラインチェックポイント機構の設計に取り組んだ。
シナジー効果を加速するソフトウェアとハードウェアの協調設計基盤

滝沢寛之, 小林広明, 青木孝文, 佐野健太郎, 江川隆輔, 多田十兵衛, 伊藤康一

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2013年4月1日～ 2016年3月31日

詳細を見る詳細を閉じる

標準プログラミング環境としてOpenCLを想定し、より多様なアクセラレータアーキテクチャを利用するために足りない機能を指摘し、OpenCLの拡張を検討した。また、OpenCLはハードウェア記述にも使われるようになってきたが、そのカーネル部分を記述するための言語としてOpenCL C言語が必ずしも効率的とは限らない点を問題視し、画像処理や高性能計算で多用される処理を記述するための高生産性言語を設計、実装した。さらには、アクセラレータごとに適切な値の異なるパラメータを自動設定する手法を提案し、その実装と評価を行った。
デバイス・アーキテクチャコデザインによるスマートユニバーサルメモリの創出

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2013年4月1日～ 2016年3月31日

詳細を見る詳細を閉じる

本研究では、メモリサブシステムがアプリケーションプログラムの振る舞いに応じて知的にデータを管理し、それにより消費エネルギー最小でアプリケーションが求めるデータ供給能力を実現する新たなメモリアーキテクチャの基本技術の確立を研究の目的としている。本研究では、知的階層型メモリサブシステムを実現するために、高バンド幅のデータ供給を低消費電力で行うためのキャッシュアーキテクチャの設計に取り組み、その有効性と今後の課題を明らかにした。
アプリケーション適応型動的超多階層メモリアーキテクチャの開発

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2012年4月1日～ 2014年3月31日

詳細を見る詳細を閉じる

本研究の目的は、アプリケーションが求めるメモリ機能・性能からアーキテクチャ設計を見直し、多階層・アプリケーション適応型オンチップメモリアーキテクチャ、及びその利用技術を確立することを目的としている。本研究では、マイクロプロセッサの高性能化・低消費電力化に向けて、キャッシュメモリを考慮した効率的な資源管理に取り組んだ。このような資源管理は、キャッシュメモリ上で発生するスレッド間資源競合の回避や、キャッシュメモリ資源の効率的な利用を可能とし、マイクロプロセッサの性能向上・消費電力の削減を可能とする。
ペタフロップス級計算機に向けた次世代ＣＦＤの研究開発

中橋和博, 山本悟, 大林茂, 小林広明, 山本一臣, 佐々木大輔, 鄭信圭, 滝沢寛之, 江川隆輔, 黒滝卓司, 榎本俊治, 今村太郎, 高橋俊

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (S)

2009年5月11日～ 2014年3月31日

詳細を見る詳細を閉じる

本研究は航空機等の空力設計に使われている現行CFDが抱える様々な課題、例えば計算結果の物理モデル依存性や複雑形状に対する作業量増大等を抜本的に解決することを目指したものである。計算機の更なる性能改善を念頭にBuilding-Cube Methodを提案し、実用化のための様々なアルゴリズム研究を行った。その成果の一つとして、自動車周りの流れを京コンピュータ上での世界トップレベルの大規模数値計算で再現した。本CFDアプローチが、極めて複雑で且つ不完全なCADデータからでも直接に流体計算を行えることを示したことは、航空機や自動車の空力設計プロセスを革新的に変える可能性を持ち、その意義は大きい。
ＨＰＣ向けアクセラレータアーキテクチャ自動生成・最適化フレームワークの研究

佐野健太郎, 滝沢寛之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2011年～ 2013年

詳細を見る詳細を閉じる

代表的な高性能計算の一つであるステンシル計算およびセルオートマトン型の計算アルゴリズムドメインに着目し、その専用ハードウェアアクセラレータを自動生成するフレームワークについて研究を実施した。本研究の成果として、シストリックアレイのためのステンシルコンパイラ、およびストリーム計算アクセラレータの高位合成コンパイラを開発した。これらは、FPGAによるリコンフィギャラブル高性能計算の生産性を向上させる重要な基盤技術である。
ペタスケール時代の複合型計算システムを支える高効率化・高信頼化技術の確立

滝沢寛之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Young Scientists (B)

研究機関：Tohoku University

2011年～ 2012年

詳細を見る詳細を閉じる

適材適所を考慮しながら、複数のプロセッサをOpenCLプログラム内では仮想的に一つに見せる技術を提案した。また、OpenCL経由で、多数かつ多様な計算ノードで構成されるシステム向けのプログラミングの新しい形を提案した。高信頼化のために、OpenCLアプリケーションの透過的チェックポイントリスタート機能を実現した。さらには、近年急速に普及したOpenACCも評価対象として考え、その実用性や現時点での制約について明らかにした。
3次元集積化新世代ベクトルマイクロアーキテクチャの創出

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2010年～ 2012年

詳細を見る詳細を閉じる

本研究では,低消費電力・高性能な次世代ベクトルプロセッサを実現するために,新たなデバイス技術として注目を集めている3次元実装技術によるマイクロアーキテクチャ設計に取り組んだ.従来の2次元設計と3次元設計をハイブリッドに活用する上での設計指針を与え,演算回路やオンチップメモリなどユニット内配線レベルからユニット間配線レベルまで,2次元配線の3次元TSV(シリコン貫通ビア)による効果的な置き換えを実現した.そして,3次元集積技術を活用して得られたプロセッサの有効性を性能評価により明らかにした.
グラフィックスハードウェアを用いた高性能計算競争的資金

制度名：Grant-in-Aid for Scientific Research

2003年4月～ 2011年9月
メニーコア・超並列時代に向けた自動チューニング記述言語の方式開発

片桐孝洋, 今村俊幸, 須田礼仁, 黒田久泰, 伊藤祥司, 岩下武史, 滝沢寛之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：The University of Tokyo

2009年～ 2011年

詳細を見る詳細を閉じる

本研究では、多様な計算機環境において高性能を達成するため、自動チューニング(AT)を実現する以下の研究開発を行った。(1)マルチコア・超並列環境に適用できるAT言語ABCLibScriptの機能拡張;(2)マルチコアCPUおよびGPUでのAT効果の検証;(3)いくつかのアプリケーションソフトウェアに新規ABCLibScript機能を適用し有効性を検証;(4)新規ABCLibScript処理系を、フリーソフトウェアとしてインターネット上に公開。
異種複数のプロセッサを適材適所で活用する高性能計算フレームワークの構築

滝沢寛之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Young Scientists (B)

研究機関：Tohoku University

2009年～ 2010年

詳細を見る詳細を閉じる

本研究の目的は,ソースコードレベルの移植性を高く維持しつつ,必要に応じて複合型計算システム中の各プロセッサの性能を効果的に利用できる高性能計算フレームワークを構築することである,そのために、高水準言語で記述されたプログラムを各種プロセッサ向けに自動チューニングする仕組みや、高水準言語からシームレスに利用できる数値計算ライブラリ、および多様な計算資源を適材適所で利用するためのスケジューリング手法を検討する。
大規模データクラスタリング処理の高速化競争的資金

1999年10月～ 2009年3月
ハードウェア・ソフトウェア協調型高効率マルチスレッドスケジューリングに関する研究

小林広明, 中村維男, 鈴木健一, 滝沢寛之, 江川隆輔, 佐藤幸紀, 小寺功, 船矢祐介, 佐藤雅之, 中村維男, 鈴木健一, 滝沢寛之, 江川隆輔, 佐藤幸紀

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2006年～ 2009年

詳細を見る詳細を閉じる

次世代のオンチップマルチコアプロセッサ(CMP)において、オンチップ計算資源の効率的活用による低消費電力高性能処理の実現を目指して、低消費電力指向高効率マルチスレッド処理技術の研究・開発を行った。具体的には、CMP上で実行されるスレッドの特徴量を定義し、この定義に基づくマルチコアプロセッサのための高効率スレッドスケジュ-リング手法を確立すると共に、高性能と低消費電力の両立を実現する動的キャッシュ分割機構を開発し、シミュレ-ションにより、その有効性を明らかにした。
遊休計算資源を用いた大規模分散計算競争的資金

制度名：Grant-in-Aid for Scientific Research

2003年3月～ 2008年3月
ヘテロジニアス・マルチコア時代の統一的ソフトウェア開発手法に関する研究

滝沢寛之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Young Scientists (B)

研究機関：Tohoku University

2007年～ 2008年

詳細を見る詳細を閉じる

本研究では,CPUと描画処理用ユニット(Graphics Processing Unit, GPU)を搭載している一般的なPCを想定し,そのCPUとGPU間での移植性を維持しつつ両者を効果的に利用可能な高級プログラミング言語処理系としてSPRAT(Stream Programming with Runtime Auto-Tuning)を設計・実装・評価した.LU分解および2次元非圧縮流体シミュレーションをアプリケーション例としてSPRAT言語で記述し,PCに搭載されたCPUとGPUの性能差や問題サイズなどの実行時パラメータに応じて,SPRAT実行環境がプロセッサを適切に自動切り替え可能であることを評価実験により示した.また,実行時間が最短になるようにプロセッサを切り替える手法や,アプリケーション実行に要するエネルギ消費を最小にするプロセッサ切り替えなどを試し,SPRAT実行時環境がそれぞれの観点から適切にプロセッサを切り替え可能であることを示した.さらに,高水準のSPRAT言語からGPU 向けのCUDAコードを生成する言語処理系(SPRATコンパイラ)に2 種類の自動最適化機能を実装し,それらの演算性能への影響を評価した.その結果,自動最適化機能により,SPRATコンパイラによって自動生成されたコードを実行した際の性能を大幅に改善できることが示された.
安全・安心なボランティアコンピューティングによる超大規模データマイニング

小林広明, 滝沢寛之

2007年～ 2008年

詳細を見る詳細を閉じる

本研究は, 家庭用ゲーム機の機能・性能を活用するボランティアコンピューティングによって, 大規模データマイニングを実現するための基盤技術を確立することを目的としている. 平成20年度には, ロケット噴射ノズル近辺での物理現象の解析を行う分散データマイニングシステムを構築し, PLAYSTATION 3およびInTriggerから構成されるボランティアコンピューティング環境で大規模データマイニングの実証実験を行った. その結果, 動的負荷分散の実施方法として従来通り集中型のタスクスケジューリングを用いる場合, 計算資源の増加に伴い動的負荷分散が効率的に行えなくなり, 大規模ボランティアコンピューティング環境で期待する性能を実現することができないことが示された. 一方, 本研究で提案している分散協調型スケジューリング機構では計算資源の台数が増加しても動的負荷分散を効率的に実施すること可能であることが明らかになった. 本評価実験より, 提案機構が大規模ボランティアコンピューティング環境における動的負荷分散を実現する有効な機構であることが明らかになった. また, 複数のプロジェクトに参加するボランティアが遊休計算能力を浪費しないために, ワーカ側でのスケジューリング手法も提案した. ボランティアコンピューティングの信頼性を高めるための仕組みとして, 計算結果の妥当性を効率的に確認する車法も提案した. 各ワーカの信頼度を定量化し, 計算結果妥当評価に基づいて信頼度を変化させることによって, 不正なワーカを検出できることをシミュレーションにより明らかにした. さらに, 家庭用ゲーム機が高い描画処理性能を有している点に着目し, その描画処理性能をデータマイニングのために利用する方法について検討し, そのようなプログラミングを容易に行うためのプログラミングフレームワークについても研究した.
安全・安心なボランティアコンピューティングによる超大規模データマイニング

小林広明, 滝沢寛之

2006年～ 2006年

詳細を見る詳細を閉じる

本年度には、代表的なデータマイニング手法の中でも特に高い演算性能が要求されるデータクラスタリング(Data Clustering, DC)とニューラルネットワーク(Neural Networks, NN)に着目し、それらの処理を家庭用ゲーム機で効率良く実行するための実装方法について検討した。具体的には,家庭用ゲーム機に搭載されている高性能プロセッサであるCell Broadband Engine(CBE)や、描画処理ユニット(Graphics Processing Unit, GPU)をデータマイニング処理に効果的に利用する方法について研究し、実装と定量的性能評価を行った。大規模P2Pコンピューティングに関する研究として、ネットワーク上に遍在する膨大な数の遊休計算機資源から、利用者の要望を満たす計算機資源を効率良く検索するための分散型計算資源管理機構について研究した。研究成果として、利用者からの要望には計算機のメモリアクセスの振舞いに見られるような時間的、空間的な局所性が存在し、それらの局所性を利用することで探索効率の飛躍的改善が可能であることが明らかにした。本年度は特に不均質な環境下での資源探索を考慮し、利用される頻度に応じてP2P通信の接続数を自動調整する仕組みについて検討した。また、膨大な数の計算機を連携させるための仕組みとして、完全分散型の動的負荷分散機構についても研究を進め、その基本制御方式を設計した。耐タンパー性計算による安全・安心な分散データマイニングシステムをボランティア計算基盤に実現するための準備として、本年度は開発環境の構築を行った。また、関連資料を収集するとともに、関係者との議論を行った。
多次元時系列データマイニングのためのクラスタリング手法とその並列化

滝沢寛之

2003年～ 2004年

詳細を見る詳細を閉じる

データクラスタリングのためには最近傍のクラスタ探索(最近傍探索)のために高次元ベクトル間の距離計算を多くの回数行う必要があり、大規模な問題に適用する場合にはその計算負荷が大きな課題となる。本研究では平成15年度に、近年のパーソナルコンピュータ(PC)用描画ハードウェア(GPU)の急速な発展に着目し、一般的なGPUを並列プロセッサとして利用すること(GPGPU)で高速な最近傍探索を実現した。さらに、平成16年度はその研究成果を応用して、GPUとCPUとの協調によりデータクラスタリングを高速に行う手法を開発した。この手法は最近傍探索距離の有する2種類の並列性を効果的に利用可能であり、その成果は国際会議において最優秀論文賞を受賞するなど学術的に非常に高く評価された。また、データクラスタリングに適用可能な競合学習をPCクラスタで効果的に並列実行する手法を提案し、その成果が国際学術論文誌に掲載された。データマイニングの重要な要素である可視化についても引き続き検討し、北海道大学-東北大学間のスーパーSINETによる接続実験により、可視化サーバを対話的に遠隔利用できることを実証実験した。物理的に遠隔地にある演算サーバを利用してクラスタリング処理やその後のボリュームレンダリング等の可視化処理を行い、データマイニングに利用可能であることが実証された。その成果は学術論文誌に掲載予定である。 Chinrunguengらの手法は、部分歪みエントロピを用いてクラスタの最適性を評価することにより平均歪みを最小化する。しかし、適切なクラスタを形成するまでに多数回の繰返し計算が必要であり、時系列データの時間変化に対して迅速に追従できない可能性がある。本研究では、部分歪みエントロピに基づいて適切にクラスタを再配置する手法を新たに提案し、動画像の適応ベクトル量子化に適用することよって追従速度と歪み最小化性能との両立を実現できることを確認した。
3次元グラフィックス用インテリジェントメモリアーキテクチャに関する研究

小林広明, 中村維男, 鈴木健一, 滝沢寛之, 佐野健太郎

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2002年～ 2004年

詳細を見る詳細を閉じる

本研究により、以下のような成果が得られた。 (1)高性能グラフィックスアルゴリズムとそのハードウェア化に関する成果大域照明モデルに基づくレンダリングアルゴリズムの持つ並列性とデータ参照の局所性の解析を行い、新たなレンダリングパイプラインの基本アーキテクチャを設計した。さらに、本レンダリングパイプラインのハードウェアアルゴリズムを設計・開発した。さらに、ソフトウェアシミュレーションにより性能評価を行い、リアルタイムレンダリングの実現可能性を明らかにした。さらに、ウォークスルーアニメーション用高速レンダリングアルゴリズムを開発し、性能評価によりその有効性を明らかにした。 (2)省電力メモリ制御機構に関する成果本グラフィックスアーキテクチャを携帯端末などの低消費電力指向の情報機器に組込むことを目的として、電力あたりの計算効率が最大になる動的再構成可能メモリシステムの基本設計をした。計算負荷の変動に応じてシステムの演算器・メモリ要素を活性化・不活性化可能な動的再構成可能インテリジェントメモリ機構の設計を行い、活性化ハードウェア量とその性能への影響を定量的に評価し、アプリケーションの計算資源要求の時間変化に応じてハードウェアを最適制御できることを明らかにした。 (3)グラフィックスハードウェア用データ圧縮アルゴリズムに関する成果グラフィックスデータの高効率・高性能圧縮技術に関する研究を行った。ボリュームデータにベクトル量子化技術を適用し、情報損失最小下での高効率データ圧縮を実現した。さらに、圧縮データに直接適用可能な可視化アルゴリズムを開発し、高速ボリュームレンダリングを実現した。データ圧縮の主処理であるデータクラスタリングの高速化を目的として、グラフィックスハードウェア上で動作可能な並列高次元ベクトル間距離計算アルゴリズムを開発した。
ニューラルネットワークの能動的学習の効率化競争的資金

1995年4月～ 1999年10月

︎全件表示 ︎最初の5件までを表示

社会貢献活動 2

GPUコンピューティングセミナー@東北大学

2009年12月17日～

詳細を見る詳細を閉じる

企業主催のセミナーにて、関連研究分野の最新の動向と今後の展望について講演
仙台高等専門学校専攻研究特別講義

2009年12月16日～

詳細を見る詳細を閉じる

仙台高等専門学校広瀬キャンパスにて特別講義

メディア報道 1

Young HPC Researchers Take Global Stage

HPCwire

2014年5月15日

メディア報道種別: その他

その他 4

ExaFSA

詳細を見る詳細を閉じる

Developing numerical simulations of Fluid-Structure-Acoustic Iteractions
ポストペタスケール高性能計算に資するシステムソフトウェア技術の創出

詳細を見る詳細を閉じる

これまでに開発されてきた貴重なソフトウェア資産をポストペタ世代の超並列複合システムへ円滑に移行する方法論の確立は、ここ数年で成し遂げなければならない重要課題であり、その作業を支援する開発環境の実現が強く望まれている。本研究では、既存のソフトウェア資産との親和性やソフトウェア開発の連続性を考慮し、既存のものをベースに新しい環境を創出する進化的アプローチによって超並列複合システム向けの開発環境の実現を目指す。すなわち、言語処理系、ライブラリ、実行時環境、支援ツール群、およびアプリケーションの各レベルで超並列複合システム向けのソフトウェア開発の新技術を開発し、それらに基づく開発環境を実現する。
対話的物理シミュレーションのラピッドプロトタイピング環境の構築

詳細を見る詳細を閉じる

本研究の目的は、対話的物理シミュレーションとそれに連携する写実的画像生成アプリケーションの開発を補助するため、現在一般的なゲーム機に搭載されている複数のプロセッサを容易に適材適所で利用可能な開発環境を実現することである。近年、ゲーム機の描画性能は飛躍的に向上し、実物と見間違うほどの画像を対話的に描画することが可能になりつつある。しかし、ゲーム画面が写実的であればあるほど、さらなる高品質な写実的画像を生成するためには物理法則に合わない動きの不自然さが顕著になる。したがって、プレーヤーに仮想現実感を与えるためには、ゲーム画面中に描画される人物や物体が物理法則の観点からみて自然に動く必要があり、対話性が求められるゲームの分野では対話的物理シミュレーションとそれに基づく写実的画像生成が今後ますます重要になる。このため、本研究ではゲーム開発の初期段階において高性能な対話的物理シミュレーションを容易に試作して試行錯誤するための環境を構築する。
ICTエコ社会を創造する安全・安心・安価なユビキタスコンピューティングプラットフォームの研究・開発

詳細を見る詳細を閉じる

情報通信分野でのエコロジーモデルの確立を目指し、社会に遍在する計算資源として活用する、ユビキタス時代の安心・安全・安価なボランティアコンピューティング基盤を研究開発する。特にボランティアコンピューティングの高効率化、高信頼化、および参加を促進するインセンティブモデルについて研究し、機密性の高い計算にも利用可能で、しかも従来の実装技術では実現困難な規模の大規模計算基盤を安価に提供するための基盤技術を確立する。