東北大学研究者紹介

コマツ　カズヒコ

小松　一彦

Kazuhiko Komatsu

所属

サイバーサイエンスセンター　研究開発部　高性能計算技術開発（ＮＥＣ）共同研究部門

職名

准教授

学位

博士（情報科学）（東北大学）
修士（情報科学）（東北大学）

researchmap

https://researchmap.jp/kazuhiko_komatsu

J-GLOBAL ID

201301078615753866

ORCID

https://orcid.org/0000-0003-4463-8359

経歴 7

2024年4月～継続中

東北大学　サイバーサイエンスセンター　特任教授（研究）
2022年1月～継続中

東北大学　多元物質科学研究所　准教授 (兼務)
2017年10月～ 2024年3月

東北大学　サイバーサイエンスセンター　准教授
2012年4月～ 2017年9月

東北大学　サイバーサイエンスセンター　助教
2015年8月～ 2015年9月

ドイツジーゲン大学　計算センター　客員研究員
2008年4月～ 2012年3月

東北大学　サイバーサイエンスセンター　ポスドク
2010年10月～ 2010年12月

ドイツシュトゥットガルト大学　高性能計算センター　客員研究員

︎全件表示 ︎最初の5件までを表示

委員歴 51

IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2024)　Program Committee Member

2024年1月～継続中
xSIG 2024　Program committee member

2024年1月～継続中
IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid2024)　Program Committee Member

2023年9月～継続中
19th International Workshop on Automatic Performance Tuning (iWAPT2024)　Program Committee

2023年8月～継続中
11th International Workshop on Computer Systems and Architectures (CSA'23)　Program Committee Member

2023年5月～継続中
7th International Workshop on GPU Computing and AI (GCA'23)　Program Committee Member

2023年4月～継続中
情報処理学会ハイパフォーマンスコンピューティング(HPC)研究会　幹事

2023年4月～継続中
International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT’22)　Registration and Finance Chair

2022年5月～継続中
情報処理学会東北支部　運営委員

2021年4月～継続中
文部科学省科学技術政策研究所科学技術動向研究センター　専門調査員

2014年4月～継続中
情報処理学会論文誌コンピューティングシステムACS　編集委員

2020年4月～ 2024年3月
Performance Optimization and Auto-Tuning of Software on Multicore/Manycore Systems (POAT) 2023　Program Committee Member

2023年2月～ 2023年12月
IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2023)　Program Committee Member

2023年2月～ 2023年12月
18th International Workshop on Automatic Performance Tuning (iWAPT2023)　Program Committee

2022年6月～ 2023年5月
情報処理学会ハイパフォーマンスコンピューティング(HPC)研究会　運営委員

2019年4月～ 2023年3月
IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2022)

2022年1月～ 2022年12月
Auto-Tuning for Multicore and GPU (ATMG2022)　Program Committee Member

2022年1月～ 2022年12月
6th International Workshop on GPU Computing and AI (GCA'22)　Program Committee Member

2021年12月～ 2022年11月
8th International Workshop on Large-scale HPC Application Modernization (LHAM2022)　Program Committee Member

2022年4月～ 2022年9月
17th International Workshop on Automatic Performance Tuning (iWAPT2022)　Program Committee

2021年6月～ 2022年5月
8th International Workshop on Large-scale HPC Application Modernization (LHAM2021)　Program Committee member

2020年12月～ 2021年12月
Auto-Tuning for Multicore and GPU (ATMG2021)　Program Committee Member

2020年12月～ 2021年12月
IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2021)　Program Committee member

2019年12月～ 2021年12月
6th International Workshop on GPU Computing and AI (GCA'21)　Program Committee member

2020年12月～ 2021年11月
ISC Hig Performance (ISC'21)　Research Poster Committee Member

2020年7月～ 2021年6月
16th International Workshop on Automatic Performance Tuning (iWAPT2021)　Program Committee Chair

2020年6月～ 2021年5月
情報処理学会東北支部　運営委員会計幹事

2019年4月～ 2021年3月
HPC Asia 2021　PC member

2020年2月～ 2021年1月
7th International Workshop on Large-scale HPC Application Modernization (LHAM2020)　Program Committee member

2019年12月～ 2020年11月
IEEE 16th IEEE Asia Pacific Conference on Circuits and Systems (APCCAS2020)　Program Committee Member

2020年5月～ 2020年10月
2020年度電気関係学会東北支部連合大会　実行委員

2019年9月～ 2020年8月
2020年度電気関係学会東北支部連合大会　プログラム委員

2019年9月～ 2020年8月
ISC Hig Performance (ISC'20)　Research Poster Committee Member

2019年7月～ 2020年6月
15th International Workshop on Automatic Performance Tuning (iWAPT2020)　Program Committee Vice Chair

2019年6月～ 2020年5月
HPC Asia 2020　PC member

2019年2月～ 2020年1月
Auto-Tuning for Multicore and GPU (ATMG2019)　Program Committee

2018年12月～ 2019年12月
IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019)　Program Committee Member

2018年10月～ 2019年12月
7th International Workshop on Computer Systems and Architectures (CSA 2019)　PC member

2019年4月～ 2019年11月
2019年度電気関係学会東北支部連合大会　プログラム委員

2019年4月～ 2019年8月
2019年度電気関係学会東北支部連合大会　実行委員

2019年4月～ 2019年8月
LHAM2018　PC member

2017年12月～ 2018年11月
ATMG2018プログラム委員会　プログラム委員長

2017年11月～ 2018年9月
LHAM2017　PC member

2016年12月～ 2017年11月
HPCS2017プログラム委員会　プログラム委員

2016年12月～ 2017年3月
LHAM2016 Organizing Committee　組織委員

2016年4月～ 2017年3月
LHAM2016 Program Committee　プログラム委員

2016年4月～ 2017年3月
HPCS2016プログラム委員会　プログラム委員

2015年11月～ 2016年3月
HPCS2016組織委員会　組織委員

2015年6月～ 2016年3月
LHAM2015 Organizing Committee　組織委員

2015年4月～ 2016年3月
LHAM2015 Program Committee　プログラム委員

2015年4月～ 2016年3月
HP3C'14　Program Committee

2013年8月～ 2013年12月

︎全件表示 ︎最初の5件までを表示

所属学協会 2

情報処理学会
IEEE

研究キーワード 3

Quantum computing
Data science
High performance computing

研究分野 2

情報通信 / 計算機システム /
情報通信 / 高性能計算 /

受賞 14

情報処理学会第86回全国大会学生奨励賞

2024年3月　情報処理学会　機械学習モデルを用いた断層パラメータ予測に関する一検討
情報処理学会第86回全国大会学生奨励賞

2024年3月　情報処理学会　VVCの高速化のためのフレーム差分画像を用いたブロック分割に関する一検討
情報処理学会第85回全国大会学生奨励賞

2023年3月　情報処理学会　複数の自動並列化情報を用いたスレッド並列化に関する一検討
情報処理学会第85回全国大会学生奨励賞

2023年3月　情報処理学会　VVC映像符号化並列処理のための映像分割に関する一検討
Best Paper Award

2022年12月　23rd International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT’22)　A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders
Best Poster Award

2022年4月　2022 IEEE Symposium in Low-Power and High-Speed Chips　A Shared Cache Architecture for VVC Coding
情報処理学会第84回全国大会大会優秀賞

2022年3月　情報処理学会　デジタルツインタービンを用いた異常検知のための空間探索手法に関する一検討
情報処理学会第84回全国大会学生奨励賞

2022年3月　情報処理学会　デジタルツインタービンを用いた異常検知のための空間探索手法に関する一検討
Eighth International Symposium on Computing and Networking (CANDAR'20) Best Paper Award

2020年11月　Combinatorial Clustering based on an Externally-defined One-hot Constraint
PaCT2019(15th International Conference on Parallel Computing Technologies) Best Paper Award

2019年8月　Analysis of relationship between SIMD-processing features used in NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors
International Supercomputing Conference(ISC2019) Best poster award

2019年6月　A Skewed Multi-Bank Cache for Vector Processors
技術貢献賞

2018年7月　NEC C&C システムユーザー会　新ベクトルプロセッサ SX−Aurora TSUBASAの基本性能評価
International Symposium on Computing and Networking (CANDAR'15) Best workshop paper award (International Workshop on Legacy HPC Application Migration (LHAM2015))

2015年12月10日　CANDAR'15
第１０回東北支部野口研究奨励賞

2015年6月17日　情報処理学会東北支部

︎全件表示 ︎最初の5件までを表示

論文 130

File I/O Cache Performance of Supercomputer Fugaku Using an Out-of-core Direct Numerical Simulation Code of Turbulence 査読有り

Yuto Hatanaka, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Akihiro Musa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi, Mitsuo Yokokawa

24th International Conference on Computational Science　2024年7月
An Asymptotic Parallel Linear Solver and Its Application to Direct Numerical Simulation for Compressible Turbulence 査読有り

Mitsuo Yokokawa, Taiki Matsumoto, Ryo Takegami, Yukiya Sugiura, Naoki Watanabe, Yoshiki Sakurai, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

24th International Conference on Computational Science　2024年7月
Quantum annealing-based algorithm for lattice gas automata 査読有り

Yuichi Kuya, Kazuhiko Komatsu, Kouki Yonaga, Hiroaki Kobayashi

Computers & Fluids　106238-106238　2024年3月
出版者・発行元：Elsevier BV
DOI： 10.1016/j.compfluid.2024.106238 　

ISSN：0045-7930
Appropriate Graph-Algorithm Selection for Edge Devices Using Machine Learning 査読有り

Yusuke Fukasawa, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　544　(551)　2023年12月18日
出版者・発行元：IEEE
DOI： 10.1109/mcsoc60832.2023.00086 　
A Constraint Partition Method for Combinatorial Optimization Problems 査読有り

Makoto Onoda, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　600　(607)　2023年12月18日
出版者・発行元：IEEE
DOI： 10.1109/mcsoc60832.2023.00093 　
Multi-scale Loss based Electron Microscopic Image Pair Matching Method 査読有り

Chunting Duan, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

In Proceedings of 22nd IEEE International Conference on Machine Learning and Applications　1957-1964　2023年12月

DOI： 10.1109/ICMLA58977.2023.00295 　
Investigating the Characteristics of Ising Machines 査読有り

Kazuhiko Komatsu, Makoto Onoda, Masahito Kumagai, Hiroaki Kobayashi

Proceedings of IEEE International Conference on Quantum Computing and Engineering　939-948　2023年9月17日
クラスタ型アーキテクチャのメモリ性能特性に関する一検討査読有り

佐藤雅之, 小松一彦, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　16　(1)　1-13　2023年7月
Performance evaluation of parallel direct numerical simulation code on supercomputer SX-Aurora TSUBASA 査読有り

Mitsuo Yokokawa, Yujiro Takenaka, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Computers & Fluids　261　105913-105913　2023年7月
出版者・発行元：Elsevier BV
DOI： 10.1016/j.compfluid.2023.105913 　

ISSN：0045-7930
Performance Evaluation of Tsunami Evacuation Route Planning on Multiple Annealing Machines 査読有り

Yihui Liu, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the 20th ACM International Conference on Computing Frontiers　185-188　2023年5月9日
出版者・発行元：ACM
DOI： 10.1145/3587135.3592193 　
I/O Performance Evaluation of a Memory-Saving DNS Code on SX-Aurora TSUBASA 査読有り

Mitsuo Yokokawa, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Taiki Matsumoto, Akihiro Musa, Kazuhiko Komatsu, Takashi Ishihara, Hiroaki Kobayashi

2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　692-696　2023年5月
出版者・発行元：IEEE
DOI： 10.1109/ipdpsw59300.2023.00117 　
Ising-Based Kernel Clustering 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Algorithms　16　(4)　214-214　2023年4月19日
出版者・発行元：MDPI AG
DOI： 10.3390/a16040214 　

eISSN：1999-4893

詳細を見る詳細を閉じる

Combinatorial clustering based on the Ising model is drawing attention as a high-quality clustering method. However, conventional Ising-based clustering methods using the Euclidean distance cannot handle irregular data. To overcome this problem, this paper proposes an Ising-based kernel clustering method. The kernel clustering method is designed based on two critical ideas. One is to perform clustering of irregular data by mapping the data onto a high-dimensional feature space by using a kernel trick. The other is the utilization of matrix–matrix calculations in the numerical libraries to accelerate preprocess for annealing. While the conventional Ising-based clustering is not designed to accept the transformed data by the kernel trick, this paper extends the availability of Ising-based clustering to process a distance matrix defined in high-dimensional data space. The proposed method can handle the Gram matrix determined by the kernel method as a high-dimensional distance matrix to handle irregular data. By comparing the proposed Ising-based kernel clustering method with the conventional Euclidean distance-based combinatorial clustering, it is clarified that the quality of the clustering results of the proposed method for irregular data is significantly better than that of the conventional method. Furthermore, the preprocess for annealing by the proposed method using numerical libraries is by a factor of up to 12.4 million × from the conventional naive python’s implementation. Comparisons between Ising-based kernel clustering and kernel K-means reveal that the proposed method has the potential to obtain higher-quality clustering results than the kernel K-means as a representative of the state-of-the-art kernel clustering methods.
A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders 査読有り

Masayuki Sato, Yuya Omori, Ryusuke Egawa, Ken Nakamura, Daisuke Kobayashi, Hiroe Iwasaki, Kazuhiko Komatsu, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　288-300　2023年4月8日
出版者・発行元：Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_23 　

ISSN：0302-9743

eISSN：1611-3349
Analysis of Precision Vectors for Ising-Based Linear Regression 査読有り

Kaho Aoyama, Kazuhiko Komatsu, Masahito Kumagai, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　251-261　2023年4月8日
出版者・発行元：Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_20 　

ISSN：0302-9743

eISSN：1611-3349
Page-Address Coalescing of Vector Gather Instructions for Efficient Address Translation 査読有り

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of 2022 IEEE/ACM 12th Workshop on Irregular Applications: Architectures and Algorithms (IA3)　1-8　2022年11月

DOI： 10.1109/IA356718.2022.00007 　
Squeezed-Quantum-Noise-Assisted Optimization for Quadratic Binary Problems by CIM-CAC

Masahito Kumagai, Yoshihisa Yamamoto, Yoshitaka Inui, Satoshi Kako, Kazuhiko Komatsu, Hiroaki Kobayashi

Coherent Network Computing 2022 (CNC2022)　2022年10月
A hierarchical wavefront method for LU-SGS 査読有り

Kazuhiko Komatsu, Yuta Hougi, Masayuki Sato, Hiroaki Kobayashi

Computers & Fluids　245　105572-105572　2022年9月
出版者・発行元：Elsevier BV
DOI： 10.1016/j.compfluid.2022.105572 　

ISSN：0045-7930
A Metadata Prefetching Mechanism for Hybrid Memory Architectures 査読有り

Shunsuke TSUKADA, Hikaru TAKAYASHIKI, Masayuki SATO, Kazuhiko KOMATSU, Hiroaki KOBAYASHI

IEICE Transactions on Electronics　E105.C　(6)　232-243　2022年6月1日
出版者・発行元：Institute of Electronics, Information and Communications Engineers (IEICE)
DOI： 10.1587/transele.2021lhp0004 　

ISSN：0916-8524

eISSN：1745-1353
High-Performance GraphBLAS Backend Prototype for NEC SX-Aurora TSUBASA 査読有り

Ilya Afanasyev, Kazuhiko Komatsu, Dmitry Lichmanov, Vadim Voevodin, Hiroaki Kobayashi

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　221-229　2022年5月
出版者・発行元：IEEE
DOI： 10.1109/ipdpsw55747.2022.00050 　
Prediction of turbine blade condition using supervised machine learning trained by digital-twin simulation 査読有り

Issei Fukamizu, Kazuhiko Komatsu, Masahito Kumagai, Hironori Miyazawa, Takashi Furusawa, Satoru Yamamoto, Hiroaki Kobayashi

International Conference on Parallel Computational Fluid Dynamics 2022　2022年5月
A Shared Cache Architecture for VVC Coding 査読有り

Yoshiaki Kondo, Masayuki Sato, Ken Nakamura, Yuya Omori, Daisuke Kobayashi, Hiroe Iwasaki, Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　2022年4月
Detection of Machinery Failure Signs From Big Time-Series Data Obtained by Flow Simulation of Intermediate-Pressure Steam Turbines

Kazuhiko Komatsu, Hironori Miyazawa, Cheng Yiran, Masayuki Sato, Takashi Furusawa, Satoru Yamamoto, Hiroaki Kobayashi

Journal of Engineering for Gas Turbines and Power　144　(1)　2022年1月1日
出版者・発行元：ASME International
DOI： 10.1115/1.4052142 　

ISSN：0742-4795

eISSN：1528-8919

詳細を見る詳細を閉じる

Abstract The periodic maintenance, repair, and overhaul (MRO) of turbine blades in thermal power plants are essential to maintain a stable power supply. During MRO, older and less-efficient power plants are put into operation, which results in wastage of additional fuels. Such a situation forces thermal power plants to work under off-design conditions. Moreover, such an operation accelerates blade deterioration, which may lead to sudden failure. Therefore, a method for avoiding unexpected failures needs to be developed. To detect the signs of machinery failures, the analysis of time-series data is required. However, data for various blade conditions must be collected from actual operating steam turbines. Further, obtaining abnormal or failure data is difficult. Thus, this paper proposes a classification approach to analyze big time-series data alternatively collected from numerical results. The time-series data from various normal and abnormal cases of actual intermediate-pressure steam-turbine operation were obtained through numerical simulation. Thereafter, useful features were extracted and classified using K-means clustering to judge whether the turbine is operating normally or abnormally. The experimental results indicate that the status of the blade can be appropriately classified. By checking data from real turbine blades using our classification results, the status of these blades can be estimated. Thus, this approach can help decide on the appropriate timing for MRO.
Optimizations of a Linear Matrix Solver in a Composite Simulation for a Vector Computer 査読有り

Zhilin He, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)　33-37　2021年12月10日
出版者・発行元：IEEE
DOI： 10.1109/paap54281.2021.9720445 　
A dynamic parameter tuning method for SpMM parallel execution 査読有り

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Concurrency and Computation: Practice and Experience　e6755　2021年12月9日
出版者・発行元：Wiley
DOI： 10.1002/cpe.6755 　

ISSN：1532-0626

eISSN：1532-0634
Ising-Based Combinatorial Clustering Using the Kernel Method 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　197-203　2021年12月
出版者・発行元：IEEE
DOI： 10.1109/mcsoc51149.2021.00037 　
An Externally-Constrained Ising Clustering Method for Material Informatics 査読有り

Kazuhiko Komatsu, Masahito Kumagai, Ji Qi, Masayuki Sato, Hiroaki Kobayashi

2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)　201-204　2021年11月
出版者・発行元：IEEE
DOI： 10.1109/candarw53999.2021.00040 　
Register Flush-free Runahead Execution for Modern Vector Processors 査読有り

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)　114-125　2021年10月
出版者・発行元：IEEE
DOI： 10.1109/sbac-pad53543.2021.00023 　
Optimizing Load Balance in a Parallel CFD Code for a Large-scale Turbine Simulation on a Vector Supercomputer 査読有り

Osamu Watanabe, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　8　(2)　114-130　2021年9月14日
出版者・発行元：FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210207 　

ISSN：2313-8734
Distributed Graph Algorithms for Multiple Vector Engines of NEC SX-Aurora TSUBASA Systems 査読有り

Ilya V. Afanasyev, Vadim V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　8　(2)　95-113　2021年9月14日
出版者・発行元：FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210206 　

ISSN：2313-8734
Performance and Power Analysis of a Vector Computing System 査読有り

Kazuhiko Komatsu, Akito Onodera, Erich Focht, Soya Fujimoto, Yoko Isobe, Shintaro Momose, Masayuki Sato, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　8　(2)　75-94　2021年9月14日
出版者・発行元：FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210205 　

ISSN：2313-8734
Efficient Mixed-Precision Tall-and-Skinny Matrix-Matrix Multiplication for GPUs 査読有り

Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

International Journal of Networking and Computing　11　(2)　267-282　2021年7月
出版者・発行元：IJNC Editorial Committee
DOI： 10.15803/ijnc.11.2_267 　

ISSN：2185-2839

eISSN：2185-2847
An External Definition of the One-Hot Constraint and Fast QUBO Generation for High-Performance Combinatorial Clustering 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

International Journal of Networking and Computing　11　(2)　463-491　2021年7月
出版者・発行元：IJNC Editorial Committee
DOI： 10.15803/ijnc.11.2_463 　

ISSN：2185-2839

eISSN：2185-2847
A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs 査読有り

Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　779-788　2021年6月
出版者・発行元：IEEE
DOI： 10.1109/ipdpsw52791.2021.00116 　
Performance Evaluation of Parallel DNS Codes on the Supercomputer SX-AURORA TSUBASA 査読有り

Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

International Conference on Parallel Computational Fluid Dynamics 2020-2021　2021年5月
A hierarchical wavefront method for LU-SGS on modern multi-core vector processors 査読有り

Yuta Hougi, Kazuhiko Komatsu, Osamu Watanabe, Masayuki Sato, Hiroaki Kobayashi

International Conference on Parallel Parallel Computational Fluid Dynamics 2020-2021　2021年5月
A Metadata Prefetching Mechanism for Hybrid Memory Architectures 査読有り

Shunsuke Tsukada, Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　1-3　2021年4月14日
出版者・発行元：IEEE
DOI： 10.1109/coolchips52128.2021.9410321 　
Optimizations of DNS Codes for Turbulence on SX-Aurora TSUBASA 招待有り

Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2019 and 2020　51-59　2021年3月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-030-68049-7_4 　
Performance Evaluation of SX-Aurora TSUBASA and Its QA-Assisted Application Design 招待有り

Hiroaki Kobayashi, Kazuhiko Komatsu

Sustained Simulation Performance 2019 and 2020　3-20　2021年3月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-030-68049-7_1 　
A Dynamic Parameter Tuning Method for High Performance SpMM 査読有り

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　318-329　2021年2月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-030-69244-5_28 　

ISSN：0302-9743

eISSN：1611-3349
A Deep Reinforcement Learning Based Feature Selector 査読有り

Yiran Cheng, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Parallel Architectures, Algorithms and Programming　1362　378-389　2021年2月
出版者・発行元：Springer Singapore
DOI： 10.1007/978-981-16-0010-4_33 　

ISSN：1865-0929

eISSN：1865-0937
VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture 査読有り

Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

The Journal of Supercomputing　77　8694-8715　2021年1月26日
出版者・発行元：Springer Science and Business Media LLC
DOI： 10.1007/s11227-020-03564-9 　

ISSN：0920-8542

eISSN：1573-0484
Optimization of the Himeno Benchmark for SX-Aurora TSUBASA 査読有り

Akito Onodera, Kazuhiko Komatsu, Soya Fujimoto, Yoko Isobe, Masayuki Sato, Hiroaki Kobayashi

Benchmarking, Measuring, and Optimizing　127-143　2021年
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-030-71058-3_8 　

ISSN：0302-9743

eISSN：1611-3349
Evaluation of Tsunami Inundation Simulation using Vector-Scalar Hybrid MPI on SX-Aurora TSUBASA 査読有り

Akihiro Musa, Takashi Soga, Takashi Abe, Masayuki Sato, Kazuhiko Komatsu, Shunichi Koshimura, Hiroaki Kobayashi

International Conference for High Performance Computing, Networking, Storage, and Analysis 2020 (SC'20) Poster　2020年11月
Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

2020 Eighth International Symposium on Computing and Networking (CANDAR)　59-68　2020年11月
出版者・発行元：IEEE
DOI： 10.1109/candar51075.2020.00015 　
An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations 査読有り

Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW)　164-167　2020年11月
出版者・発行元：IEEE
DOI： 10.1109/candarw51189.2020.00041 　
Developing an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA 査読有り

Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Communications in Computer and Information Science　131-145　2020年7月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-030-55326-5_10 　

ISSN：1865-0929

eISSN：1865-0937
Metadata Management for Large-Scale Hybrid Memory Architectures 査読有り

Shunsuke Tsukada, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

International Supercomputing Conference 2020 (ISC2020) Research Poster Session　2020年6月
An Evaluation of a Hierarchical Clustering Method Using Quantum Annealing 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

International Supercomputing Conference 2020 (ISC2020) Research Poster Session　2020年6月
Optimizations for the Himeno Benchmark on Vector Computing System SX-Aurora TSUBASA 査読有り

Akito Onodera, Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato, Yoko Isobe, Hiroaki Kobayashi

International Supercomputing Conference 2020 (ISC2020) Research Poster Session　2020年6月
I/O Performance of the SX-Aurora TSUBASA 査読有り

Mitsuo Yokokawa, Ayano Nakai, Kazuhiko Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　27-35　2020年5月
出版者・発行元：IEEE
DOI： 10.1109/ipdpsw50202.2020.00014 　
Importance of Selecting Data Layouts in the Tsunami Simulation Code 査読有り

Takumi Kishitani, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　830-837　2020年5月
出版者・発行元：IEEE
DOI： 10.1109/ipdpsw50202.2020.00140 　
Xevolver: A code transformation framework for separation of system‐awareness from application codes 査読有り

Kazuhiko Komatsu, Ayumu Gomi, Ryusuke Egawa, Daisuke Takahashi, Reiji Suda, Hiroyuki Takizawa

Concurrency and Computation: Practice and Experience　32　(7)　e5577　2020年4月10日
出版者・発行元：Wiley
DOI： 10.1002/cpe.5577 　

ISSN：1532-0626

eISSN：1532-0634
Energy-efficient Design of an STT-RAM-based Hybrid Cache Architecture 査読有り

Masayuki Sato, Xue Hao, Kazuhiko Komatsu, Hiroaki Kobayashi

2020 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　1-3　2020年4月
出版者・発行元：IEEE
DOI： 10.1109/coolchips49199.2020.9097643 　
Performance Evaluation of SX-Aurora TSUBASA by Using Benchmark Programs 招待有り

Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2018 and 2019　69-77　2020年3月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-030-39181-2_7 　
Developing Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture 査読有り

Ilya V. Afanasyev, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Lobachevskii Journal of Mathematics　40　(11)　1753-1762　2019年11月
出版者・発行元：Pleiades Publishing Ltd
DOI： 10.1134/s1995080219110039 　

ISSN：1995-0802

eISSN：1818-9962
Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA 査読有り

Osamu Watanabe, Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)　25-32　2019年11月
出版者・発行元：IEEE
DOI： 10.1109/mchpc49590.2019.00011 　
A Hardware Prefetching Mechanism for Vector Gather Instructions 査読有り

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3)　59-66　2019年11月
出版者・発行元：IEEE
DOI： 10.1109/ia349570.2019.00015 　
A Skewed Multi-banked Cache for Many-core Vector Processors 査読有り

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(3)　86-101　2019年9月
出版者・発行元：FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi190305 　

ISSN：2313-8734
A Skewed Multi-Bank Cache for Vector Processors 査読有り

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

International Supercomputing Conference(ISC2019) Poster　2019年6月
An Application Parameter Search Method Based on the Binary Search Algorithm for Performance Tuning 査読有り

Takumi Kishitani, Kazuhiko Komatsu, Akihiro Musa, Masayuki Sato, Hiroaki Kobayashi

International Supercomputing Conference(ISC2019) Poster　2019年6月
An Appropriate Computing System and Its System Parameters Selection Based on Bottleneck Prediction of Applications 査読有り

Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato, Hiroaki Kobayashi

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　768-777　2019年5月
出版者・発行元：IEEE
DOI： 10.1109/ipdpsw.2019.00127 　
Perceptron-based Cache Bypassing for Way-Adaptable Caches 査読有り

Masayuki Sato, Yongcheng Chen, Haruya Kikuchi, Kazuhiko Komatsu, Hiroaki Kobayashi

2019 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　1-3　2019年4月
出版者・発行元：IEEE
DOI： 10.1109/coolchips.2019.8721331 　

ISSN：2473-4683
Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines 査読有り

Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, Kazuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(1)　36-47　2019年3月

DOI： 10.14529/jsfi190106 　
Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors 査読有り

Ilya V. Afanasyev, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

International Conference on Parallel Computing Technologies 2019 (PaCT2019)　125-139　2019年
出版者・発行元：Springer
DOI： 10.1007/978-3-030-25636-4_10 　
Performance Evaluation of Tsunami Inundation Simulation on SX-Aurora TSUBASA. 査読有り

Akihiro Musa, Takashi Abe, Takumi Kishitani, Takuya Inoue, Masayuki Sato 0001, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

International Conference on Computational Science 2019　363-376　2019年
出版者・発行元：Springer
DOI： 10.1007/978-3-030-22741-8_26 　
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA 査読有り

Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato, Hiroaki Kobayashi

SC18: International Conference for High Performance Computing, Networking, Storage and Analysis　685-696　2018年11月
出版者・発行元：IEEE
DOI： 10.1109/sc.2018.00057 　
Developing Efficient Implementations of Bellman–Ford and Forward-Backward Graph Algorithms for NEC SX-ACE 査読有り

Ilya V. Afanasyev, Alexander S. Antonov, Dmitry A. Nikitenko, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Osamu Watanabe, Akihiro Musa, Hiroaki Kobayashi

SUPERCOMPUTING FRONTIERS AND INNOVATIONS　5　(3)　65-69　2018年11月

DOI： 10.14529/jsfi180311 　
Search Space Reduction for Parameter Tuning of a Tsunami Simulation on the Intel Knights Landing Processor 査読有り

Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　117-124　2018年9月
出版者・発行元：IEEE
DOI： 10.1109/mcsoc2018.2018.00030 　
Expressing the Differences in Code Optimizations between Intel Knights Landing and NEC SX-ACE Processors 査読有り

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

13th World Congress on Computational Mechanics/2nd Pan American Congress on Computational Mechanics　2018年7月
Early Evaluation of a New Vector Processor SX-Aurora TSUBASA 査読有り

Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

Poster Proceedings of International Supercomputing Conference　2018年6月
Performance Evaluation of a Real-Time Tsunami Inundation Forecast System on Modern Supercomputers 査読有り

Akihiro Musa, Takumi Kishitani, Takuya Inoue, Hiroaki Hokari, Masayuki Sato, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

15th Annual Meeting Asia Oceania Geoscience Society　2018年6月
Migrating an Old Vector Code to Modern Vector Machines 査読有り

Hiroyuki Takizawa, Kenta Yamaguchi, Takashi Soga, Thorsten Reimann, Kazuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

30th International Conference on Parallel Computational Fluid Dynamics　2018年5月
Use of Code Structural Features for Machine Learning to Predict Effective Optimizations 査読有り

Yuki Kawarabatake, Mulya Agung, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　1049-1055　2018年5月
出版者・発行元：IEEE
DOI： 10.1109/ipdpsw.2018.00163 　
A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems 査読有り

Mulya Agung, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa

2017 IEEE 24th International Conference on High Performance Computing (HiPC)　152-161　2017年12月
出版者・発行元：IEEE
DOI： 10.1109/hipc.2017.00026 　

ISSN：1094-7256
An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning 査読有り

Hiroyuki Takizawa, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa

2017 Fifth International Symposium on Computing and Networking (CANDAR)　389-394　2017年11月
出版者・発行元：IEEE
DOI： 10.1109/candar.2017.96 　

ISSN：2379-1888
Designing an Open Database of System-Aware Code Optimizations 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Hiroyuki Takizawa

2017 Fifth International Symposium on Computing and Networking (CANDAR)　369-374　2017年11月
出版者・発行元：IEEE
DOI： 10.1109/candar.2017.102 　

ISSN：2379-1888
Vectorization-Aware Loop Optimization with User-Defined Code Transformations 査読有り

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

2017 IEEE International Conference on Cluster Computing (CLUSTER)　685-692　2017年9月
出版者・発行元：IEEE
DOI： 10.1109/cluster.2017.102 　

ISSN：1552-5244
Performance and Power Analysis of SX-ACE Using HP-X Benchmark Programs 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Yoko Isobe, Toshihiro Kato, Souya Fujimoto, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi

2017 IEEE International Conference on Cluster Computing (CLUSTER)　693-700　2017年9月
出版者・発行元：IEEE
DOI： 10.1109/cluster.2017.65 　

ISSN：1552-5244
Program Optimization of Numerical Turbine for Vector Supercomputer SX-ACE 査読有り

Yuta Sakaguchi, Kenryo Kataumi, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi, Satoru Yamamoto

Computers & Fluids　2017年
A Directive Generation Approach to High Code-Maintainability for Various HPC Systems

Komatsu Kazuhiko, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

International Journal of Networking and Computing　7　(2)　405-418　2017年
出版者・発行元：IJNC編集委員会
DOI： 10.15803/ijnc.7.2_405 　

ISSN：2185-2839

詳細を見る詳細を閉じる

The emergence of various high-performance computing (HPC) systems compels users to write a code considering the characteristic of each HPC system. To describe the system-dependent information without drastic code modifications, the directive sets such as the OpenMP directive set and the OpenACC directive set are proofed to be useful. However, the code becomes complex to achieve high performance on various HPC systems because different directive sets are required for various HPC systems. Thus, the code-maintainability and readability are degraded. This paper proposes a directive generation approach that generates various kinds of directive sets using user-defined rules. Instead of using several kinds of directive sets, users only have to write special placeholders that are utilized to specify a unique code pattern where several directives are inserted. Then, the special placeholders trigger the generation of appropriate directives for each system using a user-defined rule with a code transformation framework Xevolver. Because only special placeholders are inserted in the code, the proposed approach can keep the code-maintainability and readability. From the performance evaluations of directive-based implementations on various HPC systems, it is shown that the best implementation is different among the HPC systems. Then, through the demonstration of transformation into multiple kinds of implementations, the proposed approach can successfully generate directives from a smaller number of special placeholders. Therefore, it is clarified that the proposed directive generation approach is effective to keep the maintainability of a code to be executed on various HPC systems.
Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE. 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

The Journal of Supercomputing　73　(9)　3948-3976　2017年
出版者・発行元：SPRINGER
DOI： 10.1007/s11227-017-1993-y 　

ISSN：0920-8542

eISSN：1573-0484

詳細を見る詳細を閉じる

Achieving a high sustained simulation performance is the most important concern in the HPC community. To this end, many kinds of HPC system architectures have been proposed, and the diversity of the HPC systems grows rapidly. Under this circumstance, a vector-parallel supercomputer SX-ACE has been designed to achieve a high sustained performance of memory-intensive applications by providing a high memory bandwidth commensurate with its high computational capability. This paper examines the potential of the modern vector-parallel supercomputer through the performance evaluation of SX-ACE using practical engineering and scientific applications. To improve the sustained simulation performances of practical applications, SX-ACE adopts an advanced memory subsystem with several new architectural features. This paper discusses how these features, such as MSHR, a large on-chip memory, and novel vector processing mechanisms, are beneficial to achieve a high sustained performance for large-scale engineering and scientific simulations. Evaluation results clearly indicate that the high sustained memory performance per core enables the modern vector supercomputer to achieve outstanding performances that are unreachable by simply increasing the number of fine-grain scalar processor cores. This paper also discusses the performance of the HPCG benchmark to evaluate the potentials of supercomputers with balanced memory and computational performance against heterogeneous and cutting-edge scalar parallel systems.
Directive Translation for Various HPC Systems Using the Xevolver Framework 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2016　109-117　2016年12月

DOI： 10.1007/978-3-319-46735-1_9 　
A Directive Generation Approach Using User-Defined Rules 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 Fourth International Symposium on Computing and Networking (CANDAR)　515-521　2016年11月
出版者・発行元：IEEE
DOI： 10.1109/candar.2016.0095 　

ISSN：2379-1888
Performance Optimization of Numerical Turbine for Supercomputer SX-ACE 査読有り

Yuta Sakaguchi, Kenryo Kataumi, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi, Satoru Yamamoto

Proceedings of International Conference on Parallel Computational Fluid Dynamics　2016年5月
Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework

Komatsu Kazuhiko, Egawa Ryusuke, Hirasawa Shoichi, Takizawa Hiroyuki, Itakura Ken'ichi, Kobayashi Hiroaki

International Journal of Networking and Computing　6　(2)　167-180　2016年
出版者・発行元：IJNC編集委員会
DOI： 10.15803/ijnc.6.2_167 　

ISSN：2185-2839

詳細を見る詳細を閉じる

<p>As the diversity of high-performance computing (HPC) systems increases, even legacy HPC applications often need to use accelerators for higher performance. To migrate large-scale legacy HPC applications to modern HPC systems equipped with accelerators, a promising way is to use OpenACC because its directive-based approach can prevent drastic code modifications. This paper shows translation of a large-scale simulation code for an OpenACC platform by keeping the maintainability of the original code. Although OpenACC enables an application to use accelerators by adding a small number of directives, it requires modifying the original code to achieve a high performance in most cases, which tends to degrade the code maintainability and performance portability. To avoid such code modifications, this paper adopts a code translation framework, Xevolver. Instead of directly modifying a code, a pair of a custom code translation rule and a custom directive is defined, and is applied to the original code using the Xevolver framework. This paper first shows that simply inserting OpenACC directives does not lead to high performance and non-trivial code modifications are required in practice. In addition, the code modifications sometimes decrease the performance when migrating a code to other platforms, which leads to low performance portability. The direct code modifications can be avoided by using pairs of an externally-defined translation rule and a custom directive to keep the original code unchanged as much as possible. Finally, the performance evaluation shows that the performance portability can be improved by selectively applying translation with the Xevolver framework compared with directly modifying a code.</p>
Code Optimization Activities Toward a High Sustained Simulation Performance 招待有り

Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2015　159-168　2015年12月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-319-20340-9_13 　
Performance Evaluation of Compiler-Assisted OpenMP Codes on Various HPC Systems 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2015　147-157　2015年12月

DOI： 10.1007/978-3-319-20340-9_12 　
Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

2015 Third International Symposium on Computing and Networking (CANDAR)　515　(520)　2015年12月
出版者・発行元：IEEE
DOI： 10.1109/candar.2015.102 　

ISSN：2379-1888
An Approach to the Highest Efficiency of the HPCG Benchmark on the SX-ACE Supercomputer 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Yoko Isobe, Ryusei Ogata, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC15 Poster　2015年11月
Expressing system-awareness as code transformations for performance portability across diverse HPC systems 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Kazuhiko Komatsu, Ryusuke Egawa a, Hiroaki Kobayashi

Proceedings of International Workshop on Portability Among HPC Architectures for Scientific Applications 2015　1-67　2015年11月
An energy-efficient dynamic memory address mapping mechanism 査読有り

Masayuki Sato, Chengguang Han, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII)　1-3　2015年4月
出版者・発行元：IEEE
DOI： 10.1109/coolchips.2015.7158660 　
Designing an HPC Refactoring Catalog Toward the Exa-scale Computing Era

Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2014　91-98　2014年11月

DOI： 10.1007/978-3-319-10626-7_8 　
Early Evaluation of the SX-ACE Processor 査読有り

Ryusuke Egawa, Shintaro Momose, Kazuhiko Komatsu, Yoko Isobe, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi

Poster proceedings in the 27th International Conference for High Performance Computing, Networking, Storage and Analysis　2014年11月
Performance Evaluation of an OpenMP Parallelization by Using Automatic Parallelization Information 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2014　119-126　2014年11月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-319-10626-7_10 　
OpenMP Parallelization Method using Compiler Information of Automatic Optimization 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Legacy HPC Application Migration 2014　2014年9月23日
A compiler-assisted OpenMP migration method based on automatic parallelizing information 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　8488　450-459　2014年
出版者・発行元：Springer Verlag
DOI： 10.1007/978-3-319-07518-1_30 　

ISSN：1611-3349 0302-9743

詳細を見る詳細を閉じる

Performance of a serial code often relies on compilers' capabilities for automatic parallelization. In such a case, the performance is not portable to a new system because a new compiler on the new system may be unable to effectively parallelize the ode originally developed assuming a particular target compiler. As the compiler messages from the target compiler are still useful to identify key kernels that should be optimized even for the different system, this paper proposes a method to migrate a serial code to the OpenMP programming model by using such compiler messages. The aim of the proposed method is to improve the performance portability across different systems and compilers. Experimental results indicate that the migrated OpenMP code can achieve a comparable or even better performance than the original code with automatic parallelization. © 2014 Springer International Publishing.
Exploring system architectures for next-generation CFD simulations in the postpeta-scale era

KOMATSU Kazuhiko, EGAWA Ryusuke, TAKIZAWA Hiroyuki, SOGA Takashi, MUSA Akihiro, KOBAYASHI Hiroaki

Journal of Fluid Science and Technology　9　(5)　JFST0073-JFST0073　2014年
出版者・発行元：一般社団法人日本機械学会
DOI： 10.1299/jfst.2014jfst0073 　

ISSN：1880-5558

詳細を見る詳細を閉じる

CFD simulations with uniform grids have been paid attention as a next-generation CFD simulation on a large-scale supercomputing system. The Building-Cube Method (BCM) is one of the next-generation CFD methods. The basic idea is to balance loads of calculations among processing elements on a supercomputing system by dividing the whole calculations into many parallel tasks with the same amount of computation. Thus, it is suitable for highly parallel computation on supercomputing systems. This paper firstly implements BCM on five supercomputing systems as an example of a next-generation CFD simulation in the upcoming postpeta-scale era. Then, by theoretical analyses and performance evaluations, this paper clarifies the requirements of future supercomputing systems for a next-generation CFD simulation. The performance evaluations show that as the number of processing elements increases, the imbalance of data exchanges among nodes becomes more serious than that of calculations even in a next-generation CFD simulation. While the calculation time can ideally be reduced according to the number of processing elements, the data transfer time becomes dominant in the total execution time. Different from the massively-parallel system architecture, the number of nodes in a system should be as small as possible to prevent the data transfer. The performance analyses also show that the memory bandwidth limits the performance of BCM and use of an on-chip memory is effective to improve the performance. A memory subsystem that achieves a higher sustained memory bandwidth is required. Therefore, a supercomputing system that consists of a small number of high-performance nodes is essential to achieve high sustained performance of the next-generation CFD in the up coming postpeta-scale era by reducing the data transfers, which becomes eventually a bottleneck in large-scale simulation.
Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi

International Conference on Fluid Dynamics(ICFD2013)　2013年11月
Analysing the Performance Improvements of Optimizations on Modern HPC Systems 招待有り

Kazuhiko Komatsu, Toshihide Sasaki, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2013　13-25　2013年7月
出版者・発行元：Springer International Publishing
DOI： 10.1007/978-3-319-01439-5_2 　
A comparison of performance tunabilities between OpenCL and OpenACC 査読有り

Makoto Sugawara, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013　147-152　2013年
出版者・発行元：IEEE Computer Society
DOI： 10.1109/MCSoC.2013.31 　

詳細を見る詳細を閉じる

To design and develop any auto tuning mechanisms for OpenACC, it is important to clarify the differences between conventional GPU programming models and OpenACC in terms of available programming and tuning techniques, called performance tunabilities. This paper hence discusses the performance tunabilities of OpenACC and OpenCL. As OpenACC cannot synchronize threads running on GPUs, some important techniques are not available to OpenACC. Therefore, we also design an additional compiler directive for thread synchronization. Evaluation results show that both OpenCL and OpenACC need architecture-aware optimizations, and similar approaches to performance optimization are effective for both OpenCL and OpenACC. The additional directive can allow OpenACC to describe more tuning techniques in the same approach as OpenCL. As it is obvious that OpenACC is more productive than OpenCL especially for legacy application migration, OpenACC is a very promising programming model if it can achieve the same performance as the conventional GPU programming models such as CUDA and OpenCL. © 2013 IEEE.
Performance Evaluation of a Next-Generation CFD on Various Supercomputing Systems 招待有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2012　123-132　2012年8月
出版者・発行元：Springer Berlin Heidelberg
DOI： 10.1007/978-3-642-32454-3_11 　
OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　5　(1)　53-67　2012年1月27日
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

本論文では，OpenCLアプリケーションを対象とし，複数のアクセラレータを用いた並列処理に必要となるタスク並列性を見出すための，実行時情報を用いた依存関係解析手法を提案する．提案する解析手法では，メモリへの読み書き順序制約を表すデータ依存関係を解析し可視化を行う．また，API関数の呼び出し順序制約を表すイベント依存関係を明らかにし，並列処理においてボトルネックになる同期処理を可視化する．提案手法に基づいて54種類のベンチマークプログラムを解析することにより，タスク並列性に基づいて並列化できる可能性のあるプログラムを特定することができた．また，潜在的なバグの発見にも，提案手法による解析が有用であることが示された．This paper proposes a runtime dependency analysis method to find task parallelism in an OpenCL application for use of multiple accelerators. The proposed method can visualize data dependencies among tasks that represent the constraints on memory access sequences, and event dependencies that show the constraints on API call sequences. As a result, the proposed method can help programmers to find unnecessary synchronization points that often become performance bottlenecks in task-parallel processing. We analyze 54 benchmarks to demonstrate that the proposed method can find programs with task parallelism. Besides, we show that the proposed method is also useful to detect potential bugs.
Performance Evaluation of BCM on Various Supercomputing Systems 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

Proceedings of International Conference on Parallel Computational Fluid Dynamics　2012年
Improving the Scalability of Transparent Checkpointing for GPU Computing Systems 査読有り

Alfian Amrizal, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY　2012年
出版者・発行元：IEEE
DOI： 10.1109/TENCON.2012.6412343 　

ISSN：2159-3442

詳細を見る詳細を閉じる

As the number of nodes in a GPU computing system increases, checkpointing to a global file system becomes more time-consuming due to the I/O bottlenecks and network congestion. To solve this problem, in this paper, we propose a transparent and scalable checkpoint/restart mechanism for OpenCL applications, named Two-level CheCL. As its name implies, Two-level CheCL consists of two different checkpoint implementations, Local CheCL and Global CheCL. Local CheCL avoids checkpointing to the global file system by utilizing node's local storage. Our experimental results show that Local CheCL can accelerate the checkpointing process by up to four times faster than a conventional checkpointing mechanism. We also implement Global CheCL, which utilizes a global file system, to make sure that we always have a global checkpoint file even in the case of a catastrophic failure. We discuss the performance of our proposed mechanism through an analysis with a two-level checkpoint model.
An Automatic Task Assignment Method for Heterogeneous Computing Systems 査読有り

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Conference on Flow Dynamics　2011年11月
Performance of Building Cube Method on Various Platforms 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

Proceedings of International Conference on Flow Dynamics　2011年11月
マイグレーションによる複合型計算システム向けジョブスケジューリング

小山賢太郎, 佐藤功人, 小松一彦, 村田善智, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　4　(4)　203-213　2011年10月5日
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

消費電力が厳しく制約された条件下で演算性能を大幅に向上させることができるシステムアーキテクチャとして，汎用プロセッサに加えてアクセラレータを混載する複合型計算システムが注目されている．本論文では，大規模複合型計算システムにおけるターンアラウンドタイムの短縮を目的とし，マイグレーションとプリエンプティブバックフィルに基づくスケジューリング手法を提案する．また，ジョブ投入時にマイグレーションのコストを予測するため，その予測モデルも提案する．予測モデルの精度を評価した結果，ほぼすべてのアプリケーションにおいて，マイグレーションコストの最悪値をジョブの最大メモリ使用量から高精度で予測できることが明らかになった．また，提案スケジューリング手法はマイグレーションとプリエンプティブバックフィルの両方の長所を利用できるため，それらのいずれかが有効に機能する状況において，ターンアラウンドタイムを短縮可能であることが示された．A heterogeneous computing system of general-purpose processors and accelerators is a promising approach to improve the system performance under severe power consumption limitation. This paper proposes a job scheduling method that uses job migration and preemptive backfilling to reduce the turn around time of job execution in a large-scale heterogeneous computing system. A prediction model is also proposed to predict the migration cost of a job when the job is submitted. The evaluation results indicate that the prediction model can accurately estimate the worst-case migration costs of most applications from their maximum memory usage. It is also demonstrated that the proposed mechanism can reduce the turn around time of a job in the situations where either job migration or backfilling works well because it has the advantages of both of the two scheduling policies.
A Patch-Based Bit Mask Filtering Method for Micropolygon Rasterization 査読有り

Jiali Yao, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of High-Performance Graphics Poster　2011年8月
Performance of SOR methods on modern vector and scalar processors 査読有り

Takashi Soga, Akihiro Musa, Koki Okabe, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　215-221　2011年6月
出版者・発行元：PERGAMON-ELSEVIER SCIENCE LTD
DOI： 10.1016/j.compfluid.2010.12.024 　

ISSN：0045-7930

詳細を見る詳細を閉じる

The building-cube method (BCM) is a new generation algorithm for CFD simulations. The basic idea of BCM is to simplify the algorithm in all stages of flow computation to achieve large-scale simulations. Calculation of a pressure field using the Successive Over Relaxation (SOR) method consumes most of the total execution time required for BCM. In this paper, effective implementations on modern vector and scalar processors are investigated. NEC SX-9 and Intel Nehalem-EX are the latest vector and scalar processors. Those processors have much higher peak performances than their previous-generation processors. However, their memory bandwidth improvement cannot catch up with the performance improvement of processors. This is the so-called memory wall problem. In our paper, we discuss optimization techniques for implementation of the SOR method based on architectural characteristics of these modern processors, and evaluate their effects on the sustained performances of these processors for BCM. (C) 2010 Elsevier Ltd. All rights reserved.
Parallel processing of the Building-Cube Method on a GPU platform 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　122-128　2011年6月
出版者・発行元：PERGAMON-ELSEVIER SCIENCE LTD
DOI： 10.1016/j.compfluid.2010.12.019 　

ISSN：0045-7930

詳細を見る詳細を閉じる

The Building-Cube Method (BCM) based on equally-spaced Cartesian meshes has been proposed as a next generation CFD method. Due to the equally-spaced meshes, it is well suited for highly parallel computation. This paper proposes a parallel implementation scheme of BCM on a GPU cluster system, which needs efficient hierarchical parallel processing to exploit the potential of the cluster system. The proposed scheme employs the Red-Black SOR method for the pressure calculations, which is the most time-consuming part of BCM, to obtain massive data parallelism of BCM. By exploiting the coarse-grain and fine-grain parallelism of BCM, the proposed scheme hierarchically assigns equally-divided tasks into the GPU cluster system. Furthermore, to exploit the computational power of GPUs in the cluster system, the proposed scheme employs an efficient data management such as coalesced data transfer and reusing data on an on-chip memory. Experimental results show that the single GPU implementation can achieve about three times higher performance than the single GPU one. Moreover, the multiple GPU implementation can achieve an almost ideal scalability. Finally, the possibility of further acceleration of not only the pressure calculation but also the whole BCM is discussed. (C) 2011 Elsevier Ltd. All rights reserved.
マイグレーションによる複合型計算システム向けジョブスケジューリング

小山賢太郎, 佐藤功人, 小松一彦, 村田善智, 滝沢寛之, 小林広明

先進的計算基盤システムシンポジウム論文集　2011　(2011)　35-44　2011年5月18日
A History-Based Performance Prediction Model with Profile Data Classification for Automatic Task Allocation in Heterogeneous Computing Systems 査読有り

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications　135-142　2011年5月
出版者・発行元：IEEE
DOI： 10.1109/ispa.2011.36 　
CheCL: Transparent checkpointing and process migration of OpenCL applications 査読有り

Hiroyuki Takizawa, Kentaro Koyama, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011　864-876　2011年

DOI： 10.1109/IPDPS.2011.85 　

詳細を見る詳細を閉じる

In this paper, we propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional check pointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function, two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely check pointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent check pointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU. © 2011 IEEE.
A Runtime Task Reallocation Library for Heterogeneous Computational Environments 査読有り

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Conference on Flow Dynamics　2010年11月
Efficient Data Management for the Building Cube Method using Cartesian Meshes on the GPU Platform 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

Proceedings of International Supercomputing Conference Poster　2010年6月
Evaluating Performance and Portability of OpenCL Programs 査読有り

Kazuhiko Komatsu, Katsuto Sato, Yusuke Arai, Kentaro Koyama, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Workshop on Automatic Performance Tuning　2010年6月
Performance of SOR Methods on Vector Processor SX-9 査読有り

Takashi Soga, Akihiro Musa, Koki Okabe, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

Proceedings of International Conference on Parallel Computational Fluid Dynamics　2010年5月
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering 査読有り

Ken-ichi Suzuki, Yoshiyuki Kaeriyama, Kazuhiko Komatsu, Ryusuke Egawa, Nobuyuki Ohba, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E93D　(4)　891-902　2010年4月
出版者・発行元：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
DOI： 10.1587/transinf.E93.D.891 　

ISSN：1745-1361

詳細を見る詳細を閉じる

Ray tracing is one of the most popular techniques for generating photo-realistic images Extensive research and development work has made interactive static scene rendering realistic This paper deals with in dynamic scene rendering in which not only the eye point but also the objects in the scene change then 3D locations every frame In order to realize interactive dynamic scene rendering R-nos (Ray Tracing based on Ray Plane and Bounding Sphere) which utilizes the coherency in rays. objects, and grouped-rays. is Introduced RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid Since a pair of ray-planes shares an original ray. the intersection for the ray can be completed using the coherency m the ray-planes Because of die three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing Further acceleration techniques for I ay-plane-sphere and ray-triangle intersection are also presented A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere. resulting in 50% to 90% reduction of intersection tests Two ray-triangle intersection techniques are also introduced which are effective when a lame number of rays are packed into a ray-plane Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres We found out that RTRPS also provides competitive performance even if only primary rays are used.
A High-level Programming Framework for Efficient Hybrid-architecture Computing 招待有り

Kazuhiko Komatsu, Kentaro Koyama, Katsuto Sato, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of SIAM Conference on Parallel Processing for Scientific Computing Minisymposium　2010年2月
Automatic tuning of CUDA execution parameters for stencil processing 査読有り

Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Software Automatic Tuning: From Concepts to State-of-the-Art Results　209-228　2010年
出版者・発行元：Springer New York
DOI： 10.1007/978-1-4419-6935-4_13 　

詳細を見る詳細を閉じる

Recently, Compute Unified Device Architecture (CUDA) has enabled Graphics Processing Units (GPUs) to accelerate various applications. However, to exploit the GPU's computing power fully, a programmer has to carefully adjust some CUDA execution parameters even for simple stencil processing kernels. Hence, this paper develops an automatic parameter tuning mechanism based on profiling to predict the optimal execution parameters. This paper first discusses the scope of the parameter exploration space determined by GPU's architectural restrictions. To find the optimal execution parameters, performance models are created by profiling execution times of kernel using each promising parameter configuration. The execution parameters are determined by using those performance models. This paper evaluates the performance improvement due to the proposed mechanism using two benchmark programs. From the evaluation results, it is clarified that the proposed mechanism can appropriately select a suboptimal Cooperative Thread Array (CTA) configuration whose performance is comparable to the optimal one. © 2010 Springer Science+Business Media LLC.
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications 査読有り

Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2009 International Conference on Parallel and Distributed Computing, Applications and Technologies　408-413　2009年12月
出版者・発行元：IEEE
DOI： 10.1109/pdcat.2009.78 　
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination 査読有り

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of High Performance Computing Symposium　81-88　2008年
An Efficient Intersection Algorithm Design of Ray Tracing For Many-Core Graphics Processors 査読有り

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of Computer Graphics and Imaging　315-320　2008年
Hierarchical Parallel Processing of Ray Tracing on a Cell Cluster 査読有り

Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Workshop on Super Visualization　2008年
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination

Komatsu Kazuhiko, Kaeriyama Yoshiyuki, Suzuki Kenichi, Takizawa Hiroyuki, Kobayashi Hiroaki

IPSJ Online Transactions　1　(1)　1-11　2008年
出版者・発行元：一般社団法人情報処理学会
DOI： 10.2197/ipsjtrans.1.1 　

ISSN：1882-6660

詳細を見る詳細を閉じる

Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.
LI-004 Accelerating Moller Intersection Algorithm Using Ray Packets

Komatsu Kazuhiko, Kaeriyama Yoshiyuki, Suzuki Kenichi, Kobayashi Hiroaki, Nakamura Tadao

情報科学技術レターズ　6　(6)　265-268　2007年8月22日
出版者・発行元：FIT(電子情報通信学会・情報処理学会)運営委員会

詳細を見る詳細を閉じる

Many implementation methods of ray tracing have been proposed, however, execution time of rayprimitive intersection tests still dominate the total execution of rendering, and faster algorithms have been strongly required. This paper presents a new fast algorithm for the intersection tests between packets of rays and triangles. Experimental results show that the proposed algorithm achieves faster intersection tests by exploiting the feature of the packets of rays.
Programmable Graphics Hardware for Image Synthesis Using the Global Illumination Model 査読有り

Yoshiyuki Kaeriyama, Daichi Zaitsu, Kazuhiko Komatsu, Kenichi Suzuki, Nobuyuki Ohba, Tadao Nakamura

Proceedings of International Symposium on Low-Power and High-Speed Chips(COOL Chips IX)　183-185　2006年
Hardware for a Ray Tracing Technique Using Plane-Sphere Intersections 査読有り

Yoshiyuki Kaeriyama, Daichi Zaitsu, Kazuhiko Komatsu, Kenichi Suzuki, Nobuyuki Ohba, Tadao Nakamura

Proceedings of Eurographics Symposium on Parallel Graphics and Visualization　9-12　2006年
Ray tracing hardware system using plane-sphere intersections 査読有り

Yoshiyuki Kaeriyama, Daichi Zaitsu, Kazuhiko Komatsu, Kenichi Suzuki, Tadao Nakamura, Nobuyuki Ohba

2006 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS　315-320　2006年
出版者・発行元：IEEE
DOI： 10.1109/FPL.2006.311231 　

ISSN：1946-1488

詳細を見る詳細を閉じる

Ray tracing is a global illumination based rendering method widely used in computer graphics. Although it generates photo-realistic images, it requires a large number of computations. In ray tracing, the ray-object intersection test is one of the dominant factors for the processing speed. To accelerate the intersection test, we propose a new method based on a plane-sphere intersection algorithm, and show a hardware system using an FPGA. The computations used in the method are highly pipelined and parallelized by optimizing the balance between the computation speed and the memory data bandwidth. As a result, the prototype makes full use of 512 DSP cores built in Xilinx Vertex-4 SX FPGA, and the average utilization of the DSP cores is close to 90%. The simulation results show that the proposed system running at 160MHz performs the intersection test a few hundred times faster than a commodity PC with a 3.4GHz Pentium 4.
Packet-Primitive Intersection Method 査読有り

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Daichi Zaitsu, Kenichi Suzuki, Nobuyuki Ohba, Tadao Nakamura

Poster Compendium of IEEE Symposium on Interactive Ray Tracing　6-6　2006年

︎全件表示 ︎最初の5件までを表示

MISC 41

生産計画・製造方法の最適化ソリューション

小松一彦, 熊谷政仁, 深水一聖, 小野田誠

仙台市スタートアップスタジオ構築プロジェクトハンズオン支援プログラム　2024年3月15日
EVerify EV for Everyone Every time Everywhere

小松一彦, 熊谷政仁, 深水一聖, 小野田誠

Forbes JAPAN ACADEMIA ENTREPRENEUR SUMMIT Japan Mobility Show　2023年11月6日
リアルタイム津波浸水被害推計シミュレーションの性能評価招待有り

撫佐昭裕, 岸谷拓海, 阿部孝志, 佐藤佳彦, 田野邊睦, 鈴木崇之, 村嶋陽一, 佐藤雅之, 小松一彦, 伊達進, 越村俊一, 小林広明

SENAC : 東北大学大型計算機センター広報　53　(2)　10-18　2020年4月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
第29回高性能シミュレーションに関するワークショップ（WSSP29）開催報告

江川隆輔, 小松一彦

SENAC : 東北大学大型計算機センター広報　52　(2)　55-55　2019年4月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
サイバーサイエンスセンターオープンキャンパス報告

小松一彦

SENAC : 東北大学大型計算機センター広報　50　(4)　35-35　2017年10月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419

詳細を見る詳細を閉じる

ISSN 0286-7419
HPGMG-FVを用いたSX-ACEの性能評価

江川隆輔, 磯部洋子, 加藤季広, 小松一彦, 滝沢寛之, 小林広明, 撫佐昭裕

SENAC : 東北大学大型計算機センター広報　50　(3)　15-18　2017年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
Xevolverによる大気・海洋結合マルチスケールモデルMSSGの性能最適化コード管理の評価

板倉憲一, 小松一彦, 江川隆輔, 滝沢寛之

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集　(2017)　12-12　2017年5月29日
SC16報告

小松一彦

SENAC : 東北大学大型計算機センター広報　50　(1)　45-45　2017年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
サイバーサイエンスセンターオープンキャンパス報告

小松一彦

SENAC : 東北大学大型計算機センター広報　49　(4)　35-35　2016年10月
テクニカルアシスタント自己紹介

小松一彦

東北大学サイバーサイエンスセンター大規模科学計算システム広報 SENAC　49　(3)　23-23　2016年7月
SC15報告

小松一彦

SENAC : 東北大学大型計算機センター広報　49　(1)　41-41　2016年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419

詳細を見る詳細を閉じる

ISSN 0286-7419
スタッフ便り

小松一彦

東北大学サイバーサイエンスセンター大規模科学計算システム広報 SENAC　49　(1)　46-46　2016年1月
サイバーサイエンスセンターオープンキャンパス報告

小松一彦

SENAC : 東北大学大型計算機センター広報　48　(4)　56-56　2015年10月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
SX-ACEにおけるHPCG ベンチマークの性能評価招待有り

小松一彦, 江川隆輔, 磯部洋子, 緒方隆盛, 滝沢寛之, 小林広明

SENAC : 東北大学大型計算機センター広報　48　(3)　14-19　2015年7月
ベクトルコンピュータにおける高速化

小林広明, 江川隆輔, 小松一彦, 岡部公起, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修, 曽我隆, 山口健太

SENAC : 東北大学大型計算機センター広報　48　(3)　20-51　2015年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
東北大学サイバーサイエンスセンター高速化推進研究活動報告書（第6号）

小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 小松一彦, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修他

2015年4月
高速化推進研究活動報告

江川隆輔, 小松一彦, 小林広明

高速化推進研究活動報告第6号　2-7　2015年2月
ベクトルコンピュータにおける高速化

小林広明, 江川隆輔, 小松一彦, 岡部公起, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修, 曽我隆, 山口健太

高速化推進研究活動報告第6号　13-60　2015年2月
MPI化による高速化

小林広明, 江川隆輔, 小松一彦, 岡部公起, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修, 曽我隆, 山口健太

高速化推進研究活動報告第6号　61-78　2015年2月
SC14報告

小松一彦

SENAC : 東北大学大型計算機センター広報　48　(1)　66-66　2015年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
サイバーサイエンスセンターオープンキャンパス報告

小松一彦

SENAC : 東北大学大型計算機センター広報　47　(4)　26-26　2014年10月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
東北大学サイバーサイエンスセンターにおける分子動力学シミュレーションコードの高速化支援について招待有り

森谷友映, 佐々木大輔, 山下毅, 小野敏, 大泉健治, 小松一彦, 江川隆輔, 小林広明

SENAC : 東北大学大型計算機センター広報　47　(1)　51-56　2014年1月
SC13報告

小松一彦

SENAC : 東北大学大型計算機センター広報　47　(1)　65-65　2014年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
サイバーサイエンスセンターオープンキャンパス報告

小松一彦

SENAC : 東北大学大型計算機センター広報　46　(4)　27-27　2013年10月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
マルチプラットフォームにおける最適化手法の効果に関する一検討

小松一彦, 佐々木俊英, 江川隆輔, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(24)　1-7　2013年7月24日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

近年，HPC システムの多様化が進んでおり，特徴の異なる複数種類の HPC システムにおいて高い性能を引き出すことができる，性能可搬性の高い HPC コードの開発が強く求められている．本研究では，各種 HPC システム向けの最適化手法が HPC コードの性能に与える効果を詳細に解析し，その知見に基づいて性能可搬性の高い HPC コードを開発することを目的としている．本報告では，異なる手動最適化同士や自動最適化を組み合わせた場合の HPC コードの性能可搬性を解析する．HPC システムごとに，それぞれの手動最適化同士や自動最適化の組み合わせによる相乗効果を評価し，性能可搬性の低下を引き起こす可能性のある最適化について議論する．
テクニカルアシスタント自己紹介

小松一彦

東北大学サイバーサイエンスセンター大規模科学計算システム広報 SENAC　46　(3)　47-47　2013年7月
SC12報告

小松一彦

SENAC : 東北大学大型計算機センター広報　46　(1)　68-68　2013年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
スタッフ便り

小松一彦

東北大学サイバーサイエンスセンター大規模科学計算システム広報 SENAC　46　(1)　76-76　2013年1月
大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

安田一平, 小松一彦, 江川隆輔, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(7)　1-6　2012年12月6日

詳細を見る詳細を閉じる

近年，大規模並列システムのノード数が増大するのに伴い，その高い演算性能を引き出すためには各ノードの演算性能ばかりではなく，ノード間の通信性能を考慮する必要がある．そのため，大規模化したシステムにおいて，容易にアプリケーションの性能解析を示すことができる手法が求められている．アプリケーションの性能解析や，最適化指針を与える方法として，性能モデルを用いたボトルネック解析が挙げられる．しかしながら，ノード間の通信を考慮した性能モデルや性能モデルに基づく解析・最適化手法は確立されていない．本報告ではノード間の通信を考慮したシステムの性能モデルを提案し， SX-9， Nehalem EX クラスタ， FX1， FX10， SR16000 の 5 つの大規模並列システムを用いて提案するモデルの妥当性を調査する．
サイバーサイエンスセンターオープンキャンパス報告

小松一彦

SENAC : 東北大学大型計算機センター広報　45　(4)　52-52　2012年10月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419
ナノ粒子群形成アプリケーションのOpenACCによる実装と性能評価

菅原誠, 小松一彦, 平澤将一, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(10)　1-7　2012年9月26日

詳細を見る詳細を閉じる

本論文では，熱プラズマによるナノ粒子群創製プロセスにおける集団的粒子形成過程をシミュレーションするナノ粒子群形成アプリケーションを OpenACC と OpenCL を用いて実装し，両者を比較検討する． OpenACC は既存のプログラムにディレクティブを追記することにより容易に GPU を利用することが可能である．それに対して， OpenCL はより低い抽象度でのプログラミングが可能である．プログラム可能な抽象度がそれぞれ異なるため，実現可能な最適化技法が異なる．各最適化技法の性能評価により， OpenACC では CPU 実行時の最大約 1.9 倍の性能向上を， OpenCL では最大約 5.6 倍の性能向上を達成できることが分かった．また，現状の OpenACC において達成可能な性能限界と，高い性能を得るためには， OpenCL のような低い抽象度での最適化が必要であることを議論する．This paper presents an implementation of the plasma-assisted nanopowdergrowth simulation with OpenACC. OpenACC provides compiler directives to allow an existing application to use GPUs. On the other hand, OpenCL is a lower-level programming model. Since OpenACC and OpenCL offer programming models of different abstraction levels, they require different optimizations for a given application code. Therefore, in this paper, several versions of a practical application, the nanopowder growth simulation, are implemented using different optimizations. Then, the performance impact of each optimization is discussed through some experimental results. The evaluation results show that OpenACC and OpenCL can achieve 1.9x and 5.6x performance improvements, respectively. It is also demonstrated that the current version of OpenACC requires low-level performance tuning such as OpenCL programming in order to achieve a high performance comparable with OpenCL.
HPCアプリケーションの性能可搬性に関する一検討

小松一彦, 江川隆輔, 安田一平, 撫佐昭裕, 松岡浩司, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(27)　1-8　2012年9月26日

詳細を見る詳細を閉じる

近年 HPC システムの多様化が進む中で，様々な HPC システムにおいても性能を引き出すことが可能な性能可搬性の高い HPC アプリケーションの開発が重要になりつつある．しかしながら，一般的に， HPC アプリケーションは 1 つの HPC システムに高度に最適化されているため，他の HPC システムでも高い性能を引き出すのは難しい．本報告では， HPC アプリケーションの性能可搬性を調査するために，特定の HPC システム向けに適用された最適化手法を様々な HPC システムを用いて評価し，その効果と性能可搬性について議論する．Since many types of HPC systems have been become avaiable recently, developing HPC applications that can exploit the potential of various HPC systems is getting very important. However, the HPC applications are not always the best ones for various HPC systems since HPC applications have been optimized for individual HPC system. This report discusses the performance portability of the basic optimization for individual HPC system through performance evaluations using 5 different HPC systems.
大規模計算システムにおけるBCMの性能評価招待有り

小松一彦, 曽我隆, 江川隆輔, 滝沢寛之, 小林広明

SENAC : 東北大学大型計算機センター広報　45　(3)　17-25　2012年7月
新テクニカルアシスタント自己紹介

小松一彦

東北大学サイバーサイエンスセンター大規模科学計算システム広報 SENAC　45　(3)　53-53　2012年7月
プログラム自動生成技術に基づくGPUコンピューティングの性能評価

菅原誠, 佐藤功人, 小松一彦, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2011　(18)　1-7　2011年7月20日

詳細を見る詳細を閉じる

近年，描画処理用プロセッサ (Graphics Processing Unit: GPU) をアクセラレータとして利用して高速化を実現する複合型計算システムが普及しつつある．しかし，GPU を利用するためには，既存のプログラムを GPU 向けのプログラムに移植する必要があり，移植コストが問題となっている．本論文では，既存のプログラムにディレクティブを追記することにより GPU 向けのプログラムを自動生成する技術に着目し，その実用性と実効性能を評価する．また，ディレクティブを用いることで実現できる最適化を示す．そして，単純な行列積のプログラムを用いて性能を評価し，自動生成されたプログラムが実用的な性能を実現できることを示す．Recently, heterogeneous computing systems that achieve high-performance computing by using Graphics Processing Units (GPUs) as accelarators draw much attention in the area of computation sciences. However, a problem in use of GPUs is that it is necessary to port an existing program to a program for GPUs. To relieve the porting effort, this paper focuses on the technology to automatically generate a GPU program by inserting directives into an existing sequential code and evaluates the sustained performance of the auto-generated program. In addition, we show the achievable code optimizations by using directives. A simple matrix multiplication program is used for the evaluation to demonstrate that the automatically generated code can achieve a high sustained performance.
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering

SUZUKI Ken-ichi, KAERIYAMA Yoshiyuki, KOMATSU Kazuhiko, EGAWA Ryusuke, OHBA Nobuyuki, KOBAYASHI Hiroaki

IEICE transactions on information and systems　93　(4)　891-902　2010年4月1日
出版者・発行元：一般社団法人電子情報通信学会
DOI： 10.1587/transinf.E93.D.891 　

ISSN：0916-8532

詳細を見る詳細を閉じる

Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive <i>dynamic</i> scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.
CUDAアプリケーシヨン向けチェックポイント・リスタート機能の実装と評価

滝沢寛之, 佐藤功人, 小松一彦, 小林広明

情報処理学会研究報告. [ハイパフォーマンスコンピューティング]　122　(7)　G1-G7　2009年10月9日
出版者・発行元：情報処理学会
ISSN：0919-6072

詳細を見る詳細を閉じる

本論文では，CUDA アプリケーションのチェックポイント・リスタートを実現するためのツールとして CheCUDA を提案する．既存のチェックポイント・リスタートシステムを使って CUDA アプリケーションのチェックポイント・リスタートを実現するため，CheCUDA は CUDA の API 呼び出し時に GPU の状態変化をメモリに記録するためのアドオンパッケージとして設計されている．本論文では，CheCUDA を試作し，実際に CUDA アプリケーションのチェックポイント・リスタートを正常に実現できることを明らかにする．また，チェックポイントファイルを生成した PC とは環境の異なる他の PC 上でリスタートできることも確認し，CheCUDA がディペンダビリティの向上だけでなくタスクマイグレーションにも有用であることを示す．さらに，CheCUDA のチェックポイント処理のオーバヘッドを定量的に評価する．In this paper, a tool named CheCUDA is designed to enable checkpoint/restart of CUDA applications. To allow an existing checkpoint/restart implementation to checkpoint CUDA applications, CheCUDA is developed as an add-on package working at each CUDA API call to record the GPU status changes onto the main memory. This paper demonstrates that our prototype implementation of CheCUDA can correctly checkpoint and restart some CUDA applications. It is also shown that CheCUDA can restart a CUDA process from a checkpoint file generated on another PC. Accordingly, CheCUDA is useful not only to enhance the dependability of CUDA applications but also to attain task migration of CUDA applications. This paper also shows the timing overhead for checkpointing.
C-024 An Auction based Resource Allocation Considering Multifaceted Utilities in a Peer to Peer Environment

Satayapiwat Chainan, Komatsu Kazuhiko, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

情報科学技術フォーラム講演論文集　8　(1)　491-494　2009年8月20日
出版者・発行元：FIT(電子情報通信学会・情報処理学会)運営委員会

詳細を見る詳細を閉じる

Recently, many market-based approaches have been studied as one of the promising alternatives in a resource allocation problem. Especially, auction-based approaches are widely chosen due to its distributed nature and its relatively lower complexity. However, employing an auction to allocate jobs is only suitable for homogeneous environments of resources. This paper proposes an auction-based resource allocation mechanism which enables resource allocation in a heterogeneous environment while minimizing user's inputs. Our preliminary results show that our resource allocation mechanism improves the performance of important jobs during high-loaded.
C-023 プロセッサ自動選択機能を有するBLASの実現に向けた性能評価(ハードウェア・アーキテクチャ,一般論文)

小松一彦, 小山賢太郎, 佐藤功人, 滝沢寛之, 小林広明

情報科学技術フォーラム講演論文集　8　(1)　485-490　2009年8月20日
出版者・発行元：FIT(電子情報通信学会・情報処理学会)運営委員会
GPU向け線形代数ライブラリの性能評価

小山賢太郎, 佐藤功人, 小松一彦

計算工学講演会論文集　14　(1)　289-292　2009年5月
出版者・発行元：日本計算工学会
ISSN：1342-145X
A Fast Ray Prustum-Triangle Intersection Algorithm with Precomputation and Early Termination (コンピューティングシステム Vol.1 No.1)

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

情報処理学会論文誌コンピューティングシステム（ACS）　1　(1)　85-95　2008年6月26日
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

Although ray tracing is the best approach to high-quality image synthesis much time is required to generate images due to its huge amount of computation. In particular ray-primitive intersection tests still dominate the execution time required for ray tracing and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.

︎全件表示 ︎最初の5件までを表示

講演・口頭発表等 85

イジングモデルに基づく量子クラスタリングフレームワーク

熊谷政仁, 小松一彦, 小野田誠, 小林広明

第11回量子ソフトウェア研究発表会　2024年3月28日
巡回セールスマン問題による並列ベクトルアニーリングの評価

小野田誠, 小松一彦, 伴内光太郎, 百瀬真太郎, 佐藤雅之, 小林広明

第193回ハイパフォーマンスコンピューティング研究発表会　2024年3月19日
VVCの高速化のためのフレーム差分画像を用いたブロック分割に関する一検討

原田零生, 近藤嘉昭, 佐藤雅之, 岩崎裕江, 小松一彦, 小林広明

情報処理学会第86回全国大会　2024年3月17日
イジングマシンを用いた救助経路の最適化に関する一検討

長南和希, 小松一彦, 佐藤雅之, 小林広明

情報処理学会第86回全国大会　2024年3月15日
機械学習モデルを用いた断層パラメータ予測に関する一検討

JEONG SANGUK, 小松一彦, 佐藤雅之, 小林広明

情報処理学会第86回全国大会　2024年3月17日
渋滞解消問題を用いたイジングマシンの評価

百南匠人, 丹羽直也, 小松一彦, 岩崎裕江, 小林広明

2024年電子情報通信学会総合大会　2024年3月7日
イジングマシンを用いた電気自動車シェアのための定式化

熊谷政仁, 深水一聖, 小野田誠, 小松一彦, 小林広明

第247回システム・アーキテクチャ・第192回ハイパフォーマンスコンピューティング合同研究発表会　2023年12月5日
Performance Evaluation of Ising Machines using Constraint Combinatorial Optimization Problems 招待有り

Kazuhiko Komatsu, Makoto Onoda, Masahito Kumagai, Hiroaki Kobayashi

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023)　2023年8月24日
コンピュータ研究者は、量子コンピュータを研究する(勉強する)必要があるのだろうか？招待有り

天野英晴, 谷本輝夫, 上野洋典, 小松一彦, 佐野健太郎, 平木敬

並列／分散／協調処理に関するサマー・ワークショップ (SWoPP2023)　2023年8月4日
A feasibility study of quantum annealing for the next-generation computing infrastructure 招待有り

Kazuhiko Komatsu

35th Workshop on Sustained Simulation Performance (WSSP’35),　2023年4月14日
複数の自動並列化情報を用いたスレッド並列化に関する一検討

坂本龍介, 小松一彦, 佐藤雅之, 小林広明

情報処理学会第85回全国大会　2023年3月3日
QUBO問題における制約重み分割による解の高精度化に関する一検討

小野田誠, 小松一彦, 熊谷政仁, 佐藤雅之, 小林広明

情報処理学会第85回全国大会　2023年3月3日
機械学習を用いたグラフアルゴリズムの実行時間予測に関する一検討

深澤祐輔, 小松一彦, 佐藤雅之, 小林広明

情報処理学会第85回全国大会　2023年3月3日
VVC映像符号化並列処理のための映像分割に関する一検討

小野内花倫, 近藤嘉昭, 佐藤雅之, 岩崎裕江, 小松一彦, 小林広明

情報処理学会第85回全国大会　2023年3月2日
A feasibility study of quantum computing for the next-generation computing infrastructure: Early evaluation of annealing machines 招待有り

Kazuhiko Komatsu

34th Workshop on Sustained Simulation Performance　2022年10月24日
組み合わせクラスタリングによるアニーリングマシンの評価

小松一彦, 小野田誠, 熊谷政仁, 小林広明

第185回ハイパフォーマンスコンピューティング研究発表会（SWoPP2022)　2022年7月29日
クラスタ型アーキテクチャにおけるメモリ性能特性に関する一検討

佐藤雅之, 小松一彦, 小林広明

xSIG 2022　2022年7月27日
Combinatorial Clustering for a Material Informatics Application using Aurora Vector Annealing 招待有り

Kazuhiko Komatsu

33rd Workshop on Sustained Simulation Performance　2022年5月23日
デジタルツインタービンを用いた異常検知のための空間探索手法に関する一検討

深水一聖, 小松一彦, 熊谷政仁, 小林広明

情報処理学会第84回全国大会　2022年3月3日
制約を含むQUBO問題のための探索空間分割に関する一考察

小野田誠, 熊谷政仁, 小松一彦, 小林広明

令和3年度情報処理学会東北支部研究会　2022年2月21日
Optimization of the stencil computation considering the architecture of SX-Aurora TSUBASA 招待有り

Kazuhiko Komatsu

Workshop on Sustained Simulation Performance 2021　2021年3月18日
非圧縮性乱流DNSコードに現れる高速フーリエ変換のSX-Aurora TSUBASAにおける性能評価

武中裕次郎, 横川三津夫, 石原卓, 小松一彦, 小林広明, 今村俊幸, 清水智也

第178回ハイパフォーマンスコンピューティング研究発表会　2021年3月8日
複合型メインメモリのメタデータ管理のためのデータアクセス解析

塚田竣介, 佐藤雅之, 高屋敷光, 小松一彦, 小林広明

第241回システム・アーキテクチャ(ARC)研究発表会　2020年7月23日
姫野ベンチマークを用いたベクトル計算システムSX-Aurora TSUBASAの性能評価

小野寺明人, 小松一彦, 磯部洋子, 佐藤雅之, 小林広明

2020年電子情報通信学会総合大会　2020年3月20日
複合型メインメモリのためのメタデータ管理手法に関する一考察

塚田竣介, 佐藤雅之, 小松一彦, 小林広明

2020年電子情報通信学会総合大会　2020年3月20日
量子アニーリングを用いたクラスタリング手法の評価

熊谷政仁, 小松一彦, 佐藤雅之, 小林広明

2020年電子情報通信学会総合大会　2020年3月20日
建物・地盤地震動応答シミュレーションのベクトル計算機向け最適化

後藤啓, 横川三津夫, 坂敏秀, 小松一彦, 小林広明

第173回ハイパフォーマンスコンピューティング研究発表会　2020年3月
SX-Aurora TSUBASAの入出力性能の評価

中井彩乃, 横川三津夫, 小松一彦, 渡辺裕太, 磯部洋子, 小林広明

第172回ハイパフォーマンスコンピューティング研究発表会　2019年12月
A System and its System Parameter Selection based on Bottleneck Prediction 国際会議招待有り

小松一彦

Workshop on Sustained Simulation Performance 30　2019年10月
A Virtual Machine Allocation Algorithm Based on Reinforcement Learning for Cloud Computing Systems

陳振宇, 佐藤雅之, 小松一彦, 小林広明

電気関係学会東北支部連合大会　2019年8月
A Refreshing Policy for eDRAM Last-Level Caches

王一汀, 佐藤雅之, 小松一彦, 小林広明

電気関係学会東北支部連合大会　2019年8月
A Pure STT-RAM Hybrid Cache Architecture for Last-Level Caches

薛昊, 小林広明, 小松一彦, 佐藤雅之

電気関係学会東北支部連合大会　2019年8月
ベクトルコンピュータを用いた機械学習の高速化に関する研究

村上洸, 佐藤雅之, 小松一彦, 小林広明

電気関係学会東北支部連合大会　2019年8月
ベクトルコンピュータを用いた数値タービンの高速化に関する一検討

法木祐太, 佐藤雅之, 小松一彦, 小林広明

電気関係学会東北支部連合大会　2019年8月
ベクトル間接参照命令のためのプリフェッチに関する一検討

高屋敷光, 佐藤雅之, 小松一彦, 小林広明

第237回システム・アーキテクチャ(ARC)研究発表会　2019年7月17日
Performance Evaluation of a Brand-New Vector Supercomputer SX-Aurora TSUBASA 国際会議招待有り

小松一彦

Aurora Forum SC18　2018年11月12日
新ベクトルプロセッサSX-Aurora TSUBASAの基本性能評価招待有り

小松一彦

NEC C&Cユーザーフォーラム&iEXPO2018ワークショップ SP研究会　2018年11月8日
Performance evaluation and analysis of SX-Aurora TSUBASA 国際会議招待有り

小松一彦

Workshop on Sustained Simulation Performance 28　2018年10月9日
メニーコアプロセッサのためのパラメータチューニング時間削減手法

岸谷拓海, 小松一彦, 撫佐昭裕, 佐藤雅之, 小林広明

並列／分散／協調処理に関する『熊本』サマー・ワークショップ　2018年7月
マルチベクトルコアプロセッサの共有キャッシュ構成に関する一検討

高屋敷光, 佐藤雅之, 小松一彦, 江川隆輔, 小林広明

並列／分散／協調処理に関する『熊本』サマー・ワークショップ　2018年7月
Directive Translation Approach in Keeping a Code Clean 国際会議招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Advanced Topics and Auto Tuning in High Performance Scientific Computing　2017年3月10日
User-Defined Directive Translation Using the Xevovler Framework 国際会議招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

SIAM CSE 2017　2017年2月27日
A Directive Generation Using A Code Translation Framework 国際会議招待有り

24th Workshop on Sustained Simulation Performance (WSSP24)　2016年12月5日
ユーザ定義変換を用いた複数種類の指示行の活用招待有り

自動チューニング研究会マイクロワークショップ 2016　2016年10月30日
Migration of an HPC Code to an OpenACC Platform Using a Code Translation Framework 国際会議招待有り

Advanced Topics and Auto Tuning in High Performance Scientific Computing　2016年2月19日
Migration of a Large-scale Code to an OpenACC Platform Using a Code Transformation Framework 国際会議招待有り

22nd Workshop on Sustained Simulation Performance (WSSP22)　2015年12月17日
コード変換フレームワークを用いたレガシーコードの移植招待有り

自動チューニング研究会マイクロワークショップ 2015　2015年10月18日
高性能可搬性のためのHPCリファクタリング招待有り

小松一彦

第9回AT研究会オープンアカデミックセッション(ATOS9)　2015年5月12日
Performance Portable Code Production using Automatic Parallelizing Information 国際会議招待有り

Kazuhiko Komatsu

The 1st IT Joint Seminar with Moscow State University　2015年3月5日
High-productive OpenMP migration using Automatic Parallelizing Information 国際会議招待有り

20th Workshop on Sustained Simulation Performance (WSSP20)　2014年12月15日
Performance Comparison of Auto-parallelized Codes and OpenMP Codes on Various Supercomputing Systems 国際会議招待有り

19th Workshop on Sustained Simulation Performance (WSSP19)　2014年3月27日
OpenMP Parallelization using Compile Log of Automatic Parallelization

Azmir Ridzuan bin Azlan, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

第12回情報シナジー研究会　2014年2月24日
東北大学サイバーサイエンスセンターにおける分子動力学シミュレーションコードの高速化支援について

森谷友映, 佐々木大輔, 山下毅, 小野敏, 大泉健治, 小松一彦, 江川隆輔, 小林広明

2013年度大学ICT推進協議会年次大会　2013年12月18日
Performance evaluation of auto-parallelized codes on various supercomputing systems 国際会議招待有り

18th Workshop on Sustained Simulation Performance(WSSP18)　2013年10月28日
OpenACCにおける性能チューニングとその効果招待有り

滝沢寛之, 平澤将一, 小松一彦, 小林広明

日本応用数理学会2013年度年会　2013年9月10日
マルチプラットフォームにおける最適化手法の効果に関する一検討

小松一彦, 佐々木俊英, 江川隆輔, 滝沢寛之, 小林広明

並列/分散/協調処理に関するサマーワークショップ(SWoPP2013)　2013年7月
メモリバンド幅および通信バンド幅に着目した大規模並列システムの性能モデルに関する一検討

安田一平, 小松一彦, 江川隆輔, 滝沢寛之, 小林広明

第11回情報シナジー研究会　2013年2月
複合型計算システム向けのOpenACCの拡張

菅原誠, 平澤将一, 小松一彦, 滝沢寛之, 小林広明

第11回情報シナジー研究会　2013年2月
High-productive OpenMP migration using compile information 国際会議

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

International Symposium on Post Petascale System Software　2012年12月2日
Toward High Performance-Portabilities on Modern HPC Systems 国際会議招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

16th Workshop on Sustained Simulation Performance(WSSP16)　2012年12月
ナノ粒子群形成アプリケーションのOpenACCによる実装と性能評価

菅原誠, 平澤将一, 小松一彦, 滝沢寛之, 小林広明

第26回数値流体力学シンポジウムCFD2012　2012年12月
大規模計算システムにおけるBuilding Cube Methodの性能評価

小松一彦, 曽我隆, 江川隆輔, 滝沢寛之, 小林広明

第26回数値流体力学シンポジウムCFD2012　2012年12月
大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

安田一平, 小松一彦, 江川隆輔, 小林広明

第194回ARC・第137回HPC合同研究発表会(HOKKE-20)　2012年12月
Performance of Practical Applications on Modern Supercomputing Systems 国際会議招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

SC12 NEC booth presentation　2012年11月
マルチプラットフォーム環境における性能可搬性の調査招待有り

自動チューニング研究会マイクロワークショップ 2012　2012年10月27日
ナノ粒子群形成アプリケーションのOpenACCによる実装と性能評価

菅原誠, 平澤将一, 小松一彦, 滝沢寛之, 小林広明

第136回HPC研究会　2012年10月
HPCアプリケーションの性能可搬性に関する一検討

小松一彦, 江川隆輔, 安田一平, 撫佐昭裕, 松岡浩司, 小林広明

第136回HPC研究会　2012年10月
HPCシステムにおける最適化手法の性能可搬性に関する一検討

小松一彦, 江川隆輔, 安田一平, 撫佐昭裕, 松岡浩司, 小林広明

HPCシステムにおける最適化手法の性能可搬性に関する一検討　2012年9月
Introduction to GPU Computing 国際会議招待有り

SICE2012 Tutorial　2012年8月20日
Performance Evaluation of a CFD using Cartesian Meshes on Various Supercomputing Systems 国際会議招待有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

NUG XXIV　2012年6月
OpenCLアプリケーションの実行時自動チューニング

滝沢寛之, 佐藤功人, 小松一彦, 小林広明

計算工学講演会　2012年5月30日
Performance evaluation of a next-generation CFD on various supercomputing systems 国際会議招待有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

14th Teraflop Workshop　2011年12月5日
プログラム自動生成技術に基づくGPUコンピューティングの性能評価

菅原誠, 佐藤功人, 小松一彦, 滝沢寛之, 小林広明

2011年並列/分散/協調処理に関する『鹿児島』サマー・ワークショップ　2011年7月27日
複合型計算機におけるソフトウェア開発の課題と支援手法の検討

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

日本学術振興会インターネット技術第163委員会情報流通基盤分科会(ITRC/INI）「情報流通基盤分科会ワークショップ」・「先端的ネットワーク＆コンピューティングテクノロジワークショップ」合同ワークショップ　2011年3月
GPUクラスタによるBuilding Cube Methodの性能評価

小松一彦, 滝沢寛之, 小林広明

第4回次世代CFD研究会　2011年2月
複合型計算システムのためのジョブスケジューリングの検討

小山賢太郎, 佐藤功人, 小松一彦, 村田善智, 滝沢寛之, 小林広明

第9回情報シナジー研究会　2011年2月
ハイブリッド型計算環境のためのプログラミングフレームワークSPRAT(A High-level Programming Framework for Efficient Hybrid-architecture Computing)

小松一彦, 小山賢太郎, 佐藤功人, 滝沢寛之, 小林広明

日本学術振興会インターネット技術第163委員会情報流通基盤分科会(ITRC/INI）「情報流通基盤分科会ワークショップ」・「先端的ネットワーク＆コンピューティングテクノロジワークショップ」合同ワークショップ　2010年3月
A High-level Programming Framework for Efficient Hybrid-architecture Computing 国際会議招待有り

Kazuhiko Komatsu, Kentaro Koyama, Katsuto Sato, Hiroyuki Takizawa, Hiroaki Kobayashi

14th SIAM Conference on Parallel Processing for Scientific Computing　2010年2月
CUDAアプリケーション向けチェックポイント・リスタート機能の実装と評価(Implementation and Evaluation of a Checkpoint/Restart Tool for CUDA Applications)

滝沢寛之, 佐藤功人, 小松一彦, 小林広明

IPSJ SIG Technical Report　2009年10月
プロセッサ自動選択機能を有するBLAS の実現に向けた性能評価(Performance Evaluation towards BLAS with Automatic Processor Selection)

小松一彦, 小山賢太郎, 佐藤功人, 滝沢寛之, 小林広明

第8回情報科学技術フォーラム　2009年9月
An Auction-based Resource Allocation Considering Multifaceted Utilities in a Peer-to-Peer Environment

Chainan Satayapiwat, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

第8回情報科学技術フォーラム　2009年9月
GPU向け線形代数ライブラリの性能評価(Early evaluation of linear algebra libraries for GPU computing)

小山賢太郎, 佐藤功人, 小松一彦, 滝沢寛之, 小林広明

計算工学講演会論文集　2009年5月
A Large-scale Distributed Data-Mining System using Idle Time of Game Consoles

Yoshitomo Murata, Kazuhiko Komatsu, Yuki Ishimori, Hiroyuki Takizawa, Hiroaki Kobayashi

日本学術振興会インターネット技術第163委員会情報流通基盤分科会(ITRC/INI）「情報流通基盤分科会ワークショップ」・「先端的ネットワーク＆コンピューティングテクノロジワークショップ」合同ワークショップ　2009年2月
汎用グラフィックスアクセラレータ(GPU)を用いたボリュームレンダリング将来展望

小松一彦, 佐野健太郎, 鈴木健一, 中村維男

脳神経情報処理研究会　2004年9月
ボリュームデータセグメンテーションの高速処理方式

小松一彦, 佐野健太郎, 鈴木健一, 中村維男

情報処理学会東北支部研究会　2003年11月

︎全件表示 ︎最初の5件までを表示

産業財産権 18

PHYSICAL PROPERTY MAP IMAGE GENERATION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
SINGULAR MATERIAL DETECTION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
MAP IMAGE GENERATION APPARATUS, CONTROL METHOD, AND NON -TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
PHYSICAL PROPERTY MAP IMAGE GENERATION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
PHYSICAL PROPERTY MAP IMAGE GENERATION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
SINGULAR MATERIAL DETECTION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
SINGULAR MATERIAL DETECTION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
MAP IMAGE GENERATION APPARATUS, CONTROL METHOD, AND NON -TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
RECOMMENDATION DATA GENERATION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
RECOMMENDATION DATA GENERATION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
RECOMMENDATION DATA GENERATION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
MAP IMAGE GENERATION APPARATUS, CONTROL METHOD, AND NON -TRANSITORY COMPUTER READABLE MEDIUM

Naoki Kuwamori, Akihiro Musa, Yohei Takigawa, Yuta Kazama, Yoshihiko Satou, Hiroaki Kobayashi, Tota Kikugawa, Tomonaga Okabe, Kazuhiko Komatsu

産業財産権の種類: 特許権
推奨データ生成装置、推奨データ生成方法、及び非一時的なコンピュータ可読媒体

小林広明, 菊川豪太, 岡部朋永, 小松一彦, 川越吉晃, 鍬守直樹, 撫佐昭裕, 佐藤佳彦

産業財産権の種類: 特許権
マップ生成装置、マップ生成方法、及び非一時的なコンピュータ可読媒体

小林広明, 菊川豪太, 岡部朋永, 小松一彦, 川越吉晃, 鍬守直樹, 撫佐昭裕, 佐藤佳彦

産業財産権の種類: 特許権
推奨データ生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権
マップ画像生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権
特異材料検出装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権
物性マップ画像生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権

︎全件表示 ︎最初の5件までを表示

共同研究・競争的資金等の研究課題 13

津波災害デジタルツインの構築とスマート・レジリエンスの実現

越村俊一, 小林広明, 小松一彦, 佐藤雅之, 百瀬真太郎, 伴内光太郎

2023年4月～ 2028年3月
大規模量子コンピューティングによる新計算原理計算基盤の創生

小松一彦, 小林広明, 佐藤雅之, 百瀬真太郎

2023年4月～ 2028年3月
超原子座標構造の可視化による創薬の革新

米倉功治, 小林広明, 小松, 一彦, 佐藤雅之

2023年4月～ 2027年3月

詳細を見る詳細を閉じる

クライオ電子顕微鏡の先端技術開発を中心に据え、X線自由電子レーザー(XFEL)も用いることで、多様かつ微量な有機化合物、タンパク質などの試料から、高い時空間分解能とスピード解析を両立の上、これまでの計測限界を突破することを目指す。これにより、電荷分布、電子構造、化学結合の極性、官能基のプロトン化、電子の動き等“見えなかった”物性・現象、いわゆる“超原子座標構造”を解明する。まず、新規感染症や難病の治療に役立つ創薬への応用を進め、さらに、この技術の高い汎用性を活かし、新材料開発、エネルギー、環境、生命科学などより広い分野への応用も促進する。また、研究を通して次世代クライオ電顕を開発、世界シェアの拡大と解析拠点の構築にも繋げたい。以上のように、本可視化技術は共通基盤技術として、多くの研究開発現場における生産性向上に貢献することが期待される。
量子・AI ハイブリッド技術の活用を加速する共通ライブラリ基盤の研究開発

小松一彦, 小林広明, 撫佐昭裕, 百瀬真太郎, 佐藤雅之, 熊谷政仁, 小野田誠

2023年6月～ 2026年3月
アニーリングマシンを用いた機械学習アルゴリズムによる大規模データ分析

小松一彦

2022年3月～ 2025年3月
新計算原理調査研究

小松一彦, 横川三津夫, 佐藤雅之

2022年8月～ 2024年3月
量子アニーリングマシンと高性能計算機とをシームレスに連携するプログラミング基盤

小松一彦

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

研究種目：Grant-in-Aid for Scientific Research (C)

研究機関：Tohoku University

2020年4月～ 2024年3月

詳細を見る詳細を閉じる

本研究では、量子アニーリングマシンを高性能計算機の新しいアクセラレータとして容易に利用できるプログラミング基盤の要素技術を確立するために、従来のアクセラレータを利用するためのプログラム情報やライブラリ情報の活用を検討している。これを実現するために、量子アニーリングマシンへのオフロードの検討、量子アニーリングマシンのためのプログラム基盤開発、そして、量子アニーリングマシンの量子ソルバ活用の3つの研究項目を設定し、研究を遂行している。本年度の量子アニーリングマシンへのオフロード検討について取り組みとして、昨年度検討および性能分析を行ったクラスタリング手法について、量子アニーリングマシンにオフロードすることで高いクラスタリング精度を実現可能なイジングモデルに基づくクラスタリング開発を行った。K-meansなどに代表される疑似最適クラスタリング手法とは異なり、同一クラスタ内距離の総和の合計値を最小化することで、厳密なクラスタリングを実現できることを明らかにした。量子アニーリングマシンのためのプログラム基盤開発については、機械学習向けプログラムを量子アニーリングマシンやGPUなどを意識することなくシームレスにプログラムが可能なプログラム基盤の概念設計を行い、基盤の開発に着手した。量子アニーリングマシンの量子ソルバ活用については、量子アニーリングマシンを活用する機械学習向けプログラムにおいて、量子ソルバを活用する方法を検討し、基盤開発とともに概念設計を行った。特に、イジングモデルに基づくクラスタリングにおいて量子ソルバを活用する方法の検討を行った。
量子アニーリングが拓く高性能マテリアルインフォマティクス基盤の新展開

小林広明, 岡部朋永, 阿部圭晃, 菊川豪太, 佐藤雅之, 撫佐昭裕, 觀山正道, 大関真之, 小松一彦

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

研究種目：Grant-in-Aid for Scientific Research (A)

研究機関：Tohoku University

2019年4月1日～ 2023年3月31日

詳細を見る詳細を閉じる

架橋高分子材料における架橋ネットワーク構造形成のシミュレーションの高速化では，分子動力学シミュレーションと連携する形で粗視化粒子スケールのシミュレーションであるDPD（散逸粒子動力学）法を実装した．また，全原子スケールと同様の反応モデルを組み込み，粗視化レベルでの硬化計算を実現した．さらに，DPDシミュレーションによって得られた構造に対し，全原子スケールの構造テンプレートを貼り付けるリバースマッピング手法を開発した．その結果，全原子シミュレーションに対してコンシステントな構造・物性予測と硬化計算の大幅な速度向上を実現した．高次精度非構造ソルバーを用いた非定常圧縮性流体マクロ解析の大規模実行については，オープンソース（PyFR）のSX-Aurora TSUBASAにおける実装と高速化について研究開発を進めた．その結果，これまで部分的にのみベクトル化が行われていた流束計算のためカーネル（tflux/intcflux）対して，配列の初期化やループ構造の見直しにより完全なベクトル化を達成することが出来た．分子動力学シミュレーション（Peachgk_md）の高速化については，ベクトル化阻害要因であったリストベクトルにおいて，止まり木法を用いることによってベクトル化が可能であることを明らかにし，Peachgk_mdに止まり木法を実装することによってベクトル化率を98.8％まで向上させ，その結果，演算性能が6.5倍に向上した．アニーリングマシンと高性能計算システムの連携によるクラスタリング手法の開発では、クラスタリング条件を定義する制約項をQUBOとは別に定義するアニーリングベースのクラスタリングを量子アニーリングマシンやデジタルアニーリングマシンで行い、QUBO生成などの前処理および、データ集計の後処理などに従来の高性能計算システムを活用する手法を開発した。
量子コンピューティングを用いたシェアエコノミーのための資源配分の事業化検証

2022年6月～ 2023年3月
グラフアルゴリズムのためのアーキテクチャに依存しないフレームワークの研究開発

小松一彦, Voevodin Vadim, 小林広明, 撫佐昭裕, 佐藤雅之, Afanasyev Ilya

提供機関：Japan Society for the Promotion of Science

制度名：Research Cooperative Program

研究機関：Tohoku University

2021年4月～ 2023年3月
統合型材料開発システムによるマテリアル革命

小林広明, 小松一彦, 佐藤雅之

2020年5月～ 2023年3月
量子アニーリングアシスト型次世代スーパーコンピューティング基盤の開発

小林広明, 小松, 一彦, 滝沢, 寛之, 山口, 健太, 撫佐, 昭裕, 曽我隆, 渡部修, 横川, 三津夫, 江川隆輔, 下村, 陽一, 中田, 一人, 越村俊一, 佐藤, 雅之, 愛野, 茂幸, 磯部洋子, 政岡, 靖久, 百瀬, 真太郎, 藤本, 壮也, 山本悟, 古澤卓, 荒木拓也, 村嶋, 陽一, 大関, 真之, 觀山, 正道, 太田雄策, マスエリック, 星, 宗王, 萩原孝

2018年4月～ 2023年3月
エクサスケール時代のアプリケーション開発支援とベクトルアーキテクチャ設計の新展開

小林広明, 小松一彦, 滝沢寛之, 江川隆輔, 佐藤雅之, 撫佐昭裕, Vladmir Voevodin, Vadim Voevodin, Ilya Afanasyev

2018年4月～ 2020年3月

︎全件表示 ︎最初の5件までを表示

社会貢献活動 14

大規模科学計算システム講習会 MPIプログラミング入門

2016年9月29日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会並列プログラミングの概要とOpenMPプログラミング入門

2016年9月28日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会 MPIプログラミング入門

2016年6月1日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会並列プログラミングの概要とOpenMPプログラミング入門

2016年5月26日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会 MPIプログラミング入門

2015年10月28日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会並列プログラミングの概要とOpenMPプログラミング入門

2015年10月27日～

詳細を見る詳細を閉じる

講師
東北大学サイエンスカフェ第116回「スーパーコンピュータの驚異的な力」

2015年5月29日～

詳細を見る詳細を閉じる

司会・開催補助
大規模科学計算システム講習会 MPIプログラミング入門

2015年4月24日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会並列プログラミングの概要とOpenMPプログラミング入門

2015年4月22日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会新スーパーコンピュータにおける高速化技法の基礎

2015年3月24日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会 MPIプログラミング入門

2014年5月30日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会並列プログラミングの概要とOpenMPプログラミング入門

2014年5月29日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会 MPIプログラミング入門

2013年9月12日～

詳細を見る詳細を閉じる

講師
大規模科学計算システム講習会 UNIX入門

2013年5月28日～

詳細を見る詳細を閉じる

講師

︎全件表示 ︎最初の5件までを表示

メディア報道 7

ReGACY Innovation Groupと仙台市が共同で実施する「仙台スタートアップスタジオハンズオン支援プログラム」のデモデイを開催

PRTIMES　https://prtimes.jp/main/html/rd/p/000000076.000099287.html

2024年3月26日

メディア報道種別: インターネットメディア
ReGACY Innovation Groupと仙台市が共同で実施する「仙台スタートアップスタジオハンズオン支援プログラム」のデモデイを開催決定

PRTIMES　プレスリリース

2024年2月27日

メディア報道種別: インターネットメディア
量子技術のビジネス活用に向け、産学連携の実証実験を実施～カーシェアリング事業の実データを活用し、約26パーセントの効率改善を導出～

住友商事　プレスリリース

2023年11月21日

メディア報道種別: インターネットメディア
量子技術のビジネス活用に向け、産学連携の実証実験を実施～カーシェアリング事業の実データを活用し、約26パーセントの効率改善を導出～

東北大学　プレスリリース・研究成果

2023年11月21日

メディア報道種別: インターネットメディア
住友商事、量子計算でカーシェア運営を効率化東北大と

日本経済新聞　量子技術

2023年10月30日

メディア報道種別: 新聞・雑誌
ReGACY Innovation Groupと仙台市が共同で実施する「仙台スタートアップスタジオハンズオン支援プログラム」の2023年度採択となる６事業を決定

PRTIMES　プレスリリース

2023年10月2日

メディア報道種別: インターネットメディア
【採択事業紹介】量子アニーリングの技術を活用し、生産計画・製造方法の最適化〜東北大学サイバーサイエンスセンター准教授小松一彦氏〜

note　仙台スタートアップスタジオ

2023年8月8日

メディア報道種別: インターネットメディア

︎全件表示 ︎最初の5件までを表示