東北大学研究者紹介

研究者詳細

ホーム

日本語 English

コバヤシ　ヒロアキ

小林　広明

Hiroaki Kobayashi

所属

大学院情報科学研究科　情報基礎科学専攻　ソフトウェア科学講座（アーキテクチャ学分野）

職名

教授

学位

工学博士（東北大学）

researchmap

https://researchmap.jp/hiroaki.kobayashi

J-GLOBAL ID

200901077534166118

経歴 14

2018年7月～継続中

東北大学　総長特別補佐
2016年4月～継続中

東北大学　大学院情報科学研究科　教授
2017年4月～ 2019年3月

東北大学　情報科学研究科　情報基礎科学専攻長
2012年4月～ 2016年3月

東北大学　教育研究評議会評議員
2008年4月～ 2016年3月

東北大学　サイバーサイエンスセンター　教授
2008年4月～ 2016年3月

東北大学　サイバーサイエンスセンター　センター長
2008年4月～ 2016年3月

東北大学　情報シナジー機構　副機構長
2006年10月～ 2016年3月

国立情報学研究所　客員教授
2002年12月～ 2008年3月

東北大学　情報シナジーセンター　副センター長
2001年10月～ 2008年3月

東北大学　情報シナジーセンター　教授
1995年10月～ 2002年1月

スタンフォード大学　電気工学科・計算機システム研究所　客員准教授
1993年4月～ 2001年9月

東北大学　情報科学研究科　助教授
1991年4月～ 1993年3月

東北大学　工学部　講師
1988年4月～ 1991年3月

東北大学　工学部　助手

︎全件表示 ︎最初の5件までを表示

学歴 2

東北大学　工学研究科　情報工学専攻

～ 1988年3月25日
東北大学　工学部　通信工学

～ 1983年3月25日

委員歴 29

文部科学省　科学技術・学術審議会専門委員

2021年4月～継続中
大阪大学サイバーメディアセンター全国共同利用運営委員会　委員

2014年4月～継続中
日本学術会議　連携会員

2014年4月～継続中
Editorial Board of International Journal of Networked and Distributed Computing　Member

2011年3月～継続中
Workshop on Sustained Simulation Performance　Organizing Committee Chair

2006年10月～継続中
文部科学省　HPCI計画推進委員

2017年3月～ 2025年3月
HPCIコンソーシアム　副理事長・副議長

2020年4月～ 2024年5月
重点課題(8) 「近未来型ものづくりを先導する革新的設計・製造プロセスの開発」諮問委員会　委員長

2015年4月～ 2020年3月
ポスト京重点課題「地震・津波による複合災害の統合的予測システムの構築」運営委員会　委員

2015年4月～ 2020年3月
HPCIコンソーシアム　理事

2014年4月～ 2018年3月
JST CREST「ポストペタスケール高性能計算に資するシステムソフトウェア技術の創出」　領域アドバイザー

2012年4月～ 2018年3月
IEEE COOL Chips　組織委員長

2011年4月～ 2017年4月
HPCI連携サービス委員会　委員長

2013年4月～ 2016年3月
次世代スーパーコンピュータ戦略プログラム分野３「防災・減災に資する地球変動予測」運営委員会　委員

2013年4月～ 2016年3月
国立情報学研究所「学術情報ネットワーク運営・連携本部」　委員

2012年4月～ 2016年3月
HPCI連携サービス委員会　委員

2011年4月～ 2016年3月
北海道大学情報基盤センター外部評価委員会　委員

2014年4月～ 2015年3月
独立行政法人海洋研究開発機構部署評価委員会　部署評価アドバイザー

2012年4月～ 2015年3月
高度情報科学技術研究機構「学際共同研究WG」　委員

2013年4月～ 2014年3月
情報処理学会　代表会員

2012年4月～ 2014年3月
学際大規模情報基盤共同利用・共同研究拠点共同研究課題審査委員会　委員長

2012年4月～ 2014年3月
情報処理学会東北支部　情報処理学会東北支部長

2012年4月～ 2014年3月
国立大学共同利用共同研究拠点協議会　役員

2012年4月～ 2014年3月
学際大規模情報基盤共同利用・共同研究拠点共同研究課題審査委員会　委員長

2012年4月～ 2014年3月
HPCIコンソーシアム　監事

2012年4月～ 2014年3月
電気関係学会東北支部連合大会実行委員会　電気関係学会東北支部連合大会実行委員長

2013年4月～ 2013年8月
海洋研究開発機構「環境・社会システム統合研究フォーラム」　委員

2012年4月～ 2013年3月
科学研究費委員会　専門委員

2011年4月～ 2013年3月
東京工業大学学術国際情報センター外部評価委員会　委員

2014年4月～

︎全件表示 ︎最初の5件までを表示

所属学協会 4

米国計算機学会(ACM)(The Association for Computing Mackinery)
米国電気学会(IEEE)(The Institute of Electrical and Electronics Engineers,INC)
情報処理学会
電子情報通信学会

研究キーワード 2

コンピュータアーキテクチャ
スーパーコンピュータ

研究分野 4

情報通信 / 高性能計算 / スーパーコンピュータ
情報通信 / ソフトウェア /
情報通信 / 情報ネットワーク /
情報通信 / 計算機システム /

受賞 10

Best Paper Award

2020年11月　The Eighth International Symposium on Computing and Networking (CANDAR'20)　Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint
Best Poster Winner HPC-in-Asia

2019年　A Skewed Multi-Bank Cache for Vector Processors
Best Paper Award of PaCT, 2019

2019年　Analysis of relationship between SIMD-processing features used in NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors
平成30年度科学技術分野の文部科学大臣表彰科学技術賞（開発部門）

2018年4月　文部科学省
2018年全NUA事例論文技術貢献賞受賞

2018年　新ベクトルプロセッサSX-Aurora TSUBASAの基本性能評価
文部科学大臣賞「情報化促進貢献個人等表彰」

2017年10月　文部科学省
ジャパン・レジリエンス・アワード2016優秀賞

2016年
Best Paper Award

2015年　Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework
BEST PAPER AWARD at the 2nd international symposium on Parallel and Distributed Processing and Applications (ISPA’04)

2004年12月13日　the 2nd international symposium on Parallel and Distributed Processing and Applications (ISPA’04)　BEST PAPER AWARD
IPデザインアワード研究助成賞

2002年5月29日　日経BP社　3DCGiRAMアーキテクチャに基づく実時間レイトレーシングエンジンの研究開発

︎全件表示 ︎最初の5件までを表示

論文 436

An analysis of memory access patterns in RISC-V vector workloads on heterogeneous memory architectures

Ryo Yokoyama, Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops　255-262　2026年1月25日
出版者・発行元： ACM
DOI： 10.1145/3784828.3785405 　
Disaster Rescue Resource Allocation Based on the Ising Model

Kosei Nakamoto, Masahito Kumagai, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　274-281　2025年12月15日
出版者・発行元： IEEE
DOI： 10.1109/mcsoc67473.2025.00052 　
Classification of Three-dimensional Electron Diffraction Data with a Large Language Model

Kazuyuki Yasuda, Masahito Kumagai, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis　96-103　2025年11月15日
出版者・発行元： ACM
DOI： 10.1145/3731599.3767351 　
Single photon coherent Ising machines for constrained optimization problems

Masahito Kumagai, Yoshitaka Inui, Edwin Ng, Satoshi Kako, Kazuhiko Komatsu, Hiroaki Kobayashi, Yoshihisa Yamamoto

Quantum Science and Technology　10　(3)　035042-035042　2025年6月20日
出版者・発行元： IOP Publishing
DOI： 10.1088/2058-9565/addde5 　

eISSN：2058-9565

詳細を見る詳細を閉じる

Abstract A Coherent Ising machine (CIM) is an oscillator-network-based analog computing system to circumvent the bottleneck in von Neumann digital computing architectures. The CIM consists of a network of degenerate optical parametric oscillators (DOPOs) and is designed to find a ground state or perform Boltzmann sampling for all degenerate ground states and low-energy excited states in combinatorial optimization problems. A nonlinear measurement feedback scheme, called chaotic amplitude control (CAC), has recently been proposed to correct pulse amplitude inhomogeneity and thereby faithfully map the Ising Hamiltonian to the loss landscape of the DOPO network. However, the quantum limit of the CIM-CAC performance is not fully explored yet. This work clarifies how the quantum noise squeezing and the measurement-induced state shift in repeated indirect quantum measurements improve the system performance. From the numerical simulation on the Ising model with the Zeeman terms, obtained from combinatorial clustering problems formulated as constrained optimization problems, it is revealed that the CIM-CAC operating in a single photon per pulse regime dramatically outperforms the standard CIM-CAC with a large photon number per pulse. This is because the standard CIM-CAC is often trapped in a periodic trajectory and cannot escape from there. On the other hand, the significant improvement is brought by the noise-induced amplitude jump in the single photon CIM-CAC.
Performance Evaluation of Vector Annealing on Multiple Nodes using the Traveling Salesperson Problem

Makoto Onoda, Kazuhiko Komatsu, Kotaro Bannai, Shintaro Momose, Masayuki Sato, Hiroaki Kobayashi

ISC High Performance 2025 Research Paper Proceedings (40th International Conference)　2025年6月
A Compressed QUBO Format for Traveling Salesperson Problems

Chu-Yuan Huang, Kazuhiko Komatsu, Makoto Onoda, Masahiro Kumagai, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the IEEE Workshop on Parallel / Distributed Combinatorics and Optimization (PDCO 2025)　2025年6月
A Fast Block Partitioning Decision Method Using Luminance Textures for VVC Encoders

Rikita Uchiyama, Karin Onouchi, Naoya Niwa, Masayuki Sato, Hiroaki Kobayashi, Hiroe Iwasaki

2025 IEEE International Conference on Consumer Electronics (ICCE)　1-4　2025年1月11日
出版者・発行元： IEEE
DOI： 10.1109/icce63647.2025.10929966 　
A Graph-based Molecular Structure Identification Method via Feature Extraction for Three-dimensional Electron Diffraction Data

Yusuke Fukasawa, Kazuhiko Komatsu, Masayuki Sato, Saori Maki-Yonekura, Hirofumi Kurokawa, Koji Yonekura, Hiroaki Kobayashi

2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW)　325-329　2024年11月26日
出版者・発行元： IEEE
DOI： 10.1109/candarw64572.2024.00060 　
Adaptive Parallelization based on Frame-level and Tile-level Parallelisms for VVC Encoding

Karin Onouchi, Masayuki Sato, Hiroe Iwasaki, Kazuhiko Komatsu, Hiroaki Kobayashi

2024 Twelfth International Symposium on Computing and Networking (CANDAR)　87-95　2024年11月26日
出版者・発行元： IEEE
DOI： 10.1109/candar64496.2024.00018 　
An Ising-based Decision Method for Intra Prediction Mode in Video Coding

Takuto Momominami, Naoya Niwa, Masahito Kumagai, Kazuhiko Komatsu, Hiroaki Kobayashi, Hiroe Iwasaki

SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis　1748-1754　2024年11月17日
出版者・発行元： IEEE
DOI： 10.1109/scw63240.2024.00218 　
File I/O Cache Performance of Supercomputer Fugaku Using an Out-of-Core Direct Numerical Simulation Code of Turbulence

Yuto Hatanaka, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Akihiro Musa, Takashi Ishihara, Atsuya Uno, Kazuhiko Komatsu, Hiroaki Kobayashi, Mitsuo Yokokawa

Computational Science – ICCS 2024　173-187　2024年6月30日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-63778-0_13 　

ISSN：0302-9743

eISSN：1611-3349
An Asymptotic Parallel Linear Solver and Its Application to Direct Numerical Simulation for Compressible Turbulence

Mitsuo Yokokawa, Taiki Matsumoto, Ryo Takegami, Yukiya Sugiura, Naoki Watanabe, Yoshiki Sakurai, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Computational Science – ICCS 2024　383-397　2024年6月27日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-63751-3_26 　

ISSN：0302-9743

eISSN：1611-3349
Prediction of Steam Turbine Blade Erosion Using CFD Simulation Data and Hierarchical Machine Learning

Issei Fukamizu, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Journal of Engineering for Gas Turbines and Power　1-10　2024年6月25日
出版者・発行元： ASME International
DOI： 10.1115/1.4065815 　

ISSN：0742-4795

eISSN：1528-8919

詳細を見る詳細を閉じる

Abstract The information of the degree of blade erosion is vital for the efficient operation of steam turbines. However, it is nearly impossible to directly measure the degree of blade erosion during operation. Moreover, collecting sufficient data of eroded cases for predictive analysis is challenging. Therefore, this paper proposes a blade erosion prediction method using numerical simulation and machine learning. Pressure data of several blade erosion cases are collected from the numerical turbine simulation. The machine learning approach involves training on collected simulation data to predict the degree of erosion for the firststage stator (1S) and the first-stage rotor blade (1R) from internal pressure data. The proposed erosion prediction model employs a two-step hierarchical approach. First, the proposed model predicts the 1S erosion degree using the k-NN (k-Nearest Neighbor) regression. Second, the proposed model estimates the 1R erosion degree with Linear Regression models. These models are tailored for each of the 1S erosion degrees, utilizing pressure data processed through Fast Fourier Transform (FFT). The evaluation shows that the proposed method achieves the prediction of the 1S erosion with a Mean Absolute Error (MAE) of 0.000693 mm, and the 1R erosion with an MAE of 0.458 mm. The evaluation results indicate that the proposed method can accurately capture the degree of turbine blade erosion from internal pressure data. As a result, the proposed method suggests that the erosion prediction method can be effectively used to determine the optimal timing for Maintenance and Repair Operations (MRO).
Quantum annealing-based algorithm for lattice gas automata

Yuichi Kuya, Kazuhiko Komatsu, Kouki Yonaga, Hiroaki Kobayashi

Computers and Fluids　274　2024年4月30日

DOI： 10.1016/j.compfluid.2024.106238 　

ISSN：0045-7930
A Constraint Partition Method for Combinatorial Optimization Problems 査読有り

Onoda Makoto, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

In Proceedings of 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　600　(607)　2023年12月

DOI： 10.1109/MCSoC60832.2023.00093 　
Appropriate Graph-Algorithm Selection for Edge Devices Using Machine Learning 査読有り

Yusuke Fukasawa, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

In Proceedings of 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　544　(551)　2023年12月

DOI： 10.1109/MCSoC60832.2023.00086 　
Multi-scale Loss based Electron Microscopic Image Pair Matching Method 査読有り

Chunting Duan, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

In Proceedings of 22nd IEEE International Conference on Machine Learning and Applications　1957-1964　2023年12月

DOI： 10.1109/ICMLA58977.2023.00295 　
Investigating the Characteristics of Ising Machines 査読有り

Kazuhiko Komatsu, Makoto Onoda, Masahito Kumagai, Hiroaki Kobayashi

Proceedings of IEEE International Conference on Quantum Computing and Engineering　2023年9月

DOI： 10.1109/QCE57702.2023.00108 　
Performance Evaluation of Tsunami Evacuation Route Planning on Multiple Annealing Machines

Yihui Liu, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the 20th ACM International Conference on Computing Frontiers　2023年5月9日
出版者・発行元： ACM
DOI： 10.1145/3587135.3592193 　
I/O Performance Evaluation of a Memory-Saving DNS Code on SX-Aurora TSUBASA

Mitsuo Yokokawa, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Taiki Matsumoto, Akihiro Musa, Kazuhiko Komatsu, Takashi Ishihara, Hiroaki Kobayashi

2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2023年5月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw59300.2023.00117 　
Ising-Based Kernel Clustering

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Algorithms　16　(4)　214-214　2023年4月19日
出版者・発行元： MDPI AG
DOI： 10.3390/a16040214 　

eISSN：1999-4893

詳細を見る詳細を閉じる

Combinatorial clustering based on the Ising model is drawing attention as a high-quality clustering method. However, conventional Ising-based clustering methods using the Euclidean distance cannot handle irregular data. To overcome this problem, this paper proposes an Ising-based kernel clustering method. The kernel clustering method is designed based on two critical ideas. One is to perform clustering of irregular data by mapping the data onto a high-dimensional feature space by using a kernel trick. The other is the utilization of matrix–matrix calculations in the numerical libraries to accelerate preprocess for annealing. While the conventional Ising-based clustering is not designed to accept the transformed data by the kernel trick, this paper extends the availability of Ising-based clustering to process a distance matrix defined in high-dimensional data space. The proposed method can handle the Gram matrix determined by the kernel method as a high-dimensional distance matrix to handle irregular data. By comparing the proposed Ising-based kernel clustering method with the conventional Euclidean distance-based combinatorial clustering, it is clarified that the quality of the clustering results of the proposed method for irregular data is significantly better than that of the conventional method. Furthermore, the preprocess for annealing by the proposed method using numerical libraries is by a factor of up to 12.4 million × from the conventional naive python’s implementation. Comparisons between Ising-based kernel clustering and kernel K-means reveal that the proposed method has the potential to obtain higher-quality clustering results than the kernel K-means as a representative of the state-of-the-art kernel clustering methods.
Analysis of Precision Vectors for Ising-Based Linear Regression

Kaho Aoyama, Kazuhiko Komatsu, Masahito Kumagai, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　251-261　2023年4月8日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_20 　

ISSN：0302-9743

eISSN：1611-3349
A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders

Masayuki Sato, Yuya Omori, Ryusuke Egawa, Ken Nakamura, Daisuke Kobayashi, Hiroe Iwasaki, Kazuhiko Komatsu, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　288-300　2023年4月8日
出版者・発行元： Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_23 　

ISSN：0302-9743

eISSN：1611-3349
Performance evaluation of parallel direct numerical simulation code on supercomputer SX-Aurora TSUBASA

Mitsuo Yokokawa, Yujiro Takenaka, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Computers & Fluids　261　105913-105913　2023年4月
出版者・発行元： Elsevier BV
DOI： 10.1016/j.compfluid.2023.105913 　

ISSN：0045-7930
Rapid and quantitative uncertainty estimation of coseismic slip distribution for large interplate earthquakes using real-time GNSS data and its application to tsunami inundation prediction

Keitaro Ohno, Yusaku Ohta, Ryota Hino, Shunichi Koshimura, Akihiro Musa, Takashi Abe, Hiroaki Kobayashi

Earth, Planets and Space　74　(1)　2022年12月
出版者・発行元： Springer Science and Business Media LLC
DOI： 10.1186/s40623-022-01586-6 　

eISSN：1880-5981

詳細を見る詳細を閉じる

<title>Abstract</title>This study proposes a new method for the uncertainty estimation of coseismic slip distribution on the plate interface deduced from real-time global navigation satellite system (GNSS) data and explores its application for tsunami inundation prediction. Jointly developed by the Geospatial Information Authority of Japan and Tohoku University, REGARD (REal-time GEONET Analysis system for Rapid Deformation monitoring) estimates coseismic fault models (a single rectangular fault model and slip distribution model) in real time to support tsunami prediction. The estimated results are adopted as part of the Disaster Information System, which is used by the Cabinet Office of the Government of Japan to assess tsunami inundation and damage. However, the REGARD system currently struggles to estimate the quantitative uncertainty of the estimated result, although the obtained result should contain both observation and modeling errors caused by the model settings. Understanding such quantitative uncertainties based on the input data is essential for utilizing this resource for disaster response. We developed an algorithm that estimates the coseismic slip distribution and its uncertainties using Markov chain Monte Carlo methods. We focused on the Nankai Trough of southwest Japan, where megathrust earthquakes have repeatedly occurred, and used simulation data to assume a Hoei-type earthquake. We divided the 2951 rectangular subfaults on the plate interface and designed a multistage sampling flow with stepwise perturbation groups. As a result, we successfully estimated the slip distribution and its uncertainty at the 95% confidence interval of the posterior probability density function. Furthermore, we developed a new visualization procedure that shows the risk of tsunami inundation and the probability on a map. Under the algorithm, we regarded the Markov chain Monte Carlo samples as individual fault models and clustered them using the k-means approach to obtain different tsunami source scenarios. We then calculated the parallel tsunami inundations and integrated the results on the map. This map, which expresses the uncertainties of tsunami inundation caused by uncertainties in the coseismic fault estimation, offers quantitative and real time insights into possible worst-case scenarios. <bold>Graphical Abstract</bold>
Page-Address Coalescing of Vector Gather Instructions for Efficient Address Translation 査読有り

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of 2022 IEEE/ACM 12th Workshop on Irregular Applications: Architectures and Algorithms (IA3)　1-8　2022年11月

DOI： 10.1109/IA356718.2022.00007 　
A hierarchical wavefront method for LU-SGS

Kazuhiko Komatsu, Yuta Hougi, Masayuki Sato, Hiroaki Kobayashi

Computers & Fluids　245　105572-105572　2022年6月
出版者・発行元： Elsevier BV
DOI： 10.1016/j.compfluid.2022.105572 　

ISSN：0045-7930
High-Performance GraphBLAS Backend Prototype for NEC SX-Aurora TSUBASA

Ilya Afanasyev, Kazuhiko Komatsu, Dmitry Lichmanov, Vadim Voevodin, Hiroaki Kobayashi

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2022年5月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw55747.2022.00050 　
An Efficient Reference Image Sharing Method for the Image-division Parallel Video Encoding Architecture

Nakamura Ken, Omori Yuya, Kobayashi Daisuke, Nitta Koyo, Sano Kimikazu, Sato Masayuki, Iwasaki Hiroe, Kobayashi Hiroaki

IEICE Transactions on Electronics　advpub　2022年
出版者・発行元： The Institute of Electronics, Information and Communication Engineers
DOI： 10.1587/transele.2022lhp0002 　

ISSN：0916-8524

eISSN：1745-1353

詳細を見る詳細を閉じる

This paper proposes an efficient reference image sharing method for the image-division parallel video encoding architecture. This method efficiently reduces the amount of data transfer by using pre-transfer with area prediction and on-demand transfer with a transfer management table. Experimental results show that the data transfer can be reduced to 19.8-35.3% of the conventional method on average without major degradation of coding performance. This makes it possible to reduce the required bandwidth of the inter-chip transfer interface by saving the amount of data transfer.
Optimizations of a Linear Matrix Solver in a Composite Simulation for a Vector Computer

Zhilin He, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)　2021年12月10日
出版者・発行元： IEEE
DOI： 10.1109/paap54281.2021.9720445 　
A dynamic parameter tuning method for SpMM parallel execution 査読有り

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Concurrency and Computation: Practice and Experience　2021年12月9日
出版者・発行元： Wiley
DOI： 10.1002/cpe.6755 　

ISSN：1532-0626

eISSN：1532-0634
Ising-Based Combinatorial Clustering Using the Kernel Method

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　2021年12月
出版者・発行元： IEEE
DOI： 10.1109/mcsoc51149.2021.00037 　
Real-time automatic uncertainty estimation of coseismic single rectangular fault model using GNSS data 査読有り

Keitaro Ohno, Yusaku Ohta, Satoshi Kawamoto, Satoshi Abe, Ryota Hino, Shunichi Koshimura, Akihiro Musa, Hiroaki Kobayashi

Earth, Planets and Space　73　(1)　2021年12月
出版者・発行元： Springer Science and Business Media LLC
DOI： 10.1186/s40623-021-01425-0 　

ISSN：1343-8832

eISSN：1880-5981
An Externally-Constrained Ising Clustering Method for Material Informatics

Kazuhiko Komatsu, Masahito Kumagai, Ji Qi, Masayuki Sato, Hiroaki Kobayashi

2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)　2021年11月
出版者・発行元： IEEE
DOI： 10.1109/candarw53999.2021.00040 　
Register Flush-free Runahead Execution for Modern Vector Processors

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)　2021年10月
出版者・発行元： IEEE
DOI： 10.1109/sbac-pad53543.2021.00023 　
Detection of Machinery Failure Signs From Big Time-Series Data Obtained by Flow Simulation of Intermediate-Pressure Steam Turbines 査読有り

Kazuhiko Komatsu, Hironori Miyazawa, Cheng Yiran, Masayuki Sato, Takashi Furusawa, Satoru Yamamoto, Hiroaki Kobayashi

Journal of Engineering for Gas Turbines and Power　144　(1)　2021年8月13日
出版者・発行元： ASME International
DOI： 10.1115/1.4052142 　

ISSN：0742-4795

eISSN：1528-8919

詳細を見る詳細を閉じる

<title>Abstract</title> The periodic maintenance, repair, and overhaul (MRO) of turbine blades in thermal power plants are essential to maintain a stable power supply. During MRO, older and less-efficient power plants are put into operation, which results in wastage of additional fuels. Such a situation forces thermal power plants to work under off-design conditions. Moreover, such an operation accelerates blade deterioration, which may lead to sudden failure. Therefore, a method for avoiding unexpected failures needs to be developed. To detect the signs of machinery failures, the analysis of time-series data is required. However, data for various blade conditions must be collected from actual operating steam turbines. Further, obtaining abnormal or failure data is difficult. Thus, this paper proposes a classification approach to analyze big time-series data alternatively collected from numerical results. The time-series data from various normal and abnormal cases of actual intermediate-pressure steam-turbine operation were obtained through numerical simulation. Thereafter, useful features were extracted and classified using K-means clustering to judge whether the turbine is operating normally or abnormally. The experimental results indicate that the status of the blade can be appropriately classified. By checking data from real turbine blades using our classification results, the status of these blades can be estimated. Thus, this approach can help decide on the appropriate timing for MRO.
Distributed Graph Algorithms for Multiple Vector Engines of NEC SX-Aurora TSUBASA Systems 査読有り

Ilya V. Afanasyev, Vadim V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　8　(2)　2021年6月
出版者・発行元： FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210206 　

ISSN：2313-8734
Optimizing Load Balance in a Parallel CFD Code for a Large-scale Turbine Simulation on a Vector Supercomputer 査読有り

Osamu Watanabe, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　8　(2)　2021年6月
出版者・発行元： FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210207 　

ISSN：2313-8734
Performance and Power Analysis of a Vector Computing System 査読有り

Supercomputing Frontiers and Innovations　8　(2)　2021年6月
出版者・発行元： FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210205 　

ISSN：2313-8734
A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs 査読有り

Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2021年6月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw52791.2021.00116 　
A Metadata Prefetching Mechanism for Hybrid Memory Architectures 査読有り

Shunsuke Tsukada, Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　2021年4月14日
出版者・発行元： IEEE
DOI： 10.1109/coolchips52128.2021.9410321 　

ISSN：0916-8524

eISSN：1745-1353
Optimization of the Himeno Benchmark for SX-Aurora TSUBASA 査読有り

Akito Onodera, Kazuhiko Komatsu, Soya Fujimoto, Yoko Isobe, Masayuki Sato, Hiroaki Kobayashi

Benchmarking, Measuring, and Optimizing　127-143　2021年3月
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-71058-3_8 　

ISSN：0302-9743

eISSN：1611-3349
VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture 査読有り

Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

The Journal of Supercomputing　2021年1月26日
出版者・発行元： Springer Science and Business Media LLC
DOI： 10.1007/s11227-020-03564-9 　

ISSN：0920-8542

eISSN：1573-0484
Performance Evaluation of SX-Aurora TSUBASA and Its QA-Assisted Application Design

Hiroaki Kobayashi, Kazuhiko Komatsu

Sustained Simulation Performance 2019 and 2020　3-20　2021年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-68049-7_1 　
Optimizations of DNS Codes for Turbulence on SX-Aurora TSUBASA

Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2019 and 2020　51-59　2021年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-68049-7_4 　
Efficient Mixed-Precision Tall-and-Skinny Matrix-Matrix Multiplication for GPUs 査読有り

Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

International Journal of Networking and Computing　11　(2)　267-282　2021年
出版者・発行元： IJNC Editorial Committee
DOI： 10.15803/ijnc.11.2_267 　

ISSN：2185-2839

eISSN：2185-2847
An External Definition of the One-Hot Constraint and Fast QUBO Generation for High-Performance Combinatorial Clustering 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

International Journal of Networking and Computing　11　(2)　463-491　2021年
出版者・発行元： IJNC Editorial Committee
DOI： 10.15803/ijnc.11.2_463 　

ISSN：2185-2839

eISSN：2185-2847
A Deep Reinforcement Learning Based Feature Selector 査読有り

Yiran Cheng, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Parallel Architectures, Algorithms and Programming　378-389　2021年
出版者・発行元： Springer Singapore
DOI： 10.1007/978-981-16-0010-4_33 　

ISSN：1865-0929

eISSN：1865-0937
A Dynamic Parameter Tuning Method for High Performance SpMM 査読有り

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　318-329　2021年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-69244-5_28 　

ISSN：0302-9743

eISSN：1611-3349
Effects of Using a Memory Stalled Core for Handling MPI Communication Overlapping in the SOR Solver on SX-ACE and SX-Aurora TSUBASA 査読有り

Takashi Soga, Kenta Yamaguchi, Raghunandan Mathur, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　7　(4)　4-15　2020年12月
出版者・発行元： FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi200401 　

ISSN：2313-8734
An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations 査読有り

Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW)　2020年11月
出版者・発行元： IEEE
DOI： 10.1109/candarw51189.2020.00041 　
Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

2020 Eighth International Symposium on Computing and Networking (CANDAR)　2020年11月
出版者・発行元： IEEE
DOI： 10.1109/candar51075.2020.00015 　
Importance of Selecting Data Layouts in the Tsunami Simulation Code 査読有り

Takumi Kishitani, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　830-837　2020年5月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw50202.2020.00140 　
I/O Performance of the SX-Aurora TSUBASA 査読有り

Mitsuo Yokokawa, Ayano Nakai, Kazuhiko Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2020年5月
出版者・発行元： IEEE
DOI： 10.1109/ipdpsw50202.2020.00014 　
Energy-efficient Design of an STT-RAM-based Hybrid Cache Architecture 査読有り

Masayuki Sato, Xue Hao, Kazuhiko Komatsu, Hiroaki Kobayashi

2020 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　2020年4月
出版者・発行元： IEEE
DOI： 10.1109/coolchips49199.2020.9097643 　
Performance Evaluation of SX-Aurora TSUBASA by Using Benchmark Programs

Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2018 and 2019　69-77　2020年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-39181-2_7 　
Optimizations for the Himeno Benchmark on Vector Computing System SX-Aurora TSUBASA 査読有り

Akito Onodera, Kazuhiko Komatsu, Masayuki Sato, Yoko Isobe, Hiroaki Kobayashi

Proceedings of ISC High Performance 2020 Poster Presentation 2020　2020年
Metadata Management for Large-Scale Hybrid Memory Architectures 査読有り

Shunsuke Tsukada, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of ISC High Performance 2020 Poster Presentation　2020年
An Evaluation of a Hierarchical Clustering Method Using Quantum Annealing 査読有り

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Proceedings of ISC High Performance 2020 Poster Presentation　2020年
Acceleration of Numerical Turbine using the Red-Black Method 査読有り

Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Poster Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia)　2020年
Performance evaluation of a clustering approach based on thermophysical properties by using multiple platforms 査読有り

Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Poster Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia)　2020年
Evaluation of Tsunami Inundation Simulation Using Vector Scalar Hybrid MPI on SX-Aurora TSUBASA 査読有り

Akihiko Musa, Takashi Soga, Takashi Abe, Masayuki Sato, Kazuhiko Komatsu, Shunichi Koshimura, Hiroaki Kobayashi

Proceedings of Research Poster Presentation of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20)　2020年
PERFORMANCE EVALUATION OF PARALLEL DNS CODES ON THE SUPERCOMPUTER SX-AURORA TSUBASA 査読有り

Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of the 32nd International conference on Parallel Computational Fluid Dynamics (ParCFD 2020)　2020年
A hierarchical wavefront method for LU-SGS on modern multi-core vector processors 査読有り

Yuta Hougi, Kazuhiko Komatsu, Osamu Watanabe, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the 32nd International conference on Parallel Computational Fluid Dynamics (ParCFD 2020)　2020年
Developing an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA 査読有り

Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Communications in Computer and Information Science　131-145　2020年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-030-55326-5_10 　

ISSN：1865-0929

eISSN：1865-0937
An Energy-aware Dynamic Data Allocation Mechanism for Many-channel Memory Systems 査読有り

Masayuki Sato, Takuya Toyoshima, Hikaru Takayashiki, Ryusuke Egawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(4)　4-19　2019年12月
出版者・発行元： FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi190401 　

ISSN：2313-8734
Developing Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture 査読有り

I. V. Afanasyev, Vad. V. Voevodin, Vl. V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

LOBACHEVSKII JOURNAL OF MATHEMATICS　40　(11)　1753-1762　2019年11月

DOI： 10.1134/S1995080219110039 　

ISSN：1995-0802

eISSN：1818-9962
A Skewed Multi-banked Cache for Many-core Vector Processors 査読有り

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(3)　86-101　2019年9月
出版者・発行元： FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi190305 　

ISSN：2313-8734
A layer-adaptable cache hierarchy by a multiple-layer bypass mechanism

Ryusuke Egawa, Ryoma Saito, Masayuki Sato, Hiroaki Kobayashi

PervasiveHealth: Pervasive Computing Technologies for Healthcare　2019年6月6日
出版者・発行元： ICST
DOI： 10.1145/3337801.3337820 　

ISSN：2153-1633
Development and Validation of a Tsunami Numerical Model with the Polygonally Nested Grid System and its MPI-Parallelization for Real-Time Tsunami Inundation Forecast on a Regional Scale 招待有り

T. Inoue, T. Abe, S. Koshimura, A. Musa, Y. Murashima, H. Kobayashi

Journal of Disaster Research　14　(3)　416-434　2019年3月

DOI： 10.20965/jdr.2019.p0416 　

ISSN：1881-2473

eISSN：1883-8030
Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines 査読有り

Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, Kazuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(1)　36-47　2019年3月

DOI： 10.14529/jsfi190106 　
A Hardware Prefetching Mechanism for Vector Gather Instructions. 査読有り

Hikaru Takayashiki, Masayuki Sato 0001, Kazuhiko Komatsu, Hiroaki Kobayashi

9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms(IA3@SC)　59-66　2019年
出版者・発行元： IEEE
DOI： 10.1109/IA349570.2019.00015 　
Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA. 査読有り

Osamu Watanabe, Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato 0001, Akihiro Musa, Hiroaki Kobayashi

25-32　2019年

DOI： 10.1109/MCHPC49590.2019.00011 　
Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors. 査読有り

Ilya V. Afanasyev, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Parallel Computing Technologies - 15th International Conference(PaCT)　125-139　2019年
出版者・発行元： Springer
DOI： 10.1007/978-3-030-25636-4_10 　
An Appropriate Computing System and Its System Parameters Selection Based on Bottleneck Prediction of Applications. 査読有り

Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato 0001, Hiroaki Kobayashi

IEEE International Parallel and Distributed Processing Symposium Workshops　768-777　2019年
出版者・発行元： IEEE
DOI： 10.1109/IPDPSW.2019.00127 　
Perceptron-based Cache Bypassing for Way-Adaptable Caches. 査読有り

Masayuki Sato 0001, Yongcheng Chen, Haruya Kikuchi, Kazuhiko Komatsu, Hiroaki Kobayashi

IEEE Symposium in Low-Power and High-Speed Chips　1-3　2019年
出版者・発行元： IEEE
DOI： 10.1109/CoolChips.2019.8721331 　
Perceptron-based Cache Bypassing for Way-Adaptable Caches 査読有り

Masayuki Sato, Yongcheng Chen, Haruya Kikuchi, Kazuhiko Komatsu, Hiroaki Kobayashi

2019 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS 22)　2019年

ISSN：2473-4683
Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA 査読有り

Osamu Watanabe, Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

PROCEEDINGS OF MCHPC'19: 2019 IEEE/ACM WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING (MCHPC)　25-32　2019年

DOI： 10.1109/MCHPC49590.2019.00011 　
Performance Evaluation of Tsunami Inundation Simulation on SX-Aurora TSUBASA. 査読有り

Akihiro Musa, Takashi Abe, Takumi Kishitani, Takuya Inoue, Masayuki Sato 0001, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

Computational Science - ICCS 2019 - 19th International Conference, Faro, Portugal, June 12-14, 2019, Proceedings, Part II　363-376　2019年
出版者・発行元： Springer
DOI： 10.1007/978-3-030-22741-8_26 　
An Adjacent-Line-Merging Writeback Scheme for STT-RAM-Based Last-Level Caches

Masayuki Sato, Yoshiki Shoji, Zentaro Sakai, Ryusuke Egawa, Hiroaki Kobayashi

IEEE Transactions on Multi-Scale Computing Systems　4　(4)　593-604　2018年10月1日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/TMSCS.2018.2827955 　

ISSN：2332-7766
Developing Efficient Implementations of Bellman–Ford and Forward-Backward Graph Algorithms for NEC SX-ACE 査読有り

SUPERCOMPUTING FRONTIERS AND INNOVATIONS　5　(3)　65-69　2018年10月

DOI： 10.14529/jsfi180311 　
A Machine Learning-based Approach for Selecting SpMV Kernels and Matrix Storage Formats 査読有り

Hang Cui, Shoichi Hirasawa, Hiroaki Kobayashi, Hiroyuki Takizawa

IEICE Transactions on Information and Systems　E101-D　(9)　2307-2314　2018年9月
メニーコアプロセッサのためのパラメータチューニング時間削減手法

岸谷拓海, 小松一彦, 撫佐昭裕, 佐藤雅之, 小林広明

並列／分散／協調処理に関する『熊本』サマー・ワークショップ　2018年7月
マルチベクトルコアプロセッサの共有キャッシュ構成に関する一検討,

高屋敷光, 佐藤雅之, 小松一彦, 江川隆輔, 小林広明

並列／分散／協調処理に関する『熊本』サマー・ワークショップ　2018年7月
Expressing the Differences in Code Optimizations between Intel Knights Landing and NEC SX-ACE Processors

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

The 13th World Congress on Computational Mechanics/2nd Pan American Congress on Computational Mechanics　2018年7月
An energy-aware set-level refreshing mechanism for eDRAM last-level caches 査読有り

Masayuki Sato, Zehua Li, Ryusuke Egawa, Hiroaki Kobayashi

21st IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL Chips 2018 - Proceedings　1-3　2018年6月5日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/CoolChips.2018.8373082 　
Early Evaluation of a New Vector Processor SX-Aurora TSUBASA 査読有り

Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

International Supercomputing Conference 2018 (ISC18)　2018年6月
Performance Evaluation of a Real-Time Tsunami Inundation Forecast System on Modern Supercomputers 査読有り

Akihiro Musa, Takumi Kishitani, Takuya Inoue, Hiroaki Hokari, Masayuki Sato, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

15th Annual Meeting Asia Oceania Geoscience Society　2018年6月

DOI： 10.20965/jdr.2018.p0234 　
MIGRATING AN OLD VECTOR CODE TO MODERN VECTOR MACHINES 査読有り

Hiroyuki Takizawa, Kenta Yamaguchi, Takashi Soga, Thorsten Reimannz, Kuzuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

Proceedings of the 30th International Conference on Parallel Computational Fluid Dynamics　2018年5月
Real-time tsunami inundation forecast system for tsunami disaster prevention and mitigation 査読有り

Akihiro Musa, Osamu Watanabe, Hiroshi Matsuoka, Hiroaki Hokari, Takuya Inoue, Yoichi Murashima, Yusaku Ohta, Ryota Hino, Shunichi Koshimura, Hiroaki Kobayashi

Journal of Supercomputing　74　(7)　1-21　2018年4月16日
出版者・発行元： Springer New York LLC
DOI： 10.1007/s11227-018-2363-0 　

ISSN：1573-0484 0920-8542
A Real-Time Tsunami Inundation Forecast System Using Vector Supercomputer SX-ACE 査読有り

Akihiro Musa, Takashi Abe, Takuya Inoue, Hiroaki Hokari, Yoichi Murashima, Yoshiyuki Kido, Susumu Date, Shinji Shimojo, Shunichi Koshimura, Hiroaki Kobayashi

Journal of Disaster Research　13　(2)　234-244　2018年3月

DOI： 10.20965/jdr.2018.p0234 　

ISSN：1881-2473

eISSN：1883-8030
Tsunami inundation and damage forecasting with high-performance computing infrastructure

S. Koshimura, Y. Murashima, A. Musa, R. Hino, Y. Ohta, H. Kobayashi, M. Kachi, Y. Sato

11th National Conference on Earthquake Engineering 2018, NCEE 2018: Integrating Science, Engineering, and Policy　6　3423-3427　2018年
出版者・発行元： Earthquake Engineering Research Institute
反応・相変化を伴う多分散系混相流シミュレーションコードの最適化

佐々木, 大輔, 加藤, 季広, 磯部, 洋子, 笠原, 弘貴, 渡部, 広吾輝, 志村, 啓, 奥野, 航平, 松尾, 亜紀子, 江川, 隆輔, 滝沢, 寛之, 小林, 広明

SENAC : 東北大学大型計算機センター広報　51　(1)　47-51　2018年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN：0286-7419

詳細を見る詳細を閉じる

紀要類（bulletin）
Search Space Reduction for Parameter Tuning of a Tsunami Simulation on the Intel Knights Landing Processor 査読有り

Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

2018 IEEE 12TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2018)　117-124　2018年

DOI： 10.1109/MCSoC2018.2018.00030 　
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA 査読有り

Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato, Hiroaki Kobayashi

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18)　2018年
Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems 査読有り

Muhammad ALFIAN AMRIZAL, Atsuya UNO, Yukinori SATO, Hiroyuki TAKIZAWA, Hiroaki KOBAYASHI

IEICE Transactions on Information and Systems　E100D　(12)　2749-2760　2017年12月

DOI： 10.1587/transinf.2017PAP0002 　

ISSN：1745-1361
Advances of tsunami inundation forecasting and its future perspectives 査読有り

Shunichi Koshimura, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi, Yoichi Murashima, Akihiro Musa

OCEANS 2017 - Aberdeen　2017-　1-4　2017年10月25日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/OCEANSE.2017.8084753 　
A Multiple-layer Bypass Mechanism for Energy-Efficient Computing

Ryusuke Egawa, Masayuki Sato, Ryoma Saito, Hiroaki Kobayashi

In Proceedings of 26th Workshop on Sustained Simulation Performance　2017年10月
Early Evaluation of a Heterogeneous Memory Architecture on a Vector Supercomputer

Ryosuke Sato, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers　2017　20-20　2017年8月
出版者・発行元：電気関係学会東北支部連合大会実行委員会
DOI： 10.11528/tsjc.2017.0_20 　
A power-aware LLC control mechanism for the 3D-stacked memory system 査読有り

Ryusuke Egawa, Wataru Uno, Masayuki Sato, Hiroaki Kobayashi, Jubee Tada

2016 IEEE International 3D Systems Integration Conference, 3DIC 2016　2017年7月5日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/3DIC.2016.7970034 　
Toward Dynamic Load Balancing across OpenMP Thread Teams for Irregular Workloads 査読有り

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

International Journal of Networking and Computing　7　(2)　387-404　2017年7月
出版者・発行元： IJNC編集委員会
DOI： 10.15803/ijnc.7.2_387 　

ISSN：2185-2839

詳細を見る詳細を閉じる

In the field of high performance computing, massively-parallel many-core processors such as Intel Xeon Phi coprocessors are becoming popular because they can significantly accelerate various applications. In order to efficiently parallelize applications for such many-core processors, several high-level programming models have been proposed. The de facto standard programming model mainly for shared-memory parallel processing is OpenMP. For hierarchical parallel processing, OpenMP version 4.0 or later allows programmers to create multiple thread teams. Each thread team contains a bunch of newly-created synchronizable threads. When multiple thread teams are used to execute an application, it is important to have dynamic load balancing across thread teams, since static load balancing easily encounters load imbalance across teams, and thus degrades performance. In this paper, we first motivate our work by clarifying the benefit of using multiple thread teams to execute an irregular workload on a many-core processor. Then, we demonstrate that dynamic load balancing across those thread teams has a potential of significantly improving the performance of irregular workloads on a many-core processor, with considering the scheduling overhead. Although such a dynamic load balancing mechanism has not been provided by the current OpenMP specification, the benefits of dynamic load balancing across thread teams are discussed through experiments using the Intel Xeon Phi coprocessor. We evaluate the performance gain of dynamic load balancing across thread teams using a ray tracing code. The results show that such a dynamic load balancing mechanism can improve the performance by up to 14% compared to static load balancing across teams, with considering scheduling overhead.
太陽光及び暑熱同時ばく露に対する熱中症リスク評価シ太陽光及び暑熱同時ばく露に対する熱中症リスク評価シミュレータの開発ミュレータの開発査読有り

西尾渉, 小寺紗千子, 平田晃正, 佐々木大輔, 山下毅, 江川隆輔, 小林広明, 曽根秀昭

電子情報通信学会和文論文誌C　J100-C　(5)　208-216　2017年5月
Effects of Using a Memory-Stalled Core for Handling MPI Communication Overlapping in The SOR Solver 査読有り

Takashi Soga, Kenta Yamaguchi, Raghunandan Mathur, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

Proceedings of The 29th International Conference on Parallel Computational Fluid Dynamics (ParallelCFD 2017)　2017年5月
人体太陽光および暑熱同時ばく露による熱中症リスク評価の高速化査読有り

西尾渉, 小寺紗千子, 平田晃正, 佐々木大輔, 山下毅, 江川隆輔, 曽根秀昭, 小林広明

電子情報通信学会論文誌 C　J100-C　(5)　208-216　2017年4月
シナリオテンプレートを用いた自動チューニングに関する研究

佐藤大智, 平澤将一, 滝沢寛之, 小林広明

第79回全国大会講演論文集　2017　(1)　45-46　2017年3月
多角形領域接続・MPI 並列津波解析モデルの複数解像度における全国津波解析への適用性検討査読有り

井上拓也, 阿部孝志, 越村俊一, 撫佐昭裕, 村嶋陽一, 小林広明

土木学会論文誌B2(海岸工学)　73　(2)　I_319-I_324　2017年
出版者・発行元：公益社団法人土木学会
DOI： 10.2208/kaigan.73.I_319 　

詳細を見る詳細を閉じる

高性能計算による地震発生後のリアルタイム津波浸水・被害予測の実現に向けて，2次元非線形長波理論による津波解析を空間解像度270 m, 90 m, 30 mで実施し，計算精度と計算コストを検証した．解析モデルとして，解析領域及びネスティングの形状を従来の矩形から多角形に拡張し，高精度解析の対象地域を津波の遡上しうる沿岸域に限定した効率的な多角形領域接続・MPI並列モデルを構築・最適化した．これにより約14 %の計算効率化を達成すると共に，多角形領域の自動設定手法を提案した．東北大学のスーパーコンピュータを用いた神奈川県から鹿児島県にかけての広域津波解析により，30 m格子全国リアルタイム津波浸水予測には140 Tflop/s程度の計算機が必要であることを明らかにした．
Optimization of a tsunami inundation model with the polygonally nested grid system and MPI parallelization 査読有り

Takuya Inoue, Takashi Abe, Shunichi Koshimura, Akihiro Musa, Yoichi Murashima, Hiroaki Kobayashi

Proceedings of International Tsunami Symposium 2017　2017年

DOI： 10.1109/OCEANSE.2017.8084753 　
Rapid Tsunami Inundation and Damage Estimation System with High-performance Computing and Networking 査読有り

Shunichi Koshimura, Yoichi Murashima, Akihiro Musa, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi, Masahiro Kachi, Yoshihiro Sato

Proceedings of International Tsunami Symposium 2017　2017年
An Application-adaptive Data Allocation Method for Multi-channel Memory 査読有り

Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

2017 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS)　2017年

DOI： 10.1109/CoolChips.2017.7946381 　

ISSN：2473-4683
An Adjacent-Line-Merging Writeback Scheme for STT-RAM Last-Level Caches 査読有り

Masayuki Sato, Zentaro Sakai, Ryusuke Egawa, Hiroaki Kobayashi

2017 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS)　2017年

DOI： 10.1109/CoolChips.2017.7946380 　

ISSN：2473-4683
Performance and Power Analysis of SX-ACE using HP-X Benchmark Programs 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi, Yoko Isobe, Toshihiro Kato, Souya Fujimoto

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　693-700　2017年

DOI： 10.1109/CLUSTER.2017.65 　

ISSN：1552-5244
Performance Evaluation of Quantum ESPRESSO on NEC SX-ACE 査読有り

Osamu Watanabe, Akihiro Musa, Hiroaki Hokari, Shivanshu Singh, Raghunandan Mathur, Hiroaki Kobayashi

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　701-708　2017年

DOI： 10.1109/CLUSTER.2017.57 　

ISSN：1552-5244
Vectorization-aware Loop Optimization with User-defined Code Transformations 査読有り

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　685-692　2017年

DOI： 10.1109/CLUSTER.2017.102 　

ISSN：1552-5244
Program optimization of numerical turbine for vector supercomputer SX-ACE 査読有り

Yuta Sakaguchi, Kenryo Kataumi, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi, Satoru Yamamoto

Computers & Fluids　2017年
A Directive Generation Approach to High Code-Maintainability for Various HPC Systems. 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Int. J. Netw. Comput.　7　(2)　405-418　2017年
Potential of a Modern Vector Supercomputer for Practical Applications - Performance Evaluation of SX-ACE - 査読有り

Ryusuke Egawa, Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

Journal of Supercomputing　73　(9)　3948-3976　2017年

DOI： 10.1007/s11227-017-1993-y 　
Directive Translation for Various HPC Systems Using the Xevolver Framework 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2016　109-117　2016年12月

DOI： 10.1007/978-3-319-46735-1_9 　
Making a Legacy Code AUto-tunable without Messing It Up 査読有り

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, Hiroaki Kobayashi

Proceedings of the 29th International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)　2016年11月
高バンド幅メモリのための省電力データ配置手法に関する研究

豊嶋拓也, 佐藤雅之, 江川隆輔, 小林広明

東北支部大会連合大会予稿集　2016　39-39　2016年8月
出版者・発行元：電気関係学会東北支部連合大会実行委員会
DOI： 10.11528/tsjc.2016.0_39 　
Message from the organizing committee chair 査読有り

Hiroaki Kobayashi

19th IEEE Symposium on Low-Power and High-Speed Chips, IEEE COOL Chips 2016 - Proceedings　i-ii　2016年7月5日
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/CoolChips.2016.7503663 　
Effects of Stacking Granularity on 3-D Stacked Floating-point Fused Multiply Add Units 査読有り

Jubee Tada, Maiki Hosokawa, Ryusuke Egawa, Hiroaki Kobayashi

Proceedings of International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART 2016)　2016年7月
Performance Optimization of Numerical Turbine for Supercomputer SX-ACE 査読有り

Y. Sakaguchi, K. Kataumi, H. Matsuoka, O. Watanabe, A. Musa, K. Komatsu, R. Egawa, H. Kobayashi, S. Yamamoto

Proceedings of the 28th International Conference on Parallel Computational Fluid Dynamics　2016年5月
A Power-Performance Tradeoff of HBM by Limiting Access Channels 査読有り

Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of IEEE Symposium on Low-Power and High-Speed Chips　2016年4月
アプリケーション適応型キャッシュリサイズのためのバイパス機構査読有り

佐藤雅之, 高井拓実, 江川隆輔, 滝沢寛之, 小林広明

電子情報通信学会論文誌 D　J99-D　(3)　337-347　2016年3月

DOI： 10.14923/transinfj.2014JDP7131 　
A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE 査読有り

Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

International Journal of Networking and Computing　6　(2)　243-262　2016年2月
機械学習を用いたコード変換に関する研究

川原畑勇希, 平澤将一, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会講演論文集　2016　227-227　2016年
出版者・発行元：電気関係学会東北支部連合大会実行委員会
DOI： 10.11528/tsjc.2016.0_227 　
多角形領域接続・MPI並列による広域津波解析の効率化査読有り

井上拓也, 阿部孝志, 越村俊一, 撫佐昭裕, 村嶋陽一, 小林広明

土木学会論文誌B2　72　(2)　I_373-I_378　2016年
出版者・発行元：公益社団法人土木学会
DOI： 10.2208/kaigan.72.I_373 　

詳細を見る詳細を閉じる

HPCIによるリアルタイム津波浸水・被害予測の実現において，2次元非線形長波理論による津波コードを用いて全国沿岸を対象とした10m格子高精度津波浸水予測を10分以内に実行するためには，2 Pflop/s程度の計算機が必要であることを明らかにした．この計算量を大幅に削減するため，解析領域及びネスティングの形状を従来の矩形から多角形に拡張し，高精度解析の対象とする地域を津波の遡上しうる沿岸域に限定することで，広域津波解析に適した効率的な解析手法を構築した．また，既存モデルに対する精度検証を行った．津波予報区レベルの広域津波解析を実施した結果，従来の3倍以上の効率化が達成でき，全国一律の即時津波浸水予測の実現可能性を示した．
ディレクティブに基づくステンシル計算の性能パラメータ自動設定査読有り

角川拓也, 平澤将一, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　9　(4)　25-37　2016年
Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework. 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

Int. J. Netw. Comput.　6　(2)　167-180　2016年
A Code Selection Mechanism Using Deep Learning 査読有り

Hang Cui, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC)　385-392　2016年

DOI： 10.1109/MCSoC.2016.46 　
A Cache Partitioning Mechanism to Protect Shared Data for CMPs 査読有り

Masayuki Sato, Shin Nishimura, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS XIX)　2016年

DOI： 10.1109/CoolChips.2016.7503674 　

ISSN：2473-4683
A User-Defined Code Transformation Approach to Overlapping MPI Communication with Computation 査読有り

Yasuharu Hayashi, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　508-514　2016年

DOI： 10.1109/CANDAR.2016.35 　

ISSN：2379-1888
A Directive Generation Approach Using User-defined Rules 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　515-521　2016年

DOI： 10.1109/CANDAR.2016.94 　

ISSN：2379-1888
The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads 査読有り

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　529-535　2016年

DOI： 10.1109/CANDAR.2016.48 　

ISSN：2379-1888
Parallel processing model for cholesky decomposition algorithm in AlgoWiki project 査読有り

Alexander S. Antonov, Alexey V. Frolov, Hiroaki Kobayashi, Igor N. Konshin, Alexey M. Teplov, Vadim V. Voevodin, Vladimir V. Voevodin

Supercomputing Frontiers and Innovations　3　(3)　61-70　2016年
出版者・発行元： South Ural State University, Publishing Center
DOI： 10.14529/jsfi160307 　

ISSN：2313-8734 2409-6008
Performance Evaluation of Compiler-Assisted OpenMP Codes on Various HPC Systems 招待有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2015　147-157　2015年12月

DOI： 10.1007/978-3-319-20340-9_12 　
A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning 査読有り

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E98D　(12)　2178-2186　2015年12月

DOI： 10.1587/transinf.2015PAP0028 　

ISSN：1745-1361
A Real-Time Tsunami Inundation Forecast System for Tsunami Disaster and Mitigation 査読有り

Akihiro Musaa, Hiroshi Matsuoka, Osamu Watanabe, Yoichi Murashima, Shunichi Koshimura, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi

the 28th International Conference for High Performance Computing, Networking, Storage and Analysis (SC15)　2015年11月
An Approach to the Highest Efficiency of the HPCG Benchmark on the SX-ACE Supercomputer 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Yoko Isobe, Ryusei Ogata, Hiroyuki Takizawa, Hiroaki Kobayashi

the 28th International Conference for High Performance Computing, Networking, Storage and Analysis (SC15)　2015年11月
三次元積層時代における高電力効率メモリ階層設計

宇野渉, 佐藤雅之, 江川隆輔, 小林広明

信学技報　115　(271)　19-24　2015年10月
出版者・発行元：電子情報通信学会
ISSN：0913-5685
マルチコアプロセッサのためのスレッド間共有データを考慮したキャッシュ機構

西村秦, 佐藤雅之, 江川隆輔, 小林広明

研究報告計算機アーキテクチャ（ARC）　2015-ARC-216　(38)　1-8　2015年8月
FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON ELECTRONICS　E98C　(7)　550-558　2015年7月

DOI： 10.1587/transele.E98.C.550 　

ISSN：1745-1353
Xevolver による実アプリケーションの性能と保守性の両立

平澤将一, 滝沢寛之, 小林広明

計算工学講演会論文集　20　4p　2015年6月
出版者・発行元：日本計算工学会
Performance Evaluation of an OpenMP Parallelization by Using Automatic Parallelization Information

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2014　119-126　2015年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-319-10626-7_10 　
Code Optimization Activities Toward a High Sustained Simulation Performance

Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2015　159-168　2015年
出版者・発行元： Springer International Publishing
DOI： 10.1007/978-3-319-20340-9_13 　
Design of a 3-D Stacked Floating-point Goldschmidt Divider 査読有り

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

2015 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC 2015)　2015年

ISSN：2164-0157
A Data Management Policy for Energy-Efficient Cache Mechanisms

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2015　61-75　2015年

DOI： 10.1007/978-3-319-20340-9_6 　
Xevolver を用いた自動チューニング

平澤将一, 肖熊, 滝沢寛之, 小林広明

計算工学会学会誌「計算工学」　20　(2)　14-17　2015年
Identication and elimination of platform-specic code smells in high performance computing applications 査読有り

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

International Journal of Networking and Computing　5　(1)　180-199　2015年
出版者・発行元： IJNC Editorial Committee
DOI： 10.15803/ijnc.5.1_180 　

ISSN：2185-2839

詳細を見る詳細を閉じる

A code smell is a code pattern that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help users find which parts of their application codes should be refactored. However, code smells have not been defined in a formal manner. Moreover, existing detection tools are designed mainly for object-oriented applications, but rarely provided for high performance computing (HPC) applications. HPC applications are usually optimized for a particular platform to achieve a high performance, and hence have special code smells called platform-specific code smells (PSCSs). The purpose of this work is to develop a code smell alert system to help users find PSCSs of HPC applications to improve the performance portability across different platforms. This paper presents a PSCS alert system that is based on an abstract syntax tree (AST) and XML. Code patterns of PSCSs are defined in a formal way using the AST information represented in XML. XML Path Language (XPath) is used to describe those patterns. A database is built to store the transformation recipes written in XSLT files for eliminating detected PSCSs. The recall and precision evaluation results obtained by using real applications show that the proposed system can detect potential PSCSs accurately. The evaluation on performance portability of real applications demonstrates that eliminating PSCSs leads to significant performance changes and therefore the code portions with detected PSCSs have to be refactored to improve the performance portability across multiple platforms.
Optimized Data Transfers Based on the OpenCL Event Management Mechanism 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Makoto Sugawara, Isaac Gelado, Hiroaki Kobayashi, Wen-mei W. Hwu

SCIENTIFIC PROGRAMMING　2015　(576498)　2015年

DOI： 10.1155/2015/576498 　

ISSN：1058-9244

eISSN：1875-919X
Real-time tsunami inundation forecasting and damage estimation method by fusion of real-time crustal deformation monitoring and high-performance computing 査読有り

S. Koshimura, R. Hino, Y. Ohta, H. Kobayashi, A. Musa, Y. Murashima

the 26th International Union of Geodesy and Geophysics　2015年
Expressing system-awareness as code transformations for performance portability across diverse HPC 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi

Workshop on Portability Among HPC Architectures for Scientific Applications　2015年
Combining code refactoring and auto-tuning to improve performance portability of high-performance computing applications 査読有り

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

The Sixth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking (COMPUTATION TOOLS 2015)　2015年
Automatic Parameter Tuning of Hierarchical Incremental Checkpointing 査読有り

Alfian Amrizal, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2014　8969　298-309　2015年

DOI： 10.1007/978-3-319-17353-5_25 　

ISSN：0302-9743
A Verification Framework for Streamlining Empirical Auto-tuning 査読有り

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　508-514　2015年

DOI： 10.1109/CANDAR.2015.115 　

ISSN：2379-1888
Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　515-520　2015年

DOI： 10.1109/CANDAR.2015.102 　

ISSN：2379-1888
A Case Study of Memory Optimization for Migration of a Plasmonics Simulation Application to SX-ACE 査読有り

Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　521-527　2015年

DOI： 10.1109/CANDAR.2015.105 　

ISSN：2379-1888
A Case Study of User-Defined Code Transformations for Data Layout Optimizations 査読有り

Takeshi Yamada, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　535-541　2015年

DOI： 10.1109/CANDAR.2015.96 　

ISSN：2379-1888
An Energy-Efficient Dynamic Memory Address Mapping Mechanism 査読有り

Masayuki Sato, Chengguang Han, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2015 IEEE SYMPOSIUM ON LOW-POWER AND HIGH-SPEED CHIPS　2015年

DOI： 10.1109/CoolChips.2015.7158660 　
Designing an HPC Refactoring Catalog Toward the Exa-scale Computing Era

Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2014　91-98　2014年11月

DOI： 10.1007/978-3-319-10626-7_8 　
Early Evaluation of the SX-ACE Processor 査読有り

Ryusuke Egawa, Shintaro Momose, Kazuhiko Komatsu, Yoko Isobe, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi

the 27th International Conference for High Performance Computing, Networking, Storage and Analysis (SC14)　2014年11月
MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications 査読有り

Ye Gao, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E97D　(11)　2835-2843　2014年11月

DOI： 10.1587/transinf.2014EDP7227 　

ISSN：1745-1361
ベクトル型メディアプロセッサの低消費電力化に関する研究

宇野渉, 高也, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会予稿集　2014年8月
キャッシュメモリにおけるスレッド間共有データの管理に関する研究

西村秦, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会予稿集　2014年8月
Exploring system architectures for next-generation CFD simulations in the postpeta-scale era 査読有り

KOMATSU Kazuhiko, EGAWA Ryusuke, TAKIZAWA Hiroyuki, SOGA Takashi, MUSA Akihiro, KOBAYASHI Hiroaki

Journal of Fluid Science and Technology　9　(5)　JFST0073-JFST0073　2014年
出版者・発行元：一般社団法人日本機械学会
DOI： 10.1299/jfst.2014jfst0073 　

ISSN：1880-5558

詳細を見る詳細を閉じる

CFD simulations with uniform grids have been paid attention as a next-generation CFD simulation on a large-scale supercomputing system. The Building-Cube Method (BCM) is one of the next-generation CFD methods. The basic idea is to balance loads of calculations among processing elements on a supercomputing system by dividing the whole calculations into many parallel tasks with the same amount of computation. Thus, it is suitable for highly parallel computation on supercomputing systems. This paper firstly implements BCM on five supercomputing systems as an example of a next-generation CFD simulation in the upcoming postpeta-scale era. Then, by theoretical analyses and performance evaluations, this paper clarifies the requirements of future supercomputing systems for a next-generation CFD simulation. The performance evaluations show that as the number of processing elements increases, the imbalance of data exchanges among nodes becomes more serious than that of calculations even in a next-generation CFD simulation. While the calculation time can ideally be reduced according to the number of processing elements, the data transfer time becomes dominant in the total execution time. Different from the massively-parallel system architecture, the number of nodes in a system should be as small as possible to prevent the data transfer. The performance analyses also show that the memory bandwidth limits the performance of BCM and use of an on-chip memory is effective to improve the performance. A memory subsystem that achieves a higher sustained memory bandwidth is required. Therefore, a supercomputing system that consists of a small number of high-performance nodes is essential to achieve high sustained performance of the next-generation CFD in the up coming postpeta-scale era by reducing the data transfers, which becomes eventually a bottleneck in large-scale simulation.
On-Chip Checkpointing with 3D-Stacked Memories 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　1-6　2014年

DOI： 10.1109/3DIC.2014.7152173 　

ISSN：2164-0157
OpenMP Parallelization Method using Compiler Information of Automatic Optimization 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Legacy HPC Application Migration 2014　2014年
Real-time tsunami inundation forecasting and damage mapping towards enhancing tsunami disaster resilience 査読有り

S. Koshimura, R. HIno, Y. Ohta, H. Kobayashi, A. Musa, Y.Murashima

American Geophysical Union Fall Meeting　2014年
An Approach to Customization of Compiler Directives for Application-Specific Code Transformations 査読有り

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC)　99-106　2014年

DOI： 10.1109/MCSoC.2014.23 　
Xevolver: An XML-based Code Translation Framework for Supporting HPC Application Migration 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Yasuharu Hayashi, Ryusuke Egawa, Hiroaki Kobayashi

2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC)　2014年

DOI： 10.1109/HiPC.2014.7116902 　

ISSN：1094-7256
A compiler-assisted OpenMP migration method based on automatic parallelizing information 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　8488　450-459　2014年
出版者・発行元： Springer Verlag
DOI： 10.1007/978-3-319-07518-1_30 　

ISSN：1611-3349 0302-9743
A Platform-Specific Code Smell Alert System for High Performance Computing Applications 査読有り

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)　653-662　2014年

DOI： 10.1109/IPDPSW.2014.76 　
An Energy Optimization Method for Vector Processing Mechanisms 査読有り

Ye Gao, Masayuki Satoi, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 IEEE COOL CHIPS XVII　2014年

DOI： 10.1109/CoolChips.2014.6842957 　

ISSN：2473-4683
On-Chip Checkpointing with 3D-Stacked Memories 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　2014年

DOI： 10.1109/3DIC.2014.7152173 　

ISSN：2164-0157
An Impact of Circuit Scale on the Performance of 3-D Stacked Arithmetic Units 査読有り

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　2014年

ISSN：2164-0157
An XML-based Programming Framework for User-defined Code Transformations 査読有り

Hiroyuki Takizawa, Xiong Xiao, Shoichi Hirasawa, Hiroaki Kobayashi

The 4th AICS International Symposium　2013年12月2日
複合システムにおけるチェックポイントリスタート査読有り

滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

日本信頼性学会誌　35　(12)　515-516　2013年12月

DOI： 10.11348/reajshinrai.35.8_515 　
三次元LSIの課題と高信頼化査読有り

小柳光正, 小林広明, 末吉敏則, 鎌田忠

日本信頼性学会誌　35　(8)　471-471　2013年12月
出版者・発行元：日本信頼性学会
DOI： 10.11348/reajshinrai.35.8_471 　

ISSN：0919-2697
Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi

International Conference on Fluid Dynamics(ICFD2013), November 27　2013年11月27日
Xevolver : an XML-based Programming Framework for Software Evolution 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

Supercomputing Conference 2013 (SC13)　2013年11月
ソフトウェア進化のための自動性能追跡システム査読有り

平澤将一, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　6　(4)　96-104　2013年10月30日

ISSN：1882-7829

詳細を見る詳細を閉じる

本研究では，ソースコード修正にともなう性能変化を解析し，その性能変化を引き起こす要因となったソースコード修正を特定することができる自動性能追跡システムを提案する．自動性能追跡システムは統合開発環境（IDE）と連携して動作し，複数の計算システムにおける性能可搬性を維持しつつ高性能計算アプリケーションの進化を支援する．ソースコードがバージョン管理システムのリポジトリに格納された際，実行対象とするすべての計算システム上で実行性能を評価することにより，自動性能追跡システムは性能可搬性を低下させるソースコード修正をアプリケーション開発者に提示する．さらに，新しい計算システムが対象として追加された場合，自動性能追跡システムはリポジトリからアプリケーションの複数バージョンのコードを抽出し，それらを新しい計算システム上で実行する．この結果，アプリケーション開発者はソースコード変更が複数の計算システムにおける実行性能に与える影響を解析することができる．解析結果に基づいて，アプリケーション開発者はソースコード修正の再考や，アプリケーションの設計の改善を図ることができる．In this work, we propose an automatic performance tracking system for analyzing the changes in execution performance and finding the source code modifications that cause the performance changes. The proposed system works together with an Integrated Developing Environment (IDE) in order to interactively support evolving a high-performance computing application while maintaining its performance portability across multiple target computing platforms. By evaluating the performances of an application on every target platform whenever its codes are committed to the repository of a version control system, the proposed system helps application developers find the source code modifications that degrade the performance portability. Moreover, when a new target platform is given, the proposed system retrieves multiple versions of an application from the repository, and automatically executes them on the new platform. As a result, application developers analyze how the source code modifications in the past affect the performance on the new platform. Based on the analysis, the application developers can review the source code changes to improve the software design of the HPC application.
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E96D　(9)　2047-2054　2013年9月

DOI： 10.1587/transinf.E96.D.2047 　

ISSN：1745-1361
ブロックバイパス機構によるキャッシュのエネルギ効率化に関する研究

高井拓実, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

並列/分散/協調処理に関する「北九州」サマー・ワークショップ (SWoPP2013)　1-9　2013年7月
Autotuning for Improving the Fault Tolerance of Large-scale Simulations 査読有り

Hiroyuki Takizawa, Alfian Amrizal, Shoichi Hirasawa, Hiroaki Kobayashi

Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@2HPC)　2013年5月
An Automatic Performance Tracking System for Scientific Software Evolution 査読有り

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@2HPC)　2013年5月
An IDE Integrated Cross-Platform Build System for Scientific Applications 査読有り

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

SIAM CSE2013 Minisymposium on Auto-tuning Technologies for Tools and Development Environment in Extreme-Scale Scientific Computing　2013年2月
Performance Evaluation of a Next-Generation CFD on Various Supercomputing Systems

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2012　123-132　2013年
出版者・発行元： Springer Berlin Heidelberg
DOI： 10.1007/978-3-642-32454-3_11 　
Analysing the performance improvements of optimizations on modern HPC systems 査読有り

Kazuhiko Komatsu, Toshihide Sasaki, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2013 - Proceedings of the Joint Workshop on Sustained Simulation Performance　13-25　2013年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-319-01439-5-2 　
Feasibility study of future HPC systems for memory-intensive applications 査読有り

Hiroaki Kobayashi

Sustained Simulation Performance 2013 - Proceedings of the Joint Workshop on Sustained Simulation Performance　3-11　2013年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-319-01439-5-1 　
Exploring a design space of 3-D stacked vector processors 査読有り

Ryusuke Egawa, Jubee Tada, Hiroaki Kobayashi

Sustained Simulation Performance 2012 - Proceedings of the Joint Workshop on High Performance Computing on Vector Systems, and Workshop on Sustained Simulation Performance　35-49　2013年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-32454-3-4 　
Message from the organizing committee chair 査読有り

Hiroaki Kobayashi

IEEE Symposium on Low-Power and High-Speed Chips - Proceedings for 2013 COOL Chips XVI　i-ii　2013年

DOI： 10.1109/CoolChips.2013.6547906 　
ClMPI: An opencl extension for interoperation with the message passing interface 査読有り

Hiroyuki Takizawa, Makoto Sugawara, Shoichi Hirasawa, Isaac Gelado, Hiroaki Kobayashi, Wen-Mei W. Hwu

Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013　1138-1148　2013年
出版者・発行元： IEEE Computer Society
DOI： 10.1109/IPDPSW.2013.183 　
Power and Performance Evaluation of 3-D Stacked Floating-point Multipliers 査読有り

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

IEEE Computer Society Annual Symposium on VLSI (ISLVLSI2013)　218-223　2013年
Design and Evaluation of a Media-oriented Vector Processor with a Multi-banked Cache Memory 査読有り

Ye Gao, Naold Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2013 IEEE 11TH SYMPOSIUM ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA (ESTIMEDIA)　78-87　2013年

DOI： 10.1109/ESTIMedia.2013.6704506 　

ISSN：2325-1271
Vertically Integrated Processor and Memory Module Design for Vector Supercomputers 査読有り

Ryusuke Egawa, Masayuki Sato, Jubee Tada, Hiroaki Kobayashi

2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　1-8　2013年

ISSN：2164-0157
Design of a 3-D Stacked Floating-Point Adder 査読有り

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　1-5　2013年

ISSN：2164-0157
Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD 査読有り

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi

International Conference on Fluid Dynamics(ICFD2013)　2013年
Performance evaluation of phase-based correspondence matching on GPUs 査読有り

Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXVI　8856　2013年

DOI： 10.1117/12.2023550 　

ISSN：0277-786X

eISSN：1996-756X
A comparison of performance tunabilities between OpenCL and OpenACC 査読有り

Makoto Sugawara, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013　147-152　2013年
出版者・発行元： IEEE Computer Society
DOI： 10.1109/MCSoC.2013.31 　
A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms 査読有り

Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2013 IEEE COOL CHIPS XVI (COOL CHIPS)　2013年

DOI： 10.1109/CoolChips.2013.6547923 　

ISSN：2473-4683
HPCアプリケーションの性能可搬性に関する一検討

小松一彦, 江川隆輔, 安田一平, 撫佐昭裕, 松岡浩司, 小林広明

情報処理学会研究報告(CD-ROM)　2012　(4)　ROMBUNNO.HPC-136,NO.27　2012年12月15日

ISSN：2186-2583
ウェイ適応型キャッシュの高エネルギ効率化のためのデッドブロック早期追い出しポリシ査読有り

東方雄亮, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤シンポジウムSACSIS2012　2012　4-5　2012年5月
メタ情報拡散に基づくP2P型自己組織化サービス資源検索機構査読有り

稲葉勉, 村田善智, 滝沢寛之, 小林広明

電子情報通信学会論文誌 D　J95-D　(5)　1110-1122　2012年5月
出版者・発行元：一般社団法人電子情報通信学会
ISSN：1880-4535

詳細を見る詳細を閉じる

PCやゲーム機などの不特定多数のサービス資源を対象としたサービス資源共有基盤の資源検索機構を実現するため,筆者らはこれまで自己組織化サービス資源検索機構(SORMS)を提案してきた.SORMSは,利用者の利用特徴に基づきオーバレイネットワーク上でサービス資源の論理リンクを張り替えることでクエリの転送先を絞り込み,求める資源の発見数や検索効率向上させることができる.しかし,従来のSORMSの発見資源数や検索効率は大規模計算環境の実用性の観点からは十分とはいえない.また,利用頻度の低いサービス資源がネットワークから孤立してしまうという問題も引き起こしていた.そこで,本論文ではSORMSの実用性の更なる向上を目的として,サービス資源のメタ情報を利用者の利用特徴に基づいてネットワーク内に効率良く拡散させ,それを積極的に検索に利用することで資源の孤立回避と検索性能向上を図るオーバレイネットワーク再構築手法を提案する.シミュレーションによる性能評価の結果,提案機構はサービス資源の発見数を約3.9倍,検索効率を約3.6倍程度向上可能であるとともに,サービス資源のネットワークからの孤立を回避でき,サービス資源の相互利用に有用に機能することが明らかとなった.
A bypass mechanism for way-adaptable caches 査読有り

Takumi Takai, Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips XV　2012年4月
OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法査読有り

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　5　(1)　53-67　2012年1月27日
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

本論文では，OpenCLアプリケーションを対象とし，複数のアクセラレータを用いた並列処理に必要となるタスク並列性を見出すための，実行時情報を用いた依存関係解析手法を提案する．提案する解析手法では，メモリへの読み書き順序制約を表すデータ依存関係を解析し可視化を行う．また，API関数の呼び出し順序制約を表すイベント依存関係を明らかにし，並列処理においてボトルネックになる同期処理を可視化する．提案手法に基づいて54種類のベンチマークプログラムを解析することにより，タスク並列性に基づいて並列化できる可能性のあるプログラムを特定することができた．また，潜在的なバグの発見にも，提案手法による解析が有用であることが示された．This paper proposes a runtime dependency analysis method to find task parallelism in an OpenCL application for use of multiple accelerators. The proposed method can visualize data dependencies among tasks that represent the constraints on memory access sequences, and event dependencies that show the constraints on API call sequences. As a result, the proposed method can help programmers to find unnecessary synchronization points that often become performance bottlenecks in task-parallel processing. We analyze 54 benchmarks to demonstrate that the proposed method can find programs with task parallelism. Besides, we show that the proposed method is also useful to detect potential bugs.
OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法査読有り

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　5　(1)　53-67　2012年1月27日
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

本論文では，OpenCLアプリケーションを対象とし，複数のアクセラレータを用いた並列処理に必要となるタスク並列性を見出すための，実行時情報を用いた依存関係解析手法を提案する．提案する解析手法では，メモリへの読み書き順序制約を表すデータ依存関係を解析し可視化を行う．また，API関数の呼び出し順序制約を表すイベント依存関係を明らかにし，並列処理においてボトルネックになる同期処理を可視化する．提案手法に基づいて54種類のベンチマークプログラムを解析することにより，タスク並列性に基づいて並列化できる可能性のあるプログラムを特定することができた．また，潜在的なバグの発見にも，提案手法による解析が有用であることが示された．This paper proposes a runtime dependency analysis method to find task parallelism in an OpenCL application for use of multiple accelerators. The proposed method can visualize data dependencies among tasks that represent the constraints on memory access sequences, and event dependencies that show the constraints on API call sequences. As a result, the proposed method can help programmers to find unnecessary synchronization points that often become performance bottlenecks in task-parallel processing. We analyze 54 benchmarks to demonstrate that the proposed method can find programs with task parallelism. Besides, we show that the proposed method is also useful to detect potential bugs.
Performance and scalability analysis of a chip multi vector processor 査読有り

Yoshiei Sato, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2011　3-20　2012年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-22244-3-1 　
A prototype implementation of OpenCL for SX vector systems 査読有り

Hiroyuki Takizawa, Ryusuke Egawa, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2011　41-50　2012年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-22244-3-3 　
A media-oriented vector architectural extension with a high bandwidth cache system 査読有り

Ye Gao, Naoki Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Symposium on Low-Power and High-Speed Chips - Proceedings for 2012 IEEE COOL Chips XV　1-3　2012年

DOI： 10.1109/COOLChips.2012.6216588 　
Exploring design space of a 3D stacked vector cache 査読有り

Ryusuke Egawa, Jubee Tada, Yusuke Endo, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012　1475-1477　2012年

DOI： 10.1109/SC.Companion.2012.270 　
Performance Evaluation of BCM on Various Supercomputing Systems 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

Proceedings of 24th International Conference on Parallel Computational Fluid Dynamics　2012年
An out-of-order vector processing mechanism for multimedia applications 査読有り

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

CF '12 - Proceedings of the ACM Computing Frontiers Conference　233-235　2012年

DOI： 10.1145/2212908.2212941 　
A capacity-efficient insertion policy for dynamic cache resizing mechanisms 査読有り

Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

CF '12 - Proceedings of the ACM Computing Frontiers Conference　265-267　2012年

DOI： 10.1145/2212908.2212949 　
GPU IMPLEMENTATION OF PHASE-BASED STEREO CORRESPONDENCE AND ITS APPLICATION 査読有り

Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012)　1697-1700　2012年

DOI： 10.1109/ICIP.2012.6467205 　

ISSN：1522-4880
Improving the Scalability of Transparent Checkpointing for GPU Computing Systems 査読有り

Alfian Amrizal, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY　2012年

ISSN：2159-3442
A Network Clustering Algorithm for Sybil-Attack Resisting 査読有り

Ling Xu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E94D　(12)　2345-2352　2011年12月

DOI： 10.1587/transinf.E94.D.2345 　

ISSN：0916-8532

eISSN：1745-1361
Performance of building cube method on various platforms 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

The 8th International Conference on Flow Dynamics 2011 (ICFD2011)　2011年11月
An automatic task assignment method for heterogeneous computing systems 査読有り

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

The 8th International Conference on Flow Dynamics 2011 (ICFD2011)　2011年11月
マイグレーションによる複合型計算システム向けジョブスケジューリング査読有り

小山賢太郎, 佐藤功人, 小松一彦, 村田善智, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　4　(4)　203-213　2011年10月5日
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

消費電力が厳しく制約された条件下で演算性能を大幅に向上させることができるシステムアーキテクチャとして，汎用プロセッサに加えてアクセラレータを混載する複合型計算システムが注目されている．本論文では，大規模複合型計算システムにおけるターンアラウンドタイムの短縮を目的とし，マイグレーションとプリエンプティブバックフィルに基づくスケジューリング手法を提案する．また，ジョブ投入時にマイグレーションのコストを予測するため，その予測モデルも提案する．予測モデルの精度を評価した結果，ほぼすべてのアプリケーションにおいて，マイグレーションコストの最悪値をジョブの最大メモリ使用量から高精度で予測できることが明らかになった．また，提案スケジューリング手法はマイグレーションとプリエンプティブバックフィルの両方の長所を利用できるため，それらのいずれかが有効に機能する状況において，ターンアラウンドタイムを短縮可能であることが示された．A heterogeneous computing system of general-purpose processors and accelerators is a promising approach to improve the system performance under severe power consumption limitation. This paper proposes a job scheduling method that uses job migration and preemptive backfilling to reduce the turn around time of job execution in a large-scale heterogeneous computing system. A prediction model is also proposed to predict the migration cost of a job when the job is submitted. The evaluation results indicate that the prediction model can accurately estimate the worst-case migration costs of most applications from their maximum memory usage. It is also demonstrated that the proposed mechanism can reduce the turn around time of a job in the situations where either job migration or backfilling works well because it has the advantages of both of the two scheduling policies.
A Patch-Based Bit Mask Filtering Method for Micropolygon Rasterization 査読有り

Jiali Yao, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of High-Performance Graphics(HPG)　2011年8月
Performance of SOR methods on modern vector and scalar processors 査読有り

Takashi Soga, Akihiro Musa, Koki Okabe, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　215-221　2011年6月

DOI： 10.1016/j.compfluid.2010.12.024 　

ISSN：0045-7930
Parallel processing of the Building-Cube Method on a GPU platform 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　122-128　2011年6月

DOI： 10.1016/j.compfluid.2010.12.019 　

ISSN：0045-7930
ウェイ適応型キャッシュのための低消費エネルギ指向挿入ポリシ査読有り

東方雄亮, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤シンポジウムSACSIS2011　2011　213-214　2011年5月
A Power-Aware Insertion Policy for the Way-Adaptable Caches 査読有り

Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of COOLChips XIV　2011年4月
実アプリケーションを用いたチップマルチベクトルプロセッサの消費エネルギ評価

永岡龍一, 佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 小林広明

情報処理学会研究報告(CD-ROM)　2010　(5)　ROMBUNNO.ARC-192,NO.3　2011年2月15日

ISSN：2186-2583
A Self-Organized Overlay Network Management Mechanism for Heterogeneous Environments 査読有り

Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

IPSJ Journal　52　(2)　320-333　2011年2月
出版者・発行元： Information and Media Technologies 編集運営会議
DOI： 10.11185/imt.6.546 　

詳細を見る詳細を閉じる

The technologies of Cloud Computing and NGN are now growing a paradigm shift where various services are provided to business users over the network. In conjunction with this movement, many studies are active to realize a ubiquitous computing environment in which a huge number of individual users can share their computing resources on the Internet, such as personal computers (PCs), game consoles, sensors and so on. To realize an effective resource discovery mechanism for such an environment, this paper presents an adaptive overlay network that enables a self-organizing resource management system to efficiently adapt to a heterogeneous environment. The proposed mechanism is composed of two functions. One is to adjust the number of logical links of a resource, which forward search queries so that less-useful query flooding can be reduced. The other is to connect resources so as to decrease the communication latency on the physical network rather than the number of query hops on an overlay network. To further improve the discovery efficiency, this paper integrates these functions into a self-organizing resource management system, SORMS, which has been proposed in our previous work. The simulation results indicate that the proposed mechanism can increase the number of discovered resources by 60% without decreasing the discovery efficiency, and can reduce the total communication traffic by 80% compared with the original SORMS. This performance improvement is obtained by efficient control of logical links in a large scale network.
動的負荷分散機能を持つ高性能ボランティアコンピューティングの実現査読有り

村田義智, 石杜佑記, 滝沢寛之, 小林広明

情報処理学会論文誌　52　(2)　401-414　2011年2月
Performance Evaluation of Real-Time Stereo Correspondence on GPU

三浦衛, 札野欽也, 伊藤康一, 青木孝文, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会講演論文集　2011　31-31　2011年
出版者・発行元：電気関係学会東北支部連合大会実行委員会
DOI： 10.11528/tsjc.2011.0_31 　
Power-aware dynamic cache partitioning for CMPs 査読有り

Isao Kotera, Kenta Abe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　6590　135-153　2011年

DOI： 10.1007/978-3-642-19448-1_8 　

ISSN：0302-9743 1611-3349
Large scaled computation of incompressible flows on Cartesian mesh using a vector-parallel supercomputer 査読有り

Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

Lecture Notes in Computational Science and Engineering　74　332-338　2011年

DOI： 10.1007/978-3-642-14438-7-35 　

ISSN：1439-7358
A self-organized overlay network management mechanism for heterogeneous environments 査読有り

Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

Journal of Information Processing　19　(0)　25-38　2011年
出版者・発行元： Information Processing Society of Japan
DOI： 10.2197/ipsjjip.19.25 　

ISSN：1882-6652 0387-5806
ルーフラインモデルに基づくベクトルプロセッサ向けプログラム最適化戦略査読有り

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会論文誌：コンピューティングシステム(ACS)　4　(3)　77-87　2011年

ISSN：1882-7772
OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法査読有り

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム(ACS)　5　(1)　53-67　2011年1月
A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems 査読有り

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011　135-142　2011年

DOI： 10.1109/ISPA.2011.36 　
CheCL: Transparent checkpointing and process migration of OpenCL applications 査読有り

Hiroyuki Takizawa, Kentaro Koyama, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011　864-876　2011年

DOI： 10.1109/IPDPS.2011.85 　
Effects of 3-D stacked vector cache on energy consumption 査読有り

Ryusuke Egawa, Yusuke Funaya, Ryuichi Nagaoka, Yusuke Endo, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

2011 IEEE International 3D Systems Integration Conference, 3DIC 2011　2011年

DOI： 10.1109/3DIC.2012.6263026 　
A middle-grain circuit partitioning strategy for 3-D integrated floating-point multipliers 査読有り

Jubee Tada, Ryusuke Egawa, Kazushige Kawai, Hiroaki Kobayashi, Gensuke Goto

2011 IEEE International 3D Systems Integration Conference, 3DIC 2011　2011年

DOI： 10.1109/3DIC.2012.6263031 　
A performance tuning strategy under combining loop transforms for a vector processor with an on-chip cache 査読有り

Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC10)　2010年11月
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering 査読有り

SUZUKI Ken-ichi, KAERIYAMA Yoshiyuki, KOMATSU Kazuhiko, EGAWA Ryusuke, OHBA Nobuyuki, KOBAYASHI Hiroaki

IEICE transactions on information and systems　93　(4)　891-902　2010年4月1日
出版者・発行元：一般社団法人電子情報通信学会
DOI： 10.1587/transinf.E93.D.891 　

ISSN：0916-8532

詳細を見る詳細を閉じる

Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive dynamic scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering 査読有り

Ken-ichi Suzuki, Yoshiyuki Kaeriyama, Kazuhiko Komatsu, Ryusuke Egawa, Nobuyuki Ohba, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E93D　(4)　891-902　2010年4月

DOI： 10.1587/transinf.E93.D.891 　

ISSN：1745-1361
The vector computing cloud: Toward a vector meta-computing environment 査読有り

Ryusuke Egawa, Manabu Higashida, Yoshitomo Murata, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2010　75-91　2010年
出版者・発行元： Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-11851-7-6 　
Automatic tuning of CUDA execution parameters for stencil processing 査読有り

Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Software Automatic Tuning: From Concepts to State-of-the-Art Results　209-228　2010年
出版者・発行元： Springer New York
DOI： 10.1007/978-1-4419-6935-4_13 　
Lessons Learned from 1-Year Experience with SX-9 and Toward the Next Generation Vector Computing 査読有り

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoko Isobe

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009　3-+　2010年

DOI： 10.1007/978-3-642-03913-3_1 　
Large-Scale Flow Computation of Complex Geometries by Building-Cube Method 査読有り

Daisuke Sasaki, Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009　167-+　2010年

DOI： 10.1007/978-3-642-03913-3_13 　
Cache partitioning strategies for 3-D stacked vector processors 査読有り

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2010, 3DIC 2010　1-6　2010年

DOI： 10.1109/3DIC.2010.5751453 　
Efficient data management for the building cube method using cartesian meshes on the GPU platform 査読有り

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

International Supercomputing Conference (ISC10)　2010年
A Majority-Based Control Scheme for Way-Adaptable Caches 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

FACING THE MULTICORE-CHALLENGE: ASPECTS OF NEW PARADIGMS AND TECHNOLOGIES IN PARALLEL COMPUTING　6310　16-+　2010年

DOI： 10.1007/978-3-642-16233-6_5 　

ISSN：0302-9743

eISSN：1611-3349
Evaluating Performance and Portability of OpenCL Programs 査読有り

Kazuhiko Komatsu, Katsuto Sato, Yusuke Arai, Kentaro Koyama, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 5th international Workshop on Automatic Performance Tuning　2010年
Resisting sybil attack by social network and network clustering 査読有り

Ling Xu, Satayapiwat Chainan, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 2010 10th Annual International Symposium on Applications and the Internet, SAINT 2010　15-21　2010年

DOI： 10.1109/SAINT.2010.32 　
A history-based job scheduling mechanism for the vector computing cloud 査読有り

Yoshitomo Murata, Ryusuke Egawa, Manabu Higashida, Hiroaki Kobayashi

Proceedings - 2010 10th Annual International Symposium on Applications and the Internet, SAINT 2010　125-128　2010年

DOI： 10.1109/SAINT.2010.43 　
A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications 査読有り

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

13TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN: ARCHITECTURES, METHODS AND TOOLS　412-415　2010年

DOI： 10.1109/DSD.2010.93 　
A Voting-Based Working Set Assessment Scheme for Dynamic Cache Resizing Mechanisms 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2010 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN　98-105　2010年

DOI： 10.1109/ICCD.2010.5647599 　

ISSN：1063-6404
Design and early evaluation of a 3-D die stacked chip multi-vector processor 査読有り

Ryusuke Egawa, Yusuke Funaya, Ryu-Ichi Nagaoka, Akihiro Musa, Hiroyuki Takizawat, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2010, 3DIC 2010　2010年

DOI： 10.1109/3DIC.2010.5751448 　
キャッシュメモリを有するベクトルプロセッサのためのプログラム最適化手法

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会研究報告(CD-ROM)　2009　(3)　ROMBUNNO.ARC-184,6　2009年10月15日

ISSN：2186-2583
Working Sets based Thread Scheduling with Cache Partitioning 査読有り

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Poster Abstracts of The Eighteenth International Conference on Parallel Architecture and Compilation Techniques　12　2009年9月
ワーキングセット評価に基づくスレッドスケジューリング

佐藤雅之, 小寺功, 江川隆輔, 滝沢寛之, 小林広明

並列/分散/協調処理に関する「仙台」サマー・ワークショップ (SWoPP仙台2009)　1-10　2009年8月
Early evaluation of a memory-stacked vector processor 査読有り

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips XII　165　2009年4月
実アプリケーションによるSX‐9の性能評価

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会シンポジウム論文集　2009　(2)　57-64　2009年1月15日

ISSN：1344-0640
Evaluating Computational Performance of Backpropagation Learning on Graphics Hardware 査読有り

Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

Electronic Notes in Theoretical Computer Science　225　(C)　379-389　2009年1月2日

DOI： 10.1016/j.entcs.2008.12.087 　

ISSN：1571-0661
Study of high resolution incompressible flow simulation based on Cartesian mesh

Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

47th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition　2009年
3D On-Chip Memory for the Vector Architecture 査読有り

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION　352-357　2009年

ISSN：2164-0157
Characteristics of an On-Chip Cache on NEC SX Vector Architecture 査読有り

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Interdisciplinary Information Sciences　15　(1)　51-66　2009年
出版者・発行元： The Editorial Committee of the Interdisciplinary Information Sciences
DOI： 10.4036/iis.2009.51 　

ISSN：1340-9050

詳細を見る詳細を閉じる

Thanks to the highly effective memory bandwidth of the vector systems, they can achieve the high computation efficiency for computation-intensive scientific applications. However, they have been encountering the memory wall problem and the effective memory bandwidth rate has decreased, resulting in the decrease in the bytes per flop rates of recent vector systems from 4 (SX-7 and SX-8) to 2 (SX-8R) and 2.5 (SX-9). The situation is getting worse as many functions units and/or cores will be brought into a single chip, because the pin bandwidth is limited and does not scale. To solve the problem, we propose an on-chip cache, called vector cache, to maintain the effective memory bandwidth rate of future vector supercomputers. The vector cache employs a bypass mechanism between the main memory and register files under software controls. We evaluate the performance of the vector cache on the NEC SX vector processor architecture with bytes per flop rates of 2 B/FLOP and 1 B/FLOP, to clarify the basic characteristics of the vector cache. For the evaluation, we use the NEC SX-7 simulator extended with the vector cache mechanism. Benchmark programs for performance evaluation are two DAXPY-like loops and five leading scientific applications. The results indicate that the vector cache boosts the computational efficiencies of the 2 B/FLOP and 1 B/FLOP systems up to the level of the 4 B/FLOP system. Especially, in the case where cache hit rates exceed 50%, the 2 B/FLOP system can achieve a performance comparable to the 4 B/FLOP system. The vector cache with the bypass mechanism can provide the data both from the main memory and the cache simultaneously. In addition, from the viewpoints of designing the cache, we investigate the impact of cache associativity on the cache hit rate, and the relationship between cache latency and the performance. The results also suggest that the associativity hardly affects the cache hit rate, and the effects of the cache latency depend on the vector loop length of applications. The cache shorter latency contributes to the performance improvement of the applications with shorter loop lengths, even in the case of the 4 B/FLOP system. In the case of longer loop lengths of 256 or more, the latency can effectively be hidden, and the performance is not sensitive to the cache latency. Finally, we discuss the effects of selective caching using the bypass mechanism and loop unrolling on the vector cache performance for the scientific applications. The selective caching is effective for efficient use of the limited cache capacity. The loop unrolling is also effective for the improvement of performance, resulting in a synergistic effect with caching. However, there are exceptional cases; the loop unrolling worsens the cache hit rate due to an increase in the working space to process the unrolled loops over the cache. In this case, an increase in the cache miss rate cancels the gain obtained by unrolling.
A Cache-Aware Thread Scheduling Policy for Multi-Core Processors 査読有り

Masayuki Sato, Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks　109-114　2009年
Evaluation of Fine Grain 3-D Integrated Arithmetic Units 査読有り

Ryusuke Egawa, Jubee Tada, Hiroaki Kobayashi, Gensuke Goto

2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION　198-+　2009年

ISSN：2164-0157
Performance tuning and analysis of future vector processors based on the roofline model 査読有り

Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM International Conference Proceeding Series　7-14　2009年

DOI： 10.1145/1621960.1621962 　
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications 査読有り

Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009)　408-+　2009年

DOI： 10.1109/PDCAT.2009.78 　
Performance Evaluation of NEC SX-9 using Real Science and Engineering Applications 査読有り

Takashi Soga, Akihiro Musa, Youichi Shimomura, Ken'ichi Itakura, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS　2009年

DOI： 10.1145/1654059.1654088 　
Activities of Cyberscience Center and Performance Evaluation of the SX-9 Supercomputer 査読有り

Hiroaki Kobayashi, Ryusuke Egawa, Kouki Okabe, Eiichi Ito, Kenji Oizumi

NEC TECHNICAL JOURNAL　3　(4)　64-72　2008年12月

ISSN：1880-5884
Caching on a chip multi vector processor 査読有り

Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC08)　2008年11月
A PARALLEL IMAGE GENERATION ALGORITHM BASED ON PHOTON MAPPING 査読有り

Masahide Tamura, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference on Computer Graphics and Imaging (CGIM 2008)　145-151　2008年2月
First Experiences with NEC SX-9.

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoichi Shimomura

High Performance Computing on Vector Systems　3-11　2008年
出版者・発行元： Springer
DOI： 10.1007/978-3-540-85869-0_1 　
The potential of on-chip memory systems for future vector architectures 査読有り

Hiroaki Kobayashi, Akihiko Musa, Yoshiei Sato, Hiroyuki Takizawa, Koki Okabe

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2007　247-+　2008年
A Utility-based Double Auction Mechanism for Efficient Grid Resource Allocation 査読有り

Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS　252-260　2008年

DOI： 10.1109/ISPA.2008.103 　
大規模計算環境における分散協調型負荷分散手法査読有り

村田善智, 稲葉勉, 滝沢寛之, 小林広明

情報処理学会論文誌　49　(3)　1214-1228　2008年
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination 査読有り

Komatsu Kazuhiko, Kaeriyama Yoshiyuki, Suzuki Kenichi, Takizawa Hiroyuki, Kobayashi Hiroaki

IPSJ Online Transactions　1　(1)　1-11　2008年
出版者・発行元：一般社団法人情報処理学会
DOI： 10.2197/ipsjtrans.1.1 　

ISSN：1882-6660

詳細を見る詳細を閉じる

Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.
SPRAT:実行時自動チューニング機能を備えるストリーム処理記述用言語査読有り

滝沢寛之, 白取寛貴, 佐藤功人, 小林広明

情報処理学会論文誌：コンピューティングシステム(ACS)　1　(2)　207-220　2008年
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

本論文では，ストリーム処理記述用言語とその実行時環境との連携により，アーキテクチャ固有の記述を必要とせず，しかも計算システムに搭載されている異種複数のプロセッサの中から実行時に適切なプロセッサを選択する機能を実現する．そのような実行時の自動チューニング機能を実現するために，本論文は比較的容易に実行時間を予測可能なストリーム処理に焦点を絞り，実行時性能予測に基づいてプロセッサを適切に切り替える手法を提案する．本論文では，利用可能なプロセッサとしてCPUとGPUを想定し，両者を適切に切り替えることによって，高い抽象レベルで記述されたコードでも両者の特長を生かした高性能計算が可能であることを明らかにする．評価実験の結果より，搭載されているCPUとGPUの性能差に応じて，両者を切り替えることの有効性が示された．また，処理データのサイズに依存してCPUとGPUの演算性能が逆転するという特性を，提案手法では自動的に利用できることが明らかになった．This paper realizes capabilities to program without any architecture-specific descriptions and also to select an appropriate processor from different processors of a computing system at runtime, by cooperation between a stream programming language and its runtime environment. To realize such a runtime auto-tuning capability, this paper focuses on stream processing whose execution time can be estimated with a simple linear performance model, and proposes a method to switch between different processors based on runtime performance prediction. This paper shows that appropriate switching between CPU and GPU in a PC allows even a code written in a high abstraction level to achieve high-performance computing, which makes use of the characteristics of each processor. The evaluation results demonstrate the effectiveness of switching between CPU and GPU according to their performance difference. The results also show that the proposed method can automatically select an appropriate processor, which may change depending on the data size.
A Performance Study of Secure Data Mining on the Cell Processor 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS　1　(2)　633-+　2008年
An Efficient Intersection Algorithm Design of Ray Tracing for Many-Core Graphics Processors 査読有り

Kazuhiro Komatasu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference on Computer Graphics and Imaging (CGIM 2008)　165-171　2008年
A Performance Study of Secure Data Mining on the Cell Processor 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS　633-+　2008年

DOI： 10.1109/CCGRID.2008.16 　
Implementation and Evaluation of a Distributed and Cooperative Load-Balancing Mechanism for Dependable Volunteer Computing 査読有り

Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS & NETWORKS WITH FTCS & DCC　316-+　2008年

DOI： 10.1109/DSN.2008.4630100 　

ISSN：1530-0889
Hierarchical Parallel Processing of Ray Tracing on a Cell Cluster 招待有り査読有り

Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Prceedings of 1st International Workshop on Super Visualization (IWSV08)　2008年
Consideration of resource access history for optimizing overlay networks in P2P-based resource discovery 査読有り

Tsutomu Inaba, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayash

Proceedings - 2008 International Symposium on Applications and the Internet, SAINT 2008　269-272　2008年

DOI： 10.1109/SAINT.2008.104 　
A Reliability Model for Result Checking in Volunteer Computing 査読有り

Ling Xu, Hirouyki Takizawa, Hiroaki Kobayashi

Proceedings of DAS-P2P 2008 Workshop　201-204　2008年

DOI： 10.1109/SAINT.2008.25 　
Gain Based Delay Balancing in the Deep Submicron Era 査読有り

Ryusuke EGAWA, Jubee TADA, Hiroaki Kobayashi, Gensuke GOTO

Proceedings of The 23nd International Technical Conference on Circuits/Systems (ITC-CSCC 2008)　577-580　2008年
SPRAT: Runtime Processor Selection for Energy-aware Computing 査読有り

Hiroyuki Takizawa, Katuto Sato, Hiroaki Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING　386-393　2008年

DOI： 10.1109/CLUSTR.2008.4663799 　

ISSN：1552-5244
Effects of MSHR and Prefetch Mechanisms on an On-Chip Cache of the Vector Architecture 査読有り

Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS　335-+　2008年

DOI： 10.1109/ISPA.2008.100 　
Auction-based Resource Allocation for Activating Incentives in Resource Trading in Grid Computing 査読有り

Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of The 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications　252-260　2008年
Modeling of cache access behavior based on Zipf's law 査読有り

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　310　9-15　2008年

DOI： 10.1145/1509084.1509086 　

ISSN：1089-795X
A shared cache for a chip multi vector processor 査読有り

Akihiro Musa, Yoshiei Sato, Takashi Soga, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　310　24-29　2008年

DOI： 10.1145/1509084.1509088 　

ISSN：1089-795X
A Power-Aware Shared Cache Mechanism Based on Locality Assessment of Memory Reference for CMPs 査読有り

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Transactions on High-Performance Embedded Architectures and Compilers　3　(1)　149-167　2008年
Early evaluation of on-chip vector caching for the NEC SX vector architecture 査読有り

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC07)　2007年11月
A progressive 3D-meshing algorithm for interactive simulation of soft bodies 査読有り

Tomoyuk Saoi, Hiroyuki Takizawat, Hiroaki Kobayashi

INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL　10　(6)　761-776　2007年11月

ISSN：1343-4500
A dependable Peer-to-Peer computing platform 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE　23　(8)　939-955　2007年11月

DOI： 10.1016/j.future.2007.03.004 　

ISSN：0167-739X

eISSN：1872-7115
Partial distortion entropy maximization for online data clustering 招待有り査読有り

Hiroyuki Takizawa, Hiroaki Kobayashi

NEURAL NETWORKS　20　(7)　819-831　2007年9月

DOI： 10.1016/j.neunet.2007.04.029 　

ISSN：0893-6080
消費電力を考慮したウェイアロケーション型共有キャッシュ機構査読有り

小寺功, 滝沢寛之, 小林広明

情報科学技術レターズ　55-58　2007年9月
Accelerating Möller Intersection Algorithm Using Ray Packets 査読有り

Kazuhiro Komatsu, Yoshiyuki Kaeriyama, Ken-ichi Suzuki, Hiroaki Kobayashi, Tadao Nakamura

Information Technology Letters　265-268　2007年9月
SMTプロセッサの実行時性能予測のためのハードウェアリソース競合解析招待有り査読有り

佐藤雅之, 船矢祐介, 小寺功, 滝沢寛之, 小林広明

情報科学技術レターズ　67-70　2007年9月
An Estimation-Based Redundant Task Dispatch Policy for Volunteer Computing Platforms 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference on Dependable Systems and Networks　348-349　2007年6月25日

詳細を見る詳細を閉じる

Fast Abstract (Supplemental Volume)
A fair-sharing and power-aware L2 cache system for chip multiprocessors 査読有り

Isao Kotera, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips X　2007年4月
Memory Efficient Scheme for Fast Spectral Photon Mapping 査読有り

Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the Ninth IASTED International Conference on Computer Graphics and Imaging (CGIM 2007)　2007年2月
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs 査読有り

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　113-120　2007年

DOI： 10.1145/1327171.1327185 　

ISSN：1089-795X
Preliminary evaluation for runtime auto-tuning of GPGPU applications 査読有り

Hiroyuki Takizawa, Hiroki Shiratori, Hiroaki Kobayashi

The 2nd International Workshop on Automatic Performance Tuning　37-37　2007年
An Efficient Control Mechanism for Self-Organizing Overlay Networks of Large-Scale P2P Systems 査読有り

Hiroaki Kobayashi, Hiroyuki Takizawa, Takuro Okawa, Tsutomu Inaba

Interdisciplinary Information Sciences　13　(2)　227-237　2007年
出版者・発行元：東北大学
DOI： 10.4036/iis.2007.227 　

ISSN：1340-9050

詳細を見る詳細を閉じる

P2P (Peer to Peer) has a great potential to handle highly-distributed computing resources and is expected to be a key technology to realize ubiquitous computing environments over the Internet. However, P2P systems tend to waste the network bandwidth for resource acquisition because of their decentralized resource management. This paper presents an efficient control mechanism for self-organizing overlay networks of large-scale P2P systems, and evaluate its performance in detail. The overlay network is configured by making local clusters reflect current interests of individual peers and connecting them together based on their similarity. As a result, the overlay network provides the resource exploitation space for some specific interests. In addition, the overlay network can dynamically be reconfigured based on the change in the interests of individual peers across time so that more useful peers at that time can be reconnected closer to their client peers. Therefore, multicasting of resource requesting messages can be carried out only over peers with similar interests that are dynamically connected through the overlay network, resulting in a remarkable decrease in both messages for resource acquisition and hops a resource requesting query travels to reach the peer that satisfies the request. Experimental results indicate that the proposed mechanism can realize effective self-organization of the overlay network in which useful peers are dynamically relocated around client peers. In addition, the adaptive allocation of links to peers according to their capability works well to keep the higher performance and fault-tolerance of the self-organizing overlay network.
An on-chip cache design for vector processors 査読有り

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　17-23　2007年

DOI： 10.1145/1327171.1327173 　

ISSN：1089-795X
A Power-Aware Shared Cache Mechanism Based on Locality Assessment of Memory Reference for CMPs 査読有り

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the MEDEA workshop (PACT 07)　121-128　2007年
Performance Evaluation of K-Means Clustering on the Cell Processor 査読有り

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of High Performance Computing Symposium 2007　2007　(1)　161-168　2007年1月
An on-chip cache design for vector processors 招待有り査読有り

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　17-23　2007年

DOI： 10.1145/1327171.1327173 　

ISSN：1089-795X
Multi-Core Data Streaming Architecture for Ray Tracing 査読有り

Yoshiyuki Kaeriyama, Daichi Zaitsu, Kenichi Suzuki, Hiroaki Kobayashi, Nobuyuki Ohba

2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2　171-+　2007年

DOI： 10.1109/ICCD.2007.4601897 　

ISSN：1063-6404
スレッド特徴量に基づくマルチコアプロセッサスケジューリング招待有り査読有り

船矢祐介, 小寺功, 滝沢寛之, 小林広明

Information Technology Letters　5　(5)　37-40　2006年9月
P2P 型資源検索システムにおける動的論理リンク管理機構査読有り

大川拓郎, 滝沢寛之, 小林広明

Information Technology Letters　5　(5)　363-366　2006年9月
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
Towards Effective GPU Implementation of Neural Networks 査読有り

Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

Proceedings of the fourth Irish Conference on Mathematical Foundations of Computer Science and Information Technology (MFCSIT)　2006年7月
Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing 査読有り

H Takizawa, H Kobayashi

JOURNAL OF SUPERCOMPUTING　36　(3)　219-234　2006年6月

DOI： 10.1007/s11227-006-8294-1 　

ISSN：0920-8542
Radiative heat transfer simulation using programmable graphics hardware 査読有り

Hiroyuki Takizawa, Noboru Yamada, Seigo Sakai, Hiroaki Kobayashi

Proceedings - 5th IEEE/ACIS Int. Conf. on Comput. and Info. Sci., ICIS 2006. In conjunction with 1st IEEE/ACIS, Int. Workshop Component-Based Software Eng., Softw. Archi. and Reuse, COMSAR 2006　2006　29-37　2006年

DOI： 10.1109/ICIS-COMSAR.2006.70 　
Design and Implementation of an Efficient Search Mechanism based on the Hybrid P2P Model for Ubiquitous Computing Systems 査読有り

T Inaba, T Okawa, Y Murata, H Takizawa, H Kobayashi

INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET , PROCEEDINGS　45-+　2006年

DOI： 10.1109/SAINT.2006.23 　
A distributed and cooperative load balancing mechanism for large-scale P2P systems 査読有り

Y Murata, T Inaba, H Takizawa, H Kobayashi

INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS　126-129　2006年

DOI： 10.1109/SAINT-W.2006.2 　
An efficient text capture method for moving robots using DCT feature and text tracking 査読有り

Hiroki Shiratori, Hideaki Goto, Hiroaki Kobayashi

18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS　1050-+　2006年

DOI： 10.1109/ICPR.2006.243 　

ISSN：1051-4651
Implications of memory performance for highly efficient supercomputing of scientific applications 査読有り

Akihiro Musa, Hiroyuki Takizawa, Koki Okabe, Takashi Soga, Hiroaki Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS　4330　845-+　2006年

ISSN：0302-9743
アクティブカメラを用いた環境中の文字の効率的探索法査読有り

齋藤精二, 後藤英昭, 小林広明

電子情報通信学会論文誌　J88-D-II　(9)　2003-2006　2005年9月
出版者・発行元：一般社団法人電子情報通信学会
ISSN：0915-1923

詳細を見る詳細を閉じる

アクティブカメラを用いて人間の生活空間における文字を探索・抽出するための, 能動的文字抽出システムを開発した. 画像の局所的なコントラストとテクスチャ特徴, 及びズームを用いることで, ズームのワイド端で8画素未満になるような小さな文字も効率的に抽出できるようになった.
大規模P2Pシステムにおける計算資源探索のモデル化と性能評価査読有り

大川拓郎, 滝沢寛之, 小林広明

情報科学技術レターズ　46　(4)　21-24　2005年9月
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
An Incremental Photon-Mapping Algorithm for Fast Walk-Through Animations 査読有り

Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Conference on Computer Graphics and Imaging　1-7　2005年8月
HPC Challenge ベンチマークを用いたSX-7 システムの性能評価査読有り

滝沢寛之, 小久保達信, 片海健亮, 小林広明

先進的計算基盤システムシンポジウム(SACSIS2005)　2005　(5)　25-33　2005年5月
A New Dynamic Decomposition Method for Parallel Molecular Dynamics Simulation 査読有り

V.Zhakhovskii, K.Nishihara, Y.Fukuda, S.Shimojo, T.Akiyama, S.Miyanaga, H.Sone, H.Kobayashi, E.Ito, Y.Seo, M.Tamura, Y.Ueshima

Proceedings of Cluster Computing and Grid 2005　9-12　2005年5月
P2Pコンピューティングのための分散協調スケジューリング機構

村田善智, 稲葉努, 滝沢寛之, 小林広明

先端的ネットワーク＆コンピューティングテクノロジワークショップ　(33)　23-30　2005年1月24日
A P2P Semantic Information Search Mechanism for Ubiquitous Grid Computing Systems

Tsutomu Inaba, Takuro Okawa, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

先端的ネットワーク＆コンピューティングテクノロジワークショップ　(33)　45-52　2005年1月
Evaluation of Large-Scale Remote Interactive Visialization via Super SINET 査読有り

Hiroyuki Takizawa, Hiroaki Kobayashi

Information　8　(3)　383-389　2005年
HPC Challenge ベンチマークを用いたSX-7 システムの性能評価査読有り

滝沢寛之, 小久保達信, 片海健亮, 小林広明

情報処理学会論文誌　46　(SIG12)　37-45　2005年
出版者・発行元：情報処理学会
ISSN：1882-7829

詳細を見る詳細を閉じる

HPC Challenge（以下HPCC とする）ベンチマークは，高性能計算（High-Performance Computing，以下HPC）システムの総合的な性能評価のために提唱されているベンチマーク集である．現在までに広く用いられている浮動小数点演算性能に加えて，メモリアクセスやネットワーク通信の性能等，複数の観点から多角的にHPC システムを評価することにより，HPCC ベンチマークは実用的な科学技術計算に対する実効性能を適切に評価する指標として期待されている．本論文では，東北大学情報シナジーセンターで運用しているNEC SX-7 システムの性能をHPCC ベンチマークを用いて評価した結果について述べる．28 の評価項目のうち16 項目において著しく高い評価が得られた結果に基づいて，HPC 分野におけるベクトル型アーキテクチャの優位性について議論する．The HPC challenge benchmark (HPCC) is a benchmark suite developed for comprehensive performance evaluation of high-performance computing (HPC) systems. HPCC is promising to appropriately evaluate the effective performance of HPC systems for practical scientific computing, due to its multilateral evaluation from several viewpoints, such as memory access and networking performances, along with the floating-point operation rate widely used until now. In this paper, we report the performance evaluation results of an NEC SX-7 system of Information Synergy Center, Tohoku University, using the HPCC benchmark. Based on the results that the system can get excellent scores in 16 of 28 tests in the benchmark, we discuss the superiority of its vector architecture in the field of HPC.
Text detection in color scene images based on unsupervised clustering of multi-channel wavelet features 査読有り

T Saoi, H Goto, H Kobayashi

Eighth International Conference on Document Analysis and Recognition, Vols 1 and 2, Proceedings　690-694　2005年

DOI： 10.1109/ICDAR.2005.227 　
A self-organizing overlay network to exploit the locality of interests for effective resource discovery in P2P systems 査読有り

H Kobayashi, H Takizawa, T Inaba, Y Takizawa

2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS　246-255　2005年
A workflow management mechanism for peer-to-peer computing platforms 査読有り

H Wang, H Takizawa, H Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS　3758　827-832　2005年

ISSN：0302-9743
Efficient parallel processing of competitive learning algorithms 査読有り

K Sano, S Momose, H Takizawa, H Kobayashi, T Nakamura

PARALLEL COMPUTING　30　(12)　1361-1383　2004年12月

DOI： 10.1016/j.parco.2004.10.001 　

ISSN：0167-8191

eISSN：1872-7336
スーパーSINETを介した大規模遠隔対話的可視化の評価実験

滝沢寛之, 小林広明

全国共同利用情報基盤センター研究開発論文集　26　24-29　2004年11月
Evaluation of Large-Scale Remote Interactive Visialization via Super SINET 査読有り

Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 3rd International Conference on Information (INFO2004)　456-459　2004年11月
スーパーSINETを利用した大規模遠隔可視化処理の評価

滝沢寛之, 小林広明

東北大学情報シナジーセンター年報　3　90-96　2004年6月
出版者・発行元：東北大学情報シナジーセンター
グリッドミドルウェアGlobusの資源探索と通信に関するオーバヘッドの定量的評価

村田善智, 稲葉勉, 滝沢寛之, 小林広明

東北大学情報シナジーセンター年報　3　115-123　2004年6月
出版者・発行元：東北大学情報シナジーセンター
An Effective Implementation of Vector Quantization Encoder on Commodity Graphics Hardware 査読有り

Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Conference on IT and Applications (ICITA)　2004年
A fast computation scheme of partial distortion entropy updating 査読有り

H Takizawa, H Kobayashi

ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS　736-741　2004年

DOI： 10.1109/ITCC.2004.1286555 　
Locality analysis to control dynamically way-adaptable caches 査読有り

Hiroaki Kobayashi, Isao Kotera, Hiroyuki Takizawa

Proceedings of the 2004 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '04　25-32　2004年

DOI： 10.1145/1152922.1101874 　
Multi-grain parallel processing of data-clustering on programmable graphics hardware 査読有り

H Takizawa, H Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS　3358　16-27　2004年

ISSN：0302-9743
Locality analysis to control dynamically way-adaptable caches 査読有り

Hiroaki Kobayashi, Isao Kotera, Hiroyuki Takizawa

Proceedings of the 2004 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '04　33　(3)　25-32　2004年

DOI： 10.1145/1152922.1101874 　
グリッド用動的資源管理のための自己組織化P2Pネットワークに関する一検討

瀧澤泰明, 滝沢寛之, 佐野健太郎, 小林広明, 中村維男

情報処理学会東北支部研究会　2003年11月
画像のエッジ劣化を抑制するベクトル量子化符号帳設計査読有り

滝沢寛之, 三浦健, 小林広明, 中村維男

Information Technology Letters　2　243-244　2003年9月
Vector quantization codebook design using the law-of-the-jungle algorithm 査読有り

H Takizawa, T Nakajima, K Sano, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E86D　(6)　1068-1077　2003年6月

ISSN：0916-8532
A Comparison Study Of Vector Quantization Codebook Design Algorithms Based On The Equidistortion Principle 査読有り

Hiroyuki Takizawa, Taira Nakajima, Kentaro Sano, Hiroaki Kobayashi

Proceedings of the 21st IASTED International Conference on Applied Informatics　255-261　2003年
An Instruction Cache Mechanism for Simultaneous Multithreaded VLIW Processors 査読有り

Jubei Tada, Hugo, Kenji, Pereira Harada, Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

The Journal of Asian Information-Science-Life　2　(1)　2003年
ベクトル量子化のためのコードブック生成並列処理に関する研究

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, 小林広明, 中村維男

並列/協調/分散処理に関する「湯布院」サマーワークショップ資料　2002年8月
MULHIキャッシュの設計および評価査読有り

多田十兵衛, 仲池卓也, 大庭信之, 小林広明, 中村維男

電子情報通信学会論文誌　J85-D-I　(3)　274-285　2002年
3DCGiRAMアーキテクチャによる実時間レイトレーシングシステム査読有り

鈴木健一, 斎田泰昌, 佐野健太郎, 大庭信之, 小林広明, 中村維男

電子情報通信学会論文誌　J85-D-II　(8)　1365-1367　2002年
An Interleaved Multiple-Hit Cache for Simultaneous Multithreaded VLIW Processors 査読有り

Jubei Tada, Hugo Kenji, Pereira Harada, Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the Third International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'02)　25-32　2002年
Practical Volume Compression based on Vector Quantization using the Low-of-the-Jungle Algorithm 査読有り

Kentaro Sano, Hiroyuki Takizawa, Taira Nakajima, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the 2nd International Conference on Visualization, Imaging, and Image Processing　519-526　2002年
Interactive Ray-Tracing on the 3DCGiRAM Architecture 査読有り

Hiroaki Kobayashi, Ken-ichi Suzuki, Kentaro Sano, Nobuyuki Oba

Proceedings of ACM/IEEE MICRO-35 4th Workshop on Media and Streaming Processors　53-59　2002年
High-Performance Photo-Realistic Graphics on the 3DCGiRAM Architecture 査読有り

KOBAYASHI Hiroaki

Proceedings of International Conference on Optical Communication and Multimedia (ICOCM2002)　114-117　2002年
PARALLEL ALGORITHM FOR THE LAW-OF-THE-JUNGLE LEARNING TO THE FAST DESIGN OF OPTIMAL CODEBOOKS 査読有り

Kentaro Sano, Shintaro. Momose, Hiroyuki Takizawa, Clecio.Donizete. Lima, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2002)　582-587　2002年
視覚的画質劣化を抑制するベクトル量子化手法査読有り

三浦健, 滝沢寛之, 佐野健太郎, 中島平, 小林広明, 中村維男

Information Technology Letters　1　185-186　2002年
Object-Space Parallel Processing of the Multi-Pass Rendering Method for Message-Passing Parallel Processing Systems 査読有り

Hiroaki Kobayashi, Hitoshi Yamauchi, Takayuki Maeda, Mayumi Tokunaga, Tadao Nakamura

The International Journal of High Performance Computer Graphics, Multimedia and Visualisation　1　(3)　1-14　2001年
画像生成インテリジェントメモリ 3DCGiRAM の演算器設計査読有り

鈴木健一, 帰山芳行, 杉山潤, 斎田泰昌, 小林広明, 中村維男

情報処理学会シンポジウムシリーズ(JSPP 2001)　2001　(6)　295-302　2001年
A Technology-Scalable Multithreaded Architecture 査読有り

KOBAYASHI Hiroaki

Proceedings of the 13-th Symposium on Computer Architecture and High-Performance Computing　82-89　2001年
3DCGiRAM: An intelligent memory architecture for photo-realistic image synthesis 査読有り

H Kobayashi, K Suzuki, K Sano, Y Kaeriyama, Y Saida, N Oba, T Nakamura

2001 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD 2001, PROCEEDINGS　462-467　2001年

ISSN：1063-6404
VLIWアーキテクチャのためのダイナミックブースティング機構査読有り

小林広明

電子情報通信学会論文誌　J80-D-I　(1)　171-183　2000年
出版者・発行元：一般社団法人電子情報通信学会
ISSN：0915-1915

詳細を見る詳細を閉じる

高性能なVLIW計算機の実現には, 並列動作が可能な実行ユニットを増やすと同時に, これらのハードウェア資源を有効利用するために, プログラムから命令レベル並列性を効率良く抽出することが重要である.本論文で提案するダイナミックブースティング機構は, ハードウェアとコンパイラのサポートにより, プログラム中の分岐の動的振舞いに適応して, 基本ブロックを超えた命令レベル並列性の抽出を可能とする手法である.本論文では, ダイナミックブースティングを実現するためのハードウェア及びコンパイラ技術について述べる.シミュレーションによる性能評価の結果, ダイナミックブースティングをVLIWプロセッサに導入することにより, 分岐命令が多い非数値演算ベンチマークにおいて, 約20%の速度向上が得られることがわかった.
Data-parallel volume rendering with adaptive volume subdivision 査読有り

K Sano, H Kitajima, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E83D　(1)　80-89　2000年1月

ISSN：1745-1361
An active learning algorithm based on existing training data 査読有り

H Takizawa, T Nakajima, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E83D　(1)　90-99　2000年1月

ISSN：0916-8532
Reconfigurable synchronized dataflow processor 査読有り

Hiroshi Sasaki, Hitoshi Maruyama, Hideaki Tsukioka, Nobuyoshi Shoji, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC　27-28　2000年

DOI： 10.1145/368434.368490 　
A Pre-attributed Resampling Algorithm for Controlled-Precision Volume Ray-Casting 査読有り

Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

IPSJ Journal　41　(SIG 5)　113-124　2000年
出版者・発行元：一般社団法人情報処理学会
ISSN：0387-5806

詳細を見る詳細を閉じる

Accurate volume rendering is essential for some visualization applications, e.g., medical imaging. However, the computationally expensive feature of conventional volume rendering algorithms for high-quality image generation has restricted their practical use. In this paper, we propose a pre-attributed resampling algorithm that accomplishes controlled-precision volume ray-casting at low computational coste. This algorithm changes resampling intervals based on numerical errors of the volume rendering integral so that the number of resampling points becomes minimum for a given error bound. Besides, to reduce computational costs for resampling, a simple interpolation method is applied to resampling points in regions where intensities and opacities are constant. To suppress the overhead of precision control, information on the numerical errors and the constant regions is obtained for each voxel in pre-processing, and then related to volume data as voxel attributes. The experimental results demonstrate that the proposed algorithm outperforms conventional ray-casting algorithms without precision control for accurate visualization in termes of accuracy/processing-time performance.
Developing a Practical Parallel Multi-pass Render in Java and C --- Toward a Grande Application in Java 査読有り

Hitoshi Yamauchi, Atsusi Maeda, Hiroaki Kobayashi

Proceedings of the ACM 2000 Java Grande Conference　126-133　2000年
A Scheduling Method for Instruction-Level Parallel Processing of Vector and Scalar Instructions 査読有り

Takuya Nakaike, Takehito Sasaki, Masayuki Katahira, Hiroaki Kobayashi, Tadao Nakamura

Systems and Computers in Japan　30　(13)　23-33　1999年11月30日
出版者・発行元： John Wiley and Sons Inc.
DOI： 10.1002/(SICI)1520-684X(19991130)30:13<23::AID-SCJ3>3.0.CO;2-3 　

ISSN：0882-1666
A topology preserving neural network for nonstationary distributions 査読有り

T Nakajima, H Takizawa, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E82D　(7)　1131-1135　1999年7月

ISSN：0916-8532
Acceleration techniques for the network inversion algorithm 査読有り

H Takizawa, T Nakajima, M Nishi, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E82D　(2)　508-511　1999年2月

ISSN：0916-8532
Time stamp invalidation of TLB-unified cache and its performance evaluation 査読有り

Ken-Ichi Suzuki, Nobuyuki Oba, Shigenori Shimizu, Hiroaki Kobayashi, Tadao Nakamura

Systems and Computers in Japan　30　(11)　94-106　1999年
出版者・発行元： John Wiley and Sons Inc.
DOI： 10.1002/(SICI)1520-684X(199910)30:11<94::AID-SCJ11>3.0.CO;2-S 　

ISSN：0882-1666
MULHI Cache:VLIWプロセッサのための命令キャッシュ機構査読有り

小林広明

情報処理学会論文誌　40　(5)　1996-2007　1999年
出版者・発行元：情報処理学会
ISSN：1882-7764

詳細を見る詳細を閉じる

コンパイラによる高度な命令レベル並列性の抽出により高性能を達成するVLIWプロセッサが次世代プロセッサアーキテクチャとして近年注目を集めている. VLIWプロセッサでは並列実行可能な複数の演算操作からなる非常に長い命令を高速にフェッチするために高ヒット率高バンド幅の命令キャッシュが必要不可欠である. 一般に VLIW命令中には多くのnop (no operation)が含まれるために nopを含んだVLIW命令を命令キャッシュに格納すると命令キャッシュの使用効率が低下し命令のキャッシュミス率が増加する. そこで本論文では VLIWプロセッサのための新たな命令キヤッシュ機構としてMULHI (MULtiple HIt)キャッシュを提案し SPEC95ベンチマーク中のいくつかのプログラムを用いて性能評価を行う. 性能評価の結果 MULHIキャッシュは nopを含んだVLIW命令をそのまま格納する従来の命令キャッシュ機構に比べて最大1.68倍の性能向上を示した.VLIW (Very Long Instruction Word) processors, which are expected to be a next generation high performance microprocessor architecture, need a high-bandwidth, high-hit-rate instruction cache to fetch VLIWs and issue operations of each VLIW to function units quickly. However, when VLIWs including many nops (no operations) are stored in a conventional instruction cache, the cache utilization is not high, resulting in the performance degradation of VLIW processors. In this paper, a new instruction cache mechanism for VLIW processors, named MULHI (MULtiple HIt) cache, is proposed and evaluated using several programs in the SPEC95 benchmark suite. The experimental results indicate that the MULHI cache achieves 1.68 times higher performance than a conventional instruction cache that stores VLIWs with nops.
A Self-organizing network system forming memory from nonstationary probability distributions 査読有り

KOBAYASHI Hiroaki

Proceedings of the International Joint Conference on Neural Networks 99　1999年
再構成可能な同期式データフロー計算機の構成とそのソフトウェア開発環境査読有り

KOBAYASHI Hiroaki

9-14　1999年
Kohonen learning with a mechanism, the law of the jungle, capable of dealing with nonstationary probability distribution functions 査読有り

T Nakajima, H Takizawa, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E81D　(6)　584-591　1998年6月

ISSN：0916-8532
ウエーブレット変換を用いた顔画像処理に関する一考察

飯村海児, 滝沢寛之, 中島平, 小林広明, 中村維男

電気関係学会東北支部連合大会　1998年
ベクトル命令とスカラ命令を融合した命令レベル並列処理のためのスケジューリング手法査読有り

小林広明

電子情報通信学会論文誌D-I　J81-D-I　(7)　910-920　1998年
出版者・発行元：一般社団法人電子情報通信学会
ISSN：0915-1915

詳細を見る詳細を閉じる

ジェットパイプラインは, ベクトル処理方式と命令レベル並列処理方式を併用することによって, 高速演算を可能にするアーキテクチャである.そのため, ジェットパイプラインの性能を最大限に引き出すためには, ベクトル化率を増加させることと並列実行可能な命令をより多く抽出することが必要である.ジェットパイプラインでは, ベクトル命令とスカラ命令が混在したコードから並列実行可能な命令を抽出する.但し, ベクトル命令はスカラ命令よりも実行サイクル数が大きいため, VLIW計算機などで用いられている並列化手法をそのまま適用することは困難である.本論文では, ジェットパイプラインの性能を最大限に引き出すために, 効果的なベクトル命令とスカラ命令を融合した並列化手法を提案し, シミュレーションによりその効果を確認する.
Automated design of wave pipelined multiport register files 査読有り

K Takano, T Sasaki, N Oba, H Kobayashi, T Nakamura

PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98　197-202　1998年
Performance Evaluation of a Parallel Multi-Pass Rendering Algorithm Based on the Object-Space Parallel Processing Model 査読有り

Hitoshi Yamauchi, Takayuki Maeda, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of JSPP 98　98　(7)　175-182　1998年
Static Load Balncing Schemes for the Object-Space Parallel Multi-Pass Rendering Method on a Distributed-Memory Multiprocessor System 査読有り

KOBAYASHI Hiroaki

Proceedings of the 2nd Eurographics Workshop on Parallel Rendering　133-144　1998年
オブジェクト空間分割型並列レイトレーシング法の汎用計算機上への実装と評価査読有り

前田隆之, 徳永麻由美, 山内斉, 小林広明, 中村維男

Visual Computing/グラフィックスとCAD合同シンポジウム98論文集　55-60　1998年
Static Load Balancing Schemes for the Object-Space Parallel Multi-Pass Rendering Method on a Distributed-Memory Multiprocessor System 査読有り

Hiroaki Kobayashi, Hitoshi Yamauchi, Takayuki Maeda, Mayumi Tokunaga, Tadao Nakamura

Proceedings of the 2nd Eurographics Workshop on Parallel Rendering　133-144　1998年
The object-space parallel processing of the multipass rendering method on the (M pi)(2) with a distributed-frame buffer system 査読有り

H Yamauchi, T Maeda, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E80D　(9)　909-918　1997年9月

ISSN：0916-8532
Decoupled modified-bit cache 査読有り

Masafumi Takahashi, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

Systems and Computers in Japan　28　(6)　49-59　1997年6月15日
出版者・発行元： John Wiley and Sons Inc.
DOI： 10.1002/(SICI)1520-684X(19970615)28:6<49::AID-SCJ6>3.0.CO;2-M 　

ISSN：0882-1666
The Object-Space Parallel Processing of the Multipass Rendering Method on the (M?r)2 with a Distributed-Frame Buffer System

Hitoshi Yamauchi, Takayuki Maeda, Hiroaki Kobayashi, Tadao Nakamura

IEICE Transactions on Information and Systems　E80-D　(9)　899-908　1997年
出版者・発行元： Institute of Electronics, Information and Communication, Engineers, IEICE
ISSN：0916-8532
ハードウェアキャッシュ評価システムRICE 査読有り

小林広明

電子情報通信学会論文誌　J80-D-I　(1)　121-123　1997年
多層パーセプトロンの分類能力向上法に関する一検討査読有り

小林広明

電子情報通信学会論文誌　J80-D-II　(1)　390-393　1997年
ウェーブパイプラインを用いた時分割疑似マルチポートレジスタファイル査読有り

小林広明

電子情報通信学会論文誌　J80-D-I　(3)　223-226　1997年
出版者・発行元：一般社団法人電子情報通信学会
ISSN：0915-1915

詳細を見る詳細を閉じる

命令レベル並列性を利用するスーパスカラプロセッサやVLIWといった高性能マイクロプロセッサでは, 複数の演算ユニットに対して同時にデータを供給しなければならない. 本論文では, 一つのポートから時分割でデータを読み書きすることにより, マルチポート化のコストを削減する時分割擬似マルチポートレジスタファイルを提案する. 本時分割擬似マルチポートレジスタファイルでは, 必要とするハードウェア資源を抑えるために,ウェーブパイプラインを取り入れている. 提案する手法に基づき, VHDLを用いて設計したレジスタファイルのハードウェア量を従来のマルチポートレジスタファイルと比較し, その有効性について検討する
RICEによる2次キャッシュメモリの性能評価査読有り

小林広明

電子情報通信学会論文誌　J80-D-1　(10)　793-802　1997年
Memory hierarchy design for jetpipeline: To execute scalar and vector instructions in parallel 査読有り

T Sasaki, T Nakaike, K Takano, M Katahira, H Kobayashi, T Nakamura

SECOND AIZU INTERNATIONAL SYMPOSIUM ON PARALLEL ALGORITHMS/ARCHITECTURE SYNTHESIS, PROCEEDINGS　66-73　66-73　1997年
A cached frame buffer system for object-space parallel processing systems 査読有り

H Kobayashi, T Maeda, H Yamauchi, T Nakamura

COMPUTER GRAPHICS INTERNATIONAL, PROCEEDINGS　146-+　1997年
Multiport Register File Using Wave Pipelining 査読有り

KOBAYASHI Hiroaki

Proceedings of ACM/IEEE International Workshop on Logic Synthesis'97　1997年
Parallel processing of the shear-warp factorization with the binary-swap method on a distributed-memory multiprocessor system 査読有り

K Sano, HH Kitajima, H Kobayashi, T Nakamura

1997 IEEE SYMPOSIUM ON PARALLEL RENDERING (PRS '97), PROCEEDINGS　87-+　1997年
分散フレームバッファシステムを持つ画像生成用超並列処理システム(Mp)2の性能評価査読有り

小林広明

電子情報通信学会コンピュータシステム研究会資料　96　(503)　25-32　1997年
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

オブジェクト空間分割型並列計算モデルに基づくマルチパスレンダリング法の並列処理は, 写実的な画像を高速に生成することが可能である. しかしながら, 本並列処理モデルに基づく超並列計算機を使用して実際に画像を生成する際に, 多数の計算要素が一斉にフレームバッファへアクセスするため, そのアクセス競合により性能が低下する可能性がある. そこで本論文ではこのアクセス競合を緩和するための分散フレームバッファシステムを提案する. 分散フレームバッファシステムにより, 超並列画像生成システムの能力を十分に引き出すことが可能になる.
TLB統一型キャッシュのためのタイムスタンプ無効化方式とその性能評価査読有り

小林広明

電子情報通信学会論文誌　J80-D-I　(12)　941-953　1997年
出版者・発行元：一般社団法人電子情報通信学会
ISSN：0915-1915

詳細を見る詳細を閉じる

本論文では, 間接タグ型キャッシュの実装方式の一つである TLB統一型キャッシュについて述べる. 間接タグ型キャッシュとは, キャッシュタグが他のアドレスタグへのポインタになっている形態のキャッシュであり, 従来の方式よりも少ないハードウェア量で実装できるという特徴がある. しかしながら, 間接タグ型キャッシュでは, 間接タグとキャッシュの一貫性を保持するために, 高速な選択的キャッシュ無効化機構が必要となる. そこで本論文では, 無効化機構の一手法として, タイムスタンプ無効化法を提案する. そして, TLBとキャッシュのタグを共用する TLB統一型キャッシュにおけるタイムスタンプ無効化法の実装方式を示す. 次に, TLB統一型キャッシュのハードウェア量を評価し, 間接タグにより節約できるハードウェア量を評価する. そして, 節約されたハードウェアを他のオンチップユニットに転用し, 他ユニットの性能を向上できることを示す. 最後に, トレースドリブンシミュレーションによりTLB統一型キャッシュの性能評価を行い, 従来の方式に比べて, 少ないハードウェア量で性能向上を実現できることを示す.
(M pi)(2): A hierarchical parallel processing system for the multipass rendering method 査読有り

H Kobayashi, H Yamauchi, Y Toh, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E79D　(8)　1055-1064　1996年8月

ISSN：0916-8532
ニューラルネットワークの最適学習法に関する一考察

滝沢寛之, 中島平, 小林広明, 中村維男

情報処理学会東北支部連合大会　1996年
データの更新をバイト単位で管理するキャッシュメモリ査読有り

小林広明

電子情報通信学会論文誌　1996年
分散共有メモリ型並列計算機のためのメッセージ損失を許容するメモリアクセスプロトコル査読有り

小林広明

電子情報通信学会論文誌　79　(9)　567-571　1996年
出版者・発行元：一般社団法人電子情報通信学会
ISSN：0915-1915

詳細を見る詳細を閉じる

本論文では,分散共有メモリ型並列計算機に用いるための,メモリアクセスの要求や応答の損失を許容するメモリアクセスプロトコルを提案する.本プロトコルにより,ネットワークのフロー制御が簡単になり,ハードウェアの簡素化・高速化が容易になる.
Decoupled modified-bit cache 査読有り

M Takahashi, N Oba, H Kobayashi, T Nakamura

CONFERENCE PROCEEDINGS OF THE 1996 IEEE FIFTEENTH ANNUAL INTERNATIONAL PHOENIX CONFERENCE ON COMPUTERS AND COMMUNICATIONS　136-143　1996年
A hierarchical parallel processing system for the multipass-rendering method 査読有り

H Kobayashi, H Yamauchi, Y Toh, T Nakamura

10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96　62-67　1996年
Facial Expression Recognition Using Neural Networks Capable of Recoghizing at an Infant Level 査読有り

KOBAYASHI Hiroaki

Proceedings of the Sixth World Congress of World Association for Infant Meutal Health　1996年
並列グラフ簡約マシンにおけるタスク割り当て手法とメモリ参照局所性評価査読有り

小林広明

情報処理学会論文誌　37　(11)　2020-2029　1996年
出版者・発行元：情報処理学会
ISSN：1882-7764

詳細を見る詳細を閉じる

関数型言語は手続き型言語と異なり参照透明性や検証の容易性プログラムの高い生産性など多くの有用な特徴を持つ. しかし従来の計算機上では十分な処理速度が得られないためにその使用が大きく制限されてきた. そこで本論文では共有メモリ型マルチプロセッサシステム上で関数型言語の簡約を高速に行うためのタスク割当て手法を提案する. 関数型プログラムの実行を計算機上で実現する方法として広く利用されているグラフ簡約は一般に処理粒度が小さくなるという欠点がある. したがってプロセッサの有効利用とメモリアクセスなどの実行時オーバヘッドの抑制を考慮したタスク割当てが重要となる. 本論文で提案するタスク割当て手法はプログラム実行時においてデータ参照の局所性を動的に考慮しながら並列タスクの検出およびそのプロセッサへの割当てを行う. 提案するタスク割当て手法をローカルメモリおよび共有メモリを持つマルチプロセッサシステムに適用しシミュレーションによる性能評価を行った結果大域的なメモリアクセスの抑制とプロセッサ数に比例したプログラム実行の高速化率が達成され提案手法の有効性を確認した.Functional programming languages have many appealing properties such as referential transparency and high programming productivity. On the other hand, the inefficiency of their implementation on conventional computers has prevent them from wide acceptance. In this paper, we propose a task scheduling strategy for high-speed processing of functional programs on a shared-memory multiprocessor system. To reduce shared-memory accesses in parallel graph reduction, the proposed task scheduling strategy allocates tasks to processors by taking the locality of data references among the tasks into account dynamically. Software simulation experiments on a multiprocessor system with the proposed strategy show that speedups of program processing in proportion to the number of processors can be achieved by making good use of local and cluster cache memories. As a result, the effectiveness of the proposed scheduling strategy with locality consideration is revealed.
プロセッサクラスタ用メモリアクセスパッファリング機構査読有り

高橋雅史, 大庭信之, 小林広明, 中村維男

電子情報通信学会論文誌　J78-D-I　(10)　861-864　1995年10月
ニューラルネットワークを用いた顔画像認識について

滝沢寛之, 中島平, 島村三重子, 小林広明, 中村維男

電気関係学会東北支部連合大会　1995年
Task Scheduling with Locality Consideration for a Clustered Parallel FL Reduction System 査読有り

KOBAYASHI Hiroaki

Proceedings of the Aizu International Symposium on Parallel Algorithm/Architecture Synthesis　234-240　1995年
Design and performance measurements of an execution model for the parallel processing of Prolog programs 査読有り

D Wang, H Kobayashi, T Nakamura

IEEE FIRST ICA3PP - IEEE FIRST INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, VOLS 1 AND 2　650-658　1995年
Mechanical-Design-Oriented Description Language : MODEL 査読有り

KOBAYASHI Hiroaki

Japanese Journal of Advanced Automation Technology　7　(1)　29-34　1995年
適応分割ポイントマッチング法査読有り

小林広明

日本機械学会論文集(A編)　60　(570)　543-548　1994年
出版者・発行元：日本機械学会
DOI： 10.1299/kikaia.60.543 　

ISSN：0387-5008

詳細を見る詳細を閉じる

The contact stress analysis of elastic bodies is important for mechanical engineering in areas such as friction, wear and fatigue. The point-matching method is the well-known analytical model that satisfies Hertzian-contact theory. However, the point-matching method has critical problems, i.e., large amounts of computation time and memory are required as the number of cells increases. Although there have been many studies on its accuracy to date, there are a few studies on efficient processing of the point-matching method. This paper proposes an efficient discretization method for the contact region to accelerate processing time and save memory space in the point-matching method
機械設計記述言語MODEL 査読有り

小林広明

日本機械学会論文集(C編)　60　(570)　715-720　1994年
出版者・発行元：日本機械学会
DOI： 10.1299/kikaic.60.715 　

ISSN：0387-5024

詳細を見る詳細を閉じる

Designing mechanical systems by means of special-purpose languages is very effective because they can define objects preciesly. However, this causes serious problems. First, the amount of description is very large in the case of designing complex systems. Second, those languages are not suited for modeling objects at higher abstraction levels. To solve these problems, this paper presents a novel description language for mechanical design called MODEL (Mechanical-design-Oriented DE scription Language). MODEL is designed in order that the designer's intentions can be efficiently reflected in the specifications of mechanical systems. We introduce a new concept, design granularity, so that designers can model objects of a mechanical system at different abstraction levels. Moreover, to reduce the amount of description, we use knowledge bases for mechanical design as a library for MODEL. The design process with MODEL is discussed in detail to clarify the capabilities of the language.
TLBとキャッシュメモリの統一的管理方式査読有り

小林広明

情報処理学会論文誌　35　(6)　1149-1152　1994年
出版者・発行元：情報処理学会
ISSN：1882-7764

詳細を見る詳細を閉じる

本論文は、アドレス変換だけに使用されていたTLBをキャッシュと統一的に管理するTLB?Unified Cache（TUC）を提案する。TUCでは、キャッシュのタグに、TLBに格納されているぺ一ジ番号へのポインタを書き込むことで、キャッシュされるデータとアドレスを間接的に関係付ける。これにより、高速メモリアレイの容量を大幅に削減することができる。また、TLBミスの生じたエントリと関係するキャッシュエントリを高速に無効化するために、Black and White 無効化法を提案する。シミュレーションにより、TUCは、メモリアレイの大幅な削減にもかかわらず、従来の方法と同等のキャッシュミス率を示すことが明らかになった。
STARCORE - A HIGH-SPEED ATM SWITCHING SYSTEM 査読有り

N OBA, K SUZUKI, H KOBAYASHI, T NAKAMURA

1994 IEEE GLOBECOM - CONFERENCE RECORD, VOLS 1-3, AND COMMUNICATIONS THEORY MINI-CONFERENCE RECORD　139-143　1994年
Breadth-first Parallel Processing of Sequential Prolog Programs 査読有り

KOBAYASHI Hiroaki

Proceedings of the Sixth IASTED-ISMM International Conference on Parallel and Distributed Computing and Systems　86-89　1994年
A Hierarchical System for Parallel Processing of Prolog Programs 査読有り

KOBAYASHI Hiroaki

Proceedings of the Sixth IASTED-ISMM International Conference on Parallel and Distributed Computing and Systems.　90-93　1994年
Jetpipeline : A Hybrid Pipeline Architecture for Instruction-Level Parallelism 査読有り

KOBAYASHI Hiroaki

Proceedings of High Performance Computing Conference'94　317-323　1994年
出版者・発行元： National Computing Research Centre, National University of Singapore
A Hierarchical Parallel Reduction System for the Functional Language FL 査読有り

KOBAYASHI Hiroaki

Proceedings of High Performance Computing Conference'94　270-278　1994年
Software Pipelining for JetPipeline Architecture 査読有り

KOBAYASHI Hiroaki

Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks　127-134　1994年
(Mp)^2 : A Hierarchical Parallel Processing System for a Global Illumination Model 査読有り

KOBAYASHI Hiroaki

Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks　157-164　1994年
LOAD BALANCING BASED ON LOAD COHERENCE BETWEEN CONTINUOUS IMAGES FOR AN OBJECT-SPACE PARALLEL RAY-TRACING SYSTEM 査読有り

H KOBAYASHI, H KUBOTA, S HORIGUCHI, T NAKAMURA

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E76D　(12)　1490-1499　1993年12月

ISSN：0916-8532
Ants routing: An adaptive packet flow control scheme in multimedia communication

Emad Rashid, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of 2nd IEEE International Conference on Universal Personal Communications: Gateway to the 21st Century, ICUPC 1993　1　228-234　1993年
出版者・発行元： Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/ICUPC.1993.528382 　
機械設計記述言語MODELを核とした統合化機械設計支援システム査読有り

小林広明

日本機械学会論文集(C編)　59　(567)　3597-3602　1993年
出版者・発行元：日本機械学会
DOI： 10.1299/kikaic.59.3597 　

ISSN：0387-5024

詳細を見る詳細を閉じる

Recently, for the purpose of production rationalization, demand for CAD(computer-aided design) systems has been rapidly increasing. However, most of the CAD systems in mechanical design have mainly performed graphical processing, such as drawing. In this paper, we proposed an integrated computer-aided mechanical design system to support the design process as well as the drawing process. The system employs a mechanical-design-oriented description language called MODEL to design mechanical systems. To reduce the amount of descriptions in MODEL, we introduce knowledge bases for mechanical design. With these knowledge bases, the system can infer final designs from insufficient descriptions of objects at higher abstraction levels and complete them. Inference and knowledge representation schemes are discussed in detail. We also construct a prototype system and examine the effectiveness of our system.
AN ADAPTIVE NETWORK ROUTING METHOD BY ELECTRICAL-CIRCUIT MODELING 査読有り

N OBA, H KOBAYASHI, T NAKAMURA

IEEE INFOCOM 93 : THE CONFERENCE ON COMPUTER COMMUNICATIONS, PROCEEDINGS, VOLS 1-3　586-592　586-592　1993年
INCORPORATING THE PARALLEL-PROCESSING TECHNIQUES WITH THE DEMAND-DRIVEN MODEL OF FUNCTIONAL PROGRAMMING-LANGUAGES 査読有り

H SHEN, H KOBAYASHI, T NAKAMURA

TENCON '93: 1993 IEEE REGION 10 CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND POWER ENGINEERING, VOL 1　146-149　146-149　1993年
Developing the Lambda Calculus for FL-oriented Parallel Reductions 査読有り

KOBAYASHI Hiroaki

Proceedings of 3RD INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS　6.49-6.50　1993年
Expression Recognition Using the Reformed Back-propagation Network 査読有り

KOBAYASHI Hiroaki

Proceedings of 3RD INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS　3.27-3.30　1993年
A Massively Parallel Processing Approach to Fast Photo-Realistic Image Synthesis 査読有り

KOBAYASHI Hiroaki

Proceedings of Computer Graphics International'93　497-507　497-507　1993年
出版者・発行元： CG International
EXPRESSION RECOGNITION USING NEURAL NETWORKS 査読有り

J DING, M SHIMAMURA, H KOBAYASHI, T NAKAMURA

WCNN'93 - PORTLAND, WORLD CONGRESS ON NEURAL NETWORKS, VOL IV　IV-231-IV-234　231-234　1993年
Ants Routing : An Adaptive Packets Flow Control Scheme in Multimedia Networks 査読有り

KOBAYASHI Hiroaki

Proceedings of IEEE 2nd International Conference on Universal Personal Communications　228-234　1993年
KNOWLEDGE REPRESENTATION FOR ADAPTIVE OVERLOAD PACKETS CONTROL IN MULTIMEDIA NETWORKS 査読有り

E RASHID, H KOBAYASHI, T NAKAMURA

GLOBECOM '93 COMMUNICATIONS FOR A CHANGING WORLD, CONFERENCE RECORD　1516-1520　1993年
NEURAL-NETWORK STRUCTURES FOR EXPRESSION RECOGNITION 査読有り

J DING, M SHIMAMURA, H KOBAYASHI, T NAKAMURA

IJCNN '93-NAGOYA : PROCEEDINGS OF 1993 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3　1430-1433　1430-1433　1993年
機械設計支援用知識ベースの構築法について査読有り

小林広明

情報処理学会グラフィックスとCADシンポジウム論文集　1991年
機械設計向き記述言語に関する研究査読有り

小林広明

情報処理学会グラフィックスとCADシンポジウム論文集　1991年
統合化機械設計支援方式について査読有り

小林広明

情報処理学会グラフィックスとCADシンポジウム論文集　1990年
Effective Parallel Processing for synthesizing Continuous Images 査読有り

KOBAYASHI Hiroaki

Proceedings of Computer Graphics International 89　343-352　1989年
Load balancing strategies for a parallel ray-tracing system based on constant subdivision 査読有り

Hiroaki Kobayashi, Satoshi Nishimura, Hideyuki Kubota, Tadao Nakamura, Yoshiharu Shigei

The Visual Computer　4　(4)　197-209　1988年7月
出版者・発行元： Springer-Verlag
DOI： 10.1007/BF01887592 　

ISSN：0178-2789
A Strategy for Mapping Parallel Ray-Tracing into a Hypercube Multiprocessor System 査読有り

KOBAYASHI Hiroaki

Proceedings of Computer Graphics International 88　1988年
Parallel processing of an object space for image synthesis using ray tracing 査読有り

Hiroaki Kobayashi, Tadao Nakamura, Yoshiharu Shigei

The Visual Computer　3　(1)　13-22　1987年2月
出版者・発行元： Springer-Verlag
DOI： 10.1007/BF02153647 　

ISSN：0178-2789
汎用パイプライン処理システムの性能評価査読有り

小林広明

電子通信学会論文誌　J68-D　(10)　1985年
A Language Processor of an Intelligent Link System 査読有り

KOBAYASHI Hiroaki

Proceedings of the IEEE International Conference on Communications　1984年
汎用パイプライン処理システムの構成と評価査読有り

小林広明

電子通信学会論文誌　J67-D　(12)　1984年

︎全件表示 ︎最初の5件までを表示

MISC 118

リアルタイム津波浸水被害推計シミュレーションの性能評価

撫佐昭裕, 岸谷拓海, 阿部孝志, 佐藤佳彦, 田野邊睦, 鈴木崇之, 村嶋陽一, 佐藤雅之, 小松一彦, 伊達進, 越村俊一, 小林広明

SENAC : 東北大学大型計算機センター広報　53　(2)　10-18　2020年4月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
リアルタイム津波浸水被害予測の全国展開に向けた検討

越村俊一, 阿部孝志, 井上拓也, 撫佐昭裕, 村嶋陽一, 鈴木崇之, 太田雄策, 日野亮太, 佐藤佳彦, 加地正明, 小林広明

SENAC : 東北大学大型計算機センター広報　52　(2)　2-8　2019年4月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
スーパーコンピュータによるリアルタイム津波浸水被害予測

越村俊一, 阿部孝志, 撫佐昭裕, 村嶋陽一, 鈴木崇之, 井上拓也, 太田雄策, 日野亮太, 佐藤佳彦, 加地正明, 小林広明

SENAC : 東北大学大型計算機センター広報　51　(1)　30-34　2018年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
HPGMG-FVを用いたSX-ACEの性能評価

江川隆輔, 磯部洋子, 加藤季広, 小松一彦, 滝沢寛之, 小林広明, 撫佐昭裕

SENAC : 東北大学大型計算機センター広報　50　(3)　15-18　2017年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
太陽光及び暑熱同時ばく露に対する熱中症リスク評価シミュレータの開発

西尾渉, 小寺紗千子, 平田晃正, 佐々木大輔, 山下毅, 江川隆輔, 小林広明, 曽根秀昭

電子情報通信学会論文誌 C(Web)　J100-C　(5)　2017年

ISSN： 1881-0217
『銅酸化物の有効モデルに対する揺らぎ交換近似』コードのSX-ACE 向け最適化

山下毅, 山崎国人, 江川隆輔, 吉岡匠哉, 土浦宏紀, 小林広明, 曽根秀昭

SENAC : 東北大学大型計算機センター広報　50　(1)　25-30　2017年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
防災減災に資するUrgent Computingへの挑戦（防災・減災に貢献するスーパーコンピュータの開発を目指して／東日本大震災の教訓と津波減災に向けてのシミュレーションの課題と展望／防災減災のための可視化と情報通信システム／JAMSTECのHPCシステムを利用した海溝型巨大地震の防災・減災への取り組み）

小林広明, 越村俊一, 下條真司, 有吉慶介

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集　(2016)　128-129　2016年5月30日
リアルタイム津波浸水被害予測技術の実証

越村俊一, 井上拓也, 日野亮太, 太田雄策, 小林広明, 撫佐昭裕, 村嶋陽一, 目黒公郎

地域安全学会梗概集(CD-ROM)　(38)　ROMBUNNO.C‐15　2016年5月
SX-ACEにおけるHPCG ベンチマークの性能評価

小松一彦, 江川隆輔, 磯部洋子, 緒方隆盛, 滝沢寛之, 小林広明

SENAC : 東北大学大型計算機センター広報　48　(3)　14-19　2015年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
ベクトルコンピュータにおける高速化

小林広明, 江川隆輔, 小松一彦, 岡部公起, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修, 曽我隆, 山口健太

SENAC : 東北大学大型計算機センター広報　48　(3)　20-51　2015年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
東北大学サイバーサイエンスセンター高速化推進研究活動報告書（第6号）

小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 小松一彦, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修他

2015年4月
リアルタイム津波浸水・被害予測シミュレーションシステム開発の取り組み

大泉健治, 阿部孝志, 佐藤佳彦, 松岡浩司, 撫佐昭裕, 小林広明

SENAC : 東北大学大型計算機センター広報　48　(1)　54-57　2015年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
東北大学サイバーサイエンスセンターにおける分子動力学シミュレーションコードの高速化支援について

森谷友映, 佐々木大輔, 山下毅, 小野敏, 大泉健治, 小松一彦, 江川隆輔, 小林広明

SENAC : 東北大学大型計算機センター広報　47　(1)　51-56　2014年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
Heuristic Data Partitioning for Social Networking Service

Sugianto Angkasa, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(34)　1-8　2013年12月9日

詳細を見る詳細を閉じる

Managing SNS data is expensive because SNS data have an explosive growth and are highly interconnected. Yet, because of the high interconnectivity of the data, every Read/Write activity of a user is associated with all of his/her friends. The response time for accessing the SNS data generally increases if the data of users and their many connections (friends/followers) are widely located over the network. Most SNS providers are commercial companies and hence need a cost-effective solution to SNS data management. In this paper, we propose a heuristic data partitioning mechanism to store all related data of pairs of users in the same place if they have frequent interaction. Moreover, our mechanism uses activity-based replication. For instance, more replicas are created for active users than inactive users. In performance evaluation against the MySQL random partitioning using real Facebook and Twitter datasets, the proposed heuristic data partitioning and replication mechanism is able to reduce the average response time of the read and write accesses by 53% and by 50%, respectively.Managing SNS data is expensive because SNS data have an explosive growth and are highly interconnected. Yet, because of the high interconnectivity of the data, every Read/Write activity of a user is associated with all of his/her friends. The response time for accessing the SNS data generally increases if the data of users and their many connections (friends/followers) are widely located over the network. Most SNS providers are commercial companies and hence need a cost-effective solution to SNS data management. In this paper, we propose a heuristic data partitioning mechanism to store all related data of pairs of users in the same place if they have frequent interaction. Moreover, our mechanism uses activity-based replication. For instance, more replicas are created for active users than inactive users. In performance evaluation against the MySQL random partitioning using real Facebook and Twitter datasets, the proposed heuristic data partitioning and replication mechanism is able to reduce the average response time of the read and write accesses by 53% and by 50%, respectively.
複合システムにおけるチェックポイントリスタート

滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

日本信頼性学会誌　35　(12)　2013年12月

DOI： 10.11348/reajshinrai.35.8_515 　
三次元LSIの課題と高信頼化

小柳光正, 小林広明, 末吉敏則, 鎌田忠

日本信頼性学会誌　35　(12)　2013年12月

DOI： 10.11348/reajshinrai.35.8_471 　
マルチプラットフォームにおける最適化手法の効果に関する一検討

小松一彦, 佐々木俊英, 江川隆輔, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(24)　1-7　2013年7月24日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

近年，HPC システムの多様化が進んでおり，特徴の異なる複数種類の HPC システムにおいて高い性能を引き出すことができる，性能可搬性の高い HPC コードの開発が強く求められている．本研究では，各種 HPC システム向けの最適化手法が HPC コードの性能に与える効果を詳細に解析し，その知見に基づいて性能可搬性の高い HPC コードを開発することを目的としている．本報告では，異なる手動最適化同士や自動最適化を組み合わせた場合の HPC コードの性能可搬性を解析する．HPC システムごとに，それぞれの手動最適化同士や自動最適化の組み合わせによる相乗効果を評価し，性能可搬性の低下を引き起こす可能性のある最適化について議論する．
チューニング対象の限定による効率の良い性能可搬性向上手法

平澤将一, 秋葉諒, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(19)　1-8　2013年5月22日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

計算システムの多様化に伴い，既存の科学技術計算プログラムを新たな計算システムへ移植し性能を最適化する作業がしばしば求められている．しかしながら大規模な科学技術計算プログラムの移植および性能最適化には多大な労力が必要となり，問題となっている．本研究では，性能可搬性向上を目的とした場合に優先的に性能最適化を行うべきソースコードの箇所を限定し，効率良くアプリケーション全体の性能可搬性を向上させる手法を提案する．ベンチマークプログラムおよび実アプリケーションによる評価の結果，提案手法はアプリケーション全体の性能可搬性を効率よく向上させるために，最適化すべきソースコードの部位を限定できることが示された．
大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

安田一平, 小松一彦, 江川隆輔, 小林広明

研究報告計算機アーキテクチャ（ARC）　2012　(7)　1-6　2012年12月6日

詳細を見る詳細を閉じる

近年，大規模並列システムのノード数が増大するのに伴い，その高い演算性能を引き出すためには各ノードの演算性能ばかりではなく，ノード間の通信性能を考慮する必要がある．そのため，大規模化したシステムにおいて，容易にアプリケーションの性能解析を示すことができる手法が求められている．アプリケーションの性能解析や，最適化指針を与える方法として，性能モデルを用いたボトルネック解析が挙げられる．しかしながら，ノード間の通信を考慮した性能モデルや性能モデルに基づく解析・最適化手法は確立されていない．本報告ではノード間の通信を考慮したシステムの性能モデルを提案し， SX-9， Nehalem EX クラスタ， FX1， FX10， SR16000 の 5 つの大規模並列システムを用いて提案するモデルの妥当性を調査する．
大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

安田一平, 小松一彦, 江川隆輔, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(7)　1-6　2012年12月6日

詳細を見る詳細を閉じる

近年，大規模並列システムのノード数が増大するのに伴い，その高い演算性能を引き出すためには各ノードの演算性能ばかりではなく，ノード間の通信性能を考慮する必要がある．そのため，大規模化したシステムにおいて，容易にアプリケーションの性能解析を示すことができる手法が求められている．アプリケーションの性能解析や，最適化指針を与える方法として，性能モデルを用いたボトルネック解析が挙げられる．しかしながら，ノード間の通信を考慮した性能モデルや性能モデルに基づく解析・最適化手法は確立されていない．本報告ではノード間の通信を考慮したシステムの性能モデルを提案し， SX-9， Nehalem EX クラスタ， FX1， FX10， SR16000 の 5 つの大規模並列システムを用いて提案するモデルの妥当性を調査する．
履歴情報に基づくジョブスケジューリングによる広域ベクトルコンピュータ連携の実現

村田善智, 江川隆輔, 小林広明

電子情報通信学会技術研究報告. IA, インターネットアーキテクチャ　112　(236)　15-19　2012年10月5日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

我々は,次世代の高性能計算基盤として,ベクトルコンピュータを広域連携させるベクトルコンピューティングクラウドを提案している.ベクトルコンピューティングクラウドは,複数の計算サイトによって構成されるが,各計算サイトは異なるジョブ管理ポリシを持つため,効率的なジョブ実行が困難である.本稿では,MPIアプリケーションのような並列ジョブを広域連携環境で効率的に実行する,履歴情報に基づくジョブスケジューリング手法を提案する.まず提案するジョブスケジューラは,各計算サイトにおいてジョブの計算が開始されるまでの待ち時間を,過去のジョブ実行履歴情報を用いて予測する.次に,スケジューラは予測した待ち時間の差が最小となるサイトの組み合わせを調べ,それらのサイトを並列ジョブに割り当てる.シミュレーションによる評価から,従来のラウンドロビンによる手法に比べ,提案手法は計算リソースの利用効率を向上させることが示された.
統合開発環境と連携するポータブルなビルドシステム

平澤将一, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(28)　1-8　2012年9月26日

詳細を見る詳細を閉じる

本研究では，性能可搬性を保ちつつアプリケーションを開発するためのフレームワーク構築に向けて，ポータブルなビルドシステムを開発する．現在の高性能計算 (High-Performance Computing, HPC) システムの構成は複雑化しており，アプリケーションを実行せずにその実効性能を予測することは困難である．このため本研究では，開発中のアプリケーションを定期的に実行し，その性能プロファイルを暗黙裡に取得して性能可搬性の低い個所を特定し，プログラマに対話的に提示することにより性能可搬性の維持を支援することを想定している．そのようなアプリケーション開発補助ツールを実現するためには，開発中のアプリケーションを暗黙裡に様々なシステム上でビルドし，実行する機能が必要である．本研究では，そのような可搬性を有するビルドシステムを開発し，アプリケーション開発支援環境として必要な機能を議論する．
ナノ粒子群形成アプリケーションのOpenACCによる実装と性能評価

菅原誠, 小松一彦, 平澤将一, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(10)　1-7　2012年9月26日

詳細を見る詳細を閉じる

本論文では，熱プラズマによるナノ粒子群創製プロセスにおける集団的粒子形成過程をシミュレーションするナノ粒子群形成アプリケーションを OpenACC と OpenCL を用いて実装し，両者を比較検討する． OpenACC は既存のプログラムにディレクティブを追記することにより容易に GPU を利用することが可能である．それに対して， OpenCL はより低い抽象度でのプログラミングが可能である．プログラム可能な抽象度がそれぞれ異なるため，実現可能な最適化技法が異なる．各最適化技法の性能評価により， OpenACC では CPU 実行時の最大約 1.9 倍の性能向上を， OpenCL では最大約 5.6 倍の性能向上を達成できることが分かった．また，現状の OpenACC において達成可能な性能限界と，高い性能を得るためには， OpenCL のような低い抽象度での最適化が必要であることを議論する．This paper presents an implementation of the plasma-assisted nanopowdergrowth simulation with OpenACC. OpenACC provides compiler directives to allow an existing application to use GPUs. On the other hand, OpenCL is a lower-level programming model. Since OpenACC and OpenCL offer programming models of different abstraction levels, they require different optimizations for a given application code. Therefore, in this paper, several versions of a practical application, the nanopowder growth simulation, are implemented using different optimizations. Then, the performance impact of each optimization is discussed through some experimental results. The evaluation results show that OpenACC and OpenCL can achieve 1.9x and 5.6x performance improvements, respectively. It is also demonstrated that the current version of OpenACC requires low-level performance tuning such as OpenCL programming in order to achieve a high performance comparable with OpenCL.
大規模計算システムにおけるBCMの性能評価

小松一彦, 曽我隆, 江川隆輔, 滝沢寛之, 小林広明

SENAC : 東北大学大型計算機センター広報　45　(3)　17-25　2012年7月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
ベクトル型スーパーコンピュータ広域連携基盤の性能評価

山下毅, 村田善智, 江川隆輔, 小野敏, 大泉健治, 小林広明

SENAC　45　(1)　42-45　2012年1月
3次元積層型浮動小数点乗算器の回路分割手法に関する研究 (電子部品・材料)

川合一茂, 多田十兵衛, 江川隆輔, 小林広明, 後藤源助

電子情報通信学会技術研究報告 : 信学技報　111　(326)　67-72　2011年11月28日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

近年,LSIの更なる性能向上の手段として3次元積層技術が注目されている.3次元積層技術を用いて演算回路を実装する場合,演算回路を回路分割手法に基づいていくつかのサブ回路に分割し,各サブ回路が一つの層に実装される.そのため,回路分割手法により演算回路の性能は大きく変化する.本研究では,クリティカルパスと回路規模に着目した浮動小数点乗算器のための回路分割手法を提案する.提案手法は,クリティカルパス中にTSVが挿入されることを可能な限り避けるため,仮数部乗算部のクリティカルパスと正規化処理部および丸め処理部を同一の層に配置する.シミュレーションによる評価の結果,提案手法を用いた3次元積層浮動小数点乗算器は2次元実装の場合と比較して,単精度で最大8%,倍精度で最大17%の高速化を達成した.
プログラム自動生成技術に基づくGPUコンピューティングの性能評価

菅原誠, 佐藤功人, 小松一彦, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2011　(18)　1-7　2011年7月20日

詳細を見る詳細を閉じる

近年，描画処理用プロセッサ (Graphics Processing Unit: GPU) をアクセラレータとして利用して高速化を実現する複合型計算システムが普及しつつある．しかし，GPU を利用するためには，既存のプログラムを GPU 向けのプログラムに移植する必要があり，移植コストが問題となっている．本論文では，既存のプログラムにディレクティブを追記することにより GPU 向けのプログラムを自動生成する技術に着目し，その実用性と実効性能を評価する．また，ディレクティブを用いることで実現できる最適化を示す．そして，単純な行列積のプログラムを用いて性能を評価し，自動生成されたプログラムが実用的な性能を実現できることを示す．Recently, heterogeneous computing systems that achieve high-performance computing by using Graphics Processing Units (GPUs) as accelarators draw much attention in the area of computation sciences. However, a problem in use of GPUs is that it is necessary to port an existing program to a program for GPUs. To relieve the porting effort, this paper focuses on the technology to automatically generate a GPU program by inserting directives into an existing sequential code and evaluates the sustained performance of the auto-generated program. In addition, we show the achievable code optimizations by using directives. A simple matrix multiplication program is used for the evaluation to demonstrate that the automatically generated code can achieve a high sustained performance.
ボランティアコンピューティングにおける締切時間を考慮したクライアントレベルスケジューリング手法

村田善智, 遠藤聡明, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤システムシンポジウム論文集　2011　45-54　2011年5月18日
ルーフラインモデルに基づくベクトルプロセッサ向けプログラム最適化戦略

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　4　(3)　77-87　2011年5月12日

ISSN： 1882-7829

詳細を見る詳細を閉じる

ベクトルプロセッサにおけるピーク演算性能に対するメモリバンド幅（Bytes/Flop，以下，B/F）は年々減少している．このため近年のベクトルプロセッサは，低下する B/F を補うためにキャッシュメモリを搭載している．本研究の目的は，キャッシュメモリを有するベクトルプロセッサにおいて高い実行効率を実現するプログラム最適化手法を確立することである．複数のプログラム最適化手法を適用する場合，各々の最適化パラメータにおいてトレードオフが存在する．さらに，これらの最適化を併用する場合には互いの最適化パラメータが影響しあうため，体系的に最良のトレードオフを探索するプログラム最適化戦略が求められる．本論文では，キャッシュを有するベクトルプロセッサの性能を引き出すためのプログラム最適化戦略を提案する．最適化戦略では，最適化の対象となるプログラムのボトルネックをルーフラインモデルにより解析し，ボトルネックを改善する最適化手法を対象プログラムに施す．また，最適化手法として本論文では，ループ変換によるプログラム最適化であるループアンローリングとキャッシュブロッキングに着目する．さらに適用する最適化パラメータは，グリーディサーチアルゴリズムによる探索で決定する．そして，複数のアプリケーションを用いて実効性能と消費エネルギーを評価し，本提案手法の優位性を示す．評価結果より，提案手法を用いることで実効性能が改善でき，さらに消費エネルギーを大幅に削減できることが明らかになった．Over the last decade, the ratio of memory bandwidth to computational performance (Bytes/Flop, B/F) of vector processors has decreased. To cover the insufficient B/F, modern vector processors are equipped with an on-chip vector cache. The purpose of this work is to establish a performance tuning strategy to exploit the potential of modern vector processors. When several tuning techniques are applied to an application, there is an explicit trade-off between individual tuning techniques. Therefore, a tuning strategy which finds a good trade-off between individual tuning techniques is required. In this paper, a tuning strategy based on the roofline model for modern vector processors is proposed. We focus on two important loop transformations. One is loop unrolling and the other is cache blocking. To decide which of loop unrolling and cache blocking is performed first, the roofline model is employed to analyze the performance bottleneck of a target application. Then, the optimization effective to remove the bottleneck is applied to the application preferentially. To determine the number of loop unrolls and the cache blocking size, we employ the greedy search algorithm. The superiority of the strategy is evaluated with several applications. The evaluation results show that the strategy can improve the performance and also drastically reduce the energy consumption.
チップマルチベクトルプロセッサのためのプログラム最適化技術

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　44　(2)　29-36　2011年4月
東北大学サイバーサイエンスセンター高速化推進研究活動報告書（第5号）

小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 伊藤英一, 大泉健治, 小野敏, 小久保達信, 橋本ユキ子, 磯部洋子, 撫佐昭裕, 神山典, 金野浩伸

2011年4月
3次元積層型乗算器の回路分割手法に関する研究

坂井一仁, 多田十兵衛, 江川隆輔, 小林広明, 後藤源助

電子情報通信学会技術研究報告. ICD, 集積回路　110　(344)　153-158　2010年12月9日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

近年,LSIの更なる性能向上の手段として3次元積層技術が注目されている.演算回路を3次元実装する場合,回路をいくつかのサブ回路に分割し,それらを積層する必要がある.このときの演算回路の分割手法によって演算回路の性能は大きく異なるため,演算回路毎に最適な分割手法の検討が必要となる.本研究では,可能な限り垂直配線の本数を抑制することで性能向上が得られる分割手法を提案する.演算回路として乗算器を取り上げ,従来の分割手法と提案手法での回路の最大遅延への効果を評価する.シミュレーションによる評価の結果,提案手法では従来の乗算器に比べ最大20%の高速化を達成した.
実アプリケーションを用いたチップマルチベクトルプロセッサの消費エネルギ評価

永岡龍一, 佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2010　(3)　1-8　2010年12月9日
出版者・発行元：情報処理学会
ISSN： 1884-0930

詳細を見る詳細を閉じる

ベクトル型スーパーコンピュータは高精度・大規模なシミュレーションを可能とする一方で，高実行効率を支える高いメモリバンド幅や大容量のメモリに要する消費電力が問題となっている．したがって，今後のベクトル型スーパーコンピュータの設計では，高性能化だけではなく，低消費電力化の実現も求められている．高性能かつ低消費電力なベクトル処理を実現するアーキテクチャとしてチップマルチベクトルプロセッサ (CMVP) が提案されている．しかし，これまで消費エネルギの観点から CMVP の評価はなされていない．そこで本稿では，CMVP の電力モデルを検討し，CMVP におけるベクトルキャッシュの有効性を実アプリケーションにより評価する．High performance computing using vector supercomputers has been shown to be effective for scientific simulations. However, a memory system of vector supercomputers requires the high-energy consumption to keep a high-memory bandwidth. To achieve high sustained performance and low energy consumption, a chip multi-vector processor (CMVP) has been proposed. However, a CMVP has not been evaluated from the point of view of energy consumption. Therefore, we evaluate the energy consumption of a CMVP. First, we establish an energy consumption model of a CMVP to analyze the energy consumption. Then, we evaluate the energy consumption to compare the several designs of varying hardware parameters.
An Out-of-order Vector Processing Mechanism for Multimedia Applications (計算機アーキテクチャ(ARC)) -- (プロセッサアーキテクチャ)

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

研究報告計算機アーキテクチャ（ARC）　2010　(24)　1-10　2010年7月27日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

Nowadays, multimedia applications (MMAs) form an important workload for general purpose processors. The vector processing is considered as the most potential approach for MMAs due to plenty of data level parallelism involved in them. However, the tradition vector architectures obey an in-order issue policy (IIP). The IIP issue policy blocks the following instructions to be issued, no matter whether they are ready to be issued or not. This paper proposes a media-oriented vector architectural extension with an out-of-order vector processing mechanism (OVPM). The OVPM overcomes the inefficiency on utilization of the memory bandwidth and vector functional units. As a result, the proposed architecture achieves a higher performance with lower hardware cost than the traditional one. This paper evaluates the proposed architecture with architectural design parameters and finds out the most efficient size for the vector architecture when performing MMAs.Nowadays, multimedia applications (MMAs) form an important workload for general purpose processors. The vector processing is considered as the most potential approach for MMAs due to plenty of data level parallelism involved in them. However, the tradition vector architectures obey an in-order issue policy (IIP). The IIP issue policy blocks the following instructions to be issued, no matter whether they are ready to be issued or not. This paper proposes a media-oriented vector architectural extension with an out-of-order vector processing mechanism (OVPM). The OVPM overcomes the inefficiency on utilization of the memory bandwidth and vector functional units. As a result, the proposed architecture achieves a higher performance with lower hardware cost than the traditional one. This paper evaluates the proposed architecture with architectural design parameters and finds out the most efficient size for the vector architecture when performing MMAs.
広域ベクトルコンピュータ連携による次世代HPC基盤の構築(3.2 第8回情報シナジー研究会, 3. 研究活動報告)

村田善智, 江川隆輔, 東田学, 小林広明

年報　9　94-98　2010年7月
出版者・発行元：東北大学サイバーサイエンスセンター
OpenCLによるGPUコンピューティングの性能評価

荒井勇亮, 佐藤功人, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2010　(11)　1-7　2010年2月15日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

近年，従来の CUDA に加えて，GPGPU プログラミングのための新たな標準プログラミング環境として OpenCL が利用可能となった．本論文では，CUDA と OpenCL のプログラムの実行性能差を定量的に評価する．まず，ほぼ同等の処理を行う CUDA と OpenCL のプログラムを実装し，性能を比較する．次に，その性能差の主要因を調査し，CUDA コンパイラではサポートされているいくつかのコンパイラ最適化手法が，現在の OpenCL コンパイラではサポートされていないことを明らかにする．最後に，OpenCL コンパイラで生成されるコードを手動で最適化することによって CUDA と同等の性能を達成できた結果から，今後の OpenCL コンパイラの最適化機能が強化されることにより，CUDA コードを OpenCL に単純変換するだけでも，CUDA と同等の性能を達成できる可能性が示された．Recently, a new open programming standard for GPGPU programming, OpenCL, has become available in addition to CUDA. In this paper, we quantitatively evaluate the performance of CUDA and OpenCL program. First, we develop some CUDA and OpenCL programs of almost the same computations and compare their performances. Then, we investigate the main factor causing their performance differences. As a result, it is shown that the current OpenCL compiler does not support several compiler optimizations that are used in the CUDA compiler. Our evaluation results also shows that OpenCL programs can achieve comparable performances with CUDA programs if the codes generated by the OpenCL compiler are manually optimized in the same way as the CUDA compiler. Therefore, these results suggest a possibility that OpenCL codes simply translated from CUDA codes can achieve the same performance with the original CUDA codes if the OpenCL compiler supports those optimizations.
High performance computing on vector systems 2009

Michael Resch, Sabine Roller, Katharina Benkert, Martin Galle, Wolfgang Bez, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2009　1-250　2010年
出版者・発行元： Springer Berlin Heidelberg
DOI： 10.1007/978-3-642-03913-3 　
CUDAアプリケーシヨン向けチェックポイント・リスタート機能の実装と評価

滝沢寛之, 佐藤功人, 小松一彦, 小林広明

情報処理学会研究報告. [ハイパフォーマンスコンピューティング]　122　(7)　G1-G7　2009年10月9日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

本論文では，CUDA アプリケーションのチェックポイント・リスタートを実現するためのツールとして CheCUDA を提案する．既存のチェックポイント・リスタートシステムを使って CUDA アプリケーションのチェックポイント・リスタートを実現するため，CheCUDA は CUDA の API 呼び出し時に GPU の状態変化をメモリに記録するためのアドオンパッケージとして設計されている．本論文では，CheCUDA を試作し，実際に CUDA アプリケーションのチェックポイント・リスタートを正常に実現できることを明らかにする．また，チェックポイントファイルを生成した PC とは環境の異なる他の PC 上でリスタートできることも確認し，CheCUDA がディペンダビリティの向上だけでなくタスクマイグレーションにも有用であることを示す．さらに，CheCUDA のチェックポイント処理のオーバヘッドを定量的に評価する．In this paper, a tool named CheCUDA is designed to enable checkpoint/restart of CUDA applications. To allow an existing checkpoint/restart implementation to checkpoint CUDA applications, CheCUDA is developed as an add-on package working at each CUDA API call to record the GPU status changes onto the main memory. This paper demonstrates that our prototype implementation of CheCUDA can correctly checkpoint and restart some CUDA applications. It is also shown that CheCUDA can restart a CUDA process from a checkpoint file generated on another PC. Accordingly, CheCUDA is useful not only to enhance the dependability of CUDA applications but also to attain task migration of CUDA applications. This paper also shows the timing overhead for checkpointing.
RC-008 ボランティアコンピューティングの高効率化ためのクライアントレベルスケジューリング(ハードウェア・アーキテクチャ,査読付き論文)

村田善智, 遠藤聡明, 滝沢寛之, 小林広明

情報科学技術フォーラム講演論文集　8　(1)　165-172　2009年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
C-024 An Auction based Resource Allocation Considering Multifaceted Utilities in a Peer to Peer Environment

Satayapiwat Chainan, Komatsu Kazuhiko, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

情報科学技術フォーラム講演論文集　8　(1)　491-494　2009年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会

詳細を見る詳細を閉じる

Recently, many market-based approaches have been studied as one of the promising alternatives in a resource allocation problem. Especially, auction-based approaches are widely chosen due to its distributed nature and its relatively lower complexity. However, employing an auction to allocate jobs is only suitable for homogeneous environments of resources. This paper proposes an auction-based resource allocation mechanism which enables resource allocation in a heterogeneous environment while minimizing user's inputs. Our preliminary results show that our resource allocation mechanism improves the performance of important jobs during high-loaded.
C-023 プロセッサ自動選択機能を有するBLASの実現に向けた性能評価(ハードウェア・アーキテクチャ,一般論文)

小松一彦, 小山賢太郎, 佐藤功人, 滝沢寛之, 小林広明

情報科学技術フォーラム講演論文集　8　(1)　485-490　2009年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
キャッシュメモリを有するベクトルプロセッサのためのプログラム最適化手法

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

研究報告計算機アーキテクチャ（ARC）　2009　(6)　1-10　2009年7月28日
出版者・発行元：情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

近年，ベクトルプロセッサにおいて演算性能に対する相対的なメモリバンド幅 (B/F) が低下しており，実行効率の低下が懸念されている．B/F 低下の影響を緩和するために，高いメモリバンド幅を有するキャッシュメモリを搭載することが検討され，その有効性が明らかになっている．そこで，キャッシュの性能をさらに引き出すためのプログラム最適化手法の確立が本報告の目的である．本報告では，キャッシュと性能の関係を解析するために，ルーフラインモデルを用いてキャッシュメモリを有するベクトルプロセッサの性能モデルを構築する．そして，実アプリケーションにプログラム最適化を施し，プログラム最適化の効果を性能モデルを用いて評価する．Since the ratio of memory bandwidth to computational performance(B/F) recently decreases, it is concerned that the sustained performance of future vector processors degrades. To reduce the performance degradation due to the decrease in B/F, vector cache memory with high memory bandwidth has been proposed and evaluated. The purpose of this paper is to establish the optimization techniques to further exploit the vector cache memory performance. To analyze the relationship between the vector cache memory and the sustained performance, this paper first presents a performance model of vector processors with vector cache memory based on the roofline model. Then, several optimization techniques are applied to real applications, and their effects are assessed with the performance model.
SX-9による大規模並列シミュレーション(3.2 第7回情報シナジー研究会, 3. 研究活動報告)

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明, 高橋俊, 中橋和博

年報　8　88-93　2009年7月
出版者・発行元：東北大学サイバーサイエンスセンター
創造工学研修の実施報告 ― スパコンを使って計算科学・計算機科学のおもしろさを体験 ―

滝沢寛之, 江川隆輔, 笹尾泰洋, 佐野健太郎, 山本悟, 小林広明

東北大学サイバーサイエンスセンター大規模科学計算システム広報SENAC　42　(2)　87-90　2009年2月
大規模非圧縮性流体シミュレーションの工学問題への応用

高橋俊, 石田崇, 中橋和博, 小林広明, 岡部公起, 下村陽一, 曽我隆, 撫佐昭裕

SENAC : 東北大学大型計算機センター広報　42　(1)　107-114　2009年1月
出版者・発行元：東北大学サイバーサイエンスセンター
ISSN： 0286-7419
624 消費エネルギを考慮したGPUコンピューティングの検討(OS3.GPGPUコンピューティング(3),オーガナイズドセッション)

滝沢寛之, 佐藤功人, 小林広明

計算力学講演会講演論文集　2008　(21)　558-559　2008年11月1日
出版者・発行元：一般社団法人日本機械学会
ISSN： 1348-026X
東北大学サイバーサイエンスセンターの取り組みとSX-9の性能評価 (スーパーコンピュータSX-9特集)

小林広明, 江川隆輔, 岡部公起

NEC技報　61　(4)　58-65　2008年10月
出版者・発行元：日本電気
ISSN： 0285-4139
RC-006 ウェイアロケーション型共有キャッシュ機構のハードウェア設計に関する研究(ハードウェア・アーキテクチャ,査読付き論文)

阿部健太, 小寺功, 江川隆輔, 滝沢寛之, 小林広明

情報科学技術フォーラム講演論文集　7　(1)　35-38　2008年8月20日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
GPUを効率的に利用するための言語拡張と自動最適化手法

佐藤功人, 滝沢寛之, 小林広明

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）　2008　(74)　199-204　2008年7月29日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

GPU は高性能グラフィックスプロセッサでありながら，汎用演算の高速化に大きな効果があり，様々なアプリケーションでの利用が試みられている． GPU は特有のハードウェア構成のために，高い演算能力を得るためには様々な制限を満たさなければならない．我々は異種プロセッサを搭載する計算システムに対して SPRAT を提案してきたが，プログラムの可搬性と実行効率を両立するためにはプロセッサ特性に合わせた言語の拡張とその自動最適化を行う必要がある．本論文では， GPU 用にコードを自動的に最適化するための共有メモリの活用とミスアラインメントの影響を軽減する手法を提案し，メモリアクセスを調整することでエッジ検出処理と LU 分解において実効性能を向上させることが可能であることを示した．GPUs have a great potencial of high-performance computing and have been used in various applications in addition to graphics processing. In order to achieve high-performance with GPUs, we have to carry out architecture-aware optimizations because of their unique architecture. We have proposed SPRAT, a programming language for hybrid systems of CPUs and GPUs, to realize both the portability of programs and the high computation effeciency. This paper proposes some automatic optimization techniques based on memory access adjustments. The results shows significant performance improvements in the executions of Edge detection and LU decomposition.
次世代ベクトルプロセッサのためのキャッシュ機構に関する一考察(3.2 第6回情報シナジー研究会, 3. 研究活動報告)

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

年報　7　89-93　2008年7月
出版者・発行元：東北大学サイバーサイエンスセンター
新大規模科学計算システムSX-9とその性能評価(3.2 第6回情報シナジー研究会, 3. 研究活動報告)

江川隆輔, 大泉健治, 伊藤英一, 岡部公起, 小林広明

年報　7　85-88　2008年7月
出版者・発行元：東北大学サイバーサイエンスセンター
GPUコンピューティングのためのストリーム処理記述言語

滝沢寛之, 佐藤功人, 小林広明

可視化情報学会誌. Suppl.　28　(1)　271-274　2008年7月1日
出版者・発行元：可視化情報学会
ISSN： 0916-4731
A Fast Ray Prustum-Triangle Intersection Algorithm with Precomputation and Early Termination (コンピューティングシステム Vol.1 No.1)

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

情報処理学会論文誌コンピューティングシステム（ACS）　1　(1)　85-95　2008年6月26日
出版者・発行元：情報処理学会
ISSN： 1882-7829

詳細を見る詳細を閉じる

Although ray tracing is the best approach to high-quality image synthesis much time is required to generate images due to its huge amount of computation. In particular ray-primitive intersection tests still dominate the execution time required for ray tracing and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.
ベクトルプロセッサ用キャッシュメモリの性能評価

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会シンポジウム論文集　2008　(2)　55　2008年1月17日

ISSN： 1344-0640
ベクトル‐パラレル計算機を用いたBuilding‐Cube法による高密度数値計算

高橋俊, 石田崇, 中橋和博, 小林広明, 岡部公起, 下村陽一, 曽我隆, 撫佐昭裕

流体力学講演会/航空宇宙数値シミュレーション技術シンポジウム講演集　40th-2008　433-434　2008年
MPIプログラミング入門

野口孝明, 曽我隆, 金野浩伸, 撫佐昭裕, 大泉健治, 小野敏, 伊藤英一, 岡部公起, 江川隆輔, 小林広明

SENAC : 東北大学大型計算機センター広報　40　(4)　69-94　2007年10月
出版者・発行元： Super-Computing System Information Synergy Center, Tohoku University
ISSN： 0286-7419
I-004 フォトンマップ分割に基づく並列画像生成アルゴリズム(I分野:グラフィクス・画像)

田村壮秀, 滝沢寛之, 小林広明

情報科学技術フォーラム一般講演論文集　6　(3)　203-206　2007年8月22日
出版者・発行元： FIT(電子情報通信学会・情報処理学会)運営委員会
実行時性能予測に基づくCPUとGPUへの動的タスク割当の検討

白取寛貴, 滝沢寛之, 小林広明

電子情報通信学会技術研究報告. CPSY, コンピュータシステム　107　(175)　37-42　2007年8月2日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

近年の描画処理ユニット(GPU)を汎用計算に用いる研究(GPGPU)の成果により,高性能なCPUとGPUを搭載したPCをヘテロジニアスな並列処理計算システムとして活用できることが明らかになっている.一方でそれらのプログラミングは複雑になってきており,これを効率的に活用するために,CPUとGPU上で動作するプログラム記述を統一化する研究がなされている.しかし,現在のGPGPUアプリケーション開発ツールの多くではプログラムを実行するプロセッサを手動で静的に選択する必要がある.その適切な選択は実行時の情報に依存しているため,実行時に適切なものを動的に予測することで更なる効率化を図ることが可能である.本報告では,CPUとGPU上でのプログラムの実行時間の見積もりと実行プロセッサの切り替えのコストから適切なプロセッサを動的に予測することの有効性について検討した結果について報告する.実験による評価の結果,CPUとGPU問のデータ転送以外の両者の切り替えのコストは小さいことから、実行時間に対して予測誤差が十分小さい場合には動的切り替えによる性能向上が期待できる可能性が示された.
ウェイアロケーション型共有キャッシュ機構の性能評価

小寺功, 江川隆輔, 滝沢寛之, 小林広明

情報処理学会研究報告計算機アーキテクチャ（ARC）　2007　(79)　31-36　2007年8月1日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

我々は，キャッシュパーティショニングと部分的に電力供給を止める消費電力削減手法を組み合わせることで，性能を維持しつつ低消費電力で動作するマルチコアプロセッサ用ウェイアロケーション型共有キャッシュ機構を提案している．本提案機構ではキャッシュの参照局所性の評価量を定義し，キャッシュパーティショニングと消費電力削減の指標として用いる．この評価量を用いることにより，提案するキャッシュ機構は柔軟に性能指向と省電力指向に設定することができる特徴を持つ．本論文では，キャッシュ参照の特徴が異なるアプリケーションを用いて本提案機構の有効性を評価する．その評価の結果，提案機構は高い参照局所性を持つアプリケーションでは適切なキャッシュパーティショニングを実現可能であることが示された．また，性能指向の設定にすることで，平均約0.3%の速度向上しつつ，約28% の消費エネルギを削減できることを明らかにした．We have proposed a way-allocatable shared cache mechanism for chip multiprocessors, which can save power consumption with remaining the performance by employing cache partitioning and power gating. In the proposed mechanism, a metric of cache access locality is defined and used for the cache partitioning and the power gating. Based on the metric, the proposed mechanism can flexibly change the configuration to be either performance-oriented or power-oriented.This paper evaluates the validity of the proposed mechanism, using some benchmarks with different cache access behaviors. The evaluation results show that the proposed mechanism can appropriately partition the shared cache for applications with high localities. In addition, our proposal at the performance-oriented mode can reduce energy consumption by 28% while improving the performance by 0.3%.
SC|06調査報告(3.2 第5回情報シナジー研究会, 3. 研究活動報告)

小野敏, 滝沢寛之, 小林広明

年報　6　83-87　2007年7月
出版者・発行元：東北大学情報シナジーセンター
SC|05調査報告(3.2 第4回情報シナジー研究会, 3. 研究活動)

大泉健治, 伊藤英一, 滝沢寛之, 小林広明

年報　5　71-74　2006年6月
出版者・発行元：東北大学情報シナジーセンター
A Runtime Optimization Method for Redundant Task Dispatch on P2P Computing Platforms.(3.2 第4回情報シナジー研究会, 3. 研究活動)

Wang Hong, Takizawa Hiroyuki, Kobayashi Hiroaki

年報　5　100-105　2006年6月
出版者・発行元：東北大学情報シナジーセンター
実シミュレーションコードによる大規模科学計算システムの性能評価(3.2 第4回情報シナジー研究会, 3. 研究活動)

滝沢寛之, 岡部公起, 伊藤英一, 撫佐昭裕, 曽我隆, 伊藤学, 小林広明

年報　5　78-83　2006年6月
出版者・発行元：東北大学情報シナジーセンター
世界一の評価を受けた東北大学のスーパーコンピュータSX-7

小林広明

仙台市医師会報　(504)　8-10　2006年
安全・安心な社会の構築に貢献する世界一のスーパーコンピュータSX-7

小林広明

まなびの杜＜東北大学＞知的探検のすすめ　3　32-33　2006年
複数文字認識エンジンの統合のための重み付き投票法

金子勝一朗, 後藤英昭, 小林広明

電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解　105　(477)　13-18　2005年12月15日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

多数決原理による複数文字認識エンジンの統合は文字認識の精度向上に有効であることが知られている.しかし従来の手法では, 各認識エンジンの入力イメージに対する相性は考慮されていなかった.一方で, 認識エンジンの入力イメージに対する相性に応じて複数認識エンジンを自動的に使い分ける手法が提案されている.本手法はこの二つの手法を参考に, 新しい複数文字認識エンジンの統合手法について提案する.1セット当たり3195字種の14種類の文字セットに対する認識実験において, 従来の多数決処理で97.82%であった認識率が, 提案手法により98.23%に向上した.
実シミュレーションコードによる大規模科学計算システムの性能評価

小林広明, 岡部公起, 撫佐昭裕, 曽我隆, 松村佳昭, 伊藤学

SENAC : 東北大学大型計算機センター広報　38　(4)　39-59　2005年10月
出版者・発行元： Super-Computing System Information Synergy Center, Tohoku University
ISSN： 0286-7419
HPCチャレンジでのSXシステムの性能評価(3.2 第3回情報シナジー研究会, 3. 研究活動)

小林広明, 滝沢寛之, 小久保達信, 岡部公起, 伊藤英一, 小林義昭, 浅見暁, 小林一夫, 後藤記一, 片海健亮, 深田大輔

年報　4　98-116　2005年5月
出版者・発行元：東北大学情報シナジーセンター
HPC チャレンジでのSX システムの性能評価

小林広明, 滝沢寛之, 小久保達信, 岡部公起, 伊藤英一, 小林義昭, 浅見暁, 小林一夫, 後藤記一, 片海健亮, 深田大輔

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　38　(1)　5-28　2005年1月
A new dynamical domain decomposition method for parallel molecular dynamics simulation

V. Zhakhovskii, K. Nishihara, Y. Fukuda, S. Shimojo, T. Akiyama, S. Miyanaga, H. Sone, H. Kobayashi, E. Ito, Y. Seo, M. Tamura, Y. Ueshima

2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005　2　848-854　2005年

DOI： 10.1109/CCGRID.2005.1558650 　
シーン中の文字領域検出における周波数特徴の分析と比較

齋藤靖二, 後藤英昭, 小林広明

電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解　104　(523)　31-36　2004年12月16日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

画像中の文字領域の抽出のために,離散コサイン変換(DCT)やウェーブレット変換などの周波数特徴を用いた手法が幾つか提案されている.従来の研究では主に最終的な文字抽出精度によって性能の評価が行われているが,特徴量そのものの良さという観点からの分析・比較は不十分であった.本報告ではDCTとウェーブレット変換をFisherの判別基準に基づき分析・比較し,文字領域抽出に適した特微量を導出する.さらに,文字・非文字を分類するための教師無しの閾値設定法を提案する.実験の結果,Fisherの判別基準による分析・比較を行うことで適切な周波数領域のDCT係数を選択することができ,優れた特微量を得ることができた.また,得られた特微量を用いて文字抽出の実験を行った結果,より高い文字抽出精度が得られた.
スーパーSINET を利用した大規模遠隔可視化処理の評価

滝沢寛之, 小林広明

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　37　(2)　5-10　2004年4月
ベクトル量子化用コードブック生成のための並列弱肉強食アルゴリズムの性能解析

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, 小林広明, 中村維男

電子情報通信学会技術研究報告. NC, ニューロコンピューティング　103　(92)　25-30　2003年5月22日
出版者・発行元：一般社団法人電子情報通信学会
ISSN： 0913-5685

詳細を見る詳細を閉じる

ベクトル量子化は高効率なデータ圧縮手法であり,データの保存や転送において核となる技術である.これまでに,誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており,中でもアルゴリズムの改良によりコードブック生成処理時間の短縮を図る弱肉強食(Law-of-the-Jungle, LOJ)アルゴリズムが注目を集めている.しかし,大きなデータセットを単一のプロセッサで処理する場合,アルゴリズムの改良による処理時間短縮には限界があるため,本研究では並列処理によるさらなる速度向上を目指してきた.本論文では,これまでに提案を行なった並列LOJアルゴリズムについて,IBM SP2, NEC AzusA, PCクラスタを用いた実験を通して性能解析,及び評価を行なう.
A-19-4 音声による計算機利用支援のための文字情報識別に関する一考察

菊池裕人, 沈紅, 川島丈賢, 小林広明, 中村維男

電子情報通信学会総合大会講演論文集　2003　369-369　2003年3月3日
出版者・発行元：一般社団法人電子情報通信学会
ベクトル量子化のためのコードブック生成並列処理に関する研究

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, ClecioDonizeteLima, 小林広明, 中村維男

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）　2002　(80)　67-72　2002年8月21日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

ベクトル量子化は高効率なデータ圧縮手法であり、データの保存や転送において核となる技術である。これまでに、誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており、中でもアルゴリズムの改良によってコードブック生成処理時間の短縮を図る Low-of-the-Jungle(LOJ)アルゴリズムが注目を集めている。しかし、大きなデータセットを単一のCPUで処理する場合、アルゴリズムの改良による処理時間短縮には限界があり、並列処理によるさらなる速度向上が求められている。今論文では、メモリ分散型並列計算機に適した並列LOJアルゴリズムを提案する。32個の計算ノードを用いて並列コードブック生成実験を行った結果、27.4倍の高いスケーラビリティが得られた。Vector quantization is an attractive technique for lossy data compression, which has been a key technology for data storage and/or transfer. So far, various algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. In particular, the Law-of-the-Jungle(LOJ) learning algorithm has been proposed to achieve rapid codebook design by algorithmic improvements. However, its acceleration is still required when large data sets are processed on a single computer. Therefore, a scalable parallel codebook design algorithm for parallel computers is required. This paper presents a parallel algorithm for the LOJ learning, suitable for distributed-memory parallel computers with a message-passing mechanism. Experimental results indicate a high scalability of the proposed parallel algorithm on the IBM SP2 parallel computer with 32 processing elements.
ベクトル量子化のための並列コードブック生成アルゴリズムの性能評価(2.<特集>第1回情報シナジー研究会)

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, 小林広明, 中村維男, Clecio Donizete Lima, 東北大学大学院情報科学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院工学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院情報科学研究科

年報　2　33-42　2002年7月1日

詳細を見る詳細を閉じる

ベクトル量子化は高効率なデータ圧縮手法であり、データの保存や転送において核となる技術である。これまでに、誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており、中でもアルゴリズムの改良によってコードブック生成処理時間の短縮を図るLaw-of-the-Jungle(LOJ)アルゴリズムが注目を集めている。しかし、大きなデータセットを単一のCPUで処理する場合、アルゴリズムの改良による処理時間短縮には限界があり、並列処理によるさらなる速度向上が求められている。本論文では、メモリ分散型並列計算機に適した並列LOJアルゴリズムを提案する。IBM SP2、NEC AzusA、PCクラスタを用いて並列LOJアルゴリズムの性能評価を行なった結果、いずれもプロセッサ台数に対する高い速度向上率が得られた。
D-11-73 レイトレーシングハードウェアのための交差判定器の計算精度に関する一考察

島倉孝満, 斉田泰昌, 佐野健太郎, 鈴木健一, 中田武男, 大庭信之, 小林広明, 中村維男

電子情報通信学会ソサイエティ大会講演論文集　2001　158-158　2001年8月29日
出版者・発行元：一般社団法人電子情報通信学会
投機的実行を行なうVLIWプロセッサの命令供給機構の設計

ハラダ・ウゴ・ケンジ・ペレイラ, 仲池卓也, 小林広明, 中村維男

情報処理学会研究報告計算機アーキテクチャ（ARC）　1999　(100)　63-68　1999年11月26日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

本論文では，投機的実行をサポートするVLIW計算機の命令フェッチ機構について述べる．ダイナミックブースティングは，コンパイラとハードウェアの支援により，プログラムの動的挙動に対応し，投機的実行を行なう手法である．ダイナミックブースティングでは，コンパイラが，連続した基本ブロック中で依存関係が無く，並列実行可能な命令を検出し，それらの命令にラベルを付ける．そのラベルが付けられた命令は，ハードウェアによって実行時に検出され，投機的実行が行なわれる．SPECint95ベンチマークを用いた実験により，ダイナミックブースティングは，最大20％の性能向上を示した．また，ダイナミックブースティングのためのハードウェアを設計した結果，ハードウェアの複雑性は，低く抑えることが可能であることが分かった．This paper presents an instruction fetch scheme capable of speculatively executing instructions in VLIW processors. This is achieved with the compiler and the underlining hardware working together in a scheme called Dynamic Boosting (DB). In dynamic boosting, the compiler is responsible for finding instruction level parallelism (ILP) beyond the boundaries of basic blocks. It then schedules and labels the independent instructions belonging to different basic blocks in such a way that the hardware is able to detect and execute these instructions in parallel at run time. The software simulation results show that a speed-up of at most 20% was achieved in the SPECint 95 benchmarks. In addition, the preliminary results on hardware cost and gate level speed show that the hardware complexity and cost are reasonable considering the obtained speed-ups.
参照画像を用いた光線追跡法の高速化に関する一検討

及川周, 山内斉, 小林広明, 中村維男

全国大会講演論文集　59　149-150　1999年9月28日
大域照明モデルに基づいたガス状物体の表現手法に関する一検討

内山知之, 山内斉, 小林広明, 中村維男

全国大会講演論文集　59　145-146　1999年9月28日
興味部位の形状を考慮した動的輪郭モデル

北島宏之, 帰山芳行, 小林広明, 中村維男

全国大会講演論文集　59　257-258　1999年9月28日
再構成可能な同期式データフロー計算機に関する一検討

佐々木浩志, 槻岡秀朗, 庄司修芳, 小林広明, 中村維男

電子情報通信学会技術研究報告. VLD, VLSI設計技術　98　(446)　17-22　1998年12月10日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

本報告では、データフローグラフ表現されたアプリケーションに対応してハードウェアを再構成し、各々の計算要素を同期させて計算を行う、同期式データフロー計算機を提案する。アプリケーションとしてJPEGエンコーダをデータフローグラフに実装して、必要となるハードウェア資源を特定した。その結果、メモリアクセス専用の処理要素(ユニット)を取り入れることで、処理を自然な形でデータフローに表現できることが分かった。また、同期式データフロー計算機のアプリケーションを作成するためのソフトウェア開発環境について述べ、提案する計算機によって性能向上の期待できるアプリケーションの持つ性質について考察する。
データ並列ボリュームレンダリングのためのボリューム適応分割手法

佐野健太郎, 北島宏之, 小林広明, 中村維男

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）　1998　(93)　7-12　1998年10月9日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

三次元の広がりを持つデータを可視化するボリュームレンダリングの高速処理手法として、汎用並列計算機による並列処理が注目を集めており、分散メモリ型汎用並列計算機に適した、画像合成法に基づくデータ並列ボリュームレンダリングアルゴリズムが提案されている。本アルゴリズムはリアルタイムレンダリングを可能とするが、計算要素が増加するにつれ並列処理不可能部分である画像合成処理時間が支配的となるため、全体の並列処理効率が低下する。本稿では、画像合成処理時間短縮を目的とした、ボリュームの適応分割手法を提案する。計算要素が多数である程、本手法を用いることにより画像合成時間の短縮が可能であることを、実験結果より確認した。Using parallel processing on a general-purpose parallel computer that is one of the promising strategies for fast volume rendering, we proposed a data-parallel volume rendering algorithm based on the image composition method. Although the algorithm achieves real-time rendering, a constant processing time of image composition lowers efficiency of parallel processing as the number of processing elements increases. To solve this problem, this study proposes an adaptive subdividing method of volume data and discusses its performance through some experiments. The experimental results show that the method reduces the image-compositing time as the number of processing elements increases.
TLB-Assisted Cache

鈴木健一, 大庭信之, 小林広明, 中村維男

情報処理学会研究報告計算機アーキテクチャ（ARC）　1997　(61)　7-12　1997年6月27日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

オンチップキャッシュでは，キャッシュのアクセス時間がプロセッサの動作周波数の制約となるため，アクセス時間の短縮は重要である．そこで，本報告では，TLB Assisted Cache (A)を提案する．TACは，一般にTLBのアクセス時間がキャッシュのアクセス時間よりも短いことに着目し，TLBのアクセス結果をキャッシュのヒット/ミス判定にも利用する．これにより，キャッシュタグの比較が完了するよりも前に，キャッシュデータアレイへのアクセスが可能となり，論理的にはV?Pキャッシュでありながら，V?Vキャッシュに匹敵するアクセス時間を実現できる．しかも，論理的にはV?Pキャッシュとして扱えるため，V?Vキャッシュのかかえる様々な欠点を無視することができる．This report proposes a new on-chip cache system named "TLB Assisted Cache (TAC)." The TAC determines a cache hit/miss by referring to the TLB and the small assist tag comparisons that are faster than a conventional cache tag comparison. Therefore, it is possible to initiate a cache data array access before a cache tag comparison. Consequently, the TAC achieves an access time as short as a V-V cache. Moreover, the TAC logically acts as a V-P cache so it does not suffer from the V-V cache's shortcomings, such as the synonym problem.
SPMDモデルによる関数型プログラム実行の一検討

中泉光広, 沈紅, 小林広明, 中村維男

情報処理学会研究報告計算機アーキテクチャ（ARC）　1997　(61)　25-30　1997年6月27日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

関数型言語は，手続き型言語と異なり，参照透明性や高いプログラムの生産性，検証容易性など，多くの有用な特徴を持つ．しかし，通常の計算機上では十分な処理速度が得られないため，その使用が大きく制限されてきた．これに対し，関数型言語の特徴の一つである並列実行の容易性を利用することによって，並列計算機上で関数型言語の高速実行を図ることが可能である．本稿では，SPMDモデルを用いてグラフ簡約を並列化し，関数型プログラムを並列計算機上で実行する方式を提案する．提案した手法を並列計算機IBM SP2に実装し，評価を行なう．ベンチマークプログラムによる実験結果は本手法の見通しを示した．Functional languages, which are different from the imperative ones, are characterized with the referential transparency, high programming productivity, and the case of program verification. However, they are prevented from wide acceptance due to the inefficiency of their implementation on conventional computers. Parallel execution of functional programs utilizing their potential parallelism is a promising way to solve this problem. This paper studies the parallel execution of functional programs based on the SPMD model. We realize the parallel execution of functional programs on parallel computer IBM SP2. The experimental results of benchmark programs reveal the perspective of the execution model.
A Parallel Volume Rendering Algorithm for Distributed-Memory Multiprocessor Systems

小林広明

豊田研究報告　(50)　41-54　1997年5月
出版者・発行元：豊田理化学研究所
ISSN： 0372-039X
ジェットパイプラインの並列化命令スケジューリングに関する一検討

仲池卓也, 佐々木毅人, 片平昌幸, 小林広明, 中村維男

情報処理学会研究報告計算機アーキテクチャ（ARC）　1996　(106)　25-30　1996年10月31日
出版者・発行元：一般社団法人情報処理学会
ISSN： 0919-6072

詳細を見る詳細を閉じる

ジェットパイプラインは、ベクトル処理と命令レベル並列処理を併用することによって、高速演算を可能にするアーキテクチャである。したがって、ジェットパイプラインのコンパイラは、ベクトル命令とスカラ命令が混在したコードを並列化する必要がある。しかし、ベクトル命令とスカラ命令は、性質が異なるため、VLIW計算機などで用いられている並列化手法を、そのまま適用することはできない。本稿では、スーパスカラプロセッサなどで用いられているディスパッチスタック法を基に、ベクトル命令とスカラ命令を融合した並列化手法を提案し、シミュレーションによりその効果を確認する。Jetpipeline is an architecture based on instruction-level parallelism (ILP), which utilizes vector and scalar processing to achieve high performance. Therefore, the compiler for Jetpipeline must parallelize vector and scalar instructions of programs. However, since vector instructions take more cycles to complete their execution than scalar instructions, it is not suitable to use parallelizing methods used in VLIW machines. In this paper, we propose a parallelizing method for Jetpipeline by improving the dispatch stack method to parallelize the vector and scalar instructions. We show the effectiveness of the proposed parallelizing method for Jetpipeline through simulation experiments.
遅延素子を用いた非同期式ベクトル演算器の設計

高野光司, 佐々木毅人, 片平昌幸, 小林広明, 中村維男

電気関係学会東北支部連合大会講演論文集　1996　1996年
FL階層化並列簡約システムの共有メモリシステム

森規昭, 北島宏之, 沈紅, 小林広明, 中村維男

電子情報通信学会ソサイエティ大会講演論文集　1995　34-34　1995年9月5日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

メモリシステムは,コンピュータを構成する重要な要素の一つである.関数型マシンでは,関数型言語の参照透明性を考慮することにより,簡単なメカニズムでメモリシステムを実現することが可能である.本報では,FL階層化並列簡約システムにおけるメモリシステムについて考察する.
メッセージロスのあるネットワークを用いた分散共有メモリマルチプロセッサシステムの評価

栗山一成, 高橋雅史, 大庭信之, 小林広明, 中村維男

電子情報通信学会ソサイエティ大会講演論文集　1995　35-35　1995年9月5日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

分散共有メモリマルチプロセッサシステム(DSM:Distributed Shared memory Multiprocessor)は,プロセッサ,メモリモジュール,双方を結ぶ内部結合ネットワークから構成される.このシステムの性能は,プロセッサの処理速度,およびメモリの動作速度だけでなく,内部結合ネットワークの通信速度によっても大きく左右される.一方,高速なネットワーク通信のための技術としてATM技術が注目されている.ATMネットワークは,ある程度のメッセージロス(以下ロスと呼ぶ)を許すことによって高スループットを実現している.我々は,ATMネットワークを内部結合ネットワークとして用いたDSMの研究を行っており,本報告ではこのシステムをモデル化し性能評価を行う.
ニューラルネットワークを利用した自動表情認識システム

中島平, 滝沢寛之, 島村三重子, 小林広明, 中村維男

電子情報通信学会ソサイエティ大会講演論文集　1995　173-173　1995年9月5日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

計算機による顔画像からの表情認識の問題においては、表情のクラス分類に対する定式化が確立されていない。このため、この問題に対してはニューラルネットワーク(NN)の適用が効果的であり、高い表情認識率の獲得が報告されている。ところが従来の方法では表情認識の過程は完全には自動化されておらず、また速度的な問題もあり、実用化は困難であった。そこで本稿では完全自動化に向けた表情認識システムを提案し、その評価を行なう。
FL階層化並列簡約システムの性能評価

北島宏之, 沈紅, 片平昌幸, 小林広明, 中村維男

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）　1995　(56)　1-8　1995年6月1日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

関数型言語は手続き型言語と違い、参照透明性やプログラムの高い生産性などの多くの有用な特徴を持つ。しかし、従来の計算機上では十分な処理速度が得られないために、その使用が制限されている。我々は、関数型言語FLで記述されたプログラムを高速に実行するために、マルチプロセッサ処理とパイプライン処理を統合した、階層化並列簡約システムを提案してきた。本論文では、システムの持つ性能を十分に引き出すために参照の局所性を考慮した動的タスク割り当て手法について考察し、シミュレーション実験を通じてシステムの性能評価を行う。その結果、提案するシステムおよびタスク割り当て手法の有効性が明らかになる。Functional programming languages differ from traditional imperative ones with many appealing properties such as referencial transparency and high programming productivity. However, the inefficiency of their implementation on conventional computers has prevent them from wide acceptance. We have proposed a hierarchical parallel reduction system by combining multiprocessor processing and pipeline processing and pipeline processing in our earlier work. In this paper, we investigate the task scheduling strategy with locality consideration suitable for enhancing the system performance, and carry out software simulation experiments. The simulation results reveal the effectiveness of the proposed system with the scheduling strategy.
ジェットパイプラインのためのコンパイル技術に関する一検討

佐々木毅人, 仲池卓也, 片平昌幸, 沈紅, 小林広明, 中村維男

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）　1995　(56)　9-16　1995年6月1日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

命令レベルの並列性を利用し、ベクトル演算機能を有するジェットパイプラインは高速演算を可能にする。しかし、その性能を充分に活用するためには、コンパイラによって充分な最適化が行われなければならない。ベクトル化や並列化は、各々単独では、ベクトル計算機や?LIW計算機用コンパイラでさまざまな手法が用いられている。しかし、ジェットパイプラインではその両方の組合せにより高い演算性能を実現することを目的としているため、ベクトル化および並列化を組み合わせた命令スケジューリングについて検討する必要がある。本稿では、そのアプローチについて述べ、シミュレーションによりその効果を確認する。To achieve high computation power, we have proposed the Jetpipeline architecture that utilizes ILP (Instruction Level Parallelism) including vector operations in addition to scalar operations. In the Jetpipeline architecture, a compiler has an important role because it exploits ILP from operations In this paper, we present a compile technique for jetpipeline based on both parallelizaiton for scalar operations and vectorization for vector operations. The proposed compile technique is examined through simulation experiments.
サブギガネットワークでマルチメディア・アプリケーションを実現する東北大学「SuperTAINS」

亀山幸義, 伊藤彰則, 小林広明

コンピュータ＆ネットワークLAN　13　(6)　114-120　1995年6月
出版者・発行元：オーム社
Prolog言語の階層処理システムとその評価

王東, 小林広明, 中村維男

情報処理学会研究報告. 記号処理研究会報告　95　(2)　1-8　1995年1月13日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

本論文では、Prologプログラムを並列処理するための階層並列処理システムを提案し、システムの性能評価を行う。最初にPrologプログラムを並列処理するために、プログラムの持つ粗粒度のAND並列性やOR並列性と、細粒度の並列単一化処理を統合した階層処理について検討する。粗粒度のAND並列性とOR並列性を効率よく処理するために、拡張AND/OR木を提案する。この木により、Prologプログラムの持つ並列性を十分表現することができる。さらに、制御用にメッシュと木の組合せネットワークを用いてプロセッサを結合した共有メモリ型マルチプロセッサシステムを提案し、この上に、階層並列処理を実現する。最後にその性能評価を行う。
パイプライン型アーキテクチャにおけるOR並列型Prolog実行の一検討

稲葉勉, 沈紅, 片平昌幸, 小林広明, 中村維男

情報処理学会研究報告. 記号処理研究会報告　95　(2)　9-16　1995年1月13日
出版者・発行元：一般社団法人情報処理学会

詳細を見る詳細を閉じる

本研究は、Prolog処理の高速化を目的とした、パイプライン型アーキテクチャ上でのOR並列Prologのコードブロックレベルの並列処理手法を提案する。これは、J.Beerの提案による逐次型Prologのパイプライン実行をOR並列Prolog用に拡張したものである。拡張に伴い、大域共有メモリ、クロスバスイッチを導入した。大域共有メモリは、選択点履歴を格納する選択点スタックモジュールと、各選択点での環境フレームが均等に配置されるモジュールから構成される。本論文では、システムの概要を述べ、ベンチマークプログラムによる性能評価を行う。性能評価の結果、逐次型Prologのパイプライン実行と比較して約2.5倍のLIPS値を得ることができた。
ベクトル命令とスカラ命令を融合したジェットパイプラインの性能評価

仲池卓也, 佐々木毅人, 片平昌幸, 沈紅, 小林広明, 中村維男

電気関係学会東北支部連合大会講演論文集　1995　1995年
並列画像生成システム(Mπ)^2の性能評価

藤勇一郎, 山内斉, 小林広明, 中村維男

電子情報通信学会秋季大会講演論文集　1994　378-378　1994年9月26日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

近年、景観シミュレーションや仮想現実感などの分野で、写実的な画像を高速に生成することが求められている。しかし、現在もっとも写実的な画像を生成すると考えられる2パスレンダリング法は多大な処理時間を必要とするため、一般には用いられていない。本稿では、この2パスレンダリング法を高速に実行する並列画像生成システム(Mπ)^2の性能評価について報告する。
ATMネットワークを用いた分散処理システムにおけるメモリアクセスプロトコル

栗山一成, 高橋雅史, 大庭信之, 小林広明, 中村維男

電子情報通信学会秋季大会講演論文集　1994　79-79　1994年9月26日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

分散処理システムでは、独自の制御部を持つ構成要素をネットワークを用いて結合し、処理の分散を行う。このとき用いられるネットワークは、高速かつ拡張性に優れていることが望まれる。このような特徴を持つネットワークとして、ATMネットワークの研究がされている。本報告では、ATMネットワークを用いた分散処理システムにおけるメモリアクセスプロトコルを示し、システムの解析を行う。
パイプライン型Prologアーキテクチャにおける負荷分散の一検討

稲葉勉, 沈紅, 片平昌幸, 小林広明, 中村維男

電子情報通信学会秋季大会講演論文集　1994　84-84　1994年9月26日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

論理型プログラミング言語の代表であるPrologは、述語論理をベースにした記号処理向き言語であり、知識情報処理分野での幅広い応用が期待されている。しかし、その実行過程では単一化や後戻りという負荷の大きい処理が中心であり、多くの実行時間を要する。本稿では、Prologプログラムの持つOR並列及びSTREAM並列という粗粒度の並列性に着目し、PEをパイプライン状に結合したアーキテクチャにマッピングして高速に処理する方法を提案する。
単語の情緒的印象と心情モデルによる推論

井形伸之, 小林広明, 中村維男

電子情報通信学会秋季大会講演論文集　1994　71-71　1994年9月26日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

人間の情緒的印象を定量化することは、自然言語処理における諸問題(比喩表現など)に重要な示唆を与えると思われる。本報告では、より概念的な単語に対してその情緒的印象を獲得し、簡単な心情モデルと組み合わせることによって、談話内の登場人物の心理的状態を推論する方法を提案する。
ニューラルネットワークを用いた自動感情認識に関する一検討

佐々木恒, 長田敏明, 小林広明, 中村維男

電子情報通信学会秋季大会講演論文集　1994　132-132　1994年9月26日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

クラス分類に対する定式が確立されていない問題である、顔画像からの感情認織へのニューラルネットワーク(NN)の適用は効果的であり、高い感情認織率の獲得が報告されている。ところが、顔画像よりNNへの入力値を生成する従来の方法は自動化が困難である。本報告では、感情認織用NNへの入力値自動生成を念頭に置いた感情認織システムについて検討する。
TLBとキャッシュの統一的管理とその性能評価

鈴木健一, 小林広明, 中村維男

電子情報通信学会秋季大会講演論文集　1994　88-88　1994年9月26日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

TLB(Translation Lookaside Buffer)は、アドレス変換テーブルのエントリを保持する一種のキャッシュメモリであり、仮想アドレスから物理アドレスへの変換を高速に実行するために、メモリ管理機構が使用する。我々は、キャッシュメモリとTLBが、いずれもアドレスタグを持っていることに着目し、キャッシュとTLBを統一的に管理するTLB-Unified Cache(TUC)を提案した。TUCは、統一的な管理によって、TLBとキャッシュの間で重複していたアドレス情報を単一化し、必要となる高速メモリアレイの容量を大幅に削減するものである。本報告では、TLBとキャッシュのミス率を総合的に評価することにより、TUCの有効性を示す。
複数のプロセッサによる共有を考慮したメモリアクセスバッファの構成

高橋雅史, 大庭信之, 小林広明, 中村維男

情報処理学会研究報告計算機アーキテクチャ（ARC）　1994　(66)　225-232　1994年7月21日

詳細を見る詳細を閉じる

共有メモリ型並列計算機システムは、既存のソフトウェア技術との親和性の高さから、広く研究・開発が行われている。しかしながら、共有メモリ型並列計算機システムではフォンノイマンボトルネックとして知られる問題がシステムの性能に深刻な悪影響をあたえる。この問題を解決するためには、メモリのアクセス時間の短縮と、内部結合ネットワークのトラヒツクの分散を行う機構が不可欠である。本稿では、クラスタ化された共有メモリ型並列計算機システムにおいて、メモリアクセス時間を短縮する新しいメモリアクセスバッファ機構を提案する。また、本機構の性能を評価するために行ったソフトウエアシミュレーションの結果を示し、その有効性を検証する。Shared-memory multiprocessor systems have been extensively and intensively studied because of its good affinity with existing software. However, the problem known as the von Neumann's bottleneck severely restricts the scalability of shared-memory multiprocessor systems. In order to solve this problem, a mechanism that shortens memory access latency and avoids traffic congestion on networks is essential. In this paper, we present a new buffering mechanism for shortening the latency of memory accesses over an inter-cluster network. To evaluate the effectiveness of the proposed mechanism, we made a trace-driven simulator representing clustered shared-memory multiprocessor systems. Simulation results show that the mechanism shortens the average memory access latency over the network.
ジェットパイプラインのための並列化コンパイラに関する一検討

佐々木毅人, 片平昌幸, 小林広明, 中村維男

日本機械学会東北支部総会・講演会講演論文集　29th　1994年
ボリュームシェーディングに関する一考察

佐藤大輔, 片平昌幸, 小林広明, 中村維男

電子情報通信学会大会講演論文集　1994　(Shuki Pt 6)　1994年

ISSN： 1349-1369
ジェットパイプラインのための命令スケジューリングに関する一検討

佐々木毅人, 片平昌幸, 小林広明, 中村維男

電子情報通信学会秋期大会講演論文集, June 1994　83-83　1994年
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

近年、命令レベルの並列処理を可能にするアーキテクチャは、パーソナルコンピュータにまで用いられている。このようなアーキテクチャでは、実行命令レベルでの並列度を利用し、1命令実行サイクルで最大4命令程度の同時実行を可能にしている。しかし、どのようなアーキテクチャでもハードウェア単独でプログラムの持つ並列性を完全に抽出することはできない。従って、十分な並列化のためには、ソフトウェアによるスケジューリングが欠かせない。本稿では、このような並列演算機能とベクトル演算機能の融合による高速演算処理を可能にする計算機アーキテクチャであるジェットパイプラインのための命令スケジューリング手法について述べ、シミュレーションによる実験結果を示す。
トークンリングにおける高多重伝送のための制御手法

屠東原, ラシッドイマド, 小林広明, 中村維男

マルチメディア通信と分散処理ワークショップ論文集　1993　(2)　231-237　1993年11月17日
ATMネットワークを用いた分散共有メモリ型分散処理システム

高橋雅史, 大庭信之, 小林広明, 中村維男

マルチメディア通信と分散処理ワークショップ論文集　1993　(2)　277-286　1993年11月17日
B-ISDNのためのインテリジェントセルフルーティングアルゴリズム

ラシッドイマド, 小林広明, 中村維男

電子情報通信学会技術研究報告. AI, 人工知能と知識処理　93　(240)　39-46　1993年9月20日
出版者・発行元：一般社団法人電子情報通信学会

詳細を見る詳細を閉じる

本論文は、非同期転送方式(ATM)によるB-ISDNのための新しいインテリジェントセルフルーティングアルゴリズムを提案する。本アルゴリズムは、ants routingとよぶ新しい輻輳制御方式に基づいている。本アルゴリズムにより、ネットワークのスイッチの出力ポート上にバースト的に発生するトラフィックを最適経路制御し、その結果、高いスループットと低パケット損失率を達成することができる。ants routingが必要とする各スイッチの輻輳状況は、常にモニタされ、そして隣接スイッチ間で共有される。さらに本論文では、待ち行列モデルに基づいた解析モデルにより、本方式の有効性について議論する。
ボリュームレンダリングのための並列計算機に関する一検討

佐藤大輔, 小林広明, 中村維男

全国大会講演論文集　46　477-478　1993年3月1日

詳細を見る詳細を閉じる

近年、医療用画像処理等の分野で注目を集めているボリュームレンダリングは、3次元データをポリゴン等に変換せず直接レンダリングする技法である。ボリュームレンダリングが扱うデータ量は莫大なものであり、処理の高速化のためのさまざまなアルゴリズムやアーキテクチャが提案されている。Drebinらが提案したαブレンディング法はCTやMRIなどをソースデータとする医療用画像処理に適したアルゴリズムである。本報ではαブレンディング法を実現する超並列画像処理アーキテクチャMARBLE(Multi processor architecture for blending slices)の概要について述べる。
画像生成の並列処理に関する一検討

山内斉, 礒田隆生, 小林広明, 中村維男

全国大会講演論文集　46　369-370　1993年3月1日

詳細を見る詳細を閉じる

近年、景観のデザインや仮想実感などの多くのアプリケーションにおいて、フォトリアリスティックな映像を生成する技術について関心が持たれている。しかしながら、よりリアリステイックな映像を生成するためには、膨大な計算時間が必要であり実用的とは言えない。そのため、計算時間の短縮が切望されている。本報では、これらの映像の生成のための新しい並列処理方式を提案する。
計算機アーキテクチャ記述言語CARD - L

高橋雅史, 小林広明, 中村維男

情報処理学会研究報告計算機アーキテクチャ（ARC）　1993　(6)　121-128　1993年1月21日

詳細を見る詳細を閉じる

新たな計算機アーキテクチャに基づく計算機を考案した場合、そのアーキテクチャの評価と検証のためにシミュレーションを行う必要がある。しかし、現在までに提案されているモデルの表現方法は、実行効率、記述効率、表現能力において、計算機のモデル化には適さない点がある。本稿では、計算機のモデル化を行い、そのモデルを簡潔に表現可能な記述言語CARD?Lを提案する。そして、この言語の文法と、その意味を定義する。特に、文脈から制御を自動的に抽出するための手法ついて詳しく述べる。最後に、記述例を通し、様々なアルゴリズムによる機能の記述が可能であることを示す。Simulation experiments must be carried out to evaluate the performance when a new computer architecture is designed. However, the previously proposed description languages for this purpose have some problems in the aspects of execution efficiency, description efficiency and description ability. In this paper, we present a Computer ARchitecture Description Language (CARD-L) that can effectively construct simulation models for target architectures. The semantics of CARD-L is described in detail. Moreover, a strategy for automatic abstraction of controlling factors from descriptions is presented. Finally, the description capability for various functions is discussed by using some examples.
ネットワークフロー制御支援エキスパートシステム

イマドハッサンラシッド, 小林広明, 中村維男

情報処理学会研究報告マルチメディア通信と分散処理（DPS）　1992　(76)　33-40　1992年9月24日

詳細を見る詳細を閉じる

本論文では、スイッチングシステムにおける呼のフロー制御支援のためのエキスパートシステムを提案する。本エキスパートシステムは、直接接続されたノード間の呼の流れを変え、トラフィック量の多いリンクにチャネルを追加することにより、多くの呼が迂回経路を通ることなく、直接リンクにより目的ノードに到達するようにフロー制御を行なう。本エキスパートシステムでの意志決定に必要なトラフィックの定量化のための理論モデルについて議論する。This paper introduces an expert system's decision making process into telecommunication to assist in flow control of calls in switching systems. The decision can be made to change flow of calls direction from output to input or vice versa in direct routes. The strategy uses temporary modifying data for low traffic trunks between source and destination. The new modified trunks are added to high traffic links under uncertain load traffic measure with given hardware systems. Our propose is to utilize the trunks that have low traffic measure, in order to add them to higher traffic links. Moreover, we can use network resources more efficiently in view of economics enhancement. Therefore, more calls will be allowed to pass through a direct rout before attempting to take an alternative route by providing new resources. The theoretical model for the traffic and its traffic measurement where the decision can be made to re configure trunks' scheme are discussed in this paper.
計算機ネットワークの適応経路制御方式Potential Routing

大庭信之, 小林広明, 中村維男

情報処理学会研究報告計算機アーキテクチャ（ARC）　1992　(64)　65-72　1992年8月19日

詳細を見る詳細を閉じる

高いネットワーク性能を得るためには、よい経路制御とフロー制御が必要不可欠である。本論文では計算機ネットワークのパケット転送方式に応用するための新しい適応経路制御方式Potential Routingを提案する。Potential Routingでは、計算機ネットワークを電気回路網とみなし、出発地から到着地までをノード間の電位差にそってパケットを流す。ノードの電位はキルヒホッフの第一法則で求め、ネットワークの混雑度によって経路を決定する。Potential Routingは次の特徴をもつ。（）経路制御を行う場合に、ネットワーク全体の構造を反映した制御が可能であり、またどのような構成のネットワークにも応用可能。（）制御テーブルはキルヒホッフの法則から得られる連立1次方程式で簡単にかつ高速に求められる。（）ピンポン現象やループ現象を引き起こさない。（）常に最短経路を与えるという保証はないが、ほとんどのネットワーク構造で最短経路を与える。To realize high network performance, adequate routing and flow controls are indispensable for the communication of information among computing nodes. We propose a new routing control method, called potential routing, for packet communication in computer networks. Potential routing models a computer network as an electrical circuit, and packet routing from a source node to a destination node is performed according to the potential differences between adjacent nodes. The node potentials are first given by Kirchhoff's law and are then dynamically adjusted according to the traffic situation. Potential routing has the following features : (1) It can be applied to arbitrary network topologies. It takes account of the global network topology in determining the route. (2) The routing table is easily and therefore quickly computed by Kirchhoff's law, using simple simultaneous equations. (3) It does not involve the ping-pong (loop) problem. (4) It is not guaranteed always to give the shortest path, but it actually does so in most cases. By simulation, we verified that potential routing shortens transmission delays, especially when the traffic is heavy or unbalanced.
ポジション・ディスプレイ・マップによる知識表現

山崎智民, 小林広明, 中村維男

全国大会講演論文集　42　220-221　1991年2月25日

詳細を見る詳細を閉じる

人間が意思決定する過程を大きく分けると、以下のようになると考えられる。(1)意思決定するための目標がある。(2)その目標の実行可否を評価する。(3)評価に従い、目標を実行する。この実行可否の評価部分では、要因となる種々の属性を判断材料としている。しかし、それら全てを一括して評価しているとは思われない。多くの場合、全体の中のある2つの属性の値を考えて、評価の対象としている。同時に3つ以上の属性を評価しなければならない場合は、その中から2つずつ選択し、それらを組み合わせることで結果を得ている。例えば、属性a,b,cを考える場合、aとb,bとcを評価し、そこからaとcの属性を評価している。また属性は全て同一レベルというわけではなく、階層化を成しているため、多くの判断が必要となる。これら属性の判断は、人間の知識の中から推論することで得られる。この知識のモデルとして意味ネットワークが知られているが、ネットワークであるため複雑であり、属性の値に興味を示すものも少ない。そこで、属性の値に焦点をおき、視覚的に明解であるポジション・ディスプレイ・マップ(以下PDM)について検討を行う。
機械構造の階層性に基づいた機械設計向き知識ベースに関する検討

米山正樹, 竹田好晴, 小林広明, 中村維男

全国大会講演論文集　42　325-326　1991年2月25日

詳細を見る詳細を閉じる

近年、設計・製造段階での生産性向上を目的として、CAD/CAMが注目を集めている.しかしながら、従来のCAD/CAMは図面の作成やNC(数値制御)機器への命令を利用者の代わりに計算機が行うといった程度のものでしかないことが多かった.そのため、CAD/CAMが作成したものが目標を達成しているかどうかということについては、利用者の判断が必要であった.その結果、見積設計や部品設計、生産設計、行程設計といった内容に対して、CAD/CAMシステムはまだまだ能力不足である。この様な問題点を解決するためのアプローチの1つが知識処理を用いたCAD/CAMシステムである.これにより利用者と計算機間のデータの授受や煩わしさが減少し、利用者は新機構などの開発に専念できると思われる.従って機械設計向きの知識ベースの開発により、CAD/CAMシステムは設計者を真の意味で支援するものになると考えられる.そこで本報告では、一連の機械設計作業を支援する機械設計用知識ベースの構築の第一段階として、設計作業のモデル化について検討し、この方法に基づく機械設計用知識ベースの構築法について述べる.
空間分割型並列処理による光線追跡法の高速化に関する一検討

窪田英幸, 小林広明, 中村維男, 重井芳治

情報処理学会研究報告計算機アーキテクチャ（ARC）　1987　(78)　9-16　1987年11月12日

詳細を見る詳細を閉じる

本論文では、光線追跡法の高速化を目的としたオブジェクト空間分割型並列処理システムの構成、画像生成の並列処理機構、負荷分散法について述べ、そのシミュレーションによる性能評価の結果を報告する。小規模なシステムでは、オブジェクト空間のマッピングによる静的負荷分散法により並列処理の台数効果が得られる。しかし、効果的な負荷分散が得られるプロセッサ数には上限がある。そこで、高性能な大規模システムの構築のために階層化システムについて検討する。これにより、メモリ要求量を低く抑えつつ、多数のプロセッサを高効率で稼働できることを示す。This paper presents a multiprocessor system for fast ray tracing based on object space parallel processing. A parallel processing scheme for image synthesis and load balancing methods in the system are discussed. Firstly, as static load balancing, a mapping strategy of a regularly subdivided object space into the processors are evaluated by simulation. Moreover, we study a hierarchical multiprocessor system to overcome the limitation of the static load balancing in a large scale multiprocessor system. By using this architecture, nearly "ideal" load balancing can be achieved without noticeable increase in memory requirement for object description.

︎全件表示 ︎最初の5件までを表示

書籍等出版物 17

Sustained Simulation Performance 2019 and 2020

Michael Resch, Manuela Wossough, Wolfgang Bez, Erich Focht, Hiroaki Kobayashi

2021年
Sustained Simulation Performance 2018 and 2019

Michael Resch, Manuela Wossough, Wolfgang Bez, Erich Focht, Hiroaki Kobayashi

2020年
Sustained Simulation Performance 2017

Michael Resch, Wolfgang Bez, Erich Focht, Michael Gienger, Hiroaki Kobayashi

2017年
Sustained Simulation Performance 2016

Michael M. Resch, Wolfgang Bez, Erich Focht, • Nisarg Patel, Hiroaki Kobayashi Editors

2016年

ISBN: 9783319467344
コンピュータ工学入門

鏡慎吾, 佐野健太郎, 滝沢寛之, 岡谷貴之

コロナ社　2015年3月

ISBN: 9784339024920
Sustained Simulation Performance 2015

Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Qi, J, Roller, S

Springer　2015年

ISBN: 9783319203409
Sustained Simulation Performance 2014

Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Patel, N

Springer　2014年

ISBN: 9783319106267
Sustained Simulation Performance 2013

Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Kovalenko, Y

Springer　2013年

ISBN: 9783319014395
Sustained Simulation Performance 2012

Resch, M.M, Wang, X, Bez, W, Focht, E, Kobayashi, H

Springer　2012年

ISBN: 9783642324543
High Performance Computing on Vector Systems 2011

Resch, M. Wang, X. Focht, E. Kobayashi, H. Roller, S

Springer　2011年

ISBN: 9783642222436
Cloud, Grid and High Performance Computing: Emerging Applications

Hong Wang, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

IGI Global　2011年

ISBN: 9781609606039
High Performance Computing on Vector Systems 2010

M.Resch, K.Benkert, X.Wang, M.Galle, W.Bez, H.Kobayashi, S.Roller

Springer　2010年11月

ISBN: 9783642118500
Software Automatic Tuning: From Concepts to State-of-the-Art Results

Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Springer　2010年

ISBN: 9781441969347
High Performance Computing on Vector Systems 2009

Resch, M, Roller, S, Benkert, K, Galle, M, Bez, W, Kobayashi, H

Springer-Verlag　2009年11月

ISBN: 9783642039126
High Performance Computing on Vector Systems 2008

M.Resch, M. Galle, H.Kobayashi, T.Hirayama

Springer-Verlag　2008年11月
High Performance Computing on Vector Systems 2007

Hiroaki Kobayashi

Springer-Verlag　2007年11月

ISBN: 9783540743835
High Performance Computing on Vector Systems 2006

Hiroaki Kobayashi

Springer Verlag　2006年1月

︎全件表示 ︎最初の5件までを表示

講演・口頭発表等 77

イジングマシンを用いた救助資源配分の最適化に関する一検討

中本光星, 小野田誠, 熊谷政仁, 佐藤雅之, 小松一彦

情報処理学会第87 回全国大会講演論文集　2025年3月15日
量子コンピューティングとシミュレーションの融合にむけて:量子アニーリング-HPC連携基盤に関する研究開発招待有り

小林広明

Q-STAR(一般社団法人量子技術による新産業創出協議会)セミナー　2024年12月23日
HPCとQuantum Computingの連携とその応用

小林広明

AIチップ設計拠点フォーラム　2024年10月25日
QC & HPC Hybrid Computing for Simulation & Data-analysis Hybrid Applications 招待有り

Hiroaki Kobayashi

German Aerospace Center Seminar　2024年9月19日
QA-HPC Hybrid Computing Infrastructure for Quantum Transformation of Simulation-Data Anaysis Combined Applications 招待有り

Hiroaki Kobayashi

IEEE Quantum Week　2024年9月19日
R&D of QA-HPC Hybrid Computing Infrastructure and Quantum Transformation of Simulation-Data Science Combined Applications 招待有り

Hiroaki Kobayashi

Tohoku-Chicago Quantum Interaction　2024年6月29日
Performance Evaluation of Vector Annealing on NEC Vector Processor SX-Aurora TSUBASA

Hiroaki Kobayashi

HPC2024　2024年6月27日
Accelerating Quantum Innovation & Startup Creation at Tohoku University 招待有り

Hiroaki Kobayashi

Chicago-Tohoku Quantum Alliance Symposium　2024年2月14日
NEC SX-ACE's Operations and Applications Development for the Future

24 th Workshop on Sustained Simulation Performance　2016年12月4日
Overview of Vector Supercomputer SX-ACE and Its Applications 国際会議

Russian Supercomputing Days 2016　2016年9月26日
防災・減災に貢献するスーパーコンピュータの開発を目指して

2016年ハイパフォーマンスコンピューティングと計算科学シンポジウム　2016年6月6日
東北大学大規模科学計算システムとその利用支援について

第25回東北CAE懇話会　2016年5月13日
Highly-Productive Computing on Modern and Future Vector Platforms

The 23rd Workshop on Sustained Simulation Performance　2016年3月16日
One-year experience with SX-ACE 国際会議

22nd Workshop on Sustained Simulation Performance　2015年12月17日
Highly-Productive HPC on Modern Vector Supercomputers: present and future 国際会議

Russia Supercomputing Days　2015年9月28日
スーパーコンピュータの驚異的な力

第116回東北大学サイエンスカフェ　2015年5月29日
Real-Time Tsunami Inundation Forecasting and Damage Estimation on SX-ACE: A HPC System as a Social Infrastructure for Tsunami Disaster Prevention and Mitigation, 国際会議

NUG XXVII　2015年5月11日
東北大学サイバーサイエンスセンターの高性能計算に関する研究開発活動: 普通の人々のためのスーパーコンピュータセンターを目指して

第25回TOPIC総会講演会　2015年4月20日
普通の人々のためのスーパーコンピュータセンターを目指して

CyberHPC Symposium　2015年3月20日
A SX-ACE-based New Computer System of Tohoku University and: Its Early Evaluation by using Real Applications, 国際会議

20th Workshop on Sustained Simulation Performance (WSSP20)　2014年12月15日
東北大学サイバーサイエンスセンターの新スーパーコンピュータシステムの概要と高性能計算に関する研究開発活動

第133回NEC C&Cシステム SP研究会　2014年11月11日
Tohoku Univ.’s New Supercomputer System and R&D on Highly-Productive HPC for Memory Intensive Applications 国際会議

NUG2014　2014年5月12日
防災・減災に資する次世代スーパーコンピュータの開発をめざして〜スーパーコンピューティングによる津波のリアルタイム予測〜

G 空間情報を活用した次世代防災・被災地支援システム研究会第３回シンポジウム　2014年3月12日
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

第11回戦略的高性能計算システム開発に関するワークショップ,　2014年3月10日
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方の調査研究の取り組み

第132回ＮＥＣＣ＆ＣシステムＳＰ研究会　2014年1月23日
Feasibility study of the next generation vector system architecture for memory intensive applications 国際会議

18th workshop on Sustained Simulation Performance, Stuttgart Germany　2013年10月28日
東北大学大規模科学計算システムの運用と次世代ベクトルコンピューティングに関する研究開発

日本学術会議電気電子工学委員会 URSI分科会無線通信システム信号処理小委員会URSI-C 研究会　2013年9月26日
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

文部科学省「革新的ハイパフォーマンス・コンピューティング・インフラ（HPCI）の構築」 HPCI戦略分野2「新物質・エネルギー創成」計算物質科学イニシアティブ（CMSI）計算分子科学研究拠点第4回研究会　2013年9月10日
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

第10回戦略的高性能計算システム開発に関するワークショップ　2013年7月30日
防災・減災に資する次世代スーパーコンピュータの開発をめざして

東北大学電子通信研究機構シンポジウム—耐災害ICTによる東北復興に向けて　2013年7月23日
スーパーコンピュータが拓く未来

東北活性化ユニバーサイエンス・新潟県立十日町高校キャリア教育講演会,　2013年7月5日
Early evaluation of NGV and feasibility study of the next generation vector system architecture for memory intensive applications 国際会議

NUG2013　2013年6月23日
Feasibility study of future HPC systems for memory-intensive applications 国際会議

1st International Workshop on Strategic Development of High Performance Computers　2013年3月18日
Feasibility study of future HPC systems for memory-intensive applications 国際会議

17th Workshop on Sustained Simulation Performance　2013年3月12日
イベント企画「安全・安心な暮らしを支えるハイパフォーマンスコンピューティング～防災・減災に向けて～」

第75回情報処理学会全国大会　2013年3月8日
Potentials of the vector architecture in the post-peta era 国際会議

Workshop on Sustained Simulation Performance　2012年12月10日
Design Space Exploration of the Vector Processor Architecture using 3D Die-Stacking Technology

筑波大学計算科学研究センター設立20周年記念シンポジウム　2012年9月7日
High-End Computing Systems: Past, Present and Future 国際会議

SICE2012 SICE Annual Conference　2012年8月20日
Capability and Potential of Vector Processors: Present and Future 国際会議

NUG2012　2012年6月12日
Capability of Vector-Parallel Computing Platforms 国際会議

the HPC Workshop in Singapore　2012年5月7日
高生産・高性能コンピューティングと新世代ベクトルコンピューティングに関するR&D 国際会議

SP研究会 SC10講演会　2010年11月17日
Activities for Highly-Productive Computing and R&D on New-Generation Vector Computing 国際会議

JAEA SC10 Workshop　2010年11月16日
Performance Discussion on Scalar and Vector Systems and R&D on New-Generation Vector Computing 国際会議

the 13th Teraflop Workshop　2010年10月21日
Performance Discussion on Scalar and Vector Systems and R&D for New-Generation Vector Computing at Tohoku University 国際会議

NUG2010　2010年6月29日
東北大学大規模科学計算システムの運用とベクトルコンピューティングに関する研究開発

第九回PCクラスタシンポジウム　2009年12月10日
Supercomputers and Supercomputing in Tohoku University 国際会議

JAEA SC09-Workshop　2009年11月18日
ラボコンピューティングからペタコンピューティングへの橋渡しを目指して〜共同利用・共同研究拠点として新しい時代の情報基盤センターの役割〜

第4回国立大学法人情報系センター長会議基調講演　2009年10月23日
21世紀はベクトルコンピューティングの時代！？

第8回情報科学技術フォーラム特別企画　2009年9月3日
Lessons Learned from 1-Year SX-9 Experiences and Toward the Next Generation Vector Computing 国際会議

20th CCSE Workshop on Advanced Computing Technologies toward PetaFLOPS　2009年4月24日
Tohoku University View to Supercomputing 国際会議

10th Teraflop Workshop　2009年3月16日
On-chip Caching for vector architectures 国際会議

JAEA -Symposium at SC08　2008年11月20日
The new era of the vector architecture: experiences with the early adaption of SX-9 国際会議

NEC HPC Workshop at SC08　2008年11月19日
A news update of Cyberscience Center 国際会議

the 9th Teraflop workshop　2008年11月12日
実アプリケーションを用いたSX-9の性能評価

大阪大学サイバーメディアセンター平成20年度スーパーコンピュータシンポジウム　2008年10月24日
HPC Activities at Tohoku University: Experiences with the early adaption of SX-9 国際会議

DWD (ドイツ気象庁)特別講演会　2008年10月2日
HPC Activities at Tohoku University 国際会議

Barcelona Supercomputer Center Seminar　2008年9月30日
New Sueprcomputer System SX-9 and its Early Evaluation

IEEE EMC Sendai Chapter Lecture and Seminar　2008年5月14日
新しいスーパーコンピュータシステムSX-9とその評価について

SP研究会　2008年5月9日
New Sueprcomputer System SX-9 and its Early Evaluation 国際会議

the 18th CCSE Workshop on Computational Technologies Supporting Development of Future Applications　2008年4月22日
Experiences with SX-9 国際会議

the 8th Teraflop workshop　2008年4月10日
Experiences with SX-9 国際会議

Worldwide NEC Users’ Meeting　2008年4月6日
メディアプロセッサによる高性能計算

電子情報通信学会専門講習会　2008年2月22日
New System Design and Its Early Evaluation 国際会議

The Seventh Teraflop Workshop　2007年11月21日
The Potential of On-Chip Memory Systems for Future Vector Architectures, 国際会議

the 16th CCSE Workshop on High-Performance Computing on Vector Based Architectures – Recent Achievements and Future Directions-　2007年4月23日
ISC Plans and Update 国際会議

The Sixth Teraflop Workshop　2007年3月26日
HPC Activities at Information Synergy Center 国際会議

The Fifth Teraflop Workshop　2006年11月20日
Implication of Memory Performance in HEC Systems 国際会議

The Fourth Teraflop Workshop　2006年3月30日
Performance Evaluation of SX-7 using HPCC and Real Application Codes 国際会議

3rd Teraflop Workshop　2005年11月11日
情報シナジーセンターのHPC研究活動とペタフロップス時代のセンターの役割

NEC HPC研究会　2005年11月9日
スーパーコンピュータにまつわる誤信と落し穴

東北大学大学院情報科学研究科談話会　2005年7月26日
大規模科学計算システムの技術動向

NUA東北地区ユーザ研修会　2003年6月5日
High-Performance Photo-Realistic Graphics on the 3DCGiRAM Architecture 国際会議

2002 International Conference on Optical Communication and Multimedia　2002年11月14日
高性能・高機能ネットワーク社会を支える基盤技術の展望

NetOne Tohoku Seminar 2000　2000年10月17日
機械を知能化するコンピュータ

日本機械学会特別企画フォーラム「機械と知能」　1998年10月11日
並列処理を用いた高速ボリュームレンダリング手法と医用画像における興味部位の自動抽出手法

秋田県立脳血管研究センター講演会　1997年2月5日
東北大学情報科学研究科のマルチメディア環境

（株）アシスト，日本サン・マイクロシステムズ（株）合同主催セミナー　1996年3月7日
スーパーコンピュータと数値流体力学

大阪大学溶接工学研究所研究集会　1991年3月29日

︎全件表示 ︎最初の5件までを表示

産業財産権 13

参照画像キャッシュ、削除先決定方法及びコンピュータプログラム

小林広明

特許特許第7416380号

産業財産権の種類: 特許権
参照画像キャッシュメモリ、データ要求方法及びコンピュータプログラム

小林広明他

特許特許第7425446号

産業財産権の種類: 特許権
津波浸水予測システム，制御装置，並列計算システムの制御方法及びプログラム

越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

特許第6362178号

産業財産権の種類: 特許権
津波浸水予測システム，データ処理サーバ，津波浸水予測の依頼方法及びプログラム

越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

特許第6323880号

産業財産権の種類: 特許権
津波浸水予測システム、制御装置、津波浸水予測の提供方法及びプログラム

越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

特許第6161130号

産業財産権の種類: 特許権
キャッシュメモリおよびキャッシュ制御方法

小林広明, 斎田泰昌

第3834323号

産業財産権の種類: 特許権
利用形態指向P２Pネットワークシステム、及び、コンピュータプログラム

小林広明, 滝沢寛之, 稲葉勉

第4170285号

産業財産権の種類: 特許権
グリッドコンピューティングシステム、及びグリッドコンピューティングシステムにおける計算資源収集方法

小林広明, 稲葉勉, 松村龍太郎

第3857258号

産業財産権の種類: 特許権
グリッドコンピューティングシステム

小林広明, 稲葉勉, 松村龍太郎

第3977298号

産業財産権の種類: 特許権
物性マップ画像生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権
特異材料検出装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権
マップ画像生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権
推奨データ生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

産業財産権の種類: 特許権

︎全件表示 ︎最初の5件までを表示

共同研究・競争的資金等の研究課題 53

量子・古典ハイブリッド計算によるソフトマテリアル研究開発デジタルツインの創成

小林広明, 撫佐昭裕, 阿部圭晃, 佐藤雅之, 小松一彦, 菊川豪太

2024年4月1日～ 2028年3月31日
大規模量子コンピューティングによる新計算原理計算基盤の創生

小松一彦, 小林広明, 佐藤雅之, 百瀬真太郎

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2023年4月1日～ 2028年3月31日

詳細を見る詳細を閉じる

近年、従来のコンピュータと異なる新たな計算原理である量子コンピュータが、次世代の計算技術として注目され活発に研究が進んでいる。本研究では、量子コンピュータが従来のコンピュータと同等程度に利用されることを見越し、さらにその先に求められる大規模な量子コンピューティングについての要素技術を先んじて研究開発を行う。量子科学的なアプローチによる大規模化だけではなく、超並列計算を実現している高性能計算分野の思想、原理、技術を発展、技術移転させることで、大規模な量子コンピューティングのための新たな並列量子アルゴリズムの用語技術の研究開発に取り組む。量子コンピュータによる並列量子計算の要素技術を開発することで、量子コンピューティングによる大規模計算の実現可能性、量子コンピューティングにおける評価方法、複数ノードによる量子コンピューティングの可能性を明らかにする。これを実現するために、大規模計算のための量子アルゴリズム、並列量子計算のためのシミュレータ開発、そして、性能評価・分析の3つの項目を設定し研究を遂行する。大規模計算のための量子アルゴリズムにおける研究項目1では、量子コンピュータの単一ノードにおいて実用的な計算が実現できると想定し、大規模計算を実現するためのアルゴリズム開発に取り組む。並列量子計算のためのシミュレータ開発における研究項目2では、大規模な量子計算の評価や分析に必要となる並列量子計算のためのシミュレータの開発を行う。性能評価・分析の研究項目3では、まだ未確立である量子プロセッサの評価方法、分析方法、ベンチマークの研究開発を行う。
大規模量子コンピューティングによる新計算原理計算基盤の創生

小松一彦, 小林広明, 佐藤雅之, 百瀬真太郎

2023年4月～ 2028年3月
理・工・医学の連携による災害医療デジタルツインの開発と医療レジリエンスの再構築

越村俊一, 江川新一, 久保達彦, 近藤久禎, マスエリック, 小林広明, 金谷泰宏, 太田雄策, 市川学, 柴崎亮介, 佐々木宏之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (S)

研究機関：Tohoku University

2021年7月5日～ 2026年3月31日
最新符号化VVC/H.266を用いたリアルアイム映像符号化技術の開拓とその応用

岩崎裕江, 小林広明, 佐藤雅之, 新田高庸, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tokyo University of Agriculture and Technology

2022年4月1日～ 2025年3月31日

詳細を見る詳細を閉じる

近年、映像トラフィックの爆発的に増加し、2022年には、全体のトラフィックの80%を超えると予測され、映像トラフィックの圧縮が契機の課題となっている。2021年2月に規格化された最新の国際標準映像符号化規格(以下、VVC/H.266と呼ぶ)は、前進のH.265の約半分の高圧縮を実現できる反面、莫大な演算量を必要とする。このため、様々な特徴を持った映像を用いて真の圧縮効率を得られる符号化モードを事前に学習し、学習結果を用いる形でリアルタイムに超高圧縮な映像符号化を実現するハードウェア向きアルゴリズムの開発を行った。１．最新映像符号化VVC/H.266の実現に向けた映像符号化アーキテクチャの確立：ハードウェアの技術水準を踏まえた上での最新映像符号化VVC/H.266の符号化を実現するためのハードウェアアーキテクチャの検討を行った。VVC/H.266では、新しい符号化ツールやブロックサイズの拡大により、従来のHEVC/H.265よりも大きく圧縮性能を向上させている一方、莫大な演算量を必要としている。リアルタイムハードウェアのエンコーダの実現を目指し、フレーム内符号化におけるエンコード時間を削減可能な効率的なブロック分割アルゴリズムについて検討を実施した。２．最新映像符号化VVC/H.266の実現に向けたメモリアーキテクチャの確立： VVC/H.266は、HEVC/H.265の符号化方式の約50%の圧縮効率を実現可能であるが、リアルタイムハードウェアを用いた場合の真のVVC/H.266の圧縮効率を得るためには、エンコードに必要なメモリアクセスは莫大である。本研究では、効率的なリアルタイムエンコード処理を行うことを想定し、メモリアクセスの効率化に向けたキャッシュ構成について提案し、当該キャッシュ構成でのメモリアクセスをシミュレータに組み込んで定量的に評価した。
最新符号化VVC/H.266を用いたリアルアイム映像符号化技術の開拓とその応用

岩崎裕江, 小林広明, 佐藤雅之, 新田高庸, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tokyo University of Agriculture and Technology

2022年4月1日～ 2025年3月31日
大規模シミュレーションによるマイクロデバイスを利用した輸送機器設計革新技術の産業利用拡大

藤井孝藏, 立川智章, 浅田健吾, 小川拓人, 滝沢寛之, 小林広明, 江川隆輔, 磯部洋子

提供機関：Tohoku University Cyber Science Center

制度名：JHPCN:Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures

研究機関：Tohoku University

2017年～ 2024年
統合型材料開発システムによるマテリアル革命

小林広明,小松一彦,佐藤雅之

2020年5月～ 2023年3月
量子アニーリングが拓く高性能マテリアルインフォマティクス基盤の新展開

小林広明, 岡部朋永, 阿部圭晃, 菊川豪太, 佐藤雅之, 撫佐昭裕, 觀山正道, 大関真之, 小松一彦

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

研究種目：Grant-in-Aid for Scientific Research (A)

研究機関：Tohoku University

2019年4月～ 2023年3月

詳細を見る詳細を閉じる

架橋高分子材料における架橋ネットワーク構造形成のシミュレーションの高速化では，分子動力学シミュレーションと連携する形で粗視化粒子スケールのシミュレーションであるDPD（散逸粒子動力学）法を実装した．また，全原子スケールと同様の反応モデルを組み込み，粗視化レベルでの硬化計算を実現した．さらに，DPDシミュレーションによって得られた構造に対し，全原子スケールの構造テンプレートを貼り付けるリバースマッピング手法を開発した．その結果，全原子シミュレーションに対してコンシステントな構造・物性予測と硬化計算の大幅な速度向上を実現した．高次精度非構造ソルバーを用いた非定常圧縮性流体マクロ解析の大規模実行については，オープンソース（PyFR）のSX-Aurora TSUBASAにおける実装と高速化について研究開発を進めた．その結果，これまで部分的にのみベクトル化が行われていた流束計算のためカーネル（tflux/intcflux）対して，配列の初期化やループ構造の見直しにより完全なベクトル化を達成することが出来た．分子動力学シミュレーション（Peachgk_md）の高速化については，ベクトル化阻害要因であったリストベクトルにおいて，止まり木法を用いることによってベクトル化が可能であることを明らかにし，Peachgk_mdに止まり木法を実装することによってベクトル化率を98.8％まで向上させ，その結果，演算性能が6.5倍に向上した．アニーリングマシンと高性能計算システムの連携によるクラスタリング手法の開発では、クラスタリング条件を定義する制約項をQUBOとは別に定義するアニーリングベースのクラスタリングを量子アニーリングマシンやデジタルアニーリングマシンで行い、QUBO生成などの前処理および、データ集計の後処理などに従来の高性能計算システムを活用する手法を開発した。
量子アニーリングアシスト型次世代スーパーコンピューティング基盤の開発

小林広明, 滝沢, 寛之, 山口, 健太, 撫佐, 昭裕, 曽我隆, 渡部修, 横川, 三津夫, 江川隆輔, 下村, 陽一, 中田, 一人, 越村俊一, 小松, 一彦, 佐藤, 雅之, 愛野, 茂幸, 磯部洋子, 政岡, 靖久, 百瀬, 真太郎, 藤本, 壮也, 山本悟, 古澤卓, 荒木拓也, 村嶋, 陽一, 大関, 真之, 觀山, 正道, 太田雄策, マスエリック, 星, 宗王, 萩原孝

2018年4月～ 2023年3月
理・工・医学の連携による津波の広域被害把握技術の深化と災害医療支援システムの革新

越村俊一, 江川新一, 近藤久禎, マスエリック, 小林広明, 金谷泰宏, 太田雄策, 柴崎亮介

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

研究種目：Grant-in-Aid for Scientific Research (S)

研究機関：Tohoku University

2017年5月～ 2022年3月

詳細を見る詳細を閉じる

シミュレーションにより即時に被害状況および病院の被害状況を予測し，医療機関の機能維持，病院の支援に資するための予測モデルの構築に取り組んだ．主要な成果は以下の通り． 1) 高分解能津波浸水被害予測シミュレーションの全国展開に向けた高速化・高度化に取り組んだ．津波シミュレーションプログラムに対して，最大4.3倍の性能向上を達成し，シミュレーションの計算領域の最適化と並列処理における負荷分散方法について検討し，全国展開に向けた課題を整理した． 2) GNSSの生データである搬送波位相から直接断層すべりを推定する手法(Phase To Slip, PTS)の高度化による地震時すべり分布推定への適用とその性能評価を実施し，GNSS衛星の精密暦ではなく，放送暦を用いても地震時すべりが正確に推定できることを明らかにした． 3) 携帯電話から継続的に得られる大量の位置情報の解析について，従来からの夜間人口との突き合わせによる補正に加え，鉄道駅の改札口における旅客カウントや高速道路料金所における通過交通量を用いた補正方法を考案した．滞在場所等の推定については，従来の10倍のデータ量を有するターゲット広告による位置データの利用を想定してシステム構築を行った． 4)災害時医療システムH-CRISISを介した災害訓練用シナリオを提供することでシミュレーション計算による人的被害規模に基づいた地域防災計画上の市町村の保健医療対応機能の評価を行った．南海トラフ地震による津波被害への対応が求められる地域において，時系列での人的被害規模に加え東日本大震災における記録を踏まえ発生が見込まれる事案への対応について能力上の課題を検証した．主な受賞は，科学技術分野の文部科学大臣表彰科学技術賞(開発部門)，第1回日本オープンイノベーション大賞総務大臣賞の2件．
理・工・医学の連携による津波の広域被害把握技術の深化と災害医療支援への展開

越村俊一, 江川新一, マスエリック, 小林広明, 太田雄策, 柴崎亮介

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

研究種目：Grant-in-Aid for Scientific Research (A)

研究機関：Tohoku University

2017年4月～ 2021年3月
エクサスケール時代のアプリケーション開発支援とベクトルアーキテクチャ設計の新展開

小林広明, 滝沢寛之,江川隆輔,佐藤雅之,撫佐昭裕,Vladmir Voevodin, Vadim Voevodin, Ilya Afanasyev 他

2018年4月～ 2020年3月
機械学習技術の活用による職人的プログラミングの知能化

滝沢寛之, 片桐孝洋, 横川三津夫, 南一生, 小林広明, 須田礼仁, 岡谷貴之, 江川隆輔, 大島聡史

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2016年4月1日～ 2019年3月31日

詳細を見る詳細を閉じる

本研究では，高性能計算(HPC)プログラミングの支援に機械学習を効果的に利用できる事例を示した．すでに機械学習の利用が成功している問題に変換することにより，コード最適化における種々の問題も機械学習で解決できる可能性がある．また，HPCプログラミング分野で膨大な数の訓練データを用意できる問題は稀であり，効率的な収集のためには対象問題を十分に分析する重要性が示された．さらに，HPCプログラミングと同様に，機械学習の利用においても熟練者の経験と勘に頼らなければならないが，すでに数値化されているハイパーパラメータの調整であるため，計算コストの問題に置き換えて考えることが可能であることも明らかになった．
ポストCMOSデバイスを用いたマイクロプロセッサの設計空間探索

江川隆輔, 小林広明, 滝沢寛之, 多田十兵衛, 佐藤雅之, 宇野渉, 豊嶋拓也, 坂井然太郎, 小笠原大輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2015年4月～ 2018年3月

詳細を見る詳細を閉じる

本研究では，2025年頃に実用化が期待されている新規デバイス技術を用いた高エネルギ効率マイクロプロセッサの実現を目的に，新規デバイスを用いた回路設計，メモリサブシステムに関する研究に取り組んだ．回路設計に関してはCNFETを用いたウェーブパイプライン化回路の設計手法に取り組んだ．また，メモリサブシステムに関しては，3次元積層技術，STT-RAMに着目し，将来のメモリサブシステムにおけるキャッシュバイパス機構，マルチバンクメモリのための省電力データ配置手法，ラストレベルキャッシュ(LLC)の低消費電力管理機構に関する研究に取り組み，シミュレーションによりその有効性を明らかにしている．
低電力積層型半導体用高密度自己組織化配線技術の研究開発

小柳光正, 東, 和幸, 元吉真, 知京, 豊裕, 川喜多, 仁, 田中徹, 福島, 誉史, 李, 康旭, 池田誠, 小林広明, 岡谷, 貴之, 清山浩司

2015年4月～ 2017年3月
5.5次元設計時代のグリーンマイクロアーキテクチャの創成

江川隆輔, 多田十兵衛, 小林広明, 滝沢寛之, 佐藤雅之, 宇野渉, 西村秦, 細川麿生, 豊嶋拓也

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2014年4月～ 2017年3月

詳細を見る詳細を閉じる

本研究は，ムーアの法則終焉後のプロセッサ設計を支えることが期待されている2.5次元，および3次元実装技術，それぞれの潜在能力を十二分に引き出し，現存プロセッサを凌駕する電力効率を実現可能なマイクロアーキテクチャの実現を目指す．具体的には，微細化のみに頼らないオーバー・ザ・ムーア時代を見据え，垂直配線を積極的に利用するプロセッサ設計の要素技術に関する研究を推進した．細粒度から粗粒度まで様々な設計粒度における積層技術の有効性検討を通して，性能．・電力・コストのトレードオフを考慮しながら適材適所でTSVを活用することで，プロセッサ・システムの電力効率を飛躍的に向上可能であることを明らかにした．
リアルタイム津波予測システムとＬアラートの連携による「津波Lアラート」の構築と災害対応の高度化実証事業

越村俊一小林広明他

2015年4月～ 2016年3月
ストレージ階層化時代のチェックポイント・リスタート技術の新展開

滝沢寛之, 宇野篤也, 小林広明, 江川隆輔, 佐藤幸紀

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2014年4月～ 2016年3月

詳細を見る詳細を閉じる

アプリケーション実行中にその状態を定期的に保存する状況を想定し、それを階層的なストレージに書き込む際の頻度などを自動調整する方法について検討した。また、その書き込みに要する時間を短縮する方法について検討した。そのためには、将来書き込まれる蓋然性の高いデータを投機的に書き込んでおくアプローチが有効であることから、その予測方法についても考察した。その予測のためには対象アプリケーションのメモリアクセスパターンを調べる必要があるため、メモリ解析ツールを開発した。大規模システムのジョブスケジューリングのシミュレータを開発し、これらの手法の効果を検証した。
ディペンダブルプロセッシングコデザイン型3次元プロセッサアーキテクチャの創出

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2014年4月～ 2016年3月

詳細を見る詳細を閉じる

本研究では、従来の半導体技術による製造限界、ならびにアーキテクチャ設計限界に直面するプロセッサ開発において、近年注目を集めている3次元実装技術を活用し、プロセッサの高性能化と高信頼化を実現する新たなアーキテクチャ設計技術を確立することを研究の目的としている。多くのアプリケーションの実行においてメモリサブシステムが性能制約を与えることから、本研究では3次元実装技術を活用した大規模高性能オンチップメモリ階層の設計と、これらメモリ階層を単にプログラムの実行だけでなく信頼性向上に活用できるオンラインチェックポイント機構の設計に取り組んだ。
シナジー効果を加速するソフトウェアとハードウェアの協調設計基盤

滝沢寛之, 小林広明, 青木孝文, 佐野健太郎, 江川隆輔, 多田十兵衛, 伊藤康一

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2013年4月～ 2016年3月

詳細を見る詳細を閉じる

標準プログラミング環境としてOpenCLを想定し、より多様なアクセラレータアーキテクチャを利用するために足りない機能を指摘し、OpenCLの拡張を検討した。また、OpenCLはハードウェア記述にも使われるようになってきたが、そのカーネル部分を記述するための言語としてOpenCL C言語が必ずしも効率的とは限らない点を問題視し、画像処理や高性能計算で多用される処理を記述するための高生産性言語を設計、実装した。さらには、アクセラレータごとに適切な値の異なるパラメータを自動設定する手法を提案し、その実装と評価を行った。
デバイス・アーキテクチャコデザインによるスマートユニバーサルメモリの創出

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2013年4月～ 2016年3月

詳細を見る詳細を閉じる

本研究では、メモリサブシステムがアプリケーションプログラムの振る舞いに応じて知的にデータを管理し、それにより消費エネルギー最小でアプリケーションが求めるデータ供給能力を実現する新たなメモリアーキテクチャの基本技術の確立を研究の目的としている。本研究では、知的階層型メモリサブシステムを実現するために、高バンド幅のデータ供給を低消費電力で行うためのキャッシュアーキテクチャの設計に取り組み、その有効性と今後の課題を明らかにした。
リアルタイム津波浸水・被害予測・災害情報配信による自治体の減災力強化の実証事業

越村俊一小林広明他

2014年4月～ 2015年3月
高メモリバンド幅アプリケーションに適した将来のＨＰＣＩシステムのあり方の調査研究

小林広明金田義行橋本ユキ子

2012年4月～ 2014年3月
アプリケーション適応型動的超多階層メモリアーキテクチャの開発

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

研究種目：Grant-in-Aid for Challenging Exploratory Research

研究機関：Tohoku University

2012年4月～ 2014年3月

詳細を見る詳細を閉じる

本研究の目的は、アプリケーションが求めるメモリ機能・性能からアーキテクチャ設計を見直し、多階層・アプリケーション適応型オンチップメモリアーキテクチャ、及びその利用技術を確立することを目的としている。本研究では、マイクロプロセッサの高性能化・低消費電力化に向けて、キャッシュメモリを考慮した効率的な資源管理に取り組んだ。このような資源管理は、キャッシュメモリ上で発生するスレッド間資源競合の回避や、キャッシュメモリ資源の効率的な利用を可能とし、マイクロプロセッサの性能向上・消費電力の削減を可能とする。
ペタフロップス級計算機に向けた次世代ＣＦＤの研究開発

中橋和博, 山本悟, 大林茂, 小林広明, 山本一臣, 佐々木大輔, 鄭信圭, 滝沢寛之, 江川隆輔, 黒滝卓司, 榎本俊治, 今村太郎, 高橋俊

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

研究種目：Grant-in-Aid for Scientific Research (S)

2009年5月～ 2014年3月

詳細を見る詳細を閉じる

本研究は航空機等の空力設計に使われている現行CFDが抱える様々な課題、例えば計算結果の物理モデル依存性や複雑形状に対する作業量増大等を抜本的に解決することを目指したものである。計算機の更なる性能改善を念頭にBuilding-Cube Methodを提案し、実用化のための様々なアルゴリズム研究を行った。その成果の一つとして、自動車周りの流れを京コンピュータ上での世界トップレベルの大規模数値計算で再現した。本CFDアプローチが、極めて複雑で且つ不完全なCADデータからでも直接に流体計算を行えることを示したことは、航空機や自動車の空力設計プロセスを革新的に変える可能性を持ち、その意義は大きい。
自己修復機能を有する３次元VLSIシステムの創製

小柳光正小林広明青木孝文末吉敏則鎌田忠元吉真

2009年4月～ 2013年3月
3次元集積化新世代ベクトルマイクロアーキテクチャの創出

小林広明, 滝沢寛之, 江川隆輔

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2010年～ 2012年

詳細を見る詳細を閉じる

本研究では,低消費電力・高性能な次世代ベクトルプロセッサを実現するために,新たなデバイス技術として注目を集めている3次元実装技術によるマイクロアーキテクチャ設計に取り組んだ.従来の2次元設計と3次元設計をハイブリッドに活用する上での設計指針を与え,演算回路やオンチップメモリなどユニット内配線レベルからユニット間配線レベルまで,2次元配線の3次元TSV(シリコン貫通ビア)による効果的な置き換えを実現した.そして,3次元集積技術を活用して得られたプロセッサの有効性を性能評価により明らかにした.
超音波計測連成解析による超高精度生体機能計測システム

早瀬敏幸小杉隆司小林広明小玉哲也

2007年4月～ 2011年3月
静的データ依存関係に基づく命令ステアリング方式に関する研究

鈴木健一, 小林広明

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

研究種目：Grant-in-Aid for Scientific Research (C)

研究機関：Tohoku Institute of Technology

2008年～ 2010年

詳細を見る詳細を閉じる

現在のマイクロプロセッサは,複数の命令を並列実行することで,高速処理を実現している.本研究では,プログラムを機械語に翻訳する際に,実行速度に重要な影響を与える命令を予め抽出(静的データ依存の解析)しておき,実際にプログラムを実行するときには,簡単な処理(命令ステアリング)だけで済むようにすることで,従来方式よりも高効率な処理を実現する.研究期間内に行なった評価では,静的ステアリングでも従来方式と変わらない性能が得られることを示した.
情報爆発に対応する新IT基盤研究支援プラットホームの構築

安達淳, 田中克己, 西田豊明, 國吉康夫, 須藤修, 黒橋禎夫, 原隆弘, 松岡聡, 田浦健次朗, 建部修見, 棟朝雅晴, 廣津登志夫, 松原仁, 下條真司, 千葉滋, 湯淺太一, 松山隆司, 近山隆, 近堂徹, 河野健二, 岡本正宏, 合田憲人, 鎌田十三郎, 喜連川優, 山名早人, 中村豊, 小林広明, 中島浩, 喜連川優, 下條真司, 千葉滋

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

研究種目：Grant-in-Aid for Scientific Research on Priority Areas

研究機関：National Institute of Informatics

2006年～ 2010年

詳細を見る詳細を閉じる

本特定領域に参加する計画・公募研究班で共用するための研究基盤を構築し、研究活動の支援を行った。これにより、限られた経費の中で研究資源の共用を図り研究連携を深める効果を発揮した。具体的には開放型検索エンジンTSUBAKIによる大規模コーパスの提供、広域分散コンピューティングテストベッドInTrigger、実世界インタラクション計測分析環境IMADE、そしてセンサーネットワーク予防医療の実験環境を構築した。
ICTエコ社会を創造する安全・安心・安価なユビキタスコンピューティングプラットフォームの研究・開発

小林広明, 堀口進, 滝沢, 寛之, 福士将

2006年4月～ 2009年3月
ハードウェア・ソフトウェア協調型高効率マルチスレッドスケジューリングに関する研究

小林広明, 中村維男, 鈴木健一, 滝沢寛之, 江川隆輔, 佐藤幸紀, 小寺功, 船矢祐介, 佐藤雅之, 中村維男, 鈴木健一, 滝沢寛之, 江川隆輔, 佐藤幸紀

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2006年～ 2009年

詳細を見る詳細を閉じる

次世代のオンチップマルチコアプロセッサ(CMP)において、オンチップ計算資源の効率的活用による低消費電力高性能処理の実現を目指して、低消費電力指向高効率マルチスレッド処理技術の研究・開発を行った。具体的には、CMP上で実行されるスレッドの特徴量を定義し、この定義に基づくマルチコアプロセッサのための高効率スレッドスケジュ-リング手法を確立すると共に、高性能と低消費電力の両立を実現する動的キャッシュ分割機構を開発し、シミュレ-ションにより、その有効性を明らかにした。
3次元積層技術による超高帯域幅ベクトルプロセッサ設計に関する研究

小林広明

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Exploratory Research

研究種目：Grant-in-Aid for Exploratory Research

研究機関：Tohoku University

2008年～ 2008年

詳細を見る詳細を閉じる

本研究では, 近未来に起こる3次元集積化実装時代に対応した高性能マイクロプロセッサアーキテクチャ設計制約条件, 及びその制約下での最適アーキテクチャ設計方式を明らかにすることを目的としている. 平成20年度には, 3次元積層の要素技術, および3次元積層技術を用いた新たなアーキテクチャ設計に関する研究動向の調査・検討を行った. これにより, 3次元積層技術により利用可能となるチップ内のトランジスタ数は飛躍的な増加し, 3次元方向に積層される各シリコン層を結合するThrough Silicon Via(TSV)によりチップ上の配線長, および配線遅延時間の短縮が可能であることを確認した. また, 近年入出力ピンの実装技術の限界により, メモリバンド幅の低下が懸念されているベクトルプロセッサに着目し, 前述の三次元積層技術がもたらす利点を最大限に活かすことが可能な3次元積層技術を用いた大容量オンチップメモリを搭載する3次元ベクトルプロセッサを提案した. 提案した3次元ベクトルプロセッサは, プロセッサ層と複数のメモリ層から構成され, メモリ層を増加させることオンチップメモリの容量を容易に増加させることが可能であり, オフチップメモリへのアクセス数を削減することで, オフチップメモリアクセスに伴う消費電力を抑制しつつ, メモリアクセスレイテンシを効果的に隠蔽する. 評価の結果, 提案するメモリ積層型3次元ベクトルプロセッサは既存の2次元実装のベクトルプロセッサと比較して, 消費エネルギを最大14%, 実行サイクルを最大63%削減出来ることを示した.
安全・安心なボランティアコンピューティングによる超大規模データマイニング

小林広明, 滝沢寛之

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

研究種目：Grant-in-Aid for Scientific Research on Priority Areas

研究機関：Tohoku University

2007年～ 2008年

詳細を見る詳細を閉じる

本研究は, 家庭用ゲーム機の機能・性能を活用するボランティアコンピューティングによって, 大規模データマイニングを実現するための基盤技術を確立することを目的としている. 平成20年度には, ロケット噴射ノズル近辺での物理現象の解析を行う分散データマイニングシステムを構築し, PLAYSTATION 3およびInTriggerから構成されるボランティアコンピューティング環境で大規模データマイニングの実証実験を行った. その結果, 動的負荷分散の実施方法として従来通り集中型のタスクスケジューリングを用いる場合, 計算資源の増加に伴い動的負荷分散が効率的に行えなくなり, 大規模ボランティアコンピューティング環境で期待する性能を実現することができないことが示された. 一方, 本研究で提案している分散協調型スケジューリング機構では計算資源の台数が増加しても動的負荷分散を効率的に実施すること可能であることが明らかになった. 本評価実験より, 提案機構が大規模ボランティアコンピューティング環境における動的負荷分散を実現する有効な機構であることが明らかになった. また, 複数のプロジェクトに参加するボランティアが遊休計算能力を浪費しないために, ワーカ側でのスケジューリング手法も提案した. ボランティアコンピューティングの信頼性を高めるための仕組みとして, 計算結果の妥当性を効率的に確認する車法も提案した. 各ワーカの信頼度を定量化し, 計算結果妥当評価に基づいて信頼度を変化させることによって, 不正なワーカを検出できることをシミュレーションにより明らかにした. さらに, 家庭用ゲーム機が高い描画処理性能を有している点に着目し, その描画処理性能をデータマイニングのために利用する方法について検討し, そのようなプログラミングを容易に行うためのプログラミングフレームワークについても研究した.
超高速フォトニック・ネットワークの構成方式に関する研究

堀口進, 小林広明, JIANG Xiaohong, 福士将, 山森一人

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2005年～ 2007年

詳細を見る詳細を閉じる

本研究では、光デバイスを用いた大規模フォトニック・スイッチを用いた超高速オプティカル・ネットワーク構成方式の確立を目指した。先ず、光クロスコネクト・スイッチとして再帰構造型多段結合型フォトニック・スイッチ方式やルーティング制御方式を考案し、自己完結型ルーティングが可能で大規模化に適していることを明らかにした。次に、多段結合網光スイッチにおけるクロストークの起こらない3次元実装方式について提案し、接続要求を多数の多段結合網スイッチレイヤへ割り当てる有効な方式によりノンブロック性能を向上できることを示した。その際に、従来の割り当て方式では各レイヤでの接続要求の負荷が集中する問題を明らかにした。各レイヤに接続要求数から求めた負荷分布指標を新たに考案し、負荷を分散できる割り当て方式を提案し、詳細なシミュレーション実験により従来法に比べ負荷分布でき、より少ないブロック率で大規模スイッチを構成できることを示した。更に、大規模フォトニック・トワークにおけるスイッチ故障やファイバー切断に対する故障回復方式やルーティングに関す検討を行った。故障回復方式では、従来の受動的経路確保に替わって能動的故障回復方式を考案するとともに、代替経路の重なりによる影響を解析的に求める数学モデルを提案し、情報が開示されているWDMネットワークである米国NSFネットやカナダ、イタリアのバックボーン・ネットワーク上での故障回復方式について詳細な性能評価を行った。その結果、能動的故障回復方式では従来法より高速に故障回復が可能であること、解析モデルと実験結果の比較から4次程度の代替経路重なり影響を考慮すれば性能を予測できることを明らかにした。
安全・安心なボランティアコンピューティングによる超大規模データマイニング

小林広明, 滝沢寛之

2006年～ 2006年

詳細を見る詳細を閉じる

本年度には、代表的なデータマイニング手法の中でも特に高い演算性能が要求されるデータクラスタリング(Data Clustering, DC)とニューラルネットワーク(Neural Networks, NN)に着目し、それらの処理を家庭用ゲーム機で効率良く実行するための実装方法について検討した。具体的には,家庭用ゲーム機に搭載されている高性能プロセッサであるCell Broadband Engine(CBE)や、描画処理ユニット(Graphics Processing Unit, GPU)をデータマイニング処理に効果的に利用する方法について研究し、実装と定量的性能評価を行った。大規模P2Pコンピューティングに関する研究として、ネットワーク上に遍在する膨大な数の遊休計算機資源から、利用者の要望を満たす計算機資源を効率良く検索するための分散型計算資源管理機構について研究した。研究成果として、利用者からの要望には計算機のメモリアクセスの振舞いに見られるような時間的、空間的な局所性が存在し、それらの局所性を利用することで探索効率の飛躍的改善が可能であることが明らかにした。本年度は特に不均質な環境下での資源探索を考慮し、利用される頻度に応じてP2P通信の接続数を自動調整する仕組みについて検討した。また、膨大な数の計算機を連携させるための仕組みとして、完全分散型の動的負荷分散機構についても研究を進め、その基本制御方式を設計した。耐タンパー性計算による安全・安心な分散データマイニングシステムをボランティア計算基盤に実現するための準備として、本年度は開発環境の構築を行った。また、関連資料を収集するとともに、関係者との議論を行った。
進化型計算機能を有する自律再構成ハードウェアに関する研究

堀口進, 小林広明, 福士将

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Exploratory Research

研究種目：Grant-in-Aid for Exploratory Research

研究機関：Tohoku University

2004年～ 2006年

詳細を見る詳細を閉じる

VLSI技術の発展により、可変結合論理アレイ素子を用いて動作環境に応じ機能を自律的に変化させる進化型ハードウェアに関する研究が注目されている。本研究では、静的FPGAや動的FPGAなどのプログラマブル論理素子により実用規模VLSIシステムに進化型計算を適用させ、自律再構成が可能なハードウェア方式について研究を行ってきた。特に、進化型計算機能に基づいた再構成システムの詳細な性能評価を行った。その結果、階層型ニューラルネットワークの故障補償可能な再構成型ハードウェアに適応した進化型計算の機能回路システムと遺伝的アルゴリズムにより学習した回路情報をハードウェア実装することにより木構成方式の有用性を示した。次に、故障状況に応じてニューラルネットワーク構成を可変にできる自律再構成ハードウェアシステムならびに進化型計算機能を適用した故障回避可能な格子型結合プロセッサ縮退再構成システムについて詳細に検討した。その結果、FPGAデバイスを用いた進化型計算機能回路システムを搭載した故障補償可能な階層型ニューラルネットワークハードウェア実装システムに関する研究成果に基づいて、新しく考案した遺伝的アルゴリズム学習、回路情報と故障補償可能ニューラルネットワークは、問題規模や動作環境に応じてネットワーク構成を自律的に変化させることが出来ることが分かった。更に、進化型計算機能に基づいた自律再構成格子型結合プロセッサ縮退再構成方式や遺伝的アルゴリズムの故障回避コーディング学習方式の提案とシステム実装を行いその性能評価を行った。これらの研究成果により、進化型計算機能に基づいた故障回避可能な自律格子型結合プロセッサ縮退再構成方式の有用性を明らかにした。
3次元グラフィックス用インテリジェントメモリアーキテクチャに関する研究

小林広明, 中村維男, 鈴木健一, 滝沢寛之, 佐野健太郎

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2002年～ 2004年

詳細を見る詳細を閉じる

本研究により、以下のような成果が得られた。 (1)高性能グラフィックスアルゴリズムとそのハードウェア化に関する成果大域照明モデルに基づくレンダリングアルゴリズムの持つ並列性とデータ参照の局所性の解析を行い、新たなレンダリングパイプラインの基本アーキテクチャを設計した。さらに、本レンダリングパイプラインのハードウェアアルゴリズムを設計・開発した。さらに、ソフトウェアシミュレーションにより性能評価を行い、リアルタイムレンダリングの実現可能性を明らかにした。さらに、ウォークスルーアニメーション用高速レンダリングアルゴリズムを開発し、性能評価によりその有効性を明らかにした。 (2)省電力メモリ制御機構に関する成果本グラフィックスアーキテクチャを携帯端末などの低消費電力指向の情報機器に組込むことを目的として、電力あたりの計算効率が最大になる動的再構成可能メモリシステムの基本設計をした。計算負荷の変動に応じてシステムの演算器・メモリ要素を活性化・不活性化可能な動的再構成可能インテリジェントメモリ機構の設計を行い、活性化ハードウェア量とその性能への影響を定量的に評価し、アプリケーションの計算資源要求の時間変化に応じてハードウェアを最適制御できることを明らかにした。 (3)グラフィックスハードウェア用データ圧縮アルゴリズムに関する成果グラフィックスデータの高効率・高性能圧縮技術に関する研究を行った。ボリュームデータにベクトル量子化技術を適用し、情報損失最小下での高効率データ圧縮を実現した。さらに、圧縮データに直接適用可能な可視化アルゴリズムを開発し、高速ボリュームレンダリングを実現した。データ圧縮の主処理であるデータクラスタリングの高速化を目的として、グラフィックスハードウェア上で動作可能な並列高次元ベクトル間距離計算アルゴリズムを開発した。
低電力超高速マイクロプロセッサのアーキテクチャに関する研究

中村維男, 後藤源助, 深瀬政秋, 小林広明, 萩原将文, 鈴木健一

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Tohoku University

2002年～ 2004年

詳細を見る詳細を閉じる

マイクロプロセッサの高性能化に伴い,消費電力が増加し,チップ全体の発熱による温度上昇が著しい.そこで,低電力で稼働する超高速マイクロプロセッサが求められている.本研究は,低周波数で動作しメモリを含む構成要素を合理的に結合することにより低電力動作を実現するマイクロプロセッサのアーキテクチャの確立を目的として行なわれた. まずは,マイクロプロセッサの低電力超高速化とは如何なるものであるかの定義を行ない,今後のマイクロプロセッサの設計指針の一つを示した.この指針は,関係する国際会議において強い関心を持たれ,平成17年度においても招待講演を既に予約されている. その定義を基に,いくつかのアーキテクチャの提案と評価を行ない,その可能性を明らかにした.また,提案アーキテクチャの性能発揮のためには,細粒度から粗粒度までの広範囲のスレッドレベル並列性の抽出が重要であることを示し,それを実装する並列性抽出手法を考案した. 一方,マイクロプロセッサのハードウェアにおいて,処理の基本的な部分であるデータパスを低電力高速処理に向けて設計することは,極めて重要である.本研究では,ウェーブパイプライン手法をデータパスに適用することにより,高速処理と低電力化が両立できることを示した.また,マイクロプロセッサのデータパスとメモリの間のキャッシュメモリについても,両者の速度差を隠蔽する新しい方法を示した. プロセッサによる並列処理の対象として,情報圧縮技法の一つであるコードブック設計がある.これに対して,ハードウェアとソフトウェアの両面から低電力化を指向することで,低電力高速プロセッサを実装し,その効果を実証した.
VLIWアーキテクチャのための高速・高機能命令供給機構に関する研究

鈴木健一, 小林広明, 中村維男, 鈴木健一

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

研究種目：Grant-in-Aid for Scientific Research (C)

2000年～ 2002年

詳細を見る詳細を閉じる

次世代の高性能マイクロプロセッサアーキテクチャとして期待されているVLIWアーキテクチャでは、並列実行可能な命令を短時間にメモリから演算機へと安定供給できるメモリシステムの存在が性能発揮の必要条件となる。本研究では、まず、VLIWアーキテクチャのための高性能な命令キャッシュ機構として、MULHI(MULtiple HIt)キャッシュ機構を提案した。MULHIキャッシュ機構では、無効命令(nop)を格納しないことによって、キャッシュメモリアレイの使用効率を高めることで高キャッシュヒット率を獲得し、その結果として、高バンド幅のメモリシステムを実現する。 MULHIキャッシュ方式は、nopを格納しない点において、従来のCOMPRESSキャッシュやSILOキャッシュと発想を同じくしている。しかしながら、キャッシュのアソシアティビティをうまく活用することによって、これらの対抗方式よりも高効率を狙うものである。ミス率ベースの性能評価から、MULHIキャッシュは従来方式よりも高OPC(Operation Per Cycle)を実現できることを明らかとした。また、詳細なハードウェア設計から、MULHIキャッシュの制御回路のオーバヘッドは十分に小さく、VLIWキャッシュに適用可能であることを示した。最後に、キャッシュによるデータ供給の新しい応用として、実時間レイトレーシングハードウェアを想定し、その有用性を示した。
再構成格子結合型マルチプロセッサ用自律再構成ネットワークの試作構築

堀口進, 林亮子, 山森一人, 小林広明, 井口寧

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：Japan Advanced Institute of Science and Technology

1999年～ 2001年

詳細を見る詳細を閉じる

大規模メッシュアレイをWSIによりウェーハ上に集積し,高性能なシステムを実現するためには,欠陥PEを回避するための再構成手法が必要不可欠となる大規模集積システムにおける再構成の問題点は,個々のPEの検査,メッシュアレイの再構成,および論理アレイの実現などに膨大な時間を要するということである.これらの問題に対し,近年,再構成そのものをハードウェアとしてシステムに埋め込み,自律的に高速に再構成するという自律再構成が注目されている.本研究の目的は,ハードウェア実装による自律再構成法の実装方式を確立し,高い再構成率を実現しつつ少ないハードウェア回路量で実装することである.そこで,冗長アプローチによる格子結合型ネットワークの再構成問題に対して,スイッチの近隣のPEが持つ欠陥情報のみを用いて,スイッチ自身が状態を変更することで制御を簡略化した自律再構成(BC)アルゴリズムを提案した.このBCアルゴリズムに関してシミュレーション実験を行い,グラフ理論に基づいた再構成方式や再帰手続きによる再構成方式と同程度の高い再構成率を実現できることを示した.また,再構成率のみだけでなく再構成に必要な処理時間,再構成後の最大接続距離などの多くの項目について,再構成に必要な冗長ハードウェア量を考慮しながら総合的に評価した。それらの議論に基づいて,FPGAを用いて自律再構成(BC)アルゴリズムを実現する格子結合型ネットワークの自律再構成試作システムの設計開発を行った。FPGAを用いた自律再構成ネットワーク試作システムを用いて,ハードウェアシステムの動作確認,再構成時間,再構成ネットワーク検証および回路規模の評価を行った.その結果,格子結合プロセッサネットワークのプロセッサ故障局所情報のみを用いた自律再構成方式は,比較的少ないハードウェア回路規模で再構成が可能、再構成に要する時間の高速化も可能となることを明らかにした.
リアルタイムフォトリアリスティックコンピュータグラフィックスシステムの開発

小林広明, 片平昌幸, 北島宏之, 中村維男, 鈴木健一, 山内斉

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B).

研究種目：Grant-in-Aid for Scientific Research (B).

研究機関：TOHOKU UNIVERSITY

1998年～ 2000年

詳細を見る詳細を閉じる

本研究は、大域照明モデルに基づく写実的画像生成用並列計算機アーキテクチャの確立を目的として、本研究代表者が提唱しているオブジェクト空間分割型並列処理モデルに基づくグラフイックスエンジンの基本設計を行い、プロトタイプとして、ラジオシティ・レイトレーシング用グラフィックスエンジンThunderを開発した。Thunderは、PCIインタフェースを有するプリント基板上に、20万ゲートの回路を実装可能なFPGAを2個と256MBのSDRAMモジュール4個(合計1GB)を実装している。FPGAには、エンジンの基本構成要素である、3次元直線発生ユニット、交差判定ユニット、2次光発生ユニットを実装している。また、SDRAMはオブジェクトメモリとして使用され、512MB/sのメモリバンド幅で、オブジェクトデータをFPGAに供給可能である。 Thunderの設計では、中心的役割を果たす交差判定ユニットの最適化に力を入れた。ハードウェアコストと性能のトレードオフから、交差判定ユニットには固定小数点演算器を採用した。ここで、固定小数点演算器による実装では、桁溢れによる画質の悪化が懸念されることから、固定小数点演算器でも精度が悪化しなアルゴリズムを新たに考案した。アルゴリズムの有効性を確認するために、交差判定ユニットのシミュレータを開発し、画像生成実験により固定小数点演算器でも浮動小数点演算器に匹敵する画像を生成できることを確認した。また、Thunderの高性能化に向けて、計算エンジンの内部に複数の交差判定演算器を設ける並列化を提案し、性能評価を行った。その結果、交差判定処理ユニット内部で8並列の場合6.4倍、16並列の場合11倍、それぞれ性能向上することがわかった。特に、16並列の場合、同一クロック周波数で動作するPentiumII(400MHz)の20倍の性能を有する。ただしこの性能を実現するためには、100GB/secメモリバンド幅が必要であることもわかった。メモリ混載技術等による高メモリバンド幅の実現は今後の研究課題である。
脳構造化スーパーコンピュータの高度アーキテクチャに関する研究

中村維男, 深瀬政秋, 小柳光正, 長谷川勝夫, 小林広明, 萩原将文, FLYNN Michae

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B).

研究種目：Grant-in-Aid for Scientific Research (B).

研究機関：TOHOKU UNIVERSITY

1998年～ 1999年

詳細を見る詳細を閉じる

スーパーコンピュータを単に高速計算を行う道具として捕えるのでなく、処理速度の向上に加えて、頭脳の機能をコンピュータに取り入れることにより、処理の方法に柔軟性を持たせたコンピュータの設計を目的とした設計思想の研究に主眼を置いてきた。そして、頭脳の構造の検討を通してその機能を実現するヒントを得たことにより、旧来よりもさらに進んだ頭脳に近い脳構造化スーパーコンピュータの概念設計を作り上げてきた。さらに、そのような脳構造化スーパーコンピュータを基本設計から詳細設計へと進めて、具現化していくにおいて、左脳の機能を実現する設計とその実装に向けてのリアルなシミュレーションを行うためのシミュレータを作成した。その結果、あたかも左脳の動きが可視化されるがごとく、処理の内容が動きと共に明確に観察できるようになった。このことは、大変意義ある研究成果であると考えられる。また、右脳機能を実現するためのコンピュータグラフィックスや、ボリュームレンダリングの研究では、高速・高精度可視化のためのアルゴリズムの研究・開発を行った。その結果、右脳の機能を機械の上に柔軟に、かつ高速に実現することが出来たことは、重要な成果である。加えて、左悩機能と右脳機能を自然に統合する具体的方式を考えてきた。そして、いずれの機能においても、処理速度をかなり上昇させる必要があることの結論を得て、コンピュータのプログラムに含まれる命令レベル並列性を最大限に利用するためのキャッシュに関わる研究と投機的命令実行に関する研究を積極的に行った。その結果、左脳と右脳そのものの構造と言うより、むしろコンピュータ固有の構造的開発により、機能的に納得の行く結果が得られた。このことは重要な知見である。
空間分割型並列処理に基づくボリュームレンダリングアルゴリズムに関する研究

小林広明

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Encouragement of Young Scientists (A)

研究種目：Grant-in-Aid for Encouragement of Young Scientists (A)

研究機関：Tohoku University

1998年～ 1999年

詳細を見る詳細を閉じる

本研究では、3次元データであるボリュームのリアルタイム可視化を可能とする並列アルゴリズムの研究を行なった。具体的には、平成10年度に設計した適応分割による負荷バランスを考慮した並列シェア・ワープアルゴリズムを並列計算機に実装し、その性能を評価した。性能評価の結果、本並列アルゴリズムは、並列計算機の処理要素であるプロセッサ数に比例した性能向上が得ることがわかった。また、適応分割を導入することにより、並列処理を行なうプロセッサ間の負荷分散が実現されると同時に、並列アルゴリズムに内在する通信量が減少し、その結果、並列処理効率が改善されることがわかった。そして、32台のプロセッサからなる並列計算機により、256×256画素の画像を1秒間に10枚以上生成できることを確認した。また、本研究では、ボリュームデータとポリゴンデータが混在したシーンに対する写実的画像生成を実現するために、大域照明モデルに基づく画像生成法であるレイトレーシング法とラジオシティ法の改良と、その並列化を行なった。具体的には、光線のボリューム内伝搬におけるエネルギー授受モデルをラジオシティとレイトレーシングの照明モデルと統合化し、さらに、統合化したモデルをオブジェクト空間分割型並列処理モデルに基づいて並列化した。本改良並列アルゴリズムにより、ポリゴンで実現される物体と雲や霧などが混在するシーンに対する大域照明モデルでの写実的画像生成が高速に実現できる。
超並列シミュレーションのビジュアル化に関する総合研究

堀口進, 阿部亨, 小林広明, 安倍正人, 川添良幸, 丹野州宣, ハミドイサム

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

研究種目：Grant-in-Aid for Scientific Research (B)

研究機関：JAPAN ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY,Hokuriku

1995年～ 1996年

詳細を見る詳細を閉じる

コンピュータシミュレーションは、様々な科学技術分野で用いられ実用規模の数値模擬実験には巨大なメモリ空間と膨大な計算時間が必要とされている。現在、この分野では、スーパーコンピュータに代わって多数の高速プロセッサからなる超並列コンピュータが注目され、最先端分野の超シミュレーション法およびデータの可視化に関する研究が切に望まれている。本研究では、基礎物理・科学・物性材料設計・流体問題や神経回路学習などの最先端分野におけるコンピュータ・シミュレーションを疎結合型超並列コンピュータで実行する超並列シミュレーションとその可視化について物理、材料、計算工学、計算機科学、ソフトウェア科学などの分野から詳しく検討してきた。その結果、従来のコンピュータにあったシミュレーション対象の量的、質的な制限を大量に緩めることが出来ることが明らかになった。例えば、物理・科学分野での分子の振舞いをシミュレーションする分子動力学法では、分子数が数千個に限られていた物を数万個に容易に拡張できる。また、シミュレーションデータの可視化により気体から液体への相転移などの分子の振舞いを確認できた。超並列シミュレーションの高速化については、プロセッサ間ネットワーク、メッセージパッシング、データ配置を考慮した動的負荷分散並列シミュレーション・アルゴリズムの提案を行ない、その有用性を確認した。この分野以外では、流体シミュレーション、3次元ウェーハスタック構造超並列コンピュータの発熱シミュレーション、脳における視覚神経の学習時の活性化シミュレーションや自己組織化の超並列シミュレーションを行いその高速性と有効性を明らかにした。更に、3次元コンピュータ・グラフィックスを用いた超並列シミュレーションからの膨大なシミュレーションデータの可視化を行い、複雑なデータのカラー可視化や3次元可視化手法の有効性を示した。
TLB統一型キャッシュメモリシステムに関する研究

小林広明

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Encouragement of Young Scientists (A)

研究種目：Grant-in-Aid for Encouragement of Young Scientists (A)

研究機関：Tohoku University

1995年～ 1995年

詳細を見る詳細を閉じる

本研究では,マイクロプロセッサのチップ上に個別に実装され,チップ面積の大きな割合を占めるTLBとキャッシュメモリについて,それらをタグの共有という形で統合化することにより,領域の縮小を試みた.また,縮小によって得られた領域をTLBの拡大として再利用することにより,メモリアクセスサイクルの減少の可能性について検討した. まず,TLB統一型キャッシュメモリの構成とその制御法を明確にし,TLB統一型キャッシュメモリのハードウェア量をレジスタビット相当で評価した.その結果,TLB統一型キャッシュメモリを導入することにより,従来のキャッシュメモリとTLBの構成に比べて,ハードウェア量を大幅に削減できることがわかった.そして,削減できたハードウェアをTLBの拡張に再利用した場合,キャッシュサイズが4KBの時は16エントリのTLBを2倍,8KBの時は4倍,16KB,32KBの時は8倍,128KBの時は16倍にそれぞれ拡張できることが明らかになった.次に,TLB統一型キャッシュメモリの性能評価をトレースドリブンシミュレーションにより行った.まず,実用的な8個の応用プログラムをワークステーションで800万命令実行した際のメモリアクセル状況を記録し,これを命令実行に必要なメモリアクセスとして,TLB統一型キャッシュメモリシミュレータと通常のTLB-キャッシュメモリシミュレータに入力した.そして,シミュレータ上でのキャッシュとTLBを介したメモリアクセス状況から,それぞれのミス率を求め,ミス率から1命令の実行に必要な平均メモリサイクル数を求めた.シミュレーションによる性能評価の結果,TLBとキャッシュメモリの統合化により削減できるハードウェア領域をTLBの拡張に再利用することにより,同量のハードウェアを必要とする従来型の構成比べて,メモリサイクル数減少させることが可能であることを明らかにした.
ウェーハスタック構造型自律再構成超並列コンピュータの研究

堀口進, 沼田一成, 阿部亨, 武田利浩, 丹野州宣, 小林広明, 阿曽弘具, JAIN Vijay, LOMBARDI Fan, KIM Jung Hwa, KNIGHT Thoma, 下平博, JAIN Vjay, LOMBARDI Fab, KIM H.Jung, KNIGHT F.Tho, 中村維男, FABRIZIO Lom, THOMAS F.Kni, PETER Wyatt, JUNG H.Kim

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

研究種目：Grant-in-Aid for international Scientific Research

研究機関：JAPAN ADVANCED JNSTITUTE of SCIENCE and Technology, Hokuriku

1993年～ 1995年

詳細を見る詳細を閉じる

集積回路技術の発展とともにウェーハ上に高機能・超密度集積回路システム、いわゆるウェーハスケール集積デバイスを実現しようとする研究が、ウェーハスケール・インテグレーションWSI(Wafer Scale Integration)であり、超並列コンピュータを実現する技術として注目されている。これらの研究から、超並列コンピュータ研究で最も重要なことは、柔軟で効率の良いプロセッサ間の結合ネットワーク方式の提案ならびにWSIシステムなどに要求される高信頼性能ならびに高いシステム構成率を達成できる欠陥・故障箇所の救済技術(Fault Tolerance,Defect Tolerance)の研究であることがわかった。現在まで、スーパーコンピュータの処理能力をはるかに凌ぐコンピュータパワーを満足させる超並列コンピュータに対する幾つかの提案がなされてきた。Seitzは、n-バイナリキューブと称されるコスミックキューブ・マルチプロセッサシステムを構築した。また、いくつかの商用マルチプロセッサシステムは、n-バイナリキューブ結合のハイパーキューブ結合網をプロセッサ間結合に採用している。PreparataとVuilleminは、ハイパーキューブ結合の各ノードをループ結合に置き換えた巡回キューブ結合CCC (the cube connected cycles) (CCC)を提案している。3次元超並列コンピュータに関して、M.Little et al.は、32×32のセルラ-アレイを5層のウェーハから成るイメージ処理用3次元コンピュータの試作に成功している。彼らは、同様のシステムでニューロコンピュータの構築を提案し、積層ウェーハ間の結線は非常に短くでき、プロセッサ間のネットワークに適している事を指摘している＼cite{Michael93}。しかしながら、3次元コンピュータで最も重要な欠陥・故障箇所の救済技術やアーキテクチャに関しては十分な研究が成されていなかった。本研究では、スタック構造型3次元アレイプロセッサのアーキテクチャおよび冗長構成を用いたフォールトトレランスアーキテクチャの提案を行なっている。ここで提案した3次元コンピュータアーキテクチャは、再帰シフト法を用いた自律再構成が可能である。Anuj Chandra et al.はS.Y.Kung et al.により提案された補償パス法を拡張した3D 1/2 トラックモデルについて議論している。しかし、彼らは、2次元アレイプロセッサで開発された2D 1/2トラックモデルを3D 1/2トラックモデルへ拡張する理論検討のみで終っている。本研究では、理論的な検討に留まらず、先に提案した再帰シフト法を用いた自律再構成方式によりスタック構造型超並列コンピュータの欠陥・故障箇所の救済性能を求め、詳しく議論した。本研究では、現在までのWSIデバイスの詳しい研究・開発動向をサーベイしWSI技術や超並列コンピュータの問題点を議論し、WSIによる超並列コンピュータの可能性を指摘した。これらの詳細な検討により、超並列コンピュータの結合方式の1つである格子結合型マルチプロセッサアーキテクチャならびにWSIへの実装を考慮した再構成方式について詳しく研究を行なった。提案した自律再構成方式と従来の再構成方式について詳細な検討を行ない、自律再構成方式の優れていることを明らかにした。これは、新らしい知見である。また、ハイパーキューブ結合の各ノードをループ結合に置き換えた巡回キューブ結合CCCのWSI構成方式について提案を行なって。CCCのWSI構成方式では、プロセッサ、スイッチやネットワーク結線の面積からシステム歩留まりを解析的に求める方式を示し、その歩留まり性能について検討し、階層型ハイパーキューブ結合によるより高い歩留まりを実現できる冗長構成であることを示した。さらに、超並列コンピュータのプロセッサ結合の直径や平均距離を軽減できるクロスドキューブ結合網を更に改良したHCQ結合を提案した。提案したHCQのネットワーク性能にいついて詳しい解析を与え、優れたネットワークであることを明らかにした。これも、本研究の新らしい有用な知見である。最後に、超並列コンピュータとしてスタック構造型3次元格子結合アレイプロセッサアーキテクチャを提案した。3次元格子結合アレイプロセッサに対する再帰シフト法を用いた自律再構成アルゴリズムについて、詳しい議論を行なった結果、比較的高い欠陥・故障箇所の救済性能が得られる事を明らかにした。
写実的画像生成のための超並列システムに関する研究

小林広明

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for Encouragement of Young Scientists (A)

研究種目：Grant-in-Aid for Encouragement of Young Scientists (A)

研究機関：Tohoku University

1994年～ 1994年

詳細を見る詳細を閉じる

本研究では,写実的画像生成のための超並列システム実現に向けて,その基礎となる新しい大域照明モデルを提案し,本モデルに基づいたシステム構成方式とその制御方式について検討した.具体的には,まず,物体情報を各プロセッサに分散配置するメモリモデル上での新しい超並列写実画像生成方式を実現するために,オブジェクト空間分割型並列処理方式に注目し,光線追跡法とラジオシティ法を統合した大域照明モデルにオブジェクト空間分割型並列処理方式を適用させて,新しい超並列写実的画像生成アルゴリズムを考案した.次に,本アルゴリズムに適した超並列計算機アーキテクチャについて検討し,システム構成,およびその制御方法を具体化した.最後に,本システムの性能評価のために,本システムのレジスタトランスファレベルでのシミュレーションが可能なシミュレータを開発し,いくつかのテスト画像生成でその性能を評価した.性能評価の結果,本システムは,256台程度まではプロセッサ台数に比例して処理時間が減少し,台数効果が得らることがわかった.また,システムの稼働率について検討したところ,256台以下では高い稼働率が達成されているが,それ以上のプロセッサからなるシステムでは,稼働率の著しい低下が観測された.この理由としては,本研究で考案した並列アルゴリズムでは,物体定義空間を静的に分割し,それをプロセッサに均一に割り当てることによりプロセッサへのタスク割り当てを行う静的負荷分散法を採用しているために,プロセッサ数を増加させた場合,それに見合う十分な空間分割が行われないと,負荷の不均一が発生し,その結果,プロセッサの稼働率に偏りが生じてしまうからである.これをさけるためには,より細かい空間分割を行うか,実行時のプロセッサの稼働率状態に応じてタスクの再配置を行う動的負荷分散を行うことが必要と思われる.これについては,今後の最重要課題である.
脳構造化スーパーコンピュータの研究

中村維男, 杉本理, 小林広明, 萩原将文, 後藤英介, 深瀬政秋, 長谷川勝夫, FLYNN Michae, MICHAEL Flyn

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

研究種目：Grant-in-Aid for international Scientific Research

研究機関：TOHOKU UNIVERSITY

1993年～ 1994年

詳細を見る詳細を閉じる

本研究では、脳構造化スーパーコンピュータの解析と統合およびその性能評価を目的として、スタンフォード大学と東北大学が共同研究を行うことを計画した。このために、研究代表者と研究分担者は合計10回の研究連絡会議を開いた。その内訳はスタンフォード大学で8回、東北大学で2回である。これらの会議では、日米の研究協力者も適宜討論に参加した。その他、計算機アーキテクチャの分野で指導的立場にある研究者を招いての会議も開催した。さらに、日常的には電子メールによる研究連絡を頻繁に行った。その結果、研究計画の項目毎に以下に示す実績を得ることができた。本年度はこの他にも、機械設計支援システム、並列アルゴリズム、マルチメディアに関する論文、計算機アーキテクチャを指向した計算機ハードウェアに関する著書1冊の実績を得ている。 1.脳構造化スーパーコンピュータの統合:研究計画の全項目を脳構造化スーパーコンピュータとして統合した。マインドコンピュータ、表現認識連想記憶メモリ、脳波学、人口蝸牛殻、過疎分散メモリ、波状パイプライン、ジェットパイプライン、論理型アーキテクチャ、記号処理アーキテクチャ、機能型アーキテクチャ、コンピュータグラフィックスの役割を考慮に入れ、脳構造化スーパーコンピュータにおける位置付けを明確に図示した。 2.過疎分散方式メモリの構築:脳構造化スーパーコンピュータにおいて過疎分散方式メモリと対をなす波状パイプラインシステムに関して、CMOS VLSIベクトルユニットによる実装設計を行った。さらに、脳構造化スーパーコンピュータにおける処理とデータ伝送に不可欠のベクトルマシン、スーパースカラプロセッサ、マルチプロセッサなどの超高速プロセッサとコンピュータネットワークについての問題点と指針を明らかにした。 3.RIGHTコンピュータの解析:スーパーコンピュータで脳機能を実現するための方法論に関するこれまでの研究をさらに発展させ、階層構造を有する分散型連想記憶メモリシステムを用いた脳構造化スーパーコンピュータの概念的モデルを明確にした。特に、このモデルに関してのRIGHTコンピュータの解析を行った。さらに、概念的モデルと具体的モデルの融合を試みた。これらの研究成果は近く公表の予定である。 4.超並列記号処理システムの構築:超並列記号処理システムをVLSIで構築することを目的として、この研究の基礎となる学問の体系化を行い、1冊の図書にもとめた。さらに、VLSIの設計に関する独自の方法について研究を行った。得られた成果をもとに現在論文を作成中である。 5.脳の処理モデルの研究:医学的な見地から遺伝子と脳の相互作用を検討し、脳の処理モデルの独創的な研究を展開している。これらの研究成果は近い将来公表の予定である。 6.RIGHTコンピュータの性能評価:RIGHTコンピュータの構成要素であるニューラルネットワークとファジィ推論システムの融合、分散表現を用いた知的情報処理、および連想記憶メモリの性能評価に関して4編の論文を公表した。 7.LEFTコンピュータとRIGHTコンピュータの性能評価:LEFTコンピュータとRIGHTコンピュータは、脳構造化スーパーコンピュータの処理部と入出力部に対応する。本研究計画項目では、特にデータ処理と出力を担当するコンピュータグラフィックスシステムの光線追跡法と多重路表現法について、詳細な性能評価を行った。 8.RIGHTコンピュータのためのニューラルネットワークの研究:RIGHTコンピュータのためのニューラルネットワークに最近話題のウェーブレット変換を導入し、音声データ処理についての研究を展開した。
スーパーコンピュータの構成と性能に関する基礎研究

中村維男, 杉本理, 小林広明, 萩原将文, 後藤英介, 深瀬政秋, 長谷川勝夫, MICHAEL FLYN, MICHAEL Flyn

提供機関：Japan Society for the Promotion of Science

制度名：Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

研究種目：Grant-in-Aid for international Scientific Research

研究機関：TOHOKU UNIVERSITY

1991年～ 1992年

詳細を見る詳細を閉じる

1.研究活動の要約本研究では,スーパーコンピュータの構成に向けての基本設計概念を命令セットレベルで作成することを目的として,スタンフォード大学と共同研究を行うことを計画した。このために,研究代表者と研究分担者(長谷川勝夫,深瀬政秋,萩原将文,小林広明,杉本理)は合計35回の研究連絡会議を開いた。その内訳は (1)スタンフォード大学において21回, (2)研究分担者(Michael j.Flynn教授と杉本理)を東北大学に招へいして7回, (3)研究分担者(後藤英介)を交えて東北大学で7回, である。これらの研究連絡会議には,オブザーバーが適宜討論に参加した。 2.研究方針これらの研究連絡会議において,はじめにスーパーコンピュータの汎用化に向けて頭脳の左脳と右脳の機能を計算機科学の範疇で整理した。次に,本研究では入出力部のRIGHTコンピュータと処理部のLEFTコンピュータで超越スーパーコンピュータを構成することを目指すことを決めた。 3.LEFTコンピュータに関する研究実績 LEFTコンピュータについては,ベクトル演算の他にスカラー演算,論理演算,言語処理等が従来の処理速度を1桁以上上回る命令レベル中心の計算機アーキテクチャの設計を行うことを計画した。ジェットパイプライン,ウエーブパイプライン,シンボリックアーキテクチャの研究を行ったことにより,この項目はかなりの程度まで達成した。 (1)ジェットパイプラインジェットパイプラインは,フォンノイマン型計算機に属するほとんど全てのプロセッサを1つのプロセッサに集積化したものである。ジェットパイプラインによるリバモアループの処理時間に関して,従来のフォンノイマ型計算機に関する概念では説明することのできない興味ある成果が得られた。 (2)ウエーブパイプライン従来の画一化されたパイプライ方式の待ち時間の無駄を解消するウエーブパイプライン方式の研究を行った。ウエーブパイプラインをシリコンチップ上に最適に配置することに関して有用な知見が得られた。 (3)シンボリックアーキテクチャ LEFTコンピュータを構成するための第3のアーキテクチャとして,リスト構造に対する新しい並列処理システムであるシンボリックアーキテクチャの研究を行った。シンボリックアーキテクチャの構造に対するリストの処理速度の依存性に関して,興味ある結果が得られた。 4.RIGHTコンピュータに関する研究実績 RIGHTコンピュータについては,OSレベルアーキテクチャからマイクロ命令レベルの計算機アーキテキチャに亘る一貫した設計論の検討を計画した。このため,ニューロコンピュータとファジイコンピュータの従来の成果を検討した。医学的な見地から新しい右脳のモデルの検討が必要であることが明かとなった。REGHTコンピュータの構成要素として以下のような研究を行った。 (1)過疎分散方式メモリ連想記憶方式の過疎分散方式メモリを,これまで方式或いはアルゴリズムの確立されていなかったパターン認識に応用した。この研究を通して,過疎分散方式メモリはアナログモデルのパターン認識をデジタル処理と結合できることが明らかとなった。 (2)意志志向計算機(MOC) 人間の想像性にかかわる計算機としてMOCを提案し,右脳の機能実現に対するMOCの可能性を検討した。 (3)新しい右脳型計算機ニューラルネットワークに基づく新しい右脳型計算機の研究を行った。学習アルゴリズム,連想メモリとその性能評価に関して有用な知見を得た。
オブジェクト指向レイトレーシングにおける並列モデリングに関する研究

小林広明

1990年～ 1990年
オブジェクト指向並列レイトレーシングシステムに関する研究

小林広明

1989年～ 1989年

︎全件表示 ︎最初の5件までを表示

社会貢献活動 10

7th Teraflop Workshop

2007年11月21日～ 2007年11月22日

詳細を見る詳細を閉じる

スーパーコンピュータとその応用に関する国際学術講演会
5th Teraflop Workshop

2006年11月20日～ 2006年11月21日

詳細を見る詳細を閉じる

スーパーコンピュータとその応用に関する国際学術講演会
津波被害予測に活用／スーパーコンピュータの多彩な役割

2015年6月6日～
Japan Concludes Exascale Feasibility Study

2014年12月3日～
津波浸水域, 20分で予測東北大など, スパコン活用

2014年8月3日～
東北大学とNEC、次世代スーパーコンピュータ技術の共同研究組織

2014年6月29日～
Feasibility Study of Advanced Vector Architecture System toward Exascale at Cyberscience Center, Tohoku University, Japan

2013年5月13日～
震災を乗り越えた東北大のスパコンが目指す未来

2011年10月28日～
仙台育英学園秀光中等教育学校講演会

2006年12月14日～

詳細を見る詳細を閉じる

高校での出張講義
仙台市医師会学術講演会

2006年4月19日～

詳細を見る詳細を閉じる

医師向け技術講演会

︎全件表示 ︎最初の5件までを表示

メディア報道 7

科学の泉「未来をひらくスパコン(1)〜(9)」

河北新報

2015年5月

メディア報道種別: 新聞・雑誌
災害を３Ｄで可視化津波浸水予測に活用東北大

河北新報，NHK

2014年6月29日

メディア報道種別: 新聞・雑誌
超高速計算が起こす“新・産業革命” 〜スパコン「京」のひらく未来〜

NHK

2013年1月8日

メディア報道種別: テレビ・ラジオ番組
ベクトル型復権に光

日経産業新聞

2007年12月25日

メディア報道種別: 新聞・雑誌
性能世界一のスパコン，東北大「ＳＸ－７」

朝日新聞

2005年2月24日

メディア報道種別: 新聞・雑誌
スーパーコンピューター，東北大学が性能世界一

NHK総合

2005年2月9日

メディア報道種別: テレビ・ラジオ番組
計測器性能は世界一東北大スーパーコンピューター

河北新報

2005年1月24日

メディア報道種別: 新聞・雑誌

︎全件表示 ︎最初の5件までを表示

その他 8

リアルタイム津波予測システムとＬアラートの連携による「津波Lアラート」の構築と災害対応の高度化実証事業

詳細を見る詳細を閉じる

大規模地震発生時に，遠隔に設置するスーパーコンピュータによるリアルタイム津波シミュレーションを相補的に機能させ，日本全国をカバーするリアルタイム津波浸水被害予測システムの研究開発と，シミュレーション結果をLアラートから提供することにより全国の自治体への配信を可能とした．
リアルタイム津波浸水・被害予測・災害情報配信による自治体の減災力強化の実証事業

詳細を見る詳細を閉じる

地震観測データとスーパーコンピュータによるリアルタイムシミュレーションを連携させ，地震発生から20分以内に関係自治体に津波浸水被害予測情報を配信するためのシステムの研究開発を行う
高メモリバンド幅アプリケーションに適した将来のHPCIシステムのあり方の調査研究

詳細を見る詳細を閉じる

本事業では, 2018年頃に実現が求められ,我が国の安全安心な社会作りと,産業界の国際競争の強化に不可欠な先端ものづくりを支える将来のスーパーコンピュータシステムの実現に必要な技術的知見の獲得を目的として，アプリケーション，システムアーキテクチャ,システムソフトウェア，デバイス技術，それぞれについて技術的課題を明らかにし，その解決のための要素技術の検討とシステム設計研究を行い，将来のHPCIシステムの在り方についての調査研究を行う.
「「京」を中核とするＨＰＣＩの産業利用支援・裾野拡大のための設備拡充」

詳細を見る詳細を閉じる

HPCIを支える高度計算機設備の拡充と，その利用環境の高度化に関する研究開発に取り組む
プログラマブル・キャッシュ付ベクトル機構によるアプリケーション性能評価

詳細を見る詳細を閉じる

シミュレーションプログラムの高速化技術としてオンチップメモリ機構とそのソフトウェア利用技術の協調設計を行う
自己修復機能を有する３次元VLSI システムの創製

詳細を見る詳細を閉じる

本研究プロジェクトでは、車載用画像処理システムのディペンダビリティについて、アーキテクチャ・OS レベルからのディペンダビリティ向上に対する考え方を基に、ディペンダブルな画像処理システムの実現に必要な画像処理・認識能力、要件を考慮したシステムの全体設計、診断・修復機能を有するリコンフィギュラブルロジックおよびリコンフィギュラブル等のハードウェア技術、VM を基本としたディペンダブルソフトウェア技術の面から研究を進める。研究全体を、画像処理システムに関する研究、ソフトウェア技術に関する研究、ハードウェア技術に関する研究の3 つの分野に分け、それぞれの分野間で緊密な連携が取れるような研究分担体制を構築しながら、研究を進めて行く。
超音波計測連成解析による超高精度生体機能計測システム

詳細を見る詳細を閉じる

スーパーコンピュータによるシミュレーション解析と超音波計測機器データとを融合させることにより、高精度な生体機能計測を高速に行うシステムの研究開発において、スーパーコンピュータと計測機器間のインタフェース設計・開発を担当
ICTエコ社会を創造する安全・安心・安価なユビキタスコンピューティングプラットフォームの研究・開発

詳細を見る詳細を閉じる

情報通信分野でのエコロジーモデルの確立を目指し、社会に遍在する計算資源として活用する、ユビキタス時代の安心・安全・安価なボランティアコンピューティング基盤を研究開発する。特にボランティアコンピューティングの高効率化、高信頼化、および参加を促進するインセンティブモデルについて研究し、機密性の高い計算にも利用可能で、しかも従来の実装技術では実現困難な規模の大規模計算基盤を安価に提供するための基盤技術を確立する。

︎全件表示 ︎最初の5件までを表示