TOHOKU UNIVERSITY Researchers - Hiroaki Kobayashi

Details of the Researcher

Home

日本語 English

Hiroaki Kobayashi

Section

Graduate School of Information Sciences

Job title

Professor

Degree

工学博士 (Tohoku University)

researchmap

https://researchmap.jp/hiroaki.kobayashi

J-GLOBAL ID

200901077534166118

Research History 14

2018/07 - Present

Tohoku University
2016/04 - Present

Tohoku University　Graduate School of Information Sciences　Professor
2017/04 - 2019/03

東北大学　情報科学研究科　情報基礎科学専攻長
2012/04 - 2016/03

東北大学　教育研究評議会評議員
2008/04 - 2016/03

Tohoku University
2008/04 - 2016/03

Tohoku University　Cyberscience Center
2008/04 - 2016/03

東北大学　情報シナジー機構　副機構長
2006/10 - 2016/03

National Institute of Informatics　Visiting Professor
2002/12 - 2008/03

Tohoku University
2001/10 - 2008/03

東北大学　情報シナジーセンター　教授
1995/10 - 2002/01

スタンフォード大学　電気工学科・計算機システム研究所　客員准教授
1993/04 - 2001/09

東北大学　情報科学研究科　助教授
1991/04 - 1993/03

東北大学　工学部　講師
1988/04 - 1991/03

東北大学　工学部　助手

Show all Show first 5

Education 2

Tohoku University　Graduate School, Division of Engineering　Department of Information Engineering

- 1988/03/25
Tohoku University　Faculty of Engineering　通信工学

- 1983/03/25

Committee Memberships 29

文部科学省　科学技術・学術審議会専門委員

2021/04 - Present
大阪大学サイバーメディアセンター全国共同利用運営委員会　委員

2014/04 - Present
日本学術会議　連携会員

2014/04 - Present
Editorial Board of International Journal of Networked and Distributed Computing　Member

2011/03 - Present
Workshop on Sustained Simulation Performance　Organizing Committee Chair

2006/10 - Present
文部科学省　HPCI計画推進委員

2017/03 - 2025/03
HPCIコンソーシアム　副理事長・副議長

2020/04 - 2024/05
重点課題(8) 「近未来型ものづくりを先導する革新的設計・製造プロセスの開発」諮問委員会　委員長

2015/04 - 2020/03
ポスト京重点課題「地震・津波による複合災害の統合的予測システムの構築」運営委員会　委員

2015/04 - 2020/03
HPCIコンソーシアム　理事

2014/04 - 2018/03
JST CREST「ポストペタスケール高性能計算に資するシステムソフトウェア技術の創出」　領域アドバイザー

2012/04 - 2018/03
IEEE COOL Chips　組織委員長

2011/04 - 2017/04
HPCI連携サービス委員会　委員長

2013/04 - 2016/03
次世代スーパーコンピュータ戦略プログラム分野３「防災・減災に資する地球変動予測」運営委員会　委員

2013/04 - 2016/03
国立情報学研究所「学術情報ネットワーク運営・連携本部」　委員

2012/04 - 2016/03
HPCI連携サービス委員会　委員

2011/04 - 2016/03
北海道大学情報基盤センター外部評価委員会　委員

2014/04 - 2015/03
独立行政法人海洋研究開発機構部署評価委員会　部署評価アドバイザー

2012/04 - 2015/03
高度情報科学技術研究機構「学際共同研究WG」　委員

2013/04 - 2014/03
情報処理学会　代表会員

2012/04 - 2014/03
学際大規模情報基盤共同利用・共同研究拠点共同研究課題審査委員会　委員長

2012/04 - 2014/03
情報処理学会東北支部　情報処理学会東北支部長

2012/04 - 2014/03
国立大学共同利用共同研究拠点協議会　役員

2012/04 - 2014/03
学際大規模情報基盤共同利用・共同研究拠点共同研究課題審査委員会　委員長

2012/04 - 2014/03
HPCIコンソーシアム　監事

2012/04 - 2014/03
電気関係学会東北支部連合大会実行委員会　電気関係学会東北支部連合大会実行委員長

2013/04 - 2013/08
海洋研究開発機構「環境・社会システム統合研究フォーラム」　委員

2012/04 - 2013/03
科学研究費委員会　専門委員

2011/04 - 2013/03
東京工業大学学術国際情報センター外部評価委員会　委員

2014/04 -

Show all ︎Show first 5

Professional Memberships 4

米国計算機学会(ACM)(The Association for Computing Mackinery)
米国電気学会(IEEE)(The Institute of Electrical and Electronics Engineers,INC)
情報処理学会
電子情報通信学会

Research Interests 2

Computer Architectures
Supercomputers

Research Areas 4

Informatics / High-performance computing / Supercomputers
Informatics / Software /
Informatics / Information networks /
Informatics / Computer systems /

Awards 10

Best Paper Award

2020/11　The Eighth International Symposium on Computing and Networking (CANDAR'20)　Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint
Best Poster Winner HPC-in-Asia

2019　A Skewed Multi-Bank Cache for Vector Processors
Best Paper Award of PaCT, 2019

2019　Analysis of relationship between SIMD-processing features used in NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors
平成30年度科学技術分野の文部科学大臣表彰科学技術賞（開発部門）

2018/04　文部科学省
2018年全NUA事例論文技術貢献賞受賞

2018　新ベクトルプロセッサSX-Aurora TSUBASAの基本性能評価
文部科学大臣賞「情報化促進貢献個人等表彰」

2017/10　文部科学省
ジャパン・レジリエンス・アワード2016優秀賞

2016
Best Paper Award

2015　Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework
BEST PAPER AWARD at the 2nd international symposium on Parallel and Distributed Processing and Applications (ISPA’04)

2004/12/13　the 2nd international symposium on Parallel and Distributed Processing and Applications (ISPA’04)　BEST PAPER AWARD
IPデザインアワード研究助成賞

2002/05/29　日経BP社　3DCGiRAMアーキテクチャに基づく実時間レイトレーシングエンジンの研究開発

Show all ︎Show 5

Papers 436

An analysis of memory access patterns in RISC-V vector workloads on heterogeneous memory architectures

Ryo Yokoyama, Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops　255-262　2026/01/25
Publisher: ACM
DOI： 10.1145/3784828.3785405 　
Disaster Rescue Resource Allocation Based on the Ising Model

Kosei Nakamoto, Masahito Kumagai, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　274-281　2025/12/15
Publisher: IEEE
DOI： 10.1109/mcsoc67473.2025.00052 　
Classification of Three-dimensional Electron Diffraction Data with a Large Language Model

Kazuyuki Yasuda, Masahito Kumagai, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis　96-103　2025/11/15
Publisher: ACM
DOI： 10.1145/3731599.3767351 　
Single photon coherent Ising machines for constrained optimization problems

Masahito Kumagai, Yoshitaka Inui, Edwin Ng, Satoshi Kako, Kazuhiko Komatsu, Hiroaki Kobayashi, Yoshihisa Yamamoto

Quantum Science and Technology　10　(3)　035042-035042　2025/06/20
Publisher: IOP Publishing
DOI： 10.1088/2058-9565/addde5 　

eISSN： 2058-9565

More details Close

Abstract A Coherent Ising machine (CIM) is an oscillator-network-based analog computing system to circumvent the bottleneck in von Neumann digital computing architectures. The CIM consists of a network of degenerate optical parametric oscillators (DOPOs) and is designed to find a ground state or perform Boltzmann sampling for all degenerate ground states and low-energy excited states in combinatorial optimization problems. A nonlinear measurement feedback scheme, called chaotic amplitude control (CAC), has recently been proposed to correct pulse amplitude inhomogeneity and thereby faithfully map the Ising Hamiltonian to the loss landscape of the DOPO network. However, the quantum limit of the CIM-CAC performance is not fully explored yet. This work clarifies how the quantum noise squeezing and the measurement-induced state shift in repeated indirect quantum measurements improve the system performance. From the numerical simulation on the Ising model with the Zeeman terms, obtained from combinatorial clustering problems formulated as constrained optimization problems, it is revealed that the CIM-CAC operating in a single photon per pulse regime dramatically outperforms the standard CIM-CAC with a large photon number per pulse. This is because the standard CIM-CAC is often trapped in a periodic trajectory and cannot escape from there. On the other hand, the significant improvement is brought by the noise-induced amplitude jump in the single photon CIM-CAC.
Performance Evaluation of Vector Annealing on Multiple Nodes using the Traveling Salesperson Problem

Makoto Onoda, Kazuhiko Komatsu, Kotaro Bannai, Shintaro Momose, Masayuki Sato, Hiroaki Kobayashi

ISC High Performance 2025 Research Paper Proceedings (40th International Conference)　2025/06
A Compressed QUBO Format for Traveling Salesperson Problems

Chu-Yuan Huang, Kazuhiko Komatsu, Makoto Onoda, Masahiro Kumagai, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the IEEE Workshop on Parallel / Distributed Combinatorics and Optimization (PDCO 2025)　2025/06
A Fast Block Partitioning Decision Method Using Luminance Textures for VVC Encoders

Rikita Uchiyama, Karin Onouchi, Naoya Niwa, Masayuki Sato, Hiroaki Kobayashi, Hiroe Iwasaki

2025 IEEE International Conference on Consumer Electronics (ICCE)　1-4　2025/01/11
Publisher: IEEE
DOI： 10.1109/icce63647.2025.10929966 　
A Graph-based Molecular Structure Identification Method via Feature Extraction for Three-dimensional Electron Diffraction Data

Yusuke Fukasawa, Kazuhiko Komatsu, Masayuki Sato, Saori Maki-Yonekura, Hirofumi Kurokawa, Koji Yonekura, Hiroaki Kobayashi

2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW)　325-329　2024/11/26
Publisher: IEEE
DOI： 10.1109/candarw64572.2024.00060 　
Adaptive Parallelization based on Frame-level and Tile-level Parallelisms for VVC Encoding

Karin Onouchi, Masayuki Sato, Hiroe Iwasaki, Kazuhiko Komatsu, Hiroaki Kobayashi

2024 Twelfth International Symposium on Computing and Networking (CANDAR)　87-95　2024/11/26
Publisher: IEEE
DOI： 10.1109/candar64496.2024.00018 　
An Ising-based Decision Method for Intra Prediction Mode in Video Coding

Takuto Momominami, Naoya Niwa, Masahito Kumagai, Kazuhiko Komatsu, Hiroaki Kobayashi, Hiroe Iwasaki

SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis　1748-1754　2024/11/17
Publisher: IEEE
DOI： 10.1109/scw63240.2024.00218 　
File I/O Cache Performance of Supercomputer Fugaku Using an Out-of-Core Direct Numerical Simulation Code of Turbulence

Yuto Hatanaka, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Akihiro Musa, Takashi Ishihara, Atsuya Uno, Kazuhiko Komatsu, Hiroaki Kobayashi, Mitsuo Yokokawa

Computational Science – ICCS 2024　173-187　2024/06/30
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-63778-0_13 　

ISSN： 0302-9743

eISSN： 1611-3349
An Asymptotic Parallel Linear Solver and Its Application to Direct Numerical Simulation for Compressible Turbulence

Mitsuo Yokokawa, Taiki Matsumoto, Ryo Takegami, Yukiya Sugiura, Naoki Watanabe, Yoshiki Sakurai, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Computational Science – ICCS 2024　383-397　2024/06/27
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-63751-3_26 　

ISSN： 0302-9743

eISSN： 1611-3349
Prediction of Steam Turbine Blade Erosion Using CFD Simulation Data and Hierarchical Machine Learning

Issei Fukamizu, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Journal of Engineering for Gas Turbines and Power　1-10　2024/06/25
Publisher: ASME International
DOI： 10.1115/1.4065815 　

ISSN： 0742-4795

eISSN： 1528-8919

More details Close

Abstract The information of the degree of blade erosion is vital for the efficient operation of steam turbines. However, it is nearly impossible to directly measure the degree of blade erosion during operation. Moreover, collecting sufficient data of eroded cases for predictive analysis is challenging. Therefore, this paper proposes a blade erosion prediction method using numerical simulation and machine learning. Pressure data of several blade erosion cases are collected from the numerical turbine simulation. The machine learning approach involves training on collected simulation data to predict the degree of erosion for the firststage stator (1S) and the first-stage rotor blade (1R) from internal pressure data. The proposed erosion prediction model employs a two-step hierarchical approach. First, the proposed model predicts the 1S erosion degree using the k-NN (k-Nearest Neighbor) regression. Second, the proposed model estimates the 1R erosion degree with Linear Regression models. These models are tailored for each of the 1S erosion degrees, utilizing pressure data processed through Fast Fourier Transform (FFT). The evaluation shows that the proposed method achieves the prediction of the 1S erosion with a Mean Absolute Error (MAE) of 0.000693 mm, and the 1R erosion with an MAE of 0.458 mm. The evaluation results indicate that the proposed method can accurately capture the degree of turbine blade erosion from internal pressure data. As a result, the proposed method suggests that the erosion prediction method can be effectively used to determine the optimal timing for Maintenance and Repair Operations (MRO).
Quantum annealing-based algorithm for lattice gas automata

Yuichi Kuya, Kazuhiko Komatsu, Kouki Yonaga, Hiroaki Kobayashi

Computers and Fluids　274　2024/04/30

DOI： 10.1016/j.compfluid.2024.106238 　

ISSN： 0045-7930
A Constraint Partition Method for Combinatorial Optimization Problems Peer-reviewed

Onoda Makoto, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

In Proceedings of 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　600　(607)　2023/12

DOI： 10.1109/MCSoC60832.2023.00093 　
Appropriate Graph-Algorithm Selection for Edge Devices Using Machine Learning Peer-reviewed

Yusuke Fukasawa, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

In Proceedings of 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　544　(551)　2023/12

DOI： 10.1109/MCSoC60832.2023.00086 　
Multi-scale Loss based Electron Microscopic Image Pair Matching Method Peer-reviewed

Chunting Duan, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

In Proceedings of 22nd IEEE International Conference on Machine Learning and Applications　1957-1964　2023/12

DOI： 10.1109/ICMLA58977.2023.00295 　
Investigating the Characteristics of Ising Machines Peer-reviewed

Kazuhiko Komatsu, Makoto Onoda, Masahito Kumagai, Hiroaki Kobayashi

Proceedings of IEEE International Conference on Quantum Computing and Engineering　2023/09

DOI： 10.1109/QCE57702.2023.00108 　
Performance Evaluation of Tsunami Evacuation Route Planning on Multiple Annealing Machines

Yihui Liu, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the 20th ACM International Conference on Computing Frontiers　2023/05/09
Publisher: ACM
DOI： 10.1145/3587135.3592193 　
I/O Performance Evaluation of a Memory-Saving DNS Code on SX-Aurora TSUBASA

Mitsuo Yokokawa, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Taiki Matsumoto, Akihiro Musa, Kazuhiko Komatsu, Takashi Ishihara, Hiroaki Kobayashi

2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2023/05
Publisher: IEEE
DOI： 10.1109/ipdpsw59300.2023.00117 　
Ising-Based Kernel Clustering

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Algorithms　16　(4)　214-214　2023/04/19
Publisher: MDPI AG
DOI： 10.3390/a16040214 　

eISSN： 1999-4893

More details Close

Combinatorial clustering based on the Ising model is drawing attention as a high-quality clustering method. However, conventional Ising-based clustering methods using the Euclidean distance cannot handle irregular data. To overcome this problem, this paper proposes an Ising-based kernel clustering method. The kernel clustering method is designed based on two critical ideas. One is to perform clustering of irregular data by mapping the data onto a high-dimensional feature space by using a kernel trick. The other is the utilization of matrix–matrix calculations in the numerical libraries to accelerate preprocess for annealing. While the conventional Ising-based clustering is not designed to accept the transformed data by the kernel trick, this paper extends the availability of Ising-based clustering to process a distance matrix defined in high-dimensional data space. The proposed method can handle the Gram matrix determined by the kernel method as a high-dimensional distance matrix to handle irregular data. By comparing the proposed Ising-based kernel clustering method with the conventional Euclidean distance-based combinatorial clustering, it is clarified that the quality of the clustering results of the proposed method for irregular data is significantly better than that of the conventional method. Furthermore, the preprocess for annealing by the proposed method using numerical libraries is by a factor of up to 12.4 million × from the conventional naive python’s implementation. Comparisons between Ising-based kernel clustering and kernel K-means reveal that the proposed method has the potential to obtain higher-quality clustering results than the kernel K-means as a representative of the state-of-the-art kernel clustering methods.
Analysis of Precision Vectors for Ising-Based Linear Regression

Kaho Aoyama, Kazuhiko Komatsu, Masahito Kumagai, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　251-261　2023/04/08
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_20 　

ISSN： 0302-9743

eISSN： 1611-3349
A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders

Masayuki Sato, Yuya Omori, Ryusuke Egawa, Ken Nakamura, Daisuke Kobayashi, Hiroe Iwasaki, Kazuhiko Komatsu, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　288-300　2023/04/08
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-29927-8_23 　

ISSN： 0302-9743

eISSN： 1611-3349
Performance evaluation of parallel direct numerical simulation code on supercomputer SX-Aurora TSUBASA

Mitsuo Yokokawa, Yujiro Takenaka, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Computers & Fluids　261　105913-105913　2023/04
Publisher: Elsevier BV
DOI： 10.1016/j.compfluid.2023.105913 　

ISSN： 0045-7930
Rapid and quantitative uncertainty estimation of coseismic slip distribution for large interplate earthquakes using real-time GNSS data and its application to tsunami inundation prediction

Keitaro Ohno, Yusaku Ohta, Ryota Hino, Shunichi Koshimura, Akihiro Musa, Takashi Abe, Hiroaki Kobayashi

Earth, Planets and Space　74　(1)　2022/12
Publisher: Springer Science and Business Media LLC
DOI： 10.1186/s40623-022-01586-6 　

eISSN： 1880-5981

More details Close

<title>Abstract</title>This study proposes a new method for the uncertainty estimation of coseismic slip distribution on the plate interface deduced from real-time global navigation satellite system (GNSS) data and explores its application for tsunami inundation prediction. Jointly developed by the Geospatial Information Authority of Japan and Tohoku University, REGARD (REal-time GEONET Analysis system for Rapid Deformation monitoring) estimates coseismic fault models (a single rectangular fault model and slip distribution model) in real time to support tsunami prediction. The estimated results are adopted as part of the Disaster Information System, which is used by the Cabinet Office of the Government of Japan to assess tsunami inundation and damage. However, the REGARD system currently struggles to estimate the quantitative uncertainty of the estimated result, although the obtained result should contain both observation and modeling errors caused by the model settings. Understanding such quantitative uncertainties based on the input data is essential for utilizing this resource for disaster response. We developed an algorithm that estimates the coseismic slip distribution and its uncertainties using Markov chain Monte Carlo methods. We focused on the Nankai Trough of southwest Japan, where megathrust earthquakes have repeatedly occurred, and used simulation data to assume a Hoei-type earthquake. We divided the 2951 rectangular subfaults on the plate interface and designed a multistage sampling flow with stepwise perturbation groups. As a result, we successfully estimated the slip distribution and its uncertainty at the 95% confidence interval of the posterior probability density function. Furthermore, we developed a new visualization procedure that shows the risk of tsunami inundation and the probability on a map. Under the algorithm, we regarded the Markov chain Monte Carlo samples as individual fault models and clustered them using the k-means approach to obtain different tsunami source scenarios. We then calculated the parallel tsunami inundations and integrated the results on the map. This map, which expresses the uncertainties of tsunami inundation caused by uncertainties in the coseismic fault estimation, offers quantitative and real time insights into possible worst-case scenarios. <bold>Graphical Abstract</bold>
Page-Address Coalescing of Vector Gather Instructions for Efficient Address Translation Peer-reviewed

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of 2022 IEEE/ACM 12th Workshop on Irregular Applications: Architectures and Algorithms (IA3)　1-8　2022/11

DOI： 10.1109/IA356718.2022.00007 　
A hierarchical wavefront method for LU-SGS

Kazuhiko Komatsu, Yuta Hougi, Masayuki Sato, Hiroaki Kobayashi

Computers & Fluids　245　105572-105572　2022/06
Publisher: Elsevier BV
DOI： 10.1016/j.compfluid.2022.105572 　

ISSN： 0045-7930
High-Performance GraphBLAS Backend Prototype for NEC SX-Aurora TSUBASA

Ilya Afanasyev, Kazuhiko Komatsu, Dmitry Lichmanov, Vadim Voevodin, Hiroaki Kobayashi

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2022/05
Publisher: IEEE
DOI： 10.1109/ipdpsw55747.2022.00050 　
An Efficient Reference Image Sharing Method for the Image-division Parallel Video Encoding Architecture

Nakamura Ken, Omori Yuya, Kobayashi Daisuke, Nitta Koyo, Sano Kimikazu, Sato Masayuki, Iwasaki Hiroe, Kobayashi Hiroaki

IEICE Transactions on Electronics　advpub　2022
Publisher: The Institute of Electronics, Information and Communication Engineers
DOI： 10.1587/transele.2022lhp0002 　

ISSN： 0916-8524

eISSN： 1745-1353

More details Close

This paper proposes an efficient reference image sharing method for the image-division parallel video encoding architecture. This method efficiently reduces the amount of data transfer by using pre-transfer with area prediction and on-demand transfer with a transfer management table. Experimental results show that the data transfer can be reduced to 19.8-35.3% of the conventional method on average without major degradation of coding performance. This makes it possible to reduce the required bandwidth of the inter-chip transfer interface by saving the amount of data transfer.
Optimizations of a Linear Matrix Solver in a Composite Simulation for a Vector Computer

Zhilin He, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)　2021/12/10
Publisher: IEEE
DOI： 10.1109/paap54281.2021.9720445 　
A dynamic parameter tuning method for SpMM parallel execution Peer-reviewed

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Concurrency and Computation: Practice and Experience　2021/12/09
Publisher: Wiley
DOI： 10.1002/cpe.6755 　

ISSN： 1532-0626

eISSN： 1532-0634
Ising-Based Combinatorial Clustering Using the Kernel Method

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)　2021/12
Publisher: IEEE
DOI： 10.1109/mcsoc51149.2021.00037 　
Real-time automatic uncertainty estimation of coseismic single rectangular fault model using GNSS data Peer-reviewed

Keitaro Ohno, Yusaku Ohta, Satoshi Kawamoto, Satoshi Abe, Ryota Hino, Shunichi Koshimura, Akihiro Musa, Hiroaki Kobayashi

Earth, Planets and Space　73　(1)　2021/12
Publisher: Springer Science and Business Media LLC
DOI： 10.1186/s40623-021-01425-0 　

ISSN： 1343-8832

eISSN： 1880-5981
An Externally-Constrained Ising Clustering Method for Material Informatics

Kazuhiko Komatsu, Masahito Kumagai, Ji Qi, Masayuki Sato, Hiroaki Kobayashi

2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)　2021/11
Publisher: IEEE
DOI： 10.1109/candarw53999.2021.00040 　
Register Flush-free Runahead Execution for Modern Vector Processors

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)　2021/10
Publisher: IEEE
DOI： 10.1109/sbac-pad53543.2021.00023 　
Detection of Machinery Failure Signs From Big Time-Series Data Obtained by Flow Simulation of Intermediate-Pressure Steam Turbines Peer-reviewed

Kazuhiko Komatsu, Hironori Miyazawa, Cheng Yiran, Masayuki Sato, Takashi Furusawa, Satoru Yamamoto, Hiroaki Kobayashi

Journal of Engineering for Gas Turbines and Power　144　(1)　2021/08/13
Publisher: ASME International
DOI： 10.1115/1.4052142 　

ISSN： 0742-4795

eISSN： 1528-8919

More details Close

<title>Abstract</title> The periodic maintenance, repair, and overhaul (MRO) of turbine blades in thermal power plants are essential to maintain a stable power supply. During MRO, older and less-efficient power plants are put into operation, which results in wastage of additional fuels. Such a situation forces thermal power plants to work under off-design conditions. Moreover, such an operation accelerates blade deterioration, which may lead to sudden failure. Therefore, a method for avoiding unexpected failures needs to be developed. To detect the signs of machinery failures, the analysis of time-series data is required. However, data for various blade conditions must be collected from actual operating steam turbines. Further, obtaining abnormal or failure data is difficult. Thus, this paper proposes a classification approach to analyze big time-series data alternatively collected from numerical results. The time-series data from various normal and abnormal cases of actual intermediate-pressure steam-turbine operation were obtained through numerical simulation. Thereafter, useful features were extracted and classified using K-means clustering to judge whether the turbine is operating normally or abnormally. The experimental results indicate that the status of the blade can be appropriately classified. By checking data from real turbine blades using our classification results, the status of these blades can be estimated. Thus, this approach can help decide on the appropriate timing for MRO.
Distributed Graph Algorithms for Multiple Vector Engines of NEC SX-Aurora TSUBASA Systems Peer-reviewed

Ilya V. Afanasyev, Vadim V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　8　(2)　2021/06
Publisher: FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210206 　

ISSN： 2313-8734
Optimizing Load Balance in a Parallel CFD Code for a Large-scale Turbine Simulation on a Vector Supercomputer Peer-reviewed

Osamu Watanabe, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　8　(2)　2021/06
Publisher: FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210207 　

ISSN： 2313-8734
Performance and Power Analysis of a Vector Computing System Peer-reviewed

Supercomputing Frontiers and Innovations　8　(2)　2021/06
Publisher: FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi210205 　

ISSN： 2313-8734
A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs Peer-reviewed

Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2021/06
Publisher: IEEE
DOI： 10.1109/ipdpsw52791.2021.00116 　
A Metadata Prefetching Mechanism for Hybrid Memory Architectures Peer-reviewed

Shunsuke Tsukada, Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　2021/04/14
Publisher: IEEE
DOI： 10.1109/coolchips52128.2021.9410321 　

ISSN： 0916-8524

eISSN： 1745-1353
Optimization of the Himeno Benchmark for SX-Aurora TSUBASA Peer-reviewed

Akito Onodera, Kazuhiko Komatsu, Soya Fujimoto, Yoko Isobe, Masayuki Sato, Hiroaki Kobayashi

Benchmarking, Measuring, and Optimizing　127-143　2021/03
Publisher: Springer International Publishing
DOI： 10.1007/978-3-030-71058-3_8 　

ISSN： 0302-9743

eISSN： 1611-3349
VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture Peer-reviewed

Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

The Journal of Supercomputing　2021/01/26
Publisher: Springer Science and Business Media LLC
DOI： 10.1007/s11227-020-03564-9 　

ISSN： 0920-8542

eISSN： 1573-0484
Performance Evaluation of SX-Aurora TSUBASA and Its QA-Assisted Application Design

Hiroaki Kobayashi, Kazuhiko Komatsu

Sustained Simulation Performance 2019 and 2020　3-20　2021
Publisher: Springer International Publishing
DOI： 10.1007/978-3-030-68049-7_1 　
Optimizations of DNS Codes for Turbulence on SX-Aurora TSUBASA

Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2019 and 2020　51-59　2021
Publisher: Springer International Publishing
DOI： 10.1007/978-3-030-68049-7_4 　
Efficient Mixed-Precision Tall-and-Skinny Matrix-Matrix Multiplication for GPUs Peer-reviewed

Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

International Journal of Networking and Computing　11　(2)　267-282　2021
Publisher: IJNC Editorial Committee
DOI： 10.15803/ijnc.11.2_267 　

ISSN： 2185-2839

eISSN： 2185-2847
An External Definition of the One-Hot Constraint and Fast QUBO Generation for High-Performance Combinatorial Clustering Peer-reviewed

Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

International Journal of Networking and Computing　11　(2)　463-491　2021
Publisher: IJNC Editorial Committee
DOI： 10.15803/ijnc.11.2_463 　

ISSN： 2185-2839

eISSN： 2185-2847
A Deep Reinforcement Learning Based Feature Selector Peer-reviewed

Yiran Cheng, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Parallel Architectures, Algorithms and Programming　378-389　2021
Publisher: Springer Singapore
DOI： 10.1007/978-981-16-0010-4_33 　

ISSN： 1865-0929

eISSN： 1865-0937
A Dynamic Parameter Tuning Method for High Performance SpMM Peer-reviewed

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Parallel and Distributed Computing, Applications and Technologies　318-329　2021
Publisher: Springer International Publishing
DOI： 10.1007/978-3-030-69244-5_28 　

ISSN： 0302-9743

eISSN： 1611-3349
Effects of Using a Memory Stalled Core for Handling MPI Communication Overlapping in the SOR Solver on SX-ACE and SX-Aurora TSUBASA Peer-reviewed

Takashi Soga, Kenta Yamaguchi, Raghunandan Mathur, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　7　(4)　4-15　2020/12
Publisher: FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi200401 　

ISSN： 2313-8734
An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations Peer-reviewed

Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW)　2020/11
Publisher: IEEE
DOI： 10.1109/candarw51189.2020.00041 　
Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint Peer-reviewed

Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

2020 Eighth International Symposium on Computing and Networking (CANDAR)　2020/11
Publisher: IEEE
DOI： 10.1109/candar51075.2020.00015 　
Importance of Selecting Data Layouts in the Tsunami Simulation Code Peer-reviewed

Takumi Kishitani, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　830-837　2020/05
Publisher: IEEE
DOI： 10.1109/ipdpsw50202.2020.00140 　
I/O Performance of the SX-Aurora TSUBASA Peer-reviewed

Mitsuo Yokokawa, Ayano Nakai, Kazuhiko Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)　2020/05
Publisher: IEEE
DOI： 10.1109/ipdpsw50202.2020.00014 　
Energy-efficient Design of an STT-RAM-based Hybrid Cache Architecture Peer-reviewed

Masayuki Sato, Xue Hao, Kazuhiko Komatsu, Hiroaki Kobayashi

2020 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)　2020/04
Publisher: IEEE
DOI： 10.1109/coolchips49199.2020.9097643 　
Performance Evaluation of SX-Aurora TSUBASA by Using Benchmark Programs

Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2018 and 2019　69-77　2020
Publisher: Springer International Publishing
DOI： 10.1007/978-3-030-39181-2_7 　
Optimizations for the Himeno Benchmark on Vector Computing System SX-Aurora TSUBASA Peer-reviewed

Akito Onodera, Kazuhiko Komatsu, Masayuki Sato, Yoko Isobe, Hiroaki Kobayashi

Proceedings of ISC High Performance 2020 Poster Presentation 2020　2020
Metadata Management for Large-Scale Hybrid Memory Architectures Peer-reviewed

Shunsuke Tsukada, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of ISC High Performance 2020 Poster Presentation　2020
An Evaluation of a Hierarchical Clustering Method Using Quantum Annealing Peer-reviewed

Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Proceedings of ISC High Performance 2020 Poster Presentation　2020
Acceleration of Numerical Turbine using the Red-Black Method Peer-reviewed

Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Poster Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia)　2020
Performance evaluation of a clustering approach based on thermophysical properties by using multiple platforms Peer-reviewed

Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Poster Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia)　2020
Evaluation of Tsunami Inundation Simulation Using Vector Scalar Hybrid MPI on SX-Aurora TSUBASA Peer-reviewed

Akihiko Musa, Takashi Soga, Takashi Abe, Masayuki Sato, Kazuhiko Komatsu, Shunichi Koshimura, Hiroaki Kobayashi

Proceedings of Research Poster Presentation of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20)　2020
PERFORMANCE EVALUATION OF PARALLEL DNS CODES ON THE SUPERCOMPUTER SX-AURORA TSUBASA Peer-reviewed

Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings of the 32nd International conference on Parallel Computational Fluid Dynamics (ParCFD 2020)　2020
A hierarchical wavefront method for LU-SGS on modern multi-core vector processors Peer-reviewed

Yuta Hougi, Kazuhiko Komatsu, Osamu Watanabe, Masayuki Sato, Hiroaki Kobayashi

Proceedings of the 32nd International conference on Parallel Computational Fluid Dynamics (ParCFD 2020)　2020
Developing an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA Peer-reviewed

Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Communications in Computer and Information Science　131-145　2020
Publisher: Springer International Publishing
DOI： 10.1007/978-3-030-55326-5_10 　

ISSN： 1865-0929

eISSN： 1865-0937
An Energy-aware Dynamic Data Allocation Mechanism for Many-channel Memory Systems Peer-reviewed

Masayuki Sato, Takuya Toyoshima, Hikaru Takayashiki, Ryusuke Egawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(4)　4-19　2019/12
Publisher: FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi190401 　

ISSN： 2313-8734
Developing Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture Peer-reviewed

I. V. Afanasyev, Vad. V. Voevodin, Vl. V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

LOBACHEVSKII JOURNAL OF MATHEMATICS　40　(11)　1753-1762　2019/11

DOI： 10.1134/S1995080219110039 　

ISSN： 1995-0802

eISSN： 1818-9962
A Skewed Multi-banked Cache for Many-core Vector Processors Peer-reviewed

Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(3)　86-101　2019/09
Publisher: FSAEIHE South Ural State University (National Research University)
DOI： 10.14529/jsfi190305 　

ISSN： 2313-8734
A layer-adaptable cache hierarchy by a multiple-layer bypass mechanism

Ryusuke Egawa, Ryoma Saito, Masayuki Sato, Hiroaki Kobayashi

PervasiveHealth: Pervasive Computing Technologies for Healthcare　2019/06/06
Publisher: ICST
DOI： 10.1145/3337801.3337820 　

ISSN： 2153-1633
Development and Validation of a Tsunami Numerical Model with the Polygonally Nested Grid System and its MPI-Parallelization for Real-Time Tsunami Inundation Forecast on a Regional Scale Invited

T. Inoue, T. Abe, S. Koshimura, A. Musa, Y. Murashima, H. Kobayashi

Journal of Disaster Research　14　(3)　416-434　2019/03

DOI： 10.20965/jdr.2019.p0416 　

ISSN： 1881-2473

eISSN： 1883-8030
Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines Peer-reviewed

Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, Kazuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

Supercomputing Frontiers and Innovations　6　(1)　36-47　2019/03

DOI： 10.14529/jsfi190106 　
A Hardware Prefetching Mechanism for Vector Gather Instructions. Peer-reviewed

Hikaru Takayashiki, Masayuki Sato 0001, Kazuhiko Komatsu, Hiroaki Kobayashi

9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms(IA3@SC)　59-66　2019
Publisher: IEEE
DOI： 10.1109/IA349570.2019.00015 　
Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA. Peer-reviewed

Osamu Watanabe, Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato 0001, Akihiro Musa, Hiroaki Kobayashi

25-32　2019

DOI： 10.1109/MCHPC49590.2019.00011 　
Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors. Peer-reviewed

Ilya V. Afanasyev, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

Parallel Computing Technologies - 15th International Conference(PaCT)　125-139　2019
Publisher: Springer
DOI： 10.1007/978-3-030-25636-4_10 　
An Appropriate Computing System and Its System Parameters Selection Based on Bottleneck Prediction of Applications. Peer-reviewed

Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato 0001, Hiroaki Kobayashi

IEEE International Parallel and Distributed Processing Symposium Workshops　768-777　2019
Publisher: IEEE
DOI： 10.1109/IPDPSW.2019.00127 　
Perceptron-based Cache Bypassing for Way-Adaptable Caches. Peer-reviewed

Masayuki Sato 0001, Yongcheng Chen, Haruya Kikuchi, Kazuhiko Komatsu, Hiroaki Kobayashi

IEEE Symposium in Low-Power and High-Speed Chips　1-3　2019
Publisher: IEEE
DOI： 10.1109/CoolChips.2019.8721331 　
Perceptron-based Cache Bypassing for Way-Adaptable Caches Peer-reviewed

Masayuki Sato, Yongcheng Chen, Haruya Kikuchi, Kazuhiko Komatsu, Hiroaki Kobayashi

2019 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS 22)　2019

ISSN： 2473-4683
Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA Peer-reviewed

Osamu Watanabe, Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

PROCEEDINGS OF MCHPC'19: 2019 IEEE/ACM WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING (MCHPC)　25-32　2019

DOI： 10.1109/MCHPC49590.2019.00011 　
Performance Evaluation of Tsunami Inundation Simulation on SX-Aurora TSUBASA. Peer-reviewed

Akihiro Musa, Takashi Abe, Takumi Kishitani, Takuya Inoue, Masayuki Sato 0001, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

Computational Science - ICCS 2019 - 19th International Conference, Faro, Portugal, June 12-14, 2019, Proceedings, Part II　363-376　2019
Publisher: Springer
DOI： 10.1007/978-3-030-22741-8_26 　
An Adjacent-Line-Merging Writeback Scheme for STT-RAM-Based Last-Level Caches

Masayuki Sato, Yoshiki Shoji, Zentaro Sakai, Ryusuke Egawa, Hiroaki Kobayashi

IEEE Transactions on Multi-Scale Computing Systems　4　(4)　593-604　2018/10/01
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/TMSCS.2018.2827955 　

ISSN： 2332-7766
Developing Efficient Implementations of Bellman–Ford and Forward-Backward Graph Algorithms for NEC SX-ACE Peer-reviewed

Ilya V. Afanasyev, Alexander S. Antonov, Dmitry A. Nikitenko, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Osamu Watanabe, Akihiro Musa, Hiroaki Kobayashi

SUPERCOMPUTING FRONTIERS AND INNOVATIONS　5　(3)　65-69　2018/10

DOI： 10.14529/jsfi180311 　
A Machine Learning-based Approach for Selecting SpMV Kernels and Matrix Storage Formats Peer-reviewed

Hang Cui, Shoichi Hirasawa, Hiroaki Kobayashi, Hiroyuki Takizawa

IEICE Transactions on Information and Systems　E101-D　(9)　2307-2314　2018/09
メニーコアプロセッサのためのパラメータチューニング時間削減手法

岸谷拓海, 小松一彦, 撫佐昭裕, 佐藤雅之, 小林広明

並列／分散／協調処理に関する『熊本』サマー・ワークショップ　2018/07
マルチベクトルコアプロセッサの共有キャッシュ構成に関する一検討,

高屋敷光, 佐藤雅之, 小松一彦, 江川隆輔, 小林広明

並列／分散／協調処理に関する『熊本』サマー・ワークショップ　2018/07
Expressing the Differences in Code Optimizations between Intel Knights Landing and NEC SX-ACE Processors

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

The 13th World Congress on Computational Mechanics/2nd Pan American Congress on Computational Mechanics　2018/07
An energy-aware set-level refreshing mechanism for eDRAM last-level caches Peer-reviewed

Masayuki Sato, Zehua Li, Ryusuke Egawa, Hiroaki Kobayashi

21st IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL Chips 2018 - Proceedings　1-3　2018/06/05
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/CoolChips.2018.8373082 　
Early Evaluation of a New Vector Processor SX-Aurora TSUBASA Peer-reviewed

Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

International Supercomputing Conference 2018 (ISC18)　2018/06
Performance Evaluation of a Real-Time Tsunami Inundation Forecast System on Modern Supercomputers Peer-reviewed

Akihiro Musa, Takumi Kishitani, Takuya Inoue, Hiroaki Hokari, Masayuki Sato, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

15th Annual Meeting Asia Oceania Geoscience Society　2018/06

DOI： 10.20965/jdr.2018.p0234 　
MIGRATING AN OLD VECTOR CODE TO MODERN VECTOR MACHINES Peer-reviewed

Hiroyuki Takizawa, Kenta Yamaguchi, Takashi Soga, Thorsten Reimannz, Kuzuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

Proceedings of the 30th International Conference on Parallel Computational Fluid Dynamics　2018/05
Real-time tsunami inundation forecast system for tsunami disaster prevention and mitigation Peer-reviewed

Akihiro Musa, Osamu Watanabe, Hiroshi Matsuoka, Hiroaki Hokari, Takuya Inoue, Yoichi Murashima, Yusaku Ohta, Ryota Hino, Shunichi Koshimura, Hiroaki Kobayashi

Journal of Supercomputing　74　(7)　1-21　2018/04/16
Publisher: Springer New York LLC
DOI： 10.1007/s11227-018-2363-0 　

ISSN： 1573-0484 0920-8542
A Real-Time Tsunami Inundation Forecast System Using Vector Supercomputer SX-ACE Peer-reviewed

Akihiro Musa, Takashi Abe, Takuya Inoue, Hiroaki Hokari, Yoichi Murashima, Yoshiyuki Kido, Susumu Date, Shinji Shimojo, Shunichi Koshimura, Hiroaki Kobayashi

Journal of Disaster Research　13　(2)　234-244　2018/03

DOI： 10.20965/jdr.2018.p0234 　

ISSN： 1881-2473

eISSN： 1883-8030
Tsunami inundation and damage forecasting with high-performance computing infrastructure

S. Koshimura, Y. Murashima, A. Musa, R. Hino, Y. Ohta, H. Kobayashi, M. Kachi, Y. Sato

11th National Conference on Earthquake Engineering 2018, NCEE 2018: Integrating Science, Engineering, and Policy　6　3423-3427　2018
Publisher: Earthquake Engineering Research Institute
反応・相変化を伴う多分散系混相流シミュレーションコードの最適化

佐々木, 大輔, 加藤, 季広, 磯部, 洋子, 笠原, 弘貴, 渡部, 広吾輝, 志村, 啓, 奥野, 航平, 松尾, 亜紀子, 江川, 隆輔, 滝沢, 寛之, 小林, 広明

SENAC : 東北大学大型計算機センター広報　51　(1)　47-51　2018/01
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419

More details Close

紀要類（bulletin）
Search Space Reduction for Parameter Tuning of a Tsunami Simulation on the Intel Knights Landing Processor Peer-reviewed

Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

2018 IEEE 12TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2018)　117-124　2018

DOI： 10.1109/MCSoC2018.2018.00030 　
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Peer-reviewed

Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato, Hiroaki Kobayashi

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18)　2018
Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems Peer-reviewed

Muhammad Alfian Amrizal, Atsuya Uno, Yukinori Sato, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E100D　(12)　2749-2760　2017/12

DOI： 10.1587/transinf.2017PAP0002 　

ISSN： 1745-1361
Advances of tsunami inundation forecasting and its future perspectives Peer-reviewed

Shunichi Koshimura, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi, Yoichi Murashima, Akihiro Musa

OCEANS 2017 - Aberdeen　2017-　1-4　2017/10/25
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/OCEANSE.2017.8084753 　
A Multiple-layer Bypass Mechanism for Energy-Efficient Computing

Ryusuke Egawa, Masayuki Sato, Ryoma Saito, Hiroaki Kobayashi

In Proceedings of 26th Workshop on Sustained Simulation Performance　2017/10
Early Evaluation of a Heterogeneous Memory Architecture on a Vector Supercomputer

Ryosuke Sato, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers　2017　20-20　2017/08
Publisher: Organizing Committee of Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers, Japan
DOI： 10.11528/tsjc.2017.0_20 　
A power-aware LLC control mechanism for the 3D-stacked memory system Peer-reviewed

Ryusuke Egawa, Wataru Uno, Masayuki Sato, Hiroaki Kobayashi, Jubee Tada

2016 IEEE International 3D Systems Integration Conference, 3DIC 2016　2017/07/05
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/3DIC.2016.7970034 　
Toward Dynamic Load Balancing across OpenMP Thread Teams for Irregular Workloads Peer-reviewed

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

International Journal of Networking and Computing　7　(2)　387-404　2017/07
Publisher: IJNC Editorial Committee
DOI： 10.15803/ijnc.7.2_387 　

ISSN： 2185-2839

More details Close

In the field of high performance computing, massively-parallel many-core processors such as Intel Xeon Phi coprocessors are becoming popular because they can significantly accelerate various applications. In order to efficiently parallelize applications for such many-core processors, several high-level programming models have been proposed. The de facto standard programming model mainly for shared-memory parallel processing is OpenMP. For hierarchical parallel processing, OpenMP version 4.0 or later allows programmers to create multiple thread teams. Each thread team contains a bunch of newly-created synchronizable threads. When multiple thread teams are used to execute an application, it is important to have dynamic load balancing across thread teams, since static load balancing easily encounters load imbalance across teams, and thus degrades performance. In this paper, we first motivate our work by clarifying the benefit of using multiple thread teams to execute an irregular workload on a many-core processor. Then, we demonstrate that dynamic load balancing across those thread teams has a potential of significantly improving the performance of irregular workloads on a many-core processor, with considering the scheduling overhead. Although such a dynamic load balancing mechanism has not been provided by the current OpenMP specification, the benefits of dynamic load balancing across thread teams are discussed through experiments using the Intel Xeon Phi coprocessor. We evaluate the performance gain of dynamic load balancing across thread teams using a ray tracing code. The results show that such a dynamic load balancing mechanism can improve the performance by up to 14% compared to static load balancing across teams, with considering scheduling overhead.
太陽光及び暑熱同時ばく露に対する熱中症リスク評価シ太陽光及び暑熱同時ばく露に対する熱中症リスク評価シミュレータの開発ミュレータの開発 Peer-reviewed

西尾渉, 小寺紗千子, 平田晃正, 佐々木大輔, 山下毅, 江川隆輔, 小林広明, 曽根秀昭

電子情報通信学会和文論文誌C　J100-C　(5)　208-216　2017/05
Effects of Using a Memory-Stalled Core for Handling MPI Communication Overlapping in The SOR Solver Peer-reviewed

Takashi Soga, Kenta Yamaguchi, Raghunandan Mathur, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

Proceedings of The 29th International Conference on Parallel Computational Fluid Dynamics (ParallelCFD 2017)　2017/05
人体太陽光および暑熱同時ばく露による熱中症リスク評価の高速化 Peer-reviewed

西尾渉, 小寺紗千子, 平田晃正, 佐々木大輔, 山下毅, 江川隆輔, 曽根秀昭, 小林広明

電子情報通信学会論文誌 C　J100-C　(5)　208-216　2017/04
シナリオテンプレートを用いた自動チューニングに関する研究

Daichi Sato, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IPSJ National Convention　2017　(1)　45-46　2017/03
A STUDY ON APPLICABILITY OF A TSUNAMI INUNDATION MODEL WITH THE POLYGONALLY NESTED GRID SYSTEM AND ITS MPI-PARALLELIZATION TO NATION-WIDE TSUNAMI FORECAST AT MULTIPLE GRID RESOLUTIONS Peer-reviewed

INOUE Takuya, ABE Takashi, KOSHIMURA Shunichi, MUSA Akihiro, MURASHIMA Yoichi, KOBAYASHI Hiroaki

Journal of Japan Society of Civil Engineers, Ser. B2 (Coastal Engineering)　73　(2)　I_319-I_324　2017
Publisher: Japan Society of Civil Engineers
DOI： 10.2208/kaigan.73.I_319 　

More details Close

 Applicability of a tsunami inundation model with the polygonally nested grid system and its MPI-parallelization to nation-wide tsunami forecast was examined in terms of accuracy and computational costs through tsunami simulation at multiple grid resolutions of 270, 90 and 30 m. The computation efficiency of the tsunami model, in which the configuration of the grid system is extended from rectangular to polygonal regions so that deployment of high-resolution grids is confined to coastal lowland, was further improved by about 14 %. This paper also proposes an automatic way of setting the polygonally nested grid system, and elucidates that it requires 140 Tflop/s supercomputer resources to complete tsunami inundation forecast for the entire coast of Japan at resolution of 30-meter grids within 10 minutes.
Optimization of a tsunami inundation model with the polygonally nested grid system and MPI parallelization Peer-reviewed

Takuya Inoue, Takashi Abe, Shunichi Koshimura, Akihiro Musa, Yoichi Murashima, Hiroaki Kobayashi

Proceedings of International Tsunami Symposium 2017　2017

DOI： 10.1109/OCEANSE.2017.8084753 　
Rapid Tsunami Inundation and Damage Estimation System with High-performance Computing and Networking Peer-reviewed

Shunichi Koshimura, Yoichi Murashima, Akihiro Musa, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi, Masahiro Kachi, Yoshihiro Sato

Proceedings of International Tsunami Symposium 2017　2017
An Application-adaptive Data Allocation Method for Multi-channel Memory Peer-reviewed

Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

2017 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS)　2017

DOI： 10.1109/CoolChips.2017.7946381 　

ISSN： 2473-4683
An Adjacent-Line-Merging Writeback Scheme for STT-RAM Last-Level Caches Peer-reviewed

Masayuki Sato, Zentaro Sakai, Ryusuke Egawa, Hiroaki Kobayashi

2017 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS)　2017

DOI： 10.1109/CoolChips.2017.7946380 　

ISSN： 2473-4683
Performance and Power Analysis of SX-ACE using HP-X Benchmark Programs Peer-reviewed

Ryusuke Egawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi, Yoko Isobe, Toshihiro Kato, Souya Fujimoto

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　693-700　2017

DOI： 10.1109/CLUSTER.2017.65 　

ISSN： 1552-5244
Performance Evaluation of Quantum ESPRESSO on NEC SX-ACE Peer-reviewed

Osamu Watanabe, Akihiro Musa, Hiroaki Hokari, Shivanshu Singh, Raghunandan Mathur, Hiroaki Kobayashi

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　701-708　2017

DOI： 10.1109/CLUSTER.2017.57 　

ISSN： 1552-5244
Vectorization-aware Loop Optimization with User-defined Code Transformations Peer-reviewed

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)　685-692　2017

DOI： 10.1109/CLUSTER.2017.102 　

ISSN： 1552-5244
Program optimization of numerical turbine for vector supercomputer SX-ACE Peer-reviewed

Yuta Sakaguchi, Kenryo Kataumi, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi, Satoru Yamamoto

Computers & Fluids　2017
A Directive Generation Approach to High Code-Maintainability for Various HPC Systems. Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Int. J. Netw. Comput.　7　(2)　405-418　2017
Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE. Peer-reviewed

Ryusuke Egawa, Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

The Journal of Supercomputing　73　(9)　3948-3976　2017

DOI： 10.1007/s11227-017-1993-y 　
Directive Translation for Various HPC Systems Using the Xevolver Framework Invited

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2016　109-117　2016/12

DOI： 10.1007/978-3-319-46735-1_9 　
Making a Legacy Code AUto-tunable without Messing It Up Peer-reviewed

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, Hiroaki Kobayashi

Proceedings of the 29th International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)　2016/11
高バンド幅メモリのための省電力データ配置手法に関する研究

豊嶋拓也, 佐藤雅之, 江川隆輔, 小林広明

東北支部大会連合大会予稿集　2016　39-39　2016/08
Publisher:
DOI： 10.11528/tsjc.2016.0_39 　
Message from the organizing committee chair Peer-reviewed

Hiroaki Kobayashi

19th IEEE Symposium on Low-Power and High-Speed Chips, IEEE COOL Chips 2016 - Proceedings　i-ii　2016/07/05
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/CoolChips.2016.7503663 　
Effects of Stacking Granularity on 3-D Stacked Floating-point Fused Multiply Add Units Peer-reviewed

Jubee Tada, Maiki Hosokawa, Ryusuke Egawa, Hiroaki Kobayashi

Proceedings of International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART 2016)　2016/07
Performance Optimization of Numerical Turbine for Supercomputer SX-ACE Peer-reviewed

Y. Sakaguchi, K. Kataumi, H. Matsuoka, O. Watanabe, A. Musa, K. Komatsu, R. Egawa, H. Kobayashi, S. Yamamoto

Proceedings of the 28th International Conference on Parallel Computational Fluid Dynamics　2016/05
A Power-Performance Tradeoff of HBM by Limiting Access Channels Peer-reviewed

Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of IEEE Symposium on Low-Power and High-Speed Chips　2016/04
A Bypassing Mechanism for Application-Adaptive Cache Resizing Peer-reviewed

Masayuki Sato, Takumi Takai, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

The IEICE Transactions on Information and Systems　J99-D　(3)　337-347　2016/03

DOI： 10.14923/transinfj.2014JDP7131 　
A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE Peer-reviewed

Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

International Journal of Networking and Computing　6　(2)　243-262　2016/02
機械学習を用いたコード変換に関する研究

川原畑勇希, 平澤将一, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会講演論文集　2016　227-227　2016
Publisher: 電気関係学会東北支部連合大会実行委員会
DOI： 10.11528/tsjc.2016.0_227 　
多角形領域接続・MPI並列による広域津波解析の効率化 Peer-reviewed

井上拓也, 阿部孝志, 越村俊一, 撫佐昭裕, 村嶋陽一, 小林広明

土木学会論文誌B2　72　(2)　I_373-I_378　2016
Publisher: Japan Society of Civil Engineers
DOI： 10.2208/kaigan.72.I_373 　

More details Close

 This paper elucidated that it requires 2 Pflop/s supercomputer resources to complete tsunami inundation forecast for the entire coast of Japan at resolution of 10-meter grids within 10 minutes if we adopt a numerical model solving non-linear shallow water equations. Therefore, we improved efficiency of the model by extending the geometry of calculation regions from rectanglar to polygonal so that deployment of high-resolution grids is confined to coastal lowland, and validated its accuracy in comparison to the existing model. A wide-area tsunami simulation on the prefectural scale resulted in over 3 times more efficient, and the possibility of nation-wide tsunami inundation forecast was indicated.
ディレクティブに基づくステンシル計算の性能パラメータ自動設定 Peer-reviewed

角川拓也, 平澤将一, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　9　(4)　25-37　2016
Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework. Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

Int. J. Netw. Comput.　6　(2)　167-180　2016
A Code Selection Mechanism Using Deep Learning Peer-reviewed

Hang Cui, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC)　385-392　2016

DOI： 10.1109/MCSoC.2016.46 　
A Cache Partitioning Mechanism to Protect Shared Data for CMPs Peer-reviewed

Masayuki Sato, Shin Nishimura, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS XIX)　2016

DOI： 10.1109/CoolChips.2016.7503674 　

ISSN： 2473-4683
A User-Defined Code Transformation Approach to Overlapping MPI Communication with Computation Peer-reviewed

Yasuharu Hayashi, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　508-514　2016

DOI： 10.1109/CANDAR.2016.35 　

ISSN： 2379-1888
A Directive Generation Approach Using User-defined Rules Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　515-521　2016

DOI： 10.1109/CANDAR.2016.94 　

ISSN： 2379-1888
The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads Peer-reviewed

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　529-535　2016

DOI： 10.1109/CANDAR.2016.48 　

ISSN： 2379-1888
Parallel processing model for cholesky decomposition algorithm in AlgoWiki project Peer-reviewed

Alexander S. Antonov, Alexey V. Frolov, Hiroaki Kobayashi, Igor N. Konshin, Alexey M. Teplov, Vadim V. Voevodin, Vladimir V. Voevodin

Supercomputing Frontiers and Innovations　3　(3)　61-70　2016
Publisher: South Ural State University, Publishing Center
DOI： 10.14529/jsfi160307 　

ISSN： 2313-8734 2409-6008
Performance Evaluation of Compiler-Assisted OpenMP Codes on Various HPC Systems Invited

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2015　147-157　2015/12

DOI： 10.1007/978-3-319-20340-9_12 　
A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning Peer-reviewed

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E98D　(12)　2178-2186　2015/12

DOI： 10.1587/transinf.2015PAP0028 　

ISSN： 1745-1361
A Real-Time Tsunami Inundation Forecast System for Tsunami Disaster and Mitigation Peer-reviewed

Akihiro Musaa, Hiroshi Matsuoka, Osamu Watanabe, Yoichi Murashima, Shunichi Koshimura, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi

the 28th International Conference for High Performance Computing, Networking, Storage and Analysis (SC15)　2015/11
An Approach to the Highest Efficiency of the HPCG Benchmark on the SX-ACE Supercomputer Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Yoko Isobe, Ryusei Ogata, Hiroyuki Takizawa, Hiroaki Kobayashi

the 28th International Conference for High Performance Computing, Networking, Storage and Analysis (SC15)　2015/11
三次元積層時代における高電力効率メモリ階層設計

宇野渉, 佐藤雅之, 江川隆輔, 小林広明

信学技報　115　(271)　19-24　2015/10
Publisher:
ISSN： 0913-5685
マルチコアプロセッサのためのスレッド間共有データを考慮したキャッシュ機構

西村秦, 佐藤雅之, 江川隆輔, 小林広明

研究報告計算機アーキテクチャ（ARC）　2015-ARC-216　(38)　1-8　2015/08
FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms Peer-reviewed

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON ELECTRONICS　E98C　(7)　550-558　2015/07

DOI： 10.1587/transele.E98.C.550 　

ISSN： 1745-1353
Xevolver による実アプリケーションの性能と保守性の両立

平澤将一, 滝沢寛之, 小林広明

計算工学講演会論文集　20　4p　2015/06
Publisher:
Performance Evaluation of an OpenMP Parallelization by Using Automatic Parallelization Information

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2014　119-126　2015
Publisher: Springer International Publishing
DOI： 10.1007/978-3-319-10626-7_10 　
Code Optimization Activities Toward a High Sustained Simulation Performance

Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2015　159-168　2015
Publisher: Springer International Publishing
DOI： 10.1007/978-3-319-20340-9_13 　
Design of a 3-D Stacked Floating-point Goldschmidt Divider Peer-reviewed

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

2015 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC 2015)　2015

ISSN： 2164-0157
A Data Management Policy for Energy-Efficient Cache Mechanisms

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2015　61-75　2015

DOI： 10.1007/978-3-319-20340-9_6 　
Xevolver を用いた自動チューニング

平澤将一, 肖熊, 滝沢寛之, 小林広明

計算工学会学会誌「計算工学」　20　(2)　14-17　2015
Identication and elimination of platform-specic code smells in high performance computing applications Peer-reviewed

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

International Journal of Networking and Computing　5　(1)　180-199　2015
Publisher: IJNC Editorial Committee
DOI： 10.15803/ijnc.5.1_180 　

ISSN： 2185-2839

More details Close

A code smell is a code pattern that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help users find which parts of their application codes should be refactored. However, code smells have not been defined in a formal manner. Moreover, existing detection tools are designed mainly for object-oriented applications, but rarely provided for high performance computing (HPC) applications. HPC applications are usually optimized for a particular platform to achieve a high performance, and hence have special code smells called platform-specific code smells (PSCSs). The purpose of this work is to develop a code smell alert system to help users find PSCSs of HPC applications to improve the performance portability across different platforms. This paper presents a PSCS alert system that is based on an abstract syntax tree (AST) and XML. Code patterns of PSCSs are defined in a formal way using the AST information represented in XML. XML Path Language (XPath) is used to describe those patterns. A database is built to store the transformation recipes written in XSLT files for eliminating detected PSCSs. The recall and precision evaluation results obtained by using real applications show that the proposed system can detect potential PSCSs accurately. The evaluation on performance portability of real applications demonstrates that eliminating PSCSs leads to significant performance changes and therefore the code portions with detected PSCSs have to be refactored to improve the performance portability across multiple platforms.
Optimized Data Transfers Based on the OpenCL Event Management Mechanism Peer-reviewed

Hiroyuki Takizawa, Shoichi Hirasawa, Makoto Sugawara, Isaac Gelado, Hiroaki Kobayashi, Wen-mei W. Hwu

SCIENTIFIC PROGRAMMING　2015　(576498)　2015

DOI： 10.1155/2015/576498 　

ISSN： 1058-9244

eISSN： 1875-919X
Real-time tsunami inundation forecasting and damage estimation method by fusion of real-time crustal deformation monitoring and high-performance computing Peer-reviewed

S. Koshimura, R. Hino, Y. Ohta, H. Kobayashi, A. Musa, Y. Murashima

the 26th International Union of Geodesy and Geophysics　2015
Expressing system-awareness as code transformations for performance portability across diverse HPC Peer-reviewed

Hiroyuki Takizawa, Shoichi Hirasawa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi

Workshop on Portability Among HPC Architectures for Scientific Applications　2015
Combining code refactoring and auto-tuning to improve performance portability of high-performance computing applications Peer-reviewed

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

The Sixth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking (COMPUTATION TOOLS 2015)　2015
Automatic Parameter Tuning of Hierarchical Incremental Checkpointing Peer-reviewed

Alfian Amrizal, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2014　8969　298-309　2015

DOI： 10.1007/978-3-319-17353-5_25 　

ISSN： 0302-9743
A Verification Framework for Streamlining Empirical Auto-tuning Peer-reviewed

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　508-514　2015

DOI： 10.1109/CANDAR.2015.115 　

ISSN： 2379-1888
Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　515-520　2015

DOI： 10.1109/CANDAR.2015.102 　

ISSN： 2379-1888
A Case Study of Memory Optimization for Migration of a Plasmonics Simulation Application to SX-ACE Peer-reviewed

Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　521-527　2015

DOI： 10.1109/CANDAR.2015.105 　

ISSN： 2379-1888
A Case Study of User-Defined Code Transformations for Data Layout Optimizations Peer-reviewed

Takeshi Yamada, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR)　535-541　2015

DOI： 10.1109/CANDAR.2015.96 　

ISSN： 2379-1888
An Energy-Efficient Dynamic Memory Address Mapping Mechanism Peer-reviewed

Masayuki Sato, Chengguang Han, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2015 IEEE SYMPOSIUM ON LOW-POWER AND HIGH-SPEED CHIPS　2015

DOI： 10.1109/CoolChips.2015.7158660 　
Designing an HPC Refactoring Catalog Toward the Exa-scale Computing Era

Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Sustained Simulation Performance 2014　91-98　2014/11

DOI： 10.1007/978-3-319-10626-7_8 　
Early Evaluation of the SX-ACE Processor Peer-reviewed

Ryusuke Egawa, Shintaro Momose, Kazuhiko Komatsu, Yoko Isobe, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi

the 27th International Conference for High Performance Computing, Networking, Storage and Analysis (SC14)　2014/11
MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications Peer-reviewed

Ye Gao, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E97D　(11)　2835-2843　2014/11

DOI： 10.1587/transinf.2014EDP7227 　

ISSN： 1745-1361
ベクトル型メディアプロセッサの低消費電力化に関する研究

宇野渉, 高也, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会予稿集　2014/08
キャッシュメモリにおけるスレッド間共有データの管理に関する研究

西村秦, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

電気関係学会東北支部連合大会予稿集　2014/08
Exploring system architectures for next-generation CFD simulations in the postpeta-scale era Peer-reviewed

KOMATSU Kazuhiko, EGAWA Ryusuke, TAKIZAWA Hiroyuki, SOGA Takashi, MUSA Akihiro, KOBAYASHI Hiroaki

Journal of Fluid Science and Technology　9　(5)　JFST0073-JFST0073　2014
Publisher: The Japan Society of Mechanical Engineers
DOI： 10.1299/jfst.2014jfst0073 　

ISSN： 1880-5558

More details Close

CFD simulations with uniform grids have been paid attention as a next-generation CFD simulation on a large-scale supercomputing system. The Building-Cube Method (BCM) is one of the next-generation CFD methods. The basic idea is to balance loads of calculations among processing elements on a supercomputing system by dividing the whole calculations into many parallel tasks with the same amount of computation. Thus, it is suitable for highly parallel computation on supercomputing systems. This paper firstly implements BCM on five supercomputing systems as an example of a next-generation CFD simulation in the upcoming postpeta-scale era. Then, by theoretical analyses and performance evaluations, this paper clarifies the requirements of future supercomputing systems for a next-generation CFD simulation. The performance evaluations show that as the number of processing elements increases, the imbalance of data exchanges among nodes becomes more serious than that of calculations even in a next-generation CFD simulation. While the calculation time can ideally be reduced according to the number of processing elements, the data transfer time becomes dominant in the total execution time. Different from the massively-parallel system architecture, the number of nodes in a system should be as small as possible to prevent the data transfer. The performance analyses also show that the memory bandwidth limits the performance of BCM and use of an on-chip memory is effective to improve the performance. A memory subsystem that achieves a higher sustained memory bandwidth is required. Therefore, a supercomputing system that consists of a small number of high-performance nodes is essential to achieve high sustained performance of the next-generation CFD in the up coming postpeta-scale era by reducing the data transfers, which becomes eventually a bottleneck in large-scale simulation.
On-Chip Checkpointing with 3D-Stacked Memories Peer-reviewed

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　1-6　2014

DOI： 10.1109/3DIC.2014.7152173 　

ISSN： 2164-0157
OpenMP Parallelization Method using Compiler Information of Automatic Optimization Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Legacy HPC Application Migration 2014　2014
Real-time tsunami inundation forecasting and damage mapping towards enhancing tsunami disaster resilience Peer-reviewed

S. Koshimura, R. HIno, Y. Ohta, H. Kobayashi, A. Musa, Y.Murashima

American Geophysical Union Fall Meeting　2014
An Approach to Customization of Compiler Directives for Application-Specific Code Transformations Peer-reviewed

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC)　99-106　2014

DOI： 10.1109/MCSoC.2014.23 　
Xevolver: An XML-based Code Translation Framework for Supporting HPC Application Migration Peer-reviewed

Hiroyuki Takizawa, Shoichi Hirasawa, Yasuharu Hayashi, Ryusuke Egawa, Hiroaki Kobayashi

2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC)　2014

DOI： 10.1109/HiPC.2014.7116902 　

ISSN： 1094-7256
A compiler-assisted OpenMP migration method based on automatic parallelizing information Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　8488　450-459　2014
Publisher: Springer Verlag
DOI： 10.1007/978-3-319-07518-1_30 　

ISSN： 1611-3349 0302-9743
A Platform-Specific Code Smell Alert System for High Performance Computing Applications Peer-reviewed

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)　653-662　2014

DOI： 10.1109/IPDPSW.2014.76 　
An Energy Optimization Method for Vector Processing Mechanisms Peer-reviewed

Ye Gao, Masayuki Satoi, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 IEEE COOL CHIPS XVII　2014

DOI： 10.1109/CoolChips.2014.6842957 　

ISSN： 2473-4683
On-Chip Checkpointing with 3D-Stacked Memories Peer-reviewed

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　2014

DOI： 10.1109/3DIC.2014.7152173 　

ISSN： 2164-0157
An Impact of Circuit Scale on the Performance of 3-D Stacked Arithmetic Units Peer-reviewed

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　2014

ISSN： 2164-0157
An XML-based Programming Framework for User-defined Code Transformations Peer-reviewed

Hiroyuki Takizawa, Xiong Xiao, Shoichi Hirasawa, Hiroaki Kobayashi

The 4th AICS International Symposium　2013/12/02
複合システムにおけるチェックポイントリスタート Peer-reviewed

滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

日本信頼性学会誌　35　(12)　515-516　2013/12

DOI： 10.11348/reajshinrai.35.8_515 　
三次元LSIの課題と高信頼化 Peer-reviewed

小柳光正, 小林広明, 末吉敏則, 鎌田忠

日本信頼性学会誌　35　(8)　471-471　2013/12
Publisher: Reliability Engineering Association of Japan (REAJ)
DOI： 10.11348/reajshinrai.35.8_471 　

ISSN： 0919-2697
Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi

International Conference on Fluid Dynamics(ICFD2013), November 27　2013/11/27
Xevolver : an XML-based Programming Framework for Software Evolution Peer-reviewed

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

Supercomputing Conference 2013 (SC13)　2013/11
An Automatic Performance Tracking System for Software Evolution Peer-reviewed

平澤将一, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム（ACS）　6　(4)　96-104　2013/10/30

ISSN： 1882-7829
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts Peer-reviewed

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E96D　(9)　2047-2054　2013/09

DOI： 10.1587/transinf.E96.D.2047 　

ISSN： 1745-1361
ブロックバイパス機構によるキャッシュのエネルギ効率化に関する研究

高井拓実, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

並列/分散/協調処理に関する「北九州」サマー・ワークショップ (SWoPP2013)　1-9　2013/07
Autotuning for Improving the Fault Tolerance of Large-scale Simulations Peer-reviewed

Hiroyuki Takizawa, Alfian Amrizal, Shoichi Hirasawa, Hiroaki Kobayashi

Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@2HPC)　2013/05
An Automatic Performance Tracking System for Scientific Software Evolution Peer-reviewed

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@2HPC)　2013/05
An IDE Integrated Cross-Platform Build System for Scientific Applications Peer-reviewed

Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

SIAM CSE2013 Minisymposium on Auto-tuning Technologies for Tools and Development Environment in Extreme-Scale Scientific Computing　2013/02
Performance Evaluation of a Next-Generation CFD on Various Supercomputing Systems

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2012　123-132　2013
Publisher: Springer Berlin Heidelberg
DOI： 10.1007/978-3-642-32454-3_11 　
Analysing the performance improvements of optimizations on modern HPC systems Peer-reviewed

Kazuhiko Komatsu, Toshihide Sasaki, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Sustained Simulation Performance 2013 - Proceedings of the Joint Workshop on Sustained Simulation Performance　13-25　2013
Publisher: Springer Science and Business Media, LLC
DOI： 10.1007/978-3-319-01439-5-2 　
Feasibility study of future HPC systems for memory-intensive applications Peer-reviewed

Hiroaki Kobayashi

Sustained Simulation Performance 2013 - Proceedings of the Joint Workshop on Sustained Simulation Performance　3-11　2013
Publisher: Springer Science and Business Media, LLC
DOI： 10.1007/978-3-319-01439-5-1 　
Exploring a design space of 3-D stacked vector processors Peer-reviewed

Ryusuke Egawa, Jubee Tada, Hiroaki Kobayashi

Sustained Simulation Performance 2012 - Proceedings of the Joint Workshop on High Performance Computing on Vector Systems, and Workshop on Sustained Simulation Performance　35-49　2013
Publisher: Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-32454-3-4 　
Message from the organizing committee chair Peer-reviewed

Hiroaki Kobayashi

IEEE Symposium on Low-Power and High-Speed Chips - Proceedings for 2013 COOL Chips XVI　i-ii　2013

DOI： 10.1109/CoolChips.2013.6547906 　
ClMPI: An opencl extension for interoperation with the message passing interface Peer-reviewed

Hiroyuki Takizawa, Makoto Sugawara, Shoichi Hirasawa, Isaac Gelado, Hiroaki Kobayashi, Wen-Mei W. Hwu

Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013　1138-1148　2013
Publisher: IEEE Computer Society
DOI： 10.1109/IPDPSW.2013.183 　
Power and Performance Evaluation of 3-D Stacked Floating-point Multipliers Peer-reviewed

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

IEEE Computer Society Annual Symposium on VLSI (ISLVLSI2013)　218-223　2013
Design and Evaluation of a Media-oriented Vector Processor with a Multi-banked Cache Memory Peer-reviewed

Ye Gao, Naold Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2013 IEEE 11TH SYMPOSIUM ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA (ESTIMEDIA)　78-87　2013

DOI： 10.1109/ESTIMedia.2013.6704506 　

ISSN： 2325-1271
Vertically Integrated Processor and Memory Module Design for Vector Supercomputers Peer-reviewed

Ryusuke Egawa, Masayuki Sato, Jubee Tada, Hiroaki Kobayashi

2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　1-8　2013

ISSN： 2164-0157
Design of a 3-D Stacked Floating-Point Adder Peer-reviewed

Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC)　1-5　2013

ISSN： 2164-0157
Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD Peer-reviewed

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi

International Conference on Fluid Dynamics(ICFD2013)　2013
Performance evaluation of phase-based correspondence matching on GPUs Peer-reviewed

Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXVI　8856　2013

DOI： 10.1117/12.2023550 　

ISSN： 0277-786X

eISSN： 1996-756X
A comparison of performance tunabilities between OpenCL and OpenACC Peer-reviewed

Makoto Sugawara, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013　147-152　2013
Publisher: IEEE Computer Society
DOI： 10.1109/MCSoC.2013.31 　
A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms Peer-reviewed

Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2013 IEEE COOL CHIPS XVI (COOL CHIPS)　2013

DOI： 10.1109/CoolChips.2013.6547923 　

ISSN： 2473-4683
Performance Portability Issues on Modern HPC Systems

小松一彦, 江川隆輔, 安田一平, 撫佐昭裕, 松岡浩司, 小林広明

情報処理学会研究報告(CD-ROM)　2012　(4)　ROMBUNNO.HPC-136,NO.27　2012/12/15

ISSN： 2186-2583
ウェイ適応型キャッシュの高エネルギ効率化のためのデッドブロック早期追い出しポリシ Peer-reviewed

東方雄亮, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤シンポジウムSACSIS2012　2012　4-5　2012/05
メタ情報拡散に基づくP2P型自己組織化サービス資源検索機構 Peer-reviewed

稲葉勉, 村田善智, 滝沢寛之, 小林広明

電子情報通信学会論文誌 D　J95-D　(5)　1110-1122　2012/05
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 1880-4535
A bypass mechanism for way-adaptable caches Peer-reviewed

Takumi Takai, Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips XV　2012/04
A Runtime Dependency Analysis Method for Task Parallelization of OpenCL Programs Peer-reviewed

Katuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

IPSJ Transactions on Computing Systems　5　(1)　53-67　2012/01/27
Publisher:
ISSN： 1882-7829
A Runtime Dependency Analysis Method for Task Parallelization of OpenCL Programs Peer-reviewed

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌論文誌コンピューティングシステム(ACS)　5　(1)　53-67　2012/01/27
Publisher:
ISSN： 1882-7829
Performance and scalability analysis of a chip multi vector processor Peer-reviewed

Yoshiei Sato, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2011　3-20　2012
Publisher: Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-22244-3-1 　
A prototype implementation of OpenCL for SX vector systems Peer-reviewed

Hiroyuki Takizawa, Ryusuke Egawa, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2011　41-50　2012
Publisher: Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-22244-3-3 　
A media-oriented vector architectural extension with a high bandwidth cache system Peer-reviewed

Ye Gao, Naoki Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Symposium on Low-Power and High-Speed Chips - Proceedings for 2012 IEEE COOL Chips XV　1-3　2012

DOI： 10.1109/COOLChips.2012.6216588 　
Exploring design space of a 3D stacked vector cache Peer-reviewed

Ryusuke Egawa, Jubee Tada, Yusuke Endo, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012　1475-1477　2012

DOI： 10.1109/SC.Companion.2012.270 　
Performance Evaluation of BCM on Various Supercomputing Systems Peer-reviewed

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

Proceedings of 24th International Conference on Parallel Computational Fluid Dynamics　2012
An out-of-order vector processing mechanism for multimedia applications Peer-reviewed

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

CF '12 - Proceedings of the ACM Computing Frontiers Conference　233-235　2012

DOI： 10.1145/2212908.2212941 　
A capacity-efficient insertion policy for dynamic cache resizing mechanisms Peer-reviewed

Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

CF '12 - Proceedings of the ACM Computing Frontiers Conference　265-267　2012

DOI： 10.1145/2212908.2212949 　
GPU IMPLEMENTATION OF PHASE-BASED STEREO CORRESPONDENCE AND ITS APPLICATION Peer-reviewed

Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012)　1697-1700　2012

DOI： 10.1109/ICIP.2012.6467205 　

ISSN： 1522-4880
Improving the Scalability of Transparent Checkpointing for GPU Computing Systems Peer-reviewed

Alfian Amrizal, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY　2012

ISSN： 2159-3442
A Network Clustering Algorithm for Sybil-Attack Resisting Peer-reviewed

Ling Xu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E94D　(12)　2345-2352　2011/12

DOI： 10.1587/transinf.E94.D.2345 　

ISSN： 0916-8532

eISSN： 1745-1361
Performance of building cube method on various platforms Peer-reviewed

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

The 8th International Conference on Flow Dynamics 2011 (ICFD2011)　2011/11
An automatic task assignment method for heterogeneous computing systems Peer-reviewed

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

The 8th International Conference on Flow Dynamics 2011 (ICFD2011)　2011/11
Job Scheduling with Migration for Heterogeneous Computing Systems Peer-reviewed

kentaro Koyama, Katuto Sato, Kazuhiko Komatsu, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

IPSJ Transactions on Computing Systems　4　(4)　203-213　2011/10/05
Publisher:
ISSN： 1882-7829
A Patch-Based Bit Mask Filtering Method for Micropolygon Rasterization Peer-reviewed

Jiali Yao, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of High-Performance Graphics(HPG)　2011/08
Performance of SOR methods on modern vector and scalar processors Peer-reviewed

Takashi Soga, Akihiro Musa, Koki Okabe, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　215-221　2011/06

DOI： 10.1016/j.compfluid.2010.12.024 　

ISSN： 0045-7930
Parallel processing of the Building-Cube Method on a GPU platform Peer-reviewed

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

COMPUTERS & FLUIDS　45　(1)　122-128　2011/06

DOI： 10.1016/j.compfluid.2010.12.019 　

ISSN： 0045-7930
ウェイ適応型キャッシュのための低消費エネルギ指向挿入ポリシ Peer-reviewed

東方雄亮, 佐藤雅之, 江川隆輔, 滝沢寛之, 小林広明

先進的計算基盤シンポジウムSACSIS2011　2011　213-214　2011/05
A Power-Aware Insertion Policy for the Way-Adaptable Caches Peer-reviewed

Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of COOLChips XIV　2011/04
Energy Consumption of a Chip Multi-Vector Processor Using Real Applications

永岡龍一, 佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 小林広明

情報処理学会研究報告(CD-ROM)　2010　(5)　ROMBUNNO.ARC-192,NO.3　2011/02/15

ISSN： 2186-2583
A Self-Organized Overlay Network Management Mechanism for Heterogeneous Environments Peer-reviewed

Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

IPSJ Journal　52　(2)　320-333　2011/02
Publisher: Information and Media Technologies Editorial Board
DOI： 10.11185/imt.6.546 　

More details Close

The technologies of Cloud Computing and NGN are now growing a paradigm shift where various services are provided to business users over the network. In conjunction with this movement, many studies are active to realize a ubiquitous computing environment in which a huge number of individual users can share their computing resources on the Internet, such as personal computers (PCs), game consoles, sensors and so on. To realize an effective resource discovery mechanism for such an environment, this paper presents an adaptive overlay network that enables a self-organizing resource management system to efficiently adapt to a heterogeneous environment. The proposed mechanism is composed of two functions. One is to adjust the number of logical links of a resource, which forward search queries so that less-useful query flooding can be reduced. The other is to connect resources so as to decrease the communication latency on the physical network rather than the number of query hops on an overlay network. To further improve the discovery efficiency, this paper integrates these functions into a self-organizing resource management system, SORMS, which has been proposed in our previous work. The simulation results indicate that the proposed mechanism can increase the number of discovered resources by 60% without decreasing the discovery efficiency, and can reduce the total communication traffic by 80% compared with the original SORMS. This performance improvement is obtained by efficient control of logical links in a large scale network.
A High-Performance Volunteer Computing Environment with a Dynamic Load-Balancing Mechanism Peer-reviewed

Yoshitomo Murata, Yuki Ishimori, Hiroyuki Takizawa, Hiroaki Kobayashi

IPSJ Journal　52　(2)　401-414　2011/02
Performance Evaluation of Real-Time Stereo Correspondence on GPU

Tohoku-Section Joint Convention Record of Institutes of Electrical and Information Engineers, Japan　2011　31-31　2011
Publisher: Organizing Committee of Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers, Japan
DOI： 10.11528/tsjc.2011.0_31 　
Power-aware dynamic cache partitioning for CMPs Peer-reviewed

Isao Kotera, Kenta Abe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)　6590　135-153　2011

DOI： 10.1007/978-3-642-19448-1_8 　

ISSN： 0302-9743 1611-3349
Large scaled computation of incompressible flows on Cartesian mesh using a vector-parallel supercomputer Peer-reviewed

Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

Lecture Notes in Computational Science and Engineering　74　332-338　2011

DOI： 10.1007/978-3-642-14438-7-35 　

ISSN： 1439-7358
A self-organized overlay network management mechanism for heterogeneous environments Peer-reviewed

Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

Journal of Information Processing　19　(0)　25-38　2011
Publisher: Information Processing Society of Japan
DOI： 10.2197/ipsjjip.19.25 　

ISSN： 1882-6652 0387-5806
A Performance Tuning Strategy Based on the Roofline Model for Vector Processors Peer-reviewed

Yosiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

情報処理学会論文誌：コンピューティングシステム(ACS)　4　(3)　77-87　2011

ISSN： 1882-7772
OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法 Peer-reviewed

佐藤功人, 小松一彦, 滝沢寛之, 小林広明

情報処理学会論文誌コンピューティングシステム(ACS)　5　(1)　53-67　2011/01
A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems Peer-reviewed

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011　135-142　2011

DOI： 10.1109/ISPA.2011.36 　
CheCL: Transparent checkpointing and process migration of OpenCL applications Peer-reviewed

Hiroyuki Takizawa, Kentaro Koyama, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011　864-876　2011

DOI： 10.1109/IPDPS.2011.85 　
Effects of 3-D stacked vector cache on energy consumption Peer-reviewed

Ryusuke Egawa, Yusuke Funaya, Ryuichi Nagaoka, Yusuke Endo, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

2011 IEEE International 3D Systems Integration Conference, 3DIC 2011　2011

DOI： 10.1109/3DIC.2012.6263026 　
A middle-grain circuit partitioning strategy for 3-D integrated floating-point multipliers Peer-reviewed

Jubee Tada, Ryusuke Egawa, Kazushige Kawai, Hiroaki Kobayashi, Gensuke Goto

2011 IEEE International 3D Systems Integration Conference, 3DIC 2011　2011

DOI： 10.1109/3DIC.2012.6263031 　
A performance tuning strategy under combining loop transforms for a vector processor with an on-chip cache Peer-reviewed

Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC10)　2010/11
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering Peer-reviewed

SUZUKI Ken-ichi, KAERIYAMA Yoshiyuki, KOMATSU Kazuhiko, EGAWA Ryusuke, OHBA Nobuyuki, KOBAYASHI Hiroaki

IEICE Transactions on Information and Systems　93　(4)　891-902　2010/04/01
Publisher: The Institute of Electronics, Information and Communication Engineers
DOI： 10.1587/transinf.E93.D.891 　

ISSN： 0916-8532

More details Close

Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive dynamic scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering Peer-reviewed

Ken-ichi Suzuki, Yoshiyuki Kaeriyama, Kazuhiko Komatsu, Ryusuke Egawa, Nobuyuki Ohba, Hiroaki Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E93D　(4)　891-902　2010/04

DOI： 10.1587/transinf.E93.D.891 　

ISSN： 1745-1361
The vector computing cloud: Toward a vector meta-computing environment Peer-reviewed

Ryusuke Egawa, Manabu Higashida, Yoshitomo Murata, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2010　75-91　2010
Publisher: Springer Science and Business Media, LLC
DOI： 10.1007/978-3-642-11851-7-6 　
Automatic tuning of CUDA execution parameters for stencil processing Peer-reviewed

Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Software Automatic Tuning: From Concepts to State-of-the-Art Results　209-228　2010
Publisher: Springer New York
DOI： 10.1007/978-1-4419-6935-4_13 　
Lessons Learned from 1-Year Experience with SX-9 and Toward the Next Generation Vector Computing Peer-reviewed

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoko Isobe

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009　3-+　2010

DOI： 10.1007/978-3-642-03913-3_1 　
Large-Scale Flow Computation of Complex Geometries by Building-Cube Method Peer-reviewed

Daisuke Sasaki, Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009　167-+　2010

DOI： 10.1007/978-3-642-03913-3_13 　
Cache partitioning strategies for 3-D stacked vector processors Peer-reviewed

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2010, 3DIC 2010　1-6　2010

DOI： 10.1109/3DIC.2010.5751453 　
Efficient data management for the building cube method using cartesian meshes on the GPU platform Peer-reviewed

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

International Supercomputing Conference (ISC10)　2010
A Majority-Based Control Scheme for Way-Adaptable Caches Peer-reviewed

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

FACING THE MULTICORE-CHALLENGE: ASPECTS OF NEW PARADIGMS AND TECHNOLOGIES IN PARALLEL COMPUTING　6310　16-+　2010

DOI： 10.1007/978-3-642-16233-6_5 　

ISSN： 0302-9743

eISSN： 1611-3349
Evaluating Performance and Portability of OpenCL Programs Peer-reviewed

Kazuhiko Komatsu, Katsuto Sato, Yusuke Arai, Kentaro Koyama, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 5th international Workshop on Automatic Performance Tuning　2010
Resisting sybil attack by social network and network clustering Peer-reviewed

Ling Xu, Satayapiwat Chainan, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings - 2010 10th Annual International Symposium on Applications and the Internet, SAINT 2010　15-21　2010

DOI： 10.1109/SAINT.2010.32 　
A history-based job scheduling mechanism for the vector computing cloud Peer-reviewed

Yoshitomo Murata, Ryusuke Egawa, Manabu Higashida, Hiroaki Kobayashi

Proceedings - 2010 10th Annual International Symposium on Applications and the Internet, SAINT 2010　125-128　2010

DOI： 10.1109/SAINT.2010.43 　
A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications Peer-reviewed

Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

13TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN: ARCHITECTURES, METHODS AND TOOLS　412-415　2010

DOI： 10.1109/DSD.2010.93 　
A Voting-Based Working Set Assessment Scheme for Dynamic Cache Resizing Mechanisms Peer-reviewed

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2010 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN　98-105　2010

DOI： 10.1109/ICCD.2010.5647599 　

ISSN： 1063-6404
Design and early evaluation of a 3-D die stacked chip multi-vector processor Peer-reviewed

Ryusuke Egawa, Yusuke Funaya, Ryu-Ichi Nagaoka, Akihiro Musa, Hiroyuki Takizawat, Hiroaki Kobayashi

IEEE 3D System Integration Conference 2010, 3DIC 2010　2010

DOI： 10.1109/3DIC.2010.5751448 　
Performance Optimization Techniques for Vector Processors with Cache Memory

佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会研究報告(CD-ROM)　2009　(3)　ROMBUNNO.ARC-184,6　2009/10/15

ISSN： 2186-2583
Working Sets based Thread Scheduling with Cache Partitioning Peer-reviewed

Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Poster Abstracts of The Eighteenth International Conference on Parallel Architecture and Compilation Techniques　12　2009/09
ワーキングセット評価に基づくスレッドスケジューリング

佐藤雅之, 小寺功, 江川隆輔, 滝沢寛之, 小林広明

並列/分散/協調処理に関する「仙台」サマー・ワークショップ (SWoPP仙台2009)　1-10　2009/08
Early evaluation of a memory-stacked vector processor Peer-reviewed

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips XII　165　2009/04
実アプリケーションによるSX‐9の性能評価

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会シンポジウム論文集　2009　(2)　57-64　2009/01/15

ISSN： 1344-0640
Evaluating Computational Performance of Backpropagation Learning on Graphics Hardware Peer-reviewed

Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

Electronic Notes in Theoretical Computer Science　225　(C)　379-389　2009/01/02

DOI： 10.1016/j.entcs.2008.12.087 　

ISSN： 1571-0661
Study of high resolution incompressible flow simulation based on Cartesian mesh

Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

47th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition　2009
3D On-Chip Memory for the Vector Architecture Peer-reviewed

Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION　352-357　2009

ISSN： 2164-0157
Characteristics of an On-Chip Cache on NEC SX Vector Architecture Peer-reviewed

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Interdisciplinary Information Sciences　15　(1)　51-66　2009
Publisher: Graduate School of Information Sciences, Tohoku University
DOI： 10.4036/iis.2009.51 　

ISSN： 1340-9050

More details Close

Thanks to the highly effective memory bandwidth of the vector systems, they can achieve the high computation efficiency for computation-intensive scientific applications. However, they have been encountering the memory wall problem and the effective memory bandwidth rate has decreased, resulting in the decrease in the bytes per flop rates of recent vector systems from 4 (SX-7 and SX-8) to 2 (SX-8R) and 2.5 (SX-9). The situation is getting worse as many functions units and/or cores will be brought into a single chip, because the pin bandwidth is limited and does not scale. To solve the problem, we propose an on-chip cache, called vector cache, to maintain the effective memory bandwidth rate of future vector supercomputers. The vector cache employs a bypass mechanism between the main memory and register files under software controls. We evaluate the performance of the vector cache on the NEC SX vector processor architecture with bytes per flop rates of 2 B/FLOP and 1 B/FLOP, to clarify the basic characteristics of the vector cache. For the evaluation, we use the NEC SX-7 simulator extended with the vector cache mechanism. Benchmark programs for performance evaluation are two DAXPY-like loops and five leading scientific applications. The results indicate that the vector cache boosts the computational efficiencies of the 2 B/FLOP and 1 B/FLOP systems up to the level of the 4 B/FLOP system. Especially, in the case where cache hit rates exceed 50%, the 2 B/FLOP system can achieve a performance comparable to the 4 B/FLOP system. The vector cache with the bypass mechanism can provide the data both from the main memory and the cache simultaneously. In addition, from the viewpoints of designing the cache, we investigate the impact of cache associativity on the cache hit rate, and the relationship between cache latency and the performance. The results also suggest that the associativity hardly affects the cache hit rate, and the effects of the cache latency depend on the vector loop length of applications. The cache shorter latency contributes to the performance improvement of the applications with shorter loop lengths, even in the case of the 4 B/FLOP system. In the case of longer loop lengths of 256 or more, the latency can effectively be hidden, and the performance is not sensitive to the cache latency. Finally, we discuss the effects of selective caching using the bypass mechanism and loop unrolling on the vector cache performance for the scientific applications. The selective caching is effective for efficient use of the limited cache capacity. The loop unrolling is also effective for the improvement of performance, resulting in a synergistic effect with caching. However, there are exceptional cases; the loop unrolling worsens the cache hit rate due to an increase in the working space to process the unrolled loops over the cache. In this case, an increase in the cache miss rate cancels the gain obtained by unrolling.
A Cache-Aware Thread Scheduling Policy for Multi-Core Processors Peer-reviewed

Masayuki Sato, Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks　109-114　2009
Evaluation of Fine Grain 3-D Integrated Arithmetic Units Peer-reviewed

Ryusuke Egawa, Jubee Tada, Hiroaki Kobayashi, Gensuke Goto

2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION　198-+　2009

ISSN： 2164-0157
Performance tuning and analysis of future vector processors based on the roofline model Peer-reviewed

Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM International Conference Proceeding Series　7-14　2009

DOI： 10.1145/1621960.1621962 　
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications Peer-reviewed

Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009)　408-+　2009

DOI： 10.1109/PDCAT.2009.78 　
Performance Evaluation of NEC SX-9 using Real Science and Engineering Applications Peer-reviewed

Takashi Soga, Akihiro Musa, Youichi Shimomura, Ken'ichi Itakura, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS　2009

DOI： 10.1145/1654059.1654088 　
Activities of Cyberscience Center and Performance Evaluation of the SX-9 Supercomputer Peer-reviewed

Hiroaki Kobayashi, Ryusuke Egawa, Kouki Okabe, Eiichi Ito, Kenji Oizumi

NEC TECHNICAL JOURNAL　3　(4)　64-72　2008/12

ISSN： 1880-5884
Caching on a chip multi vector processor Peer-reviewed

Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC08)　2008/11
A PARALLEL IMAGE GENERATION ALGORITHM BASED ON PHOTON MAPPING Peer-reviewed

Masahide Tamura, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference on Computer Graphics and Imaging (CGIM 2008)　145-151　2008/02
First Experiences with NEC SX-9.

Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoichi Shimomura

High Performance Computing on Vector Systems　3-11　2008
Publisher: Springer
DOI： 10.1007/978-3-540-85869-0_1 　
The potential of on-chip memory systems for future vector architectures Peer-reviewed

Hiroaki Kobayashi, Akihiko Musa, Yoshiei Sato, Hiroyuki Takizawa, Koki Okabe

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2007　247-+　2008
A Utility-based Double Auction Mechanism for Efficient Grid Resource Allocation Peer-reviewed

Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS　252-260　2008

DOI： 10.1109/ISPA.2008.103 　
A Distributed and Cooperative Load Balancing Method for Large-Scale Computing Environments Peer-reviewed

Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

IPSJ(Information Proessing Society Japan) Journal　49　(3)　1214-1228　2008
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination Peer-reviewed

Komatsu Kazuhiko, Kaeriyama Yoshiyuki, Suzuki Kenichi, Takizawa Hiroyuki, Kobayashi Hiroaki

IPSJ Online Transactions　1　(1)　1-11　2008
Publisher: Information Processing Society of Japan
DOI： 10.2197/ipsjtrans.1.1 　

ISSN： 1882-6660

More details Close

Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.
SPRAT:実行時自動チューニング機能を備えるストリーム処理記述用言語 Peer-reviewed

滝沢寛之, 白取寛貴, 佐藤功人, 小林広明

情報処理学会論文誌：コンピューティングシステム(ACS)　1　(2)　207-220　2008
Publisher:
ISSN： 1882-7829
A Performance Study of Secure Data Mining on the Cell Processor Peer-reviewed

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS　1　(2)　633-+　2008
An Efficient Intersection Algorithm Design of Ray Tracing for Many-Core Graphics Processors Peer-reviewed

Kazuhiro Komatasu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference on Computer Graphics and Imaging (CGIM 2008)　165-171　2008
A Performance Study of Secure Data Mining on the Cell Processor Peer-reviewed

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS　633-+　2008

DOI： 10.1109/CCGRID.2008.16 　
Implementation and Evaluation of a Distributed and Cooperative Load-Balancing Mechanism for Dependable Volunteer Computing Peer-reviewed

Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS & NETWORKS WITH FTCS & DCC　316-+　2008

DOI： 10.1109/DSN.2008.4630100 　

ISSN： 1530-0889
Hierarchical Parallel Processing of Ray Tracing on a Cell Cluster Invited Peer-reviewed

Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

Prceedings of 1st International Workshop on Super Visualization (IWSV08)　2008
Consideration of resource access history for optimizing overlay networks in P2P-based resource discovery Peer-reviewed

Tsutomu Inaba, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayash

Proceedings - 2008 International Symposium on Applications and the Internet, SAINT 2008　269-272　2008

DOI： 10.1109/SAINT.2008.104 　
A Reliability Model for Result Checking in Volunteer Computing Peer-reviewed

Ling Xu, Hirouyki Takizawa, Hiroaki Kobayashi

Proceedings of DAS-P2P 2008 Workshop　201-204　2008

DOI： 10.1109/SAINT.2008.25 　
Gain Based Delay Balancing in the Deep Submicron Era Peer-reviewed

Ryusuke EGAWA, Jubee TADA, Hiroaki Kobayashi, Gensuke GOTO

Proceedings of The 23nd International Technical Conference on Circuits/Systems (ITC-CSCC 2008)　577-580　2008
SPRAT: Runtime Processor Selection for Energy-aware Computing Peer-reviewed

Hiroyuki Takizawa, Katuto Sato, Hiroaki Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING　386-393　2008

DOI： 10.1109/CLUSTR.2008.4663799 　

ISSN： 1552-5244
Effects of MSHR and Prefetch Mechanisms on an On-Chip Cache of the Vector Architecture Peer-reviewed

Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS　335-+　2008

DOI： 10.1109/ISPA.2008.100 　
Auction-based Resource Allocation for Activating Incentives in Resource Trading in Grid Computing Peer-reviewed

Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of The 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications　252-260　2008
Modeling of cache access behavior based on Zipf's law Peer-reviewed

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　310　9-15　2008

DOI： 10.1145/1509084.1509086 　

ISSN： 1089-795X
A shared cache for a chip multi vector processor Peer-reviewed

Akihiro Musa, Yoshiei Sato, Takashi Soga, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　310　24-29　2008

DOI： 10.1145/1509084.1509088 　

ISSN： 1089-795X
A Power-Aware Shared Cache Mechanism Based on Locality Assessment of Memory Reference for CMPs Peer-reviewed

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Transactions on High-Performance Embedded Architectures and Compilers　3　(1)　149-167　2008
Early evaluation of on-chip vector caching for the NEC SX vector architecture Peer-reviewed

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

ACM/IEEE Supercomputing Conference (SC07)　2007/11
A progressive 3D-meshing algorithm for interactive simulation of soft bodies Peer-reviewed

Tomoyuk Saoi, Hiroyuki Takizawat, Hiroaki Kobayashi

INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL　10　(6)　761-776　2007/11

ISSN： 1343-4500
A dependable Peer-to-Peer computing platform Peer-reviewed

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE　23　(8)　939-955　2007/11

DOI： 10.1016/j.future.2007.03.004 　

ISSN： 0167-739X

eISSN： 1872-7115
Partial distortion entropy maximization for online data clustering Invited Peer-reviewed

Hiroyuki Takizawa, Hiroaki Kobayashi

NEURAL NETWORKS　20　(7)　819-831　2007/09

DOI： 10.1016/j.neunet.2007.04.029 　

ISSN： 0893-6080
消費電力を考慮したウェイアロケーション型共有キャッシュ機構 Peer-reviewed

小寺功, 滝沢寛之, 小林広明

情報科学技術レターズ　55-58　2007/09
Accelerating Möller Intersection Algorithm Using Ray Packets Peer-reviewed

Kazuhiro Komatsu, Yoshiyuki Kaeriyama, Ken-ichi Suzuki, Hiroaki Kobayashi, Tadao Nakamura

Information Technology Letters　265-268　2007/09
SMTプロセッサの実行時性能予測のためのハードウェアリソース競合解析 Invited Peer-reviewed

佐藤雅之, 船矢祐介, 小寺功, 滝沢寛之, 小林広明

情報科学技術レターズ　67-70　2007/09
An Estimation-Based Redundant Task Dispatch Policy for Volunteer Computing Platforms Peer-reviewed

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the International Conference on Dependable Systems and Networks　348-349　2007/06/25

More details Close

Fast Abstract (Supplemental Volume)
A fair-sharing and power-aware L2 cache system for chip multiprocessors Peer-reviewed

Isao Kotera, Hiroyuki Takizawa, Hiroaki Kobayashi

IEEE COOL Chips X　2007/04
Memory Efficient Scheme for Fast Spectral Photon Mapping Peer-reviewed

Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the Ninth IASTED International Conference on Computer Graphics and Imaging (CGIM 2007)　2007/02
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs Peer-reviewed

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　113-120　2007

DOI： 10.1145/1327171.1327185 　

ISSN： 1089-795X
Preliminary evaluation for runtime auto-tuning of GPGPU applications Peer-reviewed

Hiroyuki Takizawa, Hiroki Shiratori, Hiroaki Kobayashi

The 2nd International Workshop on Automatic Performance Tuning　37-37　2007
An Efficient Control Mechanism for Self-Organizing Overlay Networks of Large-Scale P2P Systems Peer-reviewed

Hiroaki Kobayashi, Hiroyuki Takizawa, Takuro Okawa, Tsutomu Inaba

Interdisciplinary Information Sciences　13　(2)　227-237　2007
Publisher: Tohoku University
DOI： 10.4036/iis.2007.227 　

ISSN： 1340-9050

More details Close

P2P (Peer to Peer) has a great potential to handle highly-distributed computing resources and is expected to be a key technology to realize ubiquitous computing environments over the Internet. However, P2P systems tend to waste the network bandwidth for resource acquisition because of their decentralized resource management. This paper presents an efficient control mechanism for self-organizing overlay networks of large-scale P2P systems, and evaluate its performance in detail. The overlay network is configured by making local clusters reflect current interests of individual peers and connecting them together based on their similarity. As a result, the overlay network provides the resource exploitation space for some specific interests. In addition, the overlay network can dynamically be reconfigured based on the change in the interests of individual peers across time so that more useful peers at that time can be reconnected closer to their client peers. Therefore, multicasting of resource requesting messages can be carried out only over peers with similar interests that are dynamically connected through the overlay network, resulting in a remarkable decrease in both messages for resource acquisition and hops a resource requesting query travels to reach the peer that satisfies the request. Experimental results indicate that the proposed mechanism can realize effective self-organization of the overlay network in which useful peers are dynamically relocated around client peers. In addition, the adaptive allocation of links to peers according to their capability works well to keep the higher performance and fault-tolerance of the self-organizing overlay network.
An on-chip cache design for vector processors Peer-reviewed

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　17-23　2007

DOI： 10.1145/1327171.1327173 　

ISSN： 1089-795X
A Power-Aware Shared Cache Mechanism Based on Locality Assessment of Memory Reference for CMPs Peer-reviewed

Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the MEDEA workshop (PACT 07)　121-128　2007
Performance Evaluation of K-Means Clustering on the Cell Processor Peer-reviewed

Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of High Performance Computing Symposium 2007　2007　(1)　161-168　2007/01
An on-chip cache design for vector processors Invited Peer-reviewed

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT　17-23　2007

DOI： 10.1145/1327171.1327173 　

ISSN： 1089-795X
Multi-Core Data Streaming Architecture for Ray Tracing Peer-reviewed

Yoshiyuki Kaeriyama, Daichi Zaitsu, Kenichi Suzuki, Hiroaki Kobayashi, Nobuyuki Ohba

2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2　171-+　2007

DOI： 10.1109/ICCD.2007.4601897 　

ISSN： 1063-6404
Thread Scheduling Based on the Thread Characteristics for Multi-Core Processors Invited Peer-reviewed

Yusuke Funaya, Isao Kotera, Hiroyuki Takizawa, Hiroaki Kobayashi

Information Technology Letters　5　(5)　37-40　2006/09
A Dynamic Logical Link Management Mechanism for P2P Resource Discovery Systems Peer-reviewed

Takuro Okawa, Hiroyuki Takizawa, Hiroaki Kobayashi

Information Technology Letters　5　(5)　363-366　2006/09
Publisher: Forum on Information Technology
Towards Effective GPU Implementation of Neural Networks Peer-reviewed

Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

Proceedings of the fourth Irish Conference on Mathematical Foundations of Computer Science and Information Technology (MFCSIT)　2006/07
Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing Peer-reviewed

H Takizawa, H Kobayashi

JOURNAL OF SUPERCOMPUTING　36　(3)　219-234　2006/06

DOI： 10.1007/s11227-006-8294-1 　

ISSN： 0920-8542
Radiative heat transfer simulation using programmable graphics hardware Peer-reviewed

Hiroyuki Takizawa, Noboru Yamada, Seigo Sakai, Hiroaki Kobayashi

Proceedings - 5th IEEE/ACIS Int. Conf. on Comput. and Info. Sci., ICIS 2006. In conjunction with 1st IEEE/ACIS, Int. Workshop Component-Based Software Eng., Softw. Archi. and Reuse, COMSAR 2006　2006　29-37　2006

DOI： 10.1109/ICIS-COMSAR.2006.70 　
Design and Implementation of an Efficient Search Mechanism based on the Hybrid P2P Model for Ubiquitous Computing Systems Peer-reviewed

T Inaba, T Okawa, Y Murata, H Takizawa, H Kobayashi

INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET , PROCEEDINGS　45-+　2006

DOI： 10.1109/SAINT.2006.23 　
A distributed and cooperative load balancing mechanism for large-scale P2P systems Peer-reviewed

Y Murata, T Inaba, H Takizawa, H Kobayashi

INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS　126-129　2006

DOI： 10.1109/SAINT-W.2006.2 　
An efficient text capture method for moving robots using DCT feature and text tracking Peer-reviewed

Hiroki Shiratori, Hideaki Goto, Hiroaki Kobayashi

18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS　1050-+　2006

DOI： 10.1109/ICPR.2006.243 　

ISSN： 1051-4651
Implications of memory performance for highly efficient supercomputing of scientific applications Peer-reviewed

Akihiro Musa, Hiroyuki Takizawa, Koki Okabe, Takashi Soga, Hiroaki Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS　4330　845-+　2006

ISSN： 0302-9743
An Efficient Method for Finding Texts in Living Environments Using an Active Camera Peer-reviewed

齋藤精二, 後藤英昭, 小林広明

電子情報通信学会論文誌　J88-D-II　(9)　2003-2006　2005/09
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1923
大規模P2Pシステムにおける計算資源探索のモデル化と性能評価 Peer-reviewed

大川拓郎, 滝沢寛之, 小林広明

情報科学技術レターズ　46　(4)　21-24　2005/09
Publisher: Forum on Information Technology
An Incremental Photon-Mapping Algorithm for Fast Walk-Through Animations Peer-reviewed

Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Conference on Computer Graphics and Imaging　1-7　2005/08
HPC Challenge ベンチマークを用いたSX-7 システムの性能評価 Peer-reviewed

滝沢寛之, 小久保達信, 片海健亮, 小林広明

先進的計算基盤システムシンポジウム(SACSIS2005)　2005　(5)　25-33　2005/05
A New Dynamic Decomposition Method for Parallel Molecular Dynamics Simulation Peer-reviewed

V.Zhakhovskii, K.Nishihara, Y.Fukuda, S.Shimojo, T.Akiyama, S.Miyanaga, H.Sone, H.Kobayashi, E.Ito, Y.Seo, M.Tamura, Y.Ueshima

Proceedings of Cluster Computing and Grid 2005　9-12　2005/05
A distributed cooperative scheduling mechanism for P2P computing

Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

Advanced Network & Computing Technology Workshop　(33)　23-30　2005/01/24
A P2P Semantic Information Search Mechanism for Ubiquitous Grid Computing Systems

Tsutomu Inaba, Takuro Okawa, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

Advanced Network & Computing Technology Workshop　(33)　45-52　2005/01
Evaluation of Large-Scale Remote Interactive Visialization via Super SINET Peer-reviewed

Hiroyuki Takizawa, Hiroaki Kobayashi

Information　8　(3)　383-389　2005
Performance Evaluation of the SX-7 System using the HPC Challenge Benchmark Peer-reviewed

滝沢寛之, 小久保達信, 片海健亮, 小林広明

情報処理学会論文誌　46　(SIG12)　37-45　2005
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 1882-7829

More details Close

The HPC challenge benchmark (HPCC) is a benchmark suite developed for comprehensive performance evaluation of high-performance computing (HPC) systems. HPCC is promising to appropriately evaluate the effective performance of HPC systems for practical scientific computing, due to its multilateral evaluation from several viewpoints, such as memory access and networking performances, along with the floating-point operation rate widely used until now. In this paper, we report the performance evaluation results of an NEC SX-7 system of Information Synergy Center, Tohoku University, using the HPCC benchmark. Based on the results that the system can get excellent scores in 16 of 28 tests in the benchmark, we discuss the superiority of its vector architecture in the field of HPC.
Text detection in color scene images based on unsupervised clustering of multi-channel wavelet features Peer-reviewed

T Saoi, H Goto, H Kobayashi

Eighth International Conference on Document Analysis and Recognition, Vols 1 and 2, Proceedings　690-694　2005

DOI： 10.1109/ICDAR.2005.227 　
A self-organizing overlay network to exploit the locality of interests for effective resource discovery in P2P systems Peer-reviewed

H Kobayashi, H Takizawa, T Inaba, Y Takizawa

2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS　246-255　2005
A workflow management mechanism for peer-to-peer computing platforms Peer-reviewed

H Wang, H Takizawa, H Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS　3758　827-832　2005

ISSN： 0302-9743
Efficient parallel processing of competitive learning algorithms Peer-reviewed

K Sano, S Momose, H Takizawa, H Kobayashi, T Nakamura

PARALLEL COMPUTING　30　(12)　1361-1383　2004/12

DOI： 10.1016/j.parco.2004.10.001 　

ISSN： 0167-8191

eISSN： 1872-7336
スーパーSINETを介した大規模遠隔対話的可視化の評価実験

滝沢寛之, 小林広明

全国共同利用情報基盤センター研究開発論文集　26　24-29　2004/11
Evaluation of Large-Scale Remote Interactive Visialization via Super SINET Peer-reviewed

Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of the 3rd International Conference on Information (INFO2004)　456-459　2004/11
スーパーSINETを利用した大規模遠隔可視化処理の評価

滝沢寛之, 小林広明

東北大学情報シナジーセンター年報　3　90-96　2004/06
Publisher:
グリッドミドルウェアGlobusの資源探索と通信に関するオーバヘッドの定量的評価

村田善智, 稲葉勉, 滝沢寛之, 小林広明

東北大学情報シナジーセンター年報　3　115-123　2004/06
Publisher:
An Effective Implementation of Vector Quantization Encoder on Commodity Graphics Hardware Peer-reviewed

Hiroyuki Takizawa, Hiroaki Kobayashi

Proceedings of International Conference on IT and Applications (ICITA)　2004
A fast computation scheme of partial distortion entropy updating Peer-reviewed

H Takizawa, H Kobayashi

ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS　736-741　2004

DOI： 10.1109/ITCC.2004.1286555 　
Locality analysis to control dynamically way-adaptable caches Peer-reviewed

Hiroaki Kobayashi, Isao Kotera, Hiroyuki Takizawa

Proceedings of the 2004 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '04　25-32　2004

DOI： 10.1145/1152922.1101874 　
Multi-grain parallel processing of data-clustering on programmable graphics hardware Peer-reviewed

H Takizawa, H Kobayashi

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS　3358　16-27　2004

ISSN： 0302-9743
Locality analysis to control dynamically way-adaptable caches Peer-reviewed

Hiroaki Kobayashi, Isao Kotera, Hiroyuki Takizawa

Proceedings of the 2004 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '04　33　(3)　25-32　2004

DOI： 10.1145/1152922.1101874 　
グリッド用動的資源管理のための自己組織化P2Pネットワークに関する一検討

瀧澤泰明, 滝沢寛之, 佐野健太郎, 小林広明, 中村維男

情報処理学会東北支部研究会　2003/11
画像のエッジ劣化を抑制するベクトル量子化符号帳設計 Peer-reviewed

滝沢寛之, 三浦健, 小林広明, 中村維男

Information Technology Letters　2　243-244　2003/09
Vector quantization codebook design using the law-of-the-jungle algorithm Peer-reviewed

H Takizawa, T Nakajima, K Sano, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E86D　(6)　1068-1077　2003/06

ISSN： 0916-8532
A Comparison Study Of Vector Quantization Codebook Design Algorithms Based On The Equidistortion Principle Peer-reviewed

Hiroyuki Takizawa, Taira Nakajima, Kentaro Sano, Hiroaki Kobayashi

Proceedings of the 21st IASTED International Conference on Applied Informatics　255-261　2003
An Instruction Cache Mechanism for Simultaneous Multithreaded VLIW Processors Peer-reviewed

Jubei Tada, Hugo, Kenji, Pereira Harada, Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

The Journal of Asian Information-Science-Life　2　(1)　2003
Parallel processing for vector quantization codebook design

S. Momose, K. Sano, H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

並列/協調/分散処理に関する「湯布院」サマーワークショップ資料　2002/08
Design and Evaluation of the Mulhi Cache Peer-reviewed

Jubei Tada, Takuya Nakaike, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

電子情報通信学会論文誌　J85-D-I　(3)　274-285　2002
Real-Time Ray-Tracing with the 3DCGiRAM Architecture Peer-reviewed

Ken-ichi Suzuki, Yasumasa Saida, Kentaro Sano, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

IEICE Transactions　J85-D-II　(8)　1365-1367　2002
An Interleaved Multiple-Hit Cache for Simultaneous Multithreaded VLIW Processors Peer-reviewed

Jubei Tada, Hugo Kenji, Pereira Harada, Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the Third International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'02)　25-32　2002
Practical Volume Compression based on Vector Quantization using the Low-of-the-Jungle Algorithm Peer-reviewed

Kentaro Sano, Hiroyuki Takizawa, Taira Nakajima, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the 2nd International Conference on Visualization, Imaging, and Image Processing　519-526　2002
Interactive Ray-Tracing on the 3DCGiRAM Architecture Peer-reviewed

Hiroaki Kobayashi, Ken-ichi Suzuki, Kentaro Sano, Nobuyuki Oba

Proceedings of ACM/IEEE MICRO-35 4th Workshop on Media and Streaming Processors　53-59　2002
High-Performance Photo-Realistic Graphics on the 3DCGiRAM Architecture Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of International Conference on Optical Communication and Multimedia (ICOCM2002)　114-117　2002
PARALLEL ALGORITHM FOR THE LAW-OF-THE-JUNGLE LEARNING TO THE FAST DESIGN OF OPTIMAL CODEBOOKS Peer-reviewed

Kentaro Sano, Shintaro. Momose, Hiroyuki Takizawa, Clecio.Donizete. Lima, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2002)　582-587　2002
視覚的画質劣化を抑制するベクトル量子化手法 Peer-reviewed

三浦健, 滝沢寛之, 佐野健太郎, 中島平, 小林広明, 中村維男

Information Technology Letters　1　185-186　2002
Object-Space Parallel Processing of the Multi-Pass Rendering Method for Message-Passing Parallel Processing Systems Peer-reviewed

Hiroaki Kobayashi, Hitoshi Yamauchi, Takayuki Maeda, Mayumi Tokunaga, Tadao Nakamura

The International Journal of High Performance Computer Graphics, Multimedia and Visualisation　1　(3)　1-14　2001
A Design of Caluculation Units for the Images Synthesis Intelligent Memory 3DCGiRAM Peer-reviewed

Ken-ichi Suzuki, Yoshiyuki Kaeriyama, Jun Sugiyama, Yasumasa Saida, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of JSPP2001　2001　(6)　295-302　2001
A Technology-Scalable Multithreaded Architecture Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the 13-th Symposium on Computer Architecture and High-Performance Computing　82-89　2001
3DCGiRAM: An intelligent memory architecture for photo-realistic image synthesis Peer-reviewed

H Kobayashi, K Suzuki, K Sano, Y Kaeriyama, Y Saida, N Oba, T Nakamura

2001 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD 2001, PROCEEDINGS　462-467　2001

ISSN： 1063-6404
Dynamic Boosting for VLIW Architectures Peer-reviewed

KOBAYASHI Hiroaki

IEICE Transactions on Information and Systems　J80-D-I　(1)　171-183　2000
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1915
Data-parallel volume rendering with adaptive volume subdivision Peer-reviewed

K Sano, H Kitajima, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E83D　(1)　80-89　2000/01

ISSN： 1745-1361
An active learning algorithm based on existing training data Peer-reviewed

H Takizawa, T Nakajima, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E83D　(1)　90-99　2000/01

ISSN： 0916-8532
Reconfigurable synchronized dataflow processor Peer-reviewed

Hiroshi Sasaki, Hitoshi Maruyama, Hideaki Tsukioka, Nobuyoshi Shoji, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC　27-28　2000

DOI： 10.1145/368434.368490 　
A Pre-attributed Resampling Algorithm for Controlled-Precision Volume Ray-Casting Peer-reviewed

Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

IPSJ Journal　41　(SIG 5)　113-124　2000
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0387-5806

More details Close

Accurate volume rendering is essential for some visualization applications, e.g., medical imaging. However, the computationally expensive feature of conventional volume rendering algorithms for high-quality image generation has restricted their practical use. In this paper, we propose a pre-attributed resampling algorithm that accomplishes controlled-precision volume ray-casting at low computational coste. This algorithm changes resampling intervals based on numerical errors of the volume rendering integral so that the number of resampling points becomes minimum for a given error bound. Besides, to reduce computational costs for resampling, a simple interpolation method is applied to resampling points in regions where intensities and opacities are constant. To suppress the overhead of precision control, information on the numerical errors and the constant regions is obtained for each voxel in pre-processing, and then related to volume data as voxel attributes. The experimental results demonstrate that the proposed algorithm outperforms conventional ray-casting algorithms without precision control for accurate visualization in termes of accuracy/processing-time performance.
Developing a Practical Parallel Multi-pass Render in Java and C --- Toward a Grande Application in Java Peer-reviewed

Hitoshi Yamauchi, Atsusi Maeda, Hiroaki Kobayashi

Proceedings of the ACM 2000 Java Grande Conference　126-133　2000
A Scheduling Method for Instruction-Level Parallel Processing of Vector and Scalar Instructions Peer-reviewed

Takuya Nakaike, Takehito Sasaki, Masayuki Katahira, Hiroaki Kobayashi, Tadao Nakamura

Systems and Computers in Japan　30　(13)　23-33　1999/11/30
Publisher: John Wiley and Sons Inc.
DOI： 10.1002/(SICI)1520-684X(19991130)30:13<23::AID-SCJ3>3.0.CO;2-3 　

ISSN： 0882-1666
A topology preserving neural network for nonstationary distributions Peer-reviewed

T Nakajima, H Takizawa, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E82D　(7)　1131-1135　1999/07

ISSN： 0916-8532
Acceleration techniques for the network inversion algorithm Peer-reviewed

H Takizawa, T Nakajima, M Nishi, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E82D　(2)　508-511　1999/02

ISSN： 0916-8532
Time stamp invalidation of TLB-unified cache and its performance evaluation Peer-reviewed

Ken-Ichi Suzuki, Nobuyuki Oba, Shigenori Shimizu, Hiroaki Kobayashi, Tadao Nakamura

Systems and Computers in Japan　30　(11)　94-106　1999
Publisher: John Wiley and Sons Inc.
DOI： 10.1002/(SICI)1520-684X(199910)30:11<94::AID-SCJ11>3.0.CO;2-S 　

ISSN： 0882-1666
MULHI Cache : An Instruction Cache Mechanism for VLIW Processors Peer-reviewed

KOBAYASHI Hiroaki

Transactions of Information Processing Society of Japan　40　(5)　1996-2007　1999
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 1882-7764

More details Close

VLIW (Very Long Instruction Word) processors, which are expected to be a next generation high performance microprocessor architecture, need a high-bandwidth, high-hit-rate instruction cache to fetch VLIWs and issue operations of each VLIW to function units quickly. However, when VLIWs including many nops (no operations) are stored in a conventional instruction cache, the cache utilization is not high, resulting in the performance degradation of VLIW processors. In this paper, a new instruction cache mechanism for VLIW processors, named MULHI (MULtiple HIt) cache, is proposed and evaluated using several programs in the SPEC95 benchmark suite. The experimental results indicate that the MULHI cache achieves 1.68 times higher performance than a conventional instruction cache that stores VLIWs with nops.
A Self-organizing network system forming memory from nonstationary probability distributions Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the International Joint Conference on Neural Networks 99　1999
An Architecture of the Reconfigurable Synchronous Dataflow Computer and its Software Development Environment Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of The Seventh Japanese FPGA/PLD Design Conference and Exhibit　9-14　1999
Kohonen learning with a mechanism, the law of the jungle, capable of dealing with nonstationary probability distribution functions Peer-reviewed

T Nakajima, H Takizawa, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E81D　(6)　584-591　1998/06

ISSN： 0916-8532
Facial image processing using wavelet transform

K. Iimura, H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers　1998
A Scheduling method for instruction level parallel processing of vector and scalar instructions Peer-reviewed

KOBAYASHI Hiroaki

The Transactions of the Institute of Electronics, Information and Communication Engineers D-I　J81-D-I　(7)　910-920　1998
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1915
Automated design of wave pipelined multiport register files Peer-reviewed

K Takano, T Sasaki, N Oba, H Kobayashi, T Nakamura

PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98　197-202　1998
Performance Evaluation of a Parallel Multi-Pass Rendering Algorithm Based on the Object-Space Parallel Processing Model Peer-reviewed

Hitoshi Yamauchi, Takayuki Maeda, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of JSPP 98　98　(7)　175-182　1998
Static Load Balncing Schemes for the Object-Space Parallel Multi-Pass Rendering Method on a Distributed-Memory Multiprocessor System Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the 2nd Eurographics Workshop on Parallel Rendering　133-144　1998
オブジェクト空間分割型並列レイトレーシング法の汎用計算機上への実装と評価 Peer-reviewed

前田隆之, 徳永麻由美, 山内斉, 小林広明, 中村維男

Visual Computing/グラフィックスとCAD合同シンポジウム98論文集　55-60　1998
Static Load Balancing Schemes for the Object-Space Parallel Multi-Pass Rendering Method on a Distributed-Memory Multiprocessor System Peer-reviewed

Hiroaki Kobayashi, Hitoshi Yamauchi, Takayuki Maeda, Mayumi Tokunaga, Tadao Nakamura

Proceedings of the 2nd Eurographics Workshop on Parallel Rendering　133-144　1998
The object-space parallel processing of the multipass rendering method on the (M pi)(2) with a distributed-frame buffer system Peer-reviewed

H Yamauchi, T Maeda, H Kobayashi, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E80D　(9)　909-918　1997/09

ISSN： 0916-8532
Decoupled modified-bit cache Peer-reviewed

Masafumi Takahashi, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

Systems and Computers in Japan　28　(6)　49-59　1997/06/15
Publisher: John Wiley and Sons Inc.
DOI： 10.1002/(SICI)1520-684X(19970615)28:6<49::AID-SCJ6>3.0.CO;2-M 　

ISSN： 0882-1666
The Object-Space Parallel Processing of the Multipass Rendering Method on the (M?r)2 with a Distributed-Frame Buffer System

Hitoshi Yamauchi, Takayuki Maeda, Hiroaki Kobayashi, Tadao Nakamura

IEICE Transactions on Information and Systems　E80-D　(9)　899-908　1997
Publisher: Institute of Electronics, Information and Communication, Engineers, IEICE
ISSN： 0916-8532
A Hardware Cache Evaluation System : RICE Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Institute of Electronics, Information and Communication Engineers　J80-D-I　(1)　121-123　1997
A method for improving classification capability of mutilayer perceptrons Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Institute of Electronics, Information and Communication Engineers　J80-D-II　(1)　390-393　1997
Time-Division Pseudo Multi-Port Register File with Wave Pipelining Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Institute of Electronics, Information and Communication Engineers　J80-D-I　(3)　223-226　1997
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1915
Performance Evaluation of Level-2 Cacthe by Using RICE Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Institute of Electronics, Information and communication Engineers,　J80-D-1　(10)　793-802　1997
Memory hierarchy design for jetpipeline: To execute scalar and vector instructions in parallel Peer-reviewed

T Sasaki, T Nakaike, K Takano, M Katahira, H Kobayashi, T Nakamura

SECOND AIZU INTERNATIONAL SYMPOSIUM ON PARALLEL ALGORITHMS/ARCHITECTURE SYNTHESIS, PROCEEDINGS　66-73　66-73　1997
A cached frame buffer system for object-space parallel processing systems Peer-reviewed

H Kobayashi, T Maeda, H Yamauchi, T Nakamura

COMPUTER GRAPHICS INTERNATIONAL, PROCEEDINGS　146-+　1997
Multiport Register File Using Wave Pipelining Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of ACM/IEEE International Workshop on Logic Synthesis'97　1997
Parallel processing of the shear-warp factorization with the binary-swap method on a distributed-memory multiprocessor system Peer-reviewed

K Sano, HH Kitajima, H Kobayashi, T Nakamura

1997 IEEE SYMPOSIUM ON PARALLEL RENDERING (PRS '97), PROCEEDINGS　87-+　1997
分散フレームバッファシステムを持つ画像生成用超並列処理システム(Mp)2の性能評価 Peer-reviewed

KOBAYASHI Hiroaki

電子情報通信学会コンピュータシステム研究会資料　96　(503)　25-32　1997
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

The object-space parallel processing for global illumination models is one of the most promising approaches to fast photo-realistic image synthesis. However, there is a potential bottleneck between processing elements and a frame buffer in massively parallel processing systems based on the object-space parallel processing, and this factor may restrict their scalable performance. To solve this problem, this paper presents a novel frame buffer system, named a distributed frame buffer system. By adopting the distributed frame buffer system into the object-space parallel processing systems. the overhead of the frame buffer access due to conflicts and long latency can be reduced, and the potential of the object-space parallel processing system with a large number of processing elements will be fully exploited.
Time Stamp Invalidation of TLB-Unified Cache and Its Performance Evaluation Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Institute of Electronics, Information and Communication Engineers　J80-D-I　(12)　941-953　1997
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1915
(M pi)(2): A hierarchical parallel processing system for the multipass rendering method Peer-reviewed

H Kobayashi, H Yamauchi, Y Toh, T Nakamura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E79D　(8)　1055-1064　1996/08

ISSN： 0916-8532
A study of optimal learning methods in neural networks

H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

IPSJ Regional Symposium in Tohoku　1996
Decoupled Moodified-bit Cache Peer-reviewed

KOBAYASHI Hiroaki

The Transactions of the Institute of Electronics, Information and Communication Engineers　1996
A Memory Access Protocol for Interconnection Networks with Message Losses Peer-reviewed

KOBAYASHI Hiroaki

The Transactions of the Institute of Electronics, Information and Communication Engineers　79　(9)　567-571　1996
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1915
Decoupled modified-bit cache Peer-reviewed

M Takahashi, N Oba, H Kobayashi, T Nakamura

CONFERENCE PROCEEDINGS OF THE 1996 IEEE FIFTEENTH ANNUAL INTERNATIONAL PHOENIX CONFERENCE ON COMPUTERS AND COMMUNICATIONS　136-143　1996
A hierarchical parallel processing system for the multipass-rendering method Peer-reviewed

H Kobayashi, H Yamauchi, Y Toh, T Nakamura

10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96　62-67　1996
Facial Expression Recognition Using Neural Networks Capable of Recoghizing at an Infant Level Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the Sixth World Congress of World Association for Infant Meutal Health　1996
Task Scheduling Strategies and Their Locality Evaluation of Memory References on a Parallel Graph Reduction System Peer-reviewed

KOBAYASHI Hiroaki

Transactions of Information Processing Society of Japan　37　(11)　2020-2029　1996
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 1882-7764

More details Close

Functional programming languages have many appealing properties such as referential transparency and high programming productivity. On the other hand, the inefficiency of their implementation on conventional computers has prevent them from wide acceptance. In this paper, we propose a task scheduling strategy for high-speed processing of functional programs on a shared-memory multiprocessor system. To reduce shared-memory accesses in parallel graph reduction, the proposed task scheduling strategy allocates tasks to processors by taking the locality of data references among the tasks into account dynamically. Software simulation experiments on a multiprocessor system with the proposed strategy show that speedups of program processing in proportion to the number of processors can be achieved by making good use of local and cluster cache memories. As a result, the effectiveness of the proposed scheduling strategy with locality consideration is revealed.
A Memory Access Buffering Mechanism for a Processor Cluster Peer-reviewed

高橋雅史, 大庭信之, 小林広明, 中村維男

The Transactions of the Institute of Electronics, Information and Communication Engineers　J78-D-I　(10)　861-864　1995/10
Facial image recognition using neural networks

H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers　1995
Task Scheduling with Locality Consideration for a Clustered Parallel FL Reduction System Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the Aizu International Symposium on Parallel Algorithm/Architecture Synthesis　234-240　1995
Design and performance measurements of an execution model for the parallel processing of Prolog programs Peer-reviewed

D Wang, H Kobayashi, T Nakamura

IEEE FIRST ICA3PP - IEEE FIRST INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, VOLS 1 AND 2　650-658　1995
Mechanical-Design-Oriented Description Language : MODEL Peer-reviewed

KOBAYASHI Hiroaki

Japanese Journal of Advanced Automation Technology　7　(1)　29-34　1995
Adaptive Subdivision for the Point-Matching Method Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Japan Society of Mechanical Engineers　60　(570)　543-548　1994
Publisher: The Japan Society of Mechanical Engineers
DOI： 10.1299/kikaia.60.543 　

ISSN： 0387-5008

More details Close

The contact stress analysis of elastic bodies is important for mechanical engineering in areas such as friction, wear and fatigue. The point-matching method is the well-known analytical model that satisfies Hertzian-contact theory. However, the point-matching method has critical problems, i.e., large amounts of computation time and memory are required as the number of cells increases. Although there have been many studies on its accuracy to date, there are a few studies on efficient processing of the point-matching method. This paper proposes an efficient discretization method for the contact region to accelerate processing time and save memory space in the point-matching method
Mechanical-Design-Oriented Description Language : MODEL Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Japan Society of Mechanical Engineers　60　(570)　715-720　1994
Publisher: The Japan Society of Mechanical Engineers
DOI： 10.1299/kikaic.60.715 　

ISSN： 0387-5024

More details Close

Designing mechanical systems by means of special-purpose languages is very effective because they can define objects preciesly. However, this causes serious problems. First, the amount of description is very large in the case of designing complex systems. Second, those languages are not suited for modeling objects at higher abstraction levels. To solve these problems, this paper presents a novel description language for mechanical design called MODEL (Mechanical-design-Oriented DE scription Language). MODEL is designed in order that the designer's intentions can be efficiently reflected in the specifications of mechanical systems. We introduce a new concept, design granularity, so that designers can model objects of a mechanical system at different abstraction levels. Moreover, to reduce the amount of description, we use knowledge bases for mechanical design as a library for MODEL. The design process with MODEL is discussed in detail to clarify the capabilities of the language.
A TLB-Unified Cache Management Scheme Peer-reviewed

KOBAYASHI Hiroaki

Transactions of Information Processing Society of Japan　35　(6)　1149-1152　1994
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 1882-7764

More details Close

This paper proposes the TLB-Unified Cache (TUC), which integrates the management of a cache and a translation-lookaside buffer (TLB). In the TUC, a pointer to an entry of the TLB is stored as a cache tag instead of an address. Therefore, cached data and its address are indirectly related, and the space for cache tags is drastically reduced. This paper also proposes Black and White Invalidation for the fast invalidation of the cache entries pointing a missed TLB entry. Simulation results show that, in spite of the space saving, the TUC has the same performance in terms of cache miss ratio as conventional caches.
STARCORE - A HIGH-SPEED ATM SWITCHING SYSTEM Peer-reviewed

N OBA, K SUZUKI, H KOBAYASHI, T NAKAMURA

1994 IEEE GLOBECOM - CONFERENCE RECORD, VOLS 1-3, AND COMMUNICATIONS THEORY MINI-CONFERENCE RECORD　139-143　1994
Breadth-first Parallel Processing of Sequential Prolog Programs Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the Sixth IASTED-ISMM International Conference on Parallel and Distributed Computing and Systems　86-89　1994
A Hierarchical System for Parallel Processing of Prolog Programs Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the Sixth IASTED-ISMM International Conference on Parallel and Distributed Computing and Systems.　90-93　1994
Jetpipeline : A Hybrid Pipeline Architecture for Instruction-Level Parallelism Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of High Performance Computing Conference'94　317-323　1994
Publisher:
A Hierarchical Parallel Reduction System for the Functional Language FL Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of High Performance Computing Conference'94　270-278　1994
Software Pipelining for JetPipeline Architecture Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks　127-134　1994
(Mp)^2 : A Hierarchical Parallel Processing System for a Global Illumination Model Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks　157-164　1994
LOAD BALANCING BASED ON LOAD COHERENCE BETWEEN CONTINUOUS IMAGES FOR AN OBJECT-SPACE PARALLEL RAY-TRACING SYSTEM Peer-reviewed

H KOBAYASHI, H KUBOTA, S HORIGUCHI, T NAKAMURA

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E76D　(12)　1490-1499　1993/12

ISSN： 0916-8532
Ants routing: An adaptive packet flow control scheme in multimedia communication

Emad Rashid, Hiroaki Kobayashi, Tadao Nakamura

Proceedings of 2nd IEEE International Conference on Universal Personal Communications: Gateway to the 21st Century, ICUPC 1993　1　228-234　1993
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/ICUPC.1993.528382 　
Integrated Computer-Aided Mechanical Design System using MODEL Peer-reviewed

KOBAYASHI Hiroaki

Transactions of the Japan Society of Mechanical Engineers(section C)　59　(567)　3597-3602　1993
Publisher: The Japan Society of Mechanical Engineers
DOI： 10.1299/kikaic.59.3597 　

ISSN： 0387-5024

More details Close

Recently, for the purpose of production rationalization, demand for CAD(computer-aided design) systems has been rapidly increasing. However, most of the CAD systems in mechanical design have mainly performed graphical processing, such as drawing. In this paper, we proposed an integrated computer-aided mechanical design system to support the design process as well as the drawing process. The system employs a mechanical-design-oriented description language called MODEL to design mechanical systems. To reduce the amount of descriptions in MODEL, we introduce knowledge bases for mechanical design. With these knowledge bases, the system can infer final designs from insufficient descriptions of objects at higher abstraction levels and complete them. Inference and knowledge representation schemes are discussed in detail. We also construct a prototype system and examine the effectiveness of our system.
AN ADAPTIVE NETWORK ROUTING METHOD BY ELECTRICAL-CIRCUIT MODELING Peer-reviewed

N OBA, H KOBAYASHI, T NAKAMURA

IEEE INFOCOM 93 : THE CONFERENCE ON COMPUTER COMMUNICATIONS, PROCEEDINGS, VOLS 1-3　586-592　586-592　1993
INCORPORATING THE PARALLEL-PROCESSING TECHNIQUES WITH THE DEMAND-DRIVEN MODEL OF FUNCTIONAL PROGRAMMING-LANGUAGES Peer-reviewed

H SHEN, H KOBAYASHI, T NAKAMURA

TENCON '93: 1993 IEEE REGION 10 CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND POWER ENGINEERING, VOL 1　146-149　146-149　1993
Developing the Lambda Calculus for FL-oriented Parallel Reductions Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of 3RD INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS　6.49-6.50　1993
Expression Recognition Using the Reformed Back-propagation Network Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of 3RD INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS　3.27-3.30　1993
A Massively Parallel Processing Approach to Fast Photo-Realistic Image Synthesis Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of Computer Graphics International'93　497-507　497-507　1993
Publisher:
EXPRESSION RECOGNITION USING NEURAL NETWORKS Peer-reviewed

J DING, M SHIMAMURA, H KOBAYASHI, T NAKAMURA

WCNN'93 - PORTLAND, WORLD CONGRESS ON NEURAL NETWORKS, VOL IV　IV-231-IV-234　231-234　1993
Ants Routing : An Adaptive Packets Flow Control Scheme in Multimedia Networks Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of IEEE 2nd International Conference on Universal Personal Communications　228-234　1993
KNOWLEDGE REPRESENTATION FOR ADAPTIVE OVERLOAD PACKETS CONTROL IN MULTIMEDIA NETWORKS Peer-reviewed

E RASHID, H KOBAYASHI, T NAKAMURA

GLOBECOM '93 COMMUNICATIONS FOR A CHANGING WORLD, CONFERENCE RECORD　1516-1520　1993
NEURAL-NETWORK STRUCTURES FOR EXPRESSION RECOGNITION Peer-reviewed

J DING, M SHIMAMURA, H KOBAYASHI, T NAKAMURA

IJCNN '93-NAGOYA : PROCEEDINGS OF 1993 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3　1430-1433　1430-1433　1993
An Architecture of a Knowledge-base System to Support Mechanical Design Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of IPSJ Graphics and CAD Symposium　1991
A Study on a Mechanical-Design-oriented Description Language Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of IPSJ Graphics and CAD Symposium　1991
An Proposal on Integrated Computer-Aided Mechanical Design Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of IPSJ Graphics and CAD Symposium　1990
Effective Parallel Processing for synthesizing Continuous Images Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of Computer Graphics International 89　343-352　1989
Load balancing strategies for a parallel ray-tracing system based on constant subdivision Peer-reviewed

Hiroaki Kobayashi, Satoshi Nishimura, Hideyuki Kubota, Tadao Nakamura, Yoshiharu Shigei

The Visual Computer　4　(4)　197-209　1988/07
Publisher: Springer-Verlag
DOI： 10.1007/BF01887592 　

ISSN： 0178-2789
A Strategy for Mapping Parallel Ray-Tracing into a Hypercube Multiprocessor System Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of Computer Graphics International 88　1988
Parallel processing of an object space for image synthesis using ray tracing Peer-reviewed

Hiroaki Kobayashi, Tadao Nakamura, Yoshiharu Shigei

The Visual Computer　3　(1)　13-22　1987/02
Publisher: Springer-Verlag
DOI： 10.1007/BF02153647 　

ISSN： 0178-2789
Performance Evaluation of a General Purpose Pipeline System Peer-reviewed

KOBAYASHI Hiroaki

The Transactions of the Institute of Electronics and Communication Eng　J68-D　(10)　1985
A Language Processor of an Intelligent Link System Peer-reviewed

KOBAYASHI Hiroaki

Proceedings of the IEEE International Conference on Communications　1984
Organization and Evaluation of a General Purpose Pipeline System Peer-reviewed

KOBAYASHI Hiroaki

The Transactions of the Institute of Electronics and Communication Eng　J67-D　(12)　1984

Show all ︎Show first 5

Misc. 118

リアルタイム津波浸水被害推計シミュレーションの性能評価

撫佐昭裕, 岸谷拓海, 阿部孝志, 佐藤佳彦, 田野邊睦, 鈴木崇之, 村嶋陽一, 佐藤雅之, 小松一彦, 伊達進, 越村俊一, 小林広明

SENAC : 東北大学大型計算機センター広報　53　(2)　10-18　2020/04
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
リアルタイム津波浸水被害予測の全国展開に向けた検討

越村俊一, 阿部孝志, 井上拓也, 撫佐昭裕, 村嶋陽一, 鈴木崇之, 太田雄策, 日野亮太, 佐藤佳彦, 加地正明, 小林広明

SENAC : 東北大学大型計算機センター広報　52　(2)　2-8　2019/04
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
スーパーコンピュータによるリアルタイム津波浸水被害予測

越村俊一, 阿部孝志, 撫佐昭裕, 村嶋陽一, 鈴木崇之, 井上拓也, 太田雄策, 日野亮太, 佐藤佳彦, 加地正明, 小林広明

SENAC : 東北大学大型計算機センター広報　51　(1)　30-34　2018/01
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
HPCMG-FVを用いたSX-ACEの性能評価

江川隆輔, 磯部洋子, 加藤季広, 小松一彦, 滝沢寛之, 小林広明, 撫佐昭裕

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　50　(3)　15-18　2017/07
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
太陽光及び暑熱同時ばく露に対する熱中症リスク評価シミュレータの開発

西尾渉, 小寺紗千子, 平田晃正, 佐々木大輔, 山下毅, 江川隆輔, 小林広明, 曽根秀昭

電子情報通信学会論文誌 C(Web)　J100-C　(5)　2017

ISSN： 1881-0217
『銅酸化物の有効モデルに対する揺らぎ交換近似』コードのSX-ACE 向け最適化

山下毅, 山崎国人, 江川隆輔, 吉岡匠哉, 土浦宏紀, 小林広明, 曽根秀昭

SENAC : 東北大学大型計算機センター広報　50　(1)　25-30　2017/01
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
防災減災に資するUrgent Computingへの挑戦（防災・減災に貢献するスーパーコンピュータの開発を目指して／東日本大震災の教訓と津波減災に向けてのシミュレーションの課題と展望／防災減災のための可視化と情報通信システム／JAMSTECのHPCシステムを利用した海溝型巨大地震の防災・減災への取り組み）

小林広明, 越村俊一, 下條真司, 有吉慶介

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集　(2016)　128-129　2016/05/30
リアルタイム津波浸水被害予測技術の実証

越村俊一, 井上拓也, 日野亮太, 太田雄策, 小林広明, 撫佐昭裕, 村嶋陽一, 目黒公郎

地域安全学会梗概集(CD-ROM)　(38)　ROMBUNNO.C‐15　2016/05
SX-ACEにおけるHPCG ベンチマークの性能評価

小松一彦, 江川隆輔, 磯部洋子, 緒方隆盛, 滝沢寛之, 小林広明

SENAC : 東北大学大型計算機センター広報　48　(3)　14-19　2015/07
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
ベクトルコンピュータにおける高速化

小林広明, 江川隆輔, 小松一彦, 岡部公起, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修, 曽我隆, 山口健太

SENAC : 東北大学大型計算機センター広報　48　(3)　20-51　2015/07
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
東北大学サイバーサイエンスセンター高速化推進研究活動報告書（第6号）

小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 小松一彦, 大泉健治, 小野敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修他

2015/04
リアルタイム津波浸水・被害予測シミュレーションシステム開発の取り組み

大泉健治, 阿部孝志, 佐藤佳彦, 松岡浩司, 撫佐昭裕, 小林広明

SENAC : 東北大学大型計算機センター広報　48　(1)　54-57　2015/01
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
東北大学サイバーサイエンスセンターにおける分子動力学シミュレーションコードの高速化支援について

森谷友映, 佐々木大輔, 山下毅, 小野敏, 大泉健治, 小松一彦, 江川隆輔, 小林広明

SENAC : 東北大学大型計算機センター広報　47　(1)　51-56　2014/01
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
Heuristic Data Partitioning for Social Networking Service

2013　(34)　1-8　2013/12/09
複合システムにおけるチェックポイントリスタート

滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

日本信頼性学会誌　35　(12)　2013/12

DOI： 10.11348/reajshinrai.35.8_515 　
三次元LSIの課題と高信頼化

小柳光正, 小林広明, 末吉敏則, 鎌田忠

日本信頼性学会誌　35　(12)　2013/12

DOI： 10.11348/reajshinrai.35.8_471 　
マルチプラットフォームにおける最適化手法の効果に関する一検討

小松一彦, 佐々木俊英, 江川隆輔, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(24)　1-7　2013/07/24
Publisher: 一般社団法人情報処理学会

More details Close

近年，HPC システムの多様化が進んでおり，特徴の異なる複数種類の HPC システムにおいて高い性能を引き出すことができる，性能可搬性の高い HPC コードの開発が強く求められている．本研究では，各種 HPC システム向けの最適化手法が HPC コードの性能に与える効果を詳細に解析し，その知見に基づいて性能可搬性の高い HPC コードを開発することを目的としている．本報告では，異なる手動最適化同士や自動最適化を組み合わせた場合の HPC コードの性能可搬性を解析する．HPC システムごとに，それぞれの手動最適化同士や自動最適化の組み合わせによる相乗効果を評価し，性能可搬性の低下を引き起こす可能性のある最適化について議論する．
チューニング対象の限定による効率の良い性能可搬性向上手法

平澤将一, 秋葉諒, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2013　(19)　1-8　2013/05/22
Publisher: 一般社団法人情報処理学会

More details Close

計算システムの多様化に伴い，既存の科学技術計算プログラムを新たな計算システムへ移植し性能を最適化する作業がしばしば求められている．しかしながら大規模な科学技術計算プログラムの移植および性能最適化には多大な労力が必要となり，問題となっている．本研究では，性能可搬性向上を目的とした場合に優先的に性能最適化を行うべきソースコードの箇所を限定し，効率良くアプリケーション全体の性能可搬性を向上させる手法を提案する．ベンチマークプログラムおよび実アプリケーションによる評価の結果，提案手法はアプリケーション全体の性能可搬性を効率よく向上させるために，最適化すべきソースコードの部位を限定できることが示された．
大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

安田一平, 小松一彦, 江川隆輔, 小林広明

研究報告計算機アーキテクチャ（ARC）　2012　(7)　1-6　2012/12/06

More details Close

近年，大規模並列システムのノード数が増大するのに伴い，その高い演算性能を引き出すためには各ノードの演算性能ばかりではなく，ノード間の通信性能を考慮する必要がある．そのため，大規模化したシステムにおいて，容易にアプリケーションの性能解析を示すことができる手法が求められている．アプリケーションの性能解析や，最適化指針を与える方法として，性能モデルを用いたボトルネック解析が挙げられる．しかしながら，ノード間の通信を考慮した性能モデルや性能モデルに基づく解析・最適化手法は確立されていない．本報告ではノード間の通信を考慮したシステムの性能モデルを提案し， SX-9， Nehalem EX クラスタ， FX1， FX10， SR16000 の 5 つの大規模並列システムを用いて提案するモデルの妥当性を調査する．
大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

安田一平, 小松一彦, 江川隆輔, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(7)　1-6　2012/12/06

More details Close

近年，大規模並列システムのノード数が増大するのに伴い，その高い演算性能を引き出すためには各ノードの演算性能ばかりではなく，ノード間の通信性能を考慮する必要がある．そのため，大規模化したシステムにおいて，容易にアプリケーションの性能解析を示すことができる手法が求められている．アプリケーションの性能解析や，最適化指針を与える方法として，性能モデルを用いたボトルネック解析が挙げられる．しかしながら，ノード間の通信を考慮した性能モデルや性能モデルに基づく解析・最適化手法は確立されていない．本報告ではノード間の通信を考慮したシステムの性能モデルを提案し， SX-9， Nehalem EX クラスタ， FX1， FX10， SR16000 の 5 つの大規模並列システムを用いて提案するモデルの妥当性を調査する．
A History-Based Job Scheduling Mechanism for the Vector Meta Computing

MURATA YOSHITOMO, EGAWA Ryusuke, KOBAYASHI Hiroaki

IEICE technical report. Internet Architecture　112　(236)　15-19　2012/10/05
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

A wide-area vector meta computing infrastructure named vector computing cloud has been proposed as a next generation high-performance computing infrastructure. However, in the vector computing cloud, the difference in site policies between organizations causes inefficient usage of vector computing resources. To execute a parallel job such as an MPI application on the vector computing cloud, this paper presents a history-based job scheduling mechanism. Firstly, the proposed job scheduler estimates the time to start the job execution from the history of job-execution on vector supercomputers. Next, based on the estimation, the job scheduling mechanism automatically allocates the parallel job to appropriate sites. The simulation results show that the proposed job scheduling mechanism improves the utilization efficiency of vector computing resources, compared to the conventional round-robin scheduling mechanism.
統合開発環境と連携するポータブルなビルドシステム

平澤将一, 滝沢寛之, 小林広明

研究報告ハイパフォーマンスコンピューティング（HPC）　2012　(28)　1-8　2012/09/26

More details Close

本研究では，性能可搬性を保ちつつアプリケーションを開発するためのフレームワーク構築に向けて，ポータブルなビルドシステムを開発する．現在の高性能計算 (High-Performance Computing, HPC) システムの構成は複雑化しており，アプリケーションを実行せずにその実効性能を予測することは困難である．このため本研究では，開発中のアプリケーションを定期的に実行し，その性能プロファイルを暗黙裡に取得して性能可搬性の低い個所を特定し，プログラマに対話的に提示することにより性能可搬性の維持を支援することを想定している．そのようなアプリケーション開発補助ツールを実現するためには，開発中のアプリケーションを暗黙裡に様々なシステム上でビルドし，実行する機能が必要である．本研究では，そのような可搬性を有するビルドシステムを開発し，アプリケーション開発支援環境として必要な機能を議論する．
Implementation and Evaluation of the Nanopowder Growth Simulation with OpenACC

2012　(10)　1-7　2012/09/26
大規模計算システムにおけるBCMの性能評価

小松一彦, 曽我隆, 江川隆輔, 滝沢寛之, 小林広明

SENAC　45　(3)　17-25　2012/07
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
ベクトル型スーパーコンピュータ広域連携基盤の性能評価

山下毅, 村田善智, 江川隆輔, 小野敏, 大泉健治, 小林広明

SENAC　45　(1)　42-45　2012/01
A Circuit Partitioning Strategy for 3-D Integrated Floating-point Multipliers

Kawai Kazushige, Tada Jubee, Egawa Ryusuke, Kobayashi Hiroaki, Goto Gensuke

IEICE technical report. Component parts and materials　111　(326)　67-72　2011/11/28
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Three-dimensional (3-D) integration technologies are attractive for enhancing the speed of the arithmetic circuits. To implement 3-D stacked arithmetic units, effective circuit-partitioning strategies should be applied to exploit the potential of 3-D integration technologies. In this paper, we target a single-precision and a double-precision floating-point multipliers for speed-up the circuit2 by using 3-D integration. Our partitioning strategy is that the parts of the critical-path circuits for multiplication, normalizer and rounder are implemented on the same layer, avoiding to use TSV. The simulation analysis shows that the delay time reduces to 92% for a single-precision and 83% for a double-precision multipliers, as compared with those of the conventional 2-D floating-point multipliers
Evaluation of GPU Computing Based on An Automatic Program Generation Technology

2011　(18)　1-7　2011/07/20
A Client-Level Deadline Scheduling Strategy for Volunteer Computing Systems

2011　45-54　2011/05/18
A Performance Tuning Strategy Based on the Roofline Model for Vector Processors

4　(3)　77-87　2011/05/12

ISSN： 1882-7829
チップマルチベクトルプロセッサのためのプログラム最適化技術

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　44　(2)　29-36　2011/04
東北大学サイバーサイエンスセンター高速化推進研究活動報告書（第5号）

小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 伊藤英一, 大泉健治, 小野敏, 小久保達信, 橋本ユキ子, 磯部洋子, 撫佐昭裕, 神山典, 金野浩伸

2011/04
A Circuit Partitioning Strategy for 3-D Integrated Multipliers

SAKAI Kazuhito, TADA Jubee, EGAWA Ryusuke, KOBAYASHI Hiroaki, GOTO Gensuke

IEICE technical report　110　(344)　153-158　2010/12/09
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Three-dimensional(3-D) integration technologies attract a lot of attention to further enhance the performance of the LSI. To implement 3-D stacked arithmetic units, appropriate circuit partitioning strategies should be applied to exploit the potential of 3-D integration technologies. In this paper, we propose a circuit partitioning technology, which can improve the performance of arithmetic units with small overheads of vertical interconnects. To clarify the effectiveness of the proposed partitioning strategy, 3-D stacked parallel multipliers are designed and evaluated. The multipliers designed by the proposed circuit partitioning strategy achieve a 20% delay reduction compared to multipliers that is designed based on conventional 2-D implementations.
Energy Consumption of a Chip Multi-Vector Processor Using Real Applications

2010　(3)　1-8　2010/12/09
Publisher: 情報処理学会
ISSN： 1884-0930
An Out-of-order Vector Processing Mechanism for Multimedia Applications

GAO YE, EGAWA RYUSUKE, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

2010　(24)　1-10　2010/07/27
Publisher: 情報処理学会
ISSN： 0919-6072
広域ベクトルコンピュータ連携による次世代HPC基盤の構築(3.2 第8回情報シナジー研究会, 3. 研究活動報告)

村田善智, 江川隆輔, 東田学, 小林広明

年報　9　94-98　2010/07
Publisher: 東北大学サイバーサイエンスセンター
Performance Evaluation of GPU Computing with OpenCL

ARAI YUSUKE, SATO KATSUTO, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

2010　(11)　1-7　2010/02/15
Publisher: 情報処理学会
ISSN： 0919-6072
High performance computing on vector systems 2009

Michael Resch, Sabine Roller, Katharina Benkert, Martin Galle, Wolfgang Bez, Hiroaki Kobayashi

High Performance Computing on Vector Systems 2009　1-250　2010
Publisher: Springer Berlin Heidelberg
DOI： 10.1007/978-3-642-03913-3 　
Implementation and Evaluation of a Checkpint/Restart Tool for CUDA Applications

TAKIZAWA HIROYUKI, SATO KATSUTO, KOMATSU KAZUHIKO, KOBAYASHI HIROAKI

122　(7)　G1-G7　2009/10/09
Publisher: 情報処理学会
ISSN： 0919-6072
RC-008 Client-Level Task Scheduling for Effective Volunteer Computing

Murata Yoshitomo, Endo Toshiaki, Takizawa Hiroyuki, Kobayashi Hiroaki

8　(1)　165-172　2009/08/20
Publisher: Forum on Information Technology
C-024 An Auction based Resource Allocation Considering Multifaceted Utilities in a Peer to Peer Environment

Satayapiwat Chainan, Komatsu Kazuhiko, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

8　(1)　491-494　2009/08/20
Publisher: Forum on Information Technology

More details Close

Recently, many market-based approaches have been studied as one of the promising alternatives in a resource allocation problem. Especially, auction-based approaches are widely chosen due to its distributed nature and its relatively lower complexity. However, employing an auction to allocate jobs is only suitable for homogeneous environments of resources. This paper proposes an auction-based resource allocation mechanism which enables resource allocation in a heterogeneous environment while minimizing user's inputs. Our preliminary results show that our resource allocation mechanism improves the performance of important jobs during high-loaded.
C-023 Performance Evaluation towards BLAS with Automatic Processor Selection

Komatsu Kazuhiko, Koyama Kentaro, Sato Katsuto, Takizawa Hiroyuki, Kobayashi Hiroaki

8　(1)　485-490　2009/08/20
Publisher: Forum on Information Technology
Performance Optimization Techniques for Vector Processors with Cache Memory

SATO YOSHIEI, NAGAOKA RYUICHI, MUSA AKIHIRO, EGAWA RYUSUKE, TAKIZAWA HIROYUKI, OKABE KOKI, KOBAYASHI HIROAKI

2009　(6)　1-10　2009/07/28
Publisher: 情報処理学会
ISSN： 0919-6072
SX-9による大規模並列シミュレーション(3.2 第7回情報シナジー研究会, 3. 研究活動報告)

曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明, 高橋俊, 中橋和博

年報　8　88-93　2009/07
Publisher: 東北大学サイバーサイエンスセンター
創造工学研修の実施報告 ― スパコンを使って計算科学・計算機科学のおもしろさを体験 ―

滝沢寛之, 江川隆輔, 笹尾泰洋, 佐野健太郎, 山本悟, 小林広明

東北大学サイバーサイエンスセンター大規模科学計算システム広報SENAC　42　(2)　87-90　2009/02
大規模非圧縮性流体シミュレーションの工学問題への応用

高橋俊, 石田崇, 中橋和博, 小林広明, 岡部公起, 下村陽一, 曽我隆, 撫佐昭裕

SENAC : 東北大学大型計算機センター広報　42　(1)　107-114　2009/01
Publisher: 東北大学サイバーサイエンスセンター
ISSN： 0286-7419
624 A study of energy-aware GPU computing

Takizawa Hiroyuki, Sato Katuto, Kobayashi Hiroaki

The Computational Mechanics Conference　2008　(21)　558-559　2008/11/01
Publisher: The Japan Society of Mechanical Engineers
ISSN： 1348-026X
東北大学サイバーサイエンスセンターの取り組みとSX-9の性能評価 (スーパーコンピュータSX-9特集)

小林広明, 江川隆輔, 岡部公起

NEC技報　61　(4)　58-65　2008/10
Publisher: 日本電気
ISSN： 0285-4139
RC-006 Hardware Design of A Way-Allocatable Shared Cache Mechanism

Abe Kenta, Kotera Isao, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

7　(1)　35-38　2008/08/20
Publisher: Forum on Information Technology
A programming language extension and its automatic optimization techniques for exploiting the potential of GPUs

SATO KATUTO, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

IPSJ SIG Notes　2008　(74)　199-204　2008/07/29
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

GPUs have a great potencial of high-performance computing and have been used in various applications in addition to graphics processing. In order to achieve high-performance with GPUs, we have to carry out architecture-aware optimizations because of their unique architecture. We have proposed SPRAT, a programming language for hybrid systems of CPUs and CPUs, to realize both the portability of programs and the high computation effeciency. This paper proposes some automatic optimization techniques based on memory access adjustments. The results shows, significant performance improvements in the executions of Edge detection and LU decomposition.
On-Chip Cache Memory Systems for Next Vector Architectures

7　89-93　2008/07
Publisher: 東北大学サイバーサイエンスセンター
Early Performance Evaluations of SX-9 Supercomputer Systems

7　85-88　2008/07
Publisher: 東北大学サイバーサイエンスセンター
A Stream Programming Language for GPU Computing

TAKIZAWA Hiroyuki, SATO Katuto, KOBAYASHI Hiroaki

Journal of the Visualization Society of Japan　28　(1)　271-274　2008/07/01
Publisher: 可視化情報学会
ISSN： 0916-4731
A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination

Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

1　(1)　85-95　2008/06/26
Publisher: 情報処理学会
ISSN： 1882-7829

More details Close

Although ray tracing is the best approach to high-quality image synthesis much time is required to generate images due to its huge amount of computation. In particular ray-primitive intersection tests still dominate the execution time required for ray tracing and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.
ベクトルプロセッサ用キャッシュメモリの性能評価

佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

情報処理学会シンポジウム論文集　2008　(2)　55　2008/01/17

ISSN： 1344-0640
High-Density Computation in Building-Cube Method by Vector Super-computer

高橋俊, 石田崇, 中橋和博, 小林広明, 岡部公起, 下村陽一, 曽我隆, 撫佐昭裕

流体力学講演会/航空宇宙数値シミュレーション技術シンポジウム講演集　40th-2008　433-434　2008
MPIプログラミング入門

野口孝明, 曽我隆, 金野浩伸, 撫佐昭裕, 大泉健治, 小野敏, 伊藤英一, 岡部公起, 江川隆輔, 小林広明

SENAC : 東北大学大型計算機センター広報　40　(4)　69-94　2007/10
Publisher: Super-Computing System Information Synergy Center, Tohoku University
ISSN： 0286-7419
I-004 A Parallel Image Generation Algorithm based on Partitioning of Photon Maps

Tamura Masahide, Takizawa Hiroyuki, Kobayashi Hiroaki

6　(3)　203-206　2007/08/22
Publisher: Forum on Information Technology
A Study on Dynamic Task Assignment to CPU and GPU Based on Runtime Performance Prediction

SHIRATORI Hiroki, TAKIZAWA Hiroyuki, KOBAYASHI Hiroaki

IEICE technical report　107　(175)　37-42　2007/08/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Recent studies of general-purpose computation on graphics processing units (GPUs) have shown that a PC equipped with high performance CPU and GPU can be regarded as a heterogeneous parallel processing system. On the other hand, programming for such a system has become complicated. In order to exploit the potential of the system, unified programming models for the CPU and GPU have been studied. However, the selection of CPU or GPU that executes a program must be made manually and statically in most of the existing development tools for GPGPU applications. Because appropriate selection depends on some information determined at runtime, the processing efficiency improves if the appropriate processor can be dynamically selected based on the performance prediction at runtime. This paper examines the effectiveness of dynamically selecting the appropriate processor based on the execution time estimation and the the processor switching cost. The experimental results show that the cost of the processor switching except the data transfer is negligible and hence the processor switching can improve the performance if the execution time is long compared to the prediction error.
The Evaluation of A Way-Allocatable Shared Cache Mechanism

KOTERA ISAO, EGAWA RYUSUKE, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

IPSJ SIG Notes　2007　(79)　31-36　2007/08/01
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

We have proposed a way-allocatable shared cache mechanism for chip multiprocessors, which can save power consumption with remaining the performance by employing cache partitioning and power gating. In the proposed mechanism, a metric of cache access locality is defined and used for the cache partitioning and the power gating. Based on the metric, the proposed mechanism can flexibly change the configuration to be either performance-oriented or power-oriented. This paper evaluates the validity of the proposed mechanism, using some benchmarks with different cache access behaviors. The evaluation results show that the proposed mechanism can appropriately partition the shared cache for applications with high localities. In addition, our proposal at the performance-oriented mode can reduce energy consumption by 28% while improving the performance by 0.3%.
SC|06調査報告(3.2 第5回情報シナジー研究会, 3. 研究活動報告)

小野敏, 滝沢寛之, 小林広明

年報　6　83-87　2007/07
Publisher: 東北大学情報シナジーセンター
SC|05調査報告(3.2 第4回情報シナジー研究会, 3. 研究活動)

大泉健治, 伊藤英一, 滝沢寛之, 小林広明

年報　5　71-74　2006/06
Publisher: 東北大学情報シナジーセンター
A Runtime Optimization Method for Redundant Task Dispatch on P2P Computing Platforms.(3.2 第4回情報シナジー研究会, 3. 研究活動)

Wang Hong, Takizawa Hiroyuki, Kobayashi Hiroaki

年報　5　100-105　2006/06
Publisher: 東北大学情報シナジーセンター
実シミュレーションコードによる大規模科学計算システムの性能評価(3.2 第4回情報シナジー研究会, 3. 研究活動)

滝沢寛之, 岡部公起, 伊藤英一, 撫佐昭裕, 曽我隆, 伊藤学, 小林広明

年報　5　78-83　2006/06
Publisher: 東北大学情報シナジーセンター
世界一の評価を受けた東北大学のスーパーコンピュータSX-7

小林広明

仙台市医師会報　(504)　8-10　2006
安全・安心な社会の構築に貢献する世界一のスーパーコンピュータSX-7

小林広明

まなびの杜＜東北大学＞知的探検のすすめ　3　32-33　2006
A Weighted Voting Method for Combining Multiple Character Recognition Engines

KANEKO Shoichiro, GOTO Hideaki, KOBAYASHI Hiroaki

IEICE technical report　105　(477)　13-18　2005/12/15
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

It is known that combining multiple character recognition engines by majority logic is useful for improving the accuracy of character recognition. In the previous work, however, the suitability of each recognition engine for the input characters was not taken into account. On the other hand, several methods for automatic selection of recognition engines using the suitability of the engines have been proposed. We combine the above two approaches and propose a new method for combining multiple recognition engines. Experimental results show that the recognition accuracy improves from 97.8% to 98.2% using 14 Japanese character sets with 3195 characters each.
実シミュレーションコードによる大規模科学計算システムの性能評価

小林広明, 岡部公起, 撫佐昭裕, 曽我隆, 松村佳昭, 伊藤学

SENAC : 東北大学大型計算機センター広報　38　(4)　39-59　2005/10
Publisher: Super-Computing System Information Synergy Center, Tohoku University
ISSN： 0286-7419
HPCチャレンジでのSXシステムの性能評価(3.2 第3回情報シナジー研究会, 3. 研究活動)

小林広明, 滝沢寛之, 小久保達信, 岡部公起, 伊藤英一, 小林義昭, 浅見暁, 小林一夫, 後藤記一, 片海健亮, 深田大輔

年報　4　98-116　2005/05
Publisher: 東北大学情報シナジーセンター
HPC チャレンジでのSX システムの性能評価

小林広明, 滝沢寛之, 小久保達信, 岡部公起, 伊藤英一, 小林義昭, 浅見暁, 小林一夫, 後藤記一, 片海健亮, 深田大輔

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　38　(1)　5-28　2005/01
A new dynamical domain decomposition method for parallel molecular dynamics simulation

V. Zhakhovskii, K. Nishihara, Y. Fukuda, S. Shimojo, T. Akiyama, S. Miyanaga, H. Sone, H. Kobayashi, E. Ito, Y. Seo, M. Tamura, Y. Ueshima

2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005　2　848-854　2005

DOI： 10.1109/CCGRID.2005.1558650 　
Analysis and comparison of frequency features for scene text detection

SAITOH Seiji, GOTO Hideaki, KOBAYASHI Hiroaki

Technical report of IEICE. PRMU　104　(523)　31-36　2004/12/16
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Several methods using features in frequency domains obtained by the Discrete Cosine Transformation (DCT) and the Wavelet Transformation have been proposed so far for text region detection in images. The performances of the methods in the previous work were evaluated mainly by the final precision of text region extraction. However, the analyses and the comparisons with respect to the goodness of features themselves have not been so sufficient. This report proposes an analysis and comparison method using Fisher's discriminant analysis to get a better features and an unsupervised thresholding method to segment text and non-text. Better features can be obtained by choosing DCT coefficients in an appropriate frequency range. Experimental results indicate that the final precisions of text region extraction are improved by using the optimized features.
スーパーSINET を利用した大規模遠隔可視化処理の評価

滝沢寛之, 小林広明

東北大学情報シナジーセンター大規模科学計算機システム広報SENAC　37　(2)　5-10　2004/04
Performance Analysis of a Parallel Law-of-the-Jungle Algorithm for Generating Codebooks of Vector Quantization

MOMOSE Shintaro, SANO Kentaro, TAKIZAWA Hiroyuki, NAKAJIMA Taira, KOBAYASHI Hiroaki, NAKAMURA Tadao

IEICE technical report. Neurocomputing　103　(92)　25-30　2003/05/22
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Vector quantization is an attractive technique for lossy data compression, which has been a key technology for efficient data storage andlor transfer. So far, various algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. In particular, the Law-of-the-Jungle(LOJ) learning algorithm has been proposed to achieve rapid codebook design by algorithmic improvements. However, its acceleration is still required when large data sets are processed on a single computer. In order to achieve faster codebook design, we have been proposed a scalable parallel codebook design algorithm for parallel computers. This paper analyzes and evaluates the performance of the parallel LOJ learning algorithm on three types of parallel computers: an IBM SP2, an NEC AzusA and a PC cluster.
A Study of Characters Recognition for Auditory Computer-Utilization Support Systems

Kikuchi Hiroto, Shen Hong, Kawashima Tomoyoshi, Kobayashi Hiroaki, Nakamura Tadao

Proceedings of the IEICE General Conference　2003　369-369　2003/03/03
Publisher: The Institute of Electronics, Information and Communication Engineers
Parallel Codebook Generation for Optimal Vector Quantizer

MOMOSE Shintaro, SANO Kentaro, TAKIZAWA Hiroyuki, NAKAJIMA Taira, LIMA Clecio Donizete, KOBAYASHI Hiroaki, NAKAMURA Tadao

IPSJ SIG Notes　2002　(80)　67-72　2002/08/21
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Vector quantization is an attractive technique for lossy data compression, which has been a key technology for data storage and/or transfer. So far, various algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. In particular, the Law-of-the-Jungle(LOJ) learning algorithm has been proposed to achieve rapid codebook design by algorithmic improvements. However, its acceleration is still required when large data sets are processed on a single computer. Therefore, a scalable parallel codebook design algorithm for parallel computers is required. This paper presents a parallel algorithm for the LOJ learning, suitable for distributed-memory parallel computers with a message-passing mechanism. Experimental results indicate a high scalability of the, proposed parallel algdrithm on the IBM SP2 parallel com'puter with 32 processing elements.
ベクトル量子化のための並列コードブック生成アルゴリズムの性能評価(2.<特集>第1回情報シナジー研究会)

百瀬真太郎, 佐野健太郎, 滝沢寛之, 中島平, 小林広明, 中村維男, Clecio Donizete Lima, 東北大学大学院情報科学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院工学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院情報科学研究科

年報　2　33-42　2002/07/01

More details Close

ベクトル量子化は高効率なデータ圧縮手法であり、データの保存や転送において核となる技術である。これまでに、誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており、中でもアルゴリズムの改良によってコードブック生成処理時間の短縮を図るLaw-of-the-Jungle(LOJ)アルゴリズムが注目を集めている。しかし、大きなデータセットを単一のCPUで処理する場合、アルゴリズムの改良による処理時間短縮には限界があり、並列処理によるさらなる速度向上が求められている。本論文では、メモリ分散型並列計算機に適した並列LOJアルゴリズムを提案する。IBM SP2、NEC AzusA、PCクラスタを用いて並列LOJアルゴリズムの性能評価を行なった結果、いずれもプロセッサ台数に対する高い速度向上率が得られた。
A STUDY ON PRECISION OF INTERSECTION CALCULATION FOR RAY-TRACING HARDWARE

Shimakura Takamitsu, Saida Yasumasa, Sano Kentaro, Suzuki Ken-ichi, Nakada Takeo, Oba Nobuyuki, Kobayashi Hiroaki, Nakamura Tadao

Proceedings of the Society Conference of IEICE　2001　158-158　2001/08/29
Publisher: The Institute of Electronics, Information and Communication Engineers
The Design of an Instruction Fetch Unit for VLIW Processors Supporting Speculative Execution

HARADA HUGO KENJI PEREIRA, NAKAIKE TAKUYA, KOBAYASHI HIROAKI, NAKAMURA TADAO

IPSJ SIG Notes　1999　(100)　63-68　1999/11/26
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This paper presents an instruction fetch scheme capable of speculatively executing instructions in VLIW processors. This is achieved with the compiler and the underlining hardware working together in a scheme called Dynamic Boosting (DB). In dynamic boosting, the compiler is responsible for finding instruction level parallelism (ILP) beyond the boundaries of basic blocks. It then schedules and labels the independent instructions belonging to different basic blocks in such a way that the hardware is able to detect and execute these instructions in parallel at run time. The software simulation results show that a speed-up of at most 20% was achieved in the SPECint 95 benchmarks. In addition, the preliminary results on hardware cost and gate level speed show that the hardware complexity and cost are reasonable considering the obtained speed-ups.
A Study of Acceleration of Ray-Tracing by Using Reference Images

59　149-150　1999/09/28
A Study of A Global Illumination Model for Rendering Gaseous Objects

59　145-146　1999/09/28
An Active Contour Model with Consideration to the Shape of a Region-of-Interest

59　257-258　1999/09/28
A Study on a Reconfigurable Synchronous Dataflow Computer

SASAKI Hiroshi, TSUKIOKA Hideaki, SHOJI Nobuyoshi, KOBAYASHI Hiroaki, NAKAMURA Tadao

Technical report of IEICE. VLD　98　(446)　17-22　1998/12/10
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

This report proposes a synchronous dataflow computer, which constructs hardware to represent dataflow graphs of applications then processes data in the dataflow fashion. We implemented JPEG encoder on the system and measured the amount of required hardware resources. The experimental results show that computations can naturally be expressed in datafolw graphs using units only for accessing the shared memory. The exploitable features of applications are discussed and a software development environment is also presented.
Adaptive Volume - Subdivision for Efficient Data - Parallel Volume Rendering

SANO Kentaro, KITAJIMA Hiroyuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

IPSJ SIG Notes　1998　(93)　7-12　1998/10/09
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

Using parallel processing on a general-purpose parallel computer that is one of the promising strategies for fast volume rendering, we proposed a data-parallel volume rendering algorithm based on the image composition method. Although the algorithm achieves real-time rendering, a constant processing time of image composition lowers the efficiency of parallel processing as the number of processing elements increases. To solve this problem, this study proposes an adaptive subdividing method of volume data and discusses its performance through some experiments. The experimental results show that the method reduces the image-compositing time as the number of processing elements increases.
TLB - Assisted Cache

SUZUKI KEN-ICHI, OBA NOBUYUKI, KOBAYASHI HIROAKI, NAKAMURA TADAO

IPSJ SIG Notes　1997　(61)　7-12　1997/06/27
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This report proposes a new on-chip cache system named "TLB Assisted Cache (TAC)." The TAC determines a cache hit/miss by referring to the TLB and the small assist tag comparisons that are faster than a conventional cache tag comparison. Therefore, it is possible to initiate a cache data array access-before a cache tag comparison. Consequently, the TAC achieves an access time as short as a V-V cache. Moreover, the TAC logically acts as a V-P cache so it does not suffer from the V-V cache's shortcomings, such as the synonym problem.
Implementing Functional Programs Based on the SPMD Model

NAKAIZUMI MITSUHIRO, SHEN HONG, KOBAYASHI HIROAKI, NAKAMURA TADAO

IPSJ SIG Notes　1997　(61)　25-30　1997/06/27
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Functional languages, which are different from the imperative ones, are characterized with the referential transparency, high programming productivity, and the ease of program verification. However, they are prevented from wide acceptance due to the inefficiency of their implementation on conventional computers. Parallel execution of functional programs utilizing their potential parallelism is a promising way to solve this problem. This paper studies the parallel execution of functional programs based on the SPMD model. We realize the parallel execution of functional programs on parallel computer IBM SP2. The experimental results of benchmark programs reveal the perspective of the execution model.
A Parallel Volume Rendering Algorithm for Distributed-Memory Multiprocessor Systems

Reports of Toyoda Physical and Chemical Research Institute.　(50)　41-54　1997/05
Publisher: 豊田理化学研究所
ISSN： 0372-039X
A Study on Parallelizing Scheduling for Jetpipeline

NAKAIKE TAKUYA, SASAKI TAKEHITO, KATAHIRA MASAYUKI, KOBAYASHI HIROAKI, NAKAMURA TADAO

IPSJ SIG Notes　1996　(106)　25-30　1996/10/31
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Jetpipeline is an architecture based on instruction-level parallelism(ILP), which utilizes vector and scalar processing to achieve high performance. Therefore. the compiler for Jetpipeline must parallelize vector and scalar instructions of programs. However, since vector instructions take more cycles to complete their execution than scalar instructions, it is not suitable to use parallelizing methods used in VLIW machines. In this paper, we propose a parallelizing method for Jetpipeline by improving the dispatch stack method to parallelize the vector and scalar instructions. We show the effectiveness of the proposed parallelizing method for Jetpipeline through simulation experiments.
Design of asynchronous vector calculator using delay element.

高野光司, 佐々木毅人, 片平昌幸, 小林広明, 中村維男

電気関係学会東北支部連合大会講演論文集　1996　1996
Shared Memory System for the Hierarchical Parallel Reduction System of FL

MORI Noriaki, KITAJIMA Hiroyuki, SHEN Hong, KOBAYASHI Hiroaki, NAKAMURA Tadao

Proceedings of the Society Conference of IEICE　1995　34-34　1995/09/05
Publisher: The Institute of Electronics, Information and Communication Engineers
Performance Evaluation of a Distributed Shared Memory Multiprocessor System using Network with Message Losses

KURIYAMA Kazunari, TAKAHASHI Kazunari, OBA Nobuyuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

Proceedings of the Society Conference of IEICE　1995　35-35　1995/09/05
Publisher: The Institute of Electronics, Information and Communication Engineers
An Automatic System for Facial Expression Recognition Using Neural Networks

Nakajima Taira, Takizawa Hiroyuki, Shimamura Mieko, Kobayashi Hiroaki, Nakamura Tadao

Proceedings of the Society Conference of IEICE　1995　173-173　1995/09/05
Publisher: The Institute of Electronics, Information and Communication Engineers
Performance Studies of the FL Hierarchical Parallel Reduction System

KITAJIMA Hiroyuki, SHEN Hong, KATAHIRA Masayuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

IPSJ SIG Notes　1995　(56)　1-8　1995/06/01
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

Functional programming languages differ from traditional imperative ones with many appealing properties such as referencial transparency and high programming productivity. However, the inefficiency of their implementation on conventional computers has prevent them from wide acceptance. We have proposed a hierarchical parallel reduction system by combining multiprocessor processing and pipeline processing in our earlier work. In this paper, we investigate the task scheduling strategy with locality consideration suitable for enhancing the system performance, and carry out software simulation experiments. The simulation results reveal the effectiveness of the proposed system with the scheduling strategy.
A Study on A Compile Technique for Jetpipeline

SASAKI Takehito, NAKAIKE Takuya, KATAHIRA Masayuki, SHEN Hong, KOBAYASHI Hiroaki, NAKAMURA Tadao

IPSJ SIG Notes　1995　(56)　9-16　1995/06/01
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

To achieve high computation power, we have proposed the Jetpipeline architecture that utilizes ILP (Instruction Level Parallelism) including vector operations in addition to scalar operations. In the Jetpipeline architecture, a compiler has an important role because it exploits ILP from operations In this paper, we present a compile technique for Jetpipeline based on both parallelizaiton for scalar operations and vectorization for vector operations. The proposed compile technique is examined through simulation experiments.
SuperTAINS: Tohoku University Network realizes multimedia applications through sub-giga network

Yukiyoshi Kameyama, Akinori Ito, Hiroaki Kobayashi

Computer and Network LAN　13　(6)　114-120　1995/06
Publisher: Ohmsha
A Hierarchical Processing System for Prolog Programs and Its Performance Evaluation

Wang Dong, Kobayashi Hiroaki, Nakamura Tadao

95　(2)　1-8　1995/01/13
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

This paper presents a hierarchical parallel execution model for Prolog programs based on Or-parallelism/And-parallelism as coarse-grain parallelism, and parallel unification as fine-grain parallelism. In the hierarchical model, we proposed an extended And-Or tree for its high level (coarse-grain), and used parallel unification at the low level. Thus, the execution model can exploit high degree of parallelism in Prolog programs. Moreover, the execution model is implemented on a hierarchical processing system which is a shared memory multiprocessor system with a mesh plus tree network devoted to control. Finally, the performance evaluation of this system is also carried out.
Pipelined Execution of OR-Parallel Prolog

Inaba Tsutomu, Shen Hong, Katahira Masayuki, Kobayashi Hiroaki, Nakamura Tadao

95　(2)　9-16　1995/01/13
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

In this study, we propose an OR-Prolog parallel execution model on a PE-pipeline architecture. This is an extention of J. Beer's idea in "Pipelined Execution of Sequential Prolog." On our model, we adopt global shared memory and crossbar network. The Global shared memory that consists of a Choice-Point Stack Module and several Environment Frame Modules that store the environments of each Choice-Point. In this paper, the system organization and simulation results are described. Based on the simulation results, we can obtain the LIPS 2. 5 times of that on Beer's model.
Performance evaluation of jet pipe line fusing vector and scalar orders.

仲池卓也, 佐々木毅人, 片平昌幸, 沈紅, 小林広明, 中村維男

電気関係学会東北支部連合大会講演論文集　1995　1995
Performance Evaluation of (Mπ)^2

TOH Yuichiro, YAMAUCHI Hitoshi, KOBAYASHI Hiroaki, NAKAMURA Tadao

1994　378-378　1994/09/26
Publisher: The Institute of Electronics, Information and Communication Engineers
A Memory Access Protocol of A Distributed Processing Systems with An ATM Network

Kuriyama Kazunari, Takahashi Masafumi, Oba Nobuyuki, Kobayashi Hiroaki, Nakamura Tadao

1994　79-79　1994/09/26
Publisher: The Institute of Electronics, Information and Communication Engineers
A Study on Load Balancing on a Pipelined Prolog Architecture

INABA Tsutomu, SHEN Hong, KATAHIRA Masayuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

1994　84-84　1994/09/26
Publisher: The Institute of Electronics, Information and Communication Engineers
An Inference Method with Word-Impression and a Feeling Model

Igata Nobuyuki, Kobayashi Hiroaki, Nakamura Tadao

1994　71-71　1994/09/26
Publisher: The Institute of Electronics, Information and Communication Engineers
A Study on Automatic Emotion Recognition with Neural Networks

Sasaki Kou, Osada Toshiaki, Kobayashi Hiroaki, Nakamura Tadao

1994　132-132　1994/09/26
Publisher: The Institute of Electronics, Information and Communication Engineers
Performance Evaluation of The TLB-Unified Cache

Suzuki Ken-ichi, Kobayashi Hiroaki, Nakamura Tadao

1994　88-88　1994/09/26
Publisher: The Institute of Electronics, Information and Communication Engineers
A Memory Access Queuing Mechanism for a Clustered Multiprocessor System

1994　(66)　225-232　1994/07/21
A Study of a Parallelizing Compiler for the Jet Pipeline.

佐々木毅人, 片平昌幸, 小林広明, 中村維男

日本機械学会東北支部総会・講演会講演論文集　29th　1994
A Study on a Shading Method for Volume Rendering.

佐藤大輔, 片平昌幸, 小林広明, 中村維男

電子情報通信学会大会講演論文集　1994　(Shuki Pt 6)　1994

ISSN： 1349-1369
A Study on an Instruction Scheduling Strategy for Jetpipeline

SASAKI Takehito, KATAHIRA Masayuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

83-83　1994
Publisher: The Institute of Electronics, Information and Communication Engineers
A New Control Scheme for Token Ring LANs

1993　(2)　231-237　1993/11/17
A Distributed Shared-Memory System Using ATM Networks

1993　(2)　277-286　1993/11/17
An Intelligent Self-Routing Algorithm for B-ISDN

Rashid Emad, Kobayashi Hiroaki, Nakamura Tadao

IEICE technical report. Artificial intelligence and knowledge-based processing　93　(240)　39-46　1993/09/20
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

This paper presents a new self-routing algorithm for broadband ISDN′s asynchronous transfer mode(ATM)switching networks.The routi ng algorithm is ambuscade in a switch for congestion control called Ants Routing.The congestion is controlled through regulating the input traffic rate to the switch element that has congestion on one of its output p ports.high throughput and low packet loss probability can be achieved by rerouting packets′arriv al due to the presence of bursty traffic on a switch′s output port .The rerouting algorithm is based on the information of congestion status of each switch,which can be distributed among neighboring switches.Mathematical analysis based on the queuing model shows. that our algorithm has capability of congesfion avoidance on the interconnection network and packet loss improvement especially when traffic is bursty.
A study on parallel processing system for volume rendering

46　477-478　1993/03/01
A Study on A Parallel Processing System for Photo-Realistic Image Synthesis

46　369-370　1993/03/01
The Computer Architecture Description Language : CARD - L

1993　(6)　121-128　1993/01/21
Expert System Aid in Networks' Flow Control

1992　(76)　33-40　1992/09/24
AN ADAPTVE NETWORK ROUTING METHOD - POTENTIAL ROUTING

1992　(64)　65-72　1992/08/19
Knowledge Representation by Using Position-Display-Map

42　220-221　1991/02/25
A Discussion on the knowledge-base for mechannical designs based on the hierarchy of the mechanical structure

42　325-326　1991/02/25
A Study of Object Space Parallel Processing for Fast Ray Tracing

1987　(78)　9-16　1987/11/12

Show all ︎Show first 5

Books and Other Publications 17

Sustained Simulation Performance 2019 and 2020

Michael Resch, Manuela Wossough, Wolfgang Bez, Erich Focht, Hiroaki Kobayashi

2021
Sustained Simulation Performance 2018 and 2019

Michael Resch, Manuela Wossough, Wolfgang Bez, Erich Focht, Hiroaki Kobayashi

2020
Sustained Simulation Performance 2017

Michael Resch, Wolfgang Bez, Erich Focht, Michael Gienger, Hiroaki Kobayashi

2017
Sustained Simulation Performance 2016

Michael M. Resch, Wolfgang Bez, Erich Focht, • Nisarg Patel, Hiroaki Kobayashi Editors

2016

ISBN: 9783319467344
コンピュータ工学入門

鏡慎吾, 佐野健太郎, 滝沢寛之, 岡谷貴之

コロナ社　2015/03

ISBN: 9784339024920
Sustained Simulation Performance 2015

Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Qi, J, Roller, S

Springer　2015

ISBN: 9783319203409
Sustained Simulation Performance 2014

Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Patel, N

Springer　2014

ISBN: 9783319106267
Sustained Simulation Performance 2013

Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Kovalenko, Y

Springer　2013

ISBN: 9783319014395
Sustained Simulation Performance 2012

Resch, M.M, Wang, X, Bez, W, Focht, E, Kobayashi, H

Springer　2012

ISBN: 9783642324543
High Performance Computing on Vector Systems 2011

Resch, M. Wang, X. Focht, E. Kobayashi, H. Roller, S

Springer　2011

ISBN: 9783642222436
Cloud, Grid and High Performance Computing: Emerging Applications

Hong Wang, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

IGI Global　2011

ISBN: 9781609606039
High Performance Computing on Vector Systems 2010

M.Resch, K.Benkert, X.Wang, M.Galle, W.Bez, H.Kobayashi, S.Roller

Springer　2010/11

ISBN: 9783642118500
Software Automatic Tuning: From Concepts to State-of-the-Art Results

Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Springer　2010

ISBN: 9781441969347
High Performance Computing on Vector Systems 2009

Resch, M, Roller, S, Benkert, K, Galle, M, Bez, W, Kobayashi, H

Springer-Verlag　2009/11

ISBN: 9783642039126
High Performance Computing on Vector Systems 2008

M.Resch, M. Galle, H.Kobayashi, T.Hirayama

Springer-Verlag　2008/11
High Performance Computing on Vector Systems 2007

Hiroaki Kobayashi

Springer-Verlag　2007/11

ISBN: 9783540743835
High Performance Computing on Vector Systems 2006

Hiroaki Kobayashi

Springer Verlag　2006/01

Show all Show first 5

Presentations 77

イジングマシンを用いた救助資源配分の最適化に関する一検討

中本光星, 小野田誠, 熊谷政仁, 佐藤雅之, 小松一彦

情報処理学会第87 回全国大会講演論文集　2025/03/15
量子コンピューティングとシミュレーションの融合にむけて:量子アニーリング-HPC連携基盤に関する研究開発 Invited

小林広明

Q-STAR(一般社団法人量子技術による新産業創出協議会)セミナー　2024/12/23
HPCとQuantum Computingの連携とその応用

小林広明

AIチップ設計拠点フォーラム　2024/10/25
QC & HPC Hybrid Computing for Simulation & Data-analysis Hybrid Applications Invited

Hiroaki Kobayashi

German Aerospace Center Seminar　2024/09/19
QA-HPC Hybrid Computing Infrastructure for Quantum Transformation of Simulation-Data Anaysis Combined Applications Invited

Hiroaki Kobayashi

IEEE Quantum Week　2024/09/19
R&D of QA-HPC Hybrid Computing Infrastructure and Quantum Transformation of Simulation-Data Science Combined Applications Invited

Hiroaki Kobayashi

Tohoku-Chicago Quantum Interaction　2024/06/29
Performance Evaluation of Vector Annealing on NEC Vector Processor SX-Aurora TSUBASA

Hiroaki Kobayashi

HPC2024　2024/06/27
Accelerating Quantum Innovation & Startup Creation at Tohoku University Invited

Hiroaki Kobayashi

Chicago-Tohoku Quantum Alliance Symposium　2024/02/14
NEC SX-ACE's Operations and Applications Development for the Future

24 th Workshop on Sustained Simulation Performance　2016/12/04
Overview of Vector Supercomputer SX-ACE and Its Applications International-presentation

Russian Supercomputing Days 2016　2016/09/26
防災・減災に貢献するスーパーコンピュータの開発を目指して

2016年ハイパフォーマンスコンピューティングと計算科学シンポジウム　2016/06/06
東北大学大規模科学計算システムとその利用支援について

第25回東北CAE懇話会　2016/05/13
Highly-Productive Computing on Modern and Future Vector Platforms

The 23rd Workshop on Sustained Simulation Performance　2016/03/16
One-year experience with SX-ACE International-presentation

22nd Workshop on Sustained Simulation Performance　2015/12/17
Highly-Productive HPC on Modern Vector Supercomputers: present and future International-presentation

Russia Supercomputing Days　2015/09/28
スーパーコンピュータの驚異的な力

第116回東北大学サイエンスカフェ　2015/05/29
Real-Time Tsunami Inundation Forecasting and Damage Estimation on SX-ACE: A HPC System as a Social Infrastructure for Tsunami Disaster Prevention and Mitigation, International-presentation

NUG XXVII　2015/05/11
東北大学サイバーサイエンスセンターの高性能計算に関する研究開発活動: 普通の人々のためのスーパーコンピュータセンターを目指して

第25回TOPIC総会講演会　2015/04/20
普通の人々のためのスーパーコンピュータセンターを目指して

CyberHPC Symposium　2015/03/20
A SX-ACE-based New Computer System of Tohoku University and: Its Early Evaluation by using Real Applications, International-presentation

20th Workshop on Sustained Simulation Performance (WSSP20)　2014/12/15
東北大学サイバーサイエンスセンターの新スーパーコンピュータシステムの概要と高性能計算に関する研究開発活動

第133回NEC C&Cシステム SP研究会　2014/11/11
Tohoku Univ.’s New Supercomputer System and R&D on Highly-Productive HPC for Memory Intensive Applications International-presentation

NUG2014　2014/05/12
防災・減災に資する次世代スーパーコンピュータの開発をめざして〜スーパーコンピューティングによる津波のリアルタイム予測〜

G 空間情報を活用した次世代防災・被災地支援システム研究会第３回シンポジウム　2014/03/12
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

第11回戦略的高性能計算システム開発に関するワークショップ,　2014/03/10
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方の調査研究の取り組み

第132回ＮＥＣＣ＆ＣシステムＳＰ研究会　2014/01/23
Feasibility study of the next generation vector system architecture for memory intensive applications International-presentation

18th workshop on Sustained Simulation Performance, Stuttgart Germany　2013/10/28
東北大学大規模科学計算システムの運用と次世代ベクトルコンピューティングに関する研究開発

日本学術会議電気電子工学委員会 URSI分科会無線通信システム信号処理小委員会URSI-C 研究会　2013/09/26
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

文部科学省「革新的ハイパフォーマンス・コンピューティング・インフラ（HPCI）の構築」 HPCI戦略分野2「新物質・エネルギー創成」計算物質科学イニシアティブ（CMSI）計算分子科学研究拠点第4回研究会　2013/09/10
高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

第10回戦略的高性能計算システム開発に関するワークショップ　2013/07/30
防災・減災に資する次世代スーパーコンピュータの開発をめざして

東北大学電子通信研究機構シンポジウム—耐災害ICTによる東北復興に向けて　2013/07/23
スーパーコンピュータが拓く未来

東北活性化ユニバーサイエンス・新潟県立十日町高校キャリア教育講演会,　2013/07/05
Early evaluation of NGV and feasibility study of the next generation vector system architecture for memory intensive applications International-presentation

NUG2013　2013/06/23
Feasibility study of future HPC systems for memory-intensive applications International-presentation

1st International Workshop on Strategic Development of High Performance Computers　2013/03/18
Feasibility study of future HPC systems for memory-intensive applications International-presentation

17th Workshop on Sustained Simulation Performance　2013/03/12
イベント企画「安全・安心な暮らしを支えるハイパフォーマンスコンピューティング～防災・減災に向けて～」

第75回情報処理学会全国大会　2013/03/08
Potentials of the vector architecture in the post-peta era International-presentation

Workshop on Sustained Simulation Performance　2012/12/10
Design Space Exploration of the Vector Processor Architecture using 3D Die-Stacking Technology

筑波大学計算科学研究センター設立20周年記念シンポジウム　2012/09/07
High-End Computing Systems: Past, Present and Future International-presentation

SICE2012 SICE Annual Conference　2012/08/20
Capability and Potential of Vector Processors: Present and Future International-presentation

NUG2012　2012/06/12
Capability of Vector-Parallel Computing Platforms International-presentation

the HPC Workshop in Singapore　2012/05/07
高生産・高性能コンピューティングと新世代ベクトルコンピューティングに関するR&D International-presentation

SP研究会 SC10講演会　2010/11/17
Activities for Highly-Productive Computing and R&D on New-Generation Vector Computing International-presentation

JAEA SC10 Workshop　2010/11/16
Performance Discussion on Scalar and Vector Systems and R&D on New-Generation Vector Computing International-presentation

the 13th Teraflop Workshop　2010/10/21
Performance Discussion on Scalar and Vector Systems and R&D for New-Generation Vector Computing at Tohoku University International-presentation

NUG2010　2010/06/29
東北大学大規模科学計算システムの運用とベクトルコンピューティングに関する研究開発

第九回PCクラスタシンポジウム　2009/12/10
Supercomputers and Supercomputing in Tohoku University International-presentation

JAEA SC09-Workshop　2009/11/18
ラボコンピューティングからペタコンピューティングへの橋渡しを目指して〜共同利用・共同研究拠点として新しい時代の情報基盤センターの役割〜

第4回国立大学法人情報系センター長会議基調講演　2009/10/23
21世紀はベクトルコンピューティングの時代！？

第8回情報科学技術フォーラム特別企画　2009/09/03
Lessons Learned from 1-Year SX-9 Experiences and Toward the Next Generation Vector Computing International-presentation

20th CCSE Workshop on Advanced Computing Technologies toward PetaFLOPS　2009/04/24
Tohoku University View to Supercomputing International-presentation

10th Teraflop Workshop　2009/03/16
On-chip Caching for vector architectures International-presentation

JAEA -Symposium at SC08　2008/11/20
The new era of the vector architecture: experiences with the early adaption of SX-9 International-presentation

NEC HPC Workshop at SC08　2008/11/19
A news update of Cyberscience Center International-presentation

the 9th Teraflop workshop　2008/11/12
実アプリケーションを用いたSX-9の性能評価

大阪大学サイバーメディアセンター平成20年度スーパーコンピュータシンポジウム　2008/10/24
HPC Activities at Tohoku University: Experiences with the early adaption of SX-9 International-presentation

DWD (ドイツ気象庁)特別講演会　2008/10/02
HPC Activities at Tohoku University International-presentation

Barcelona Supercomputer Center Seminar　2008/09/30
New Sueprcomputer System SX-9 and its Early Evaluation

IEEE EMC Sendai Chapter Lecture and Seminar　2008/05/14
新しいスーパーコンピュータシステムSX-9とその評価について

SP研究会　2008/05/09
New Sueprcomputer System SX-9 and its Early Evaluation International-presentation

the 18th CCSE Workshop on Computational Technologies Supporting Development of Future Applications　2008/04/22
Experiences with SX-9 International-presentation

the 8th Teraflop workshop　2008/04/10
Experiences with SX-9 International-presentation

Worldwide NEC Users’ Meeting　2008/04/06
メディアプロセッサによる高性能計算

電子情報通信学会専門講習会　2008/02/22
New System Design and Its Early Evaluation International-presentation

The Seventh Teraflop Workshop　2007/11/21
The Potential of On-Chip Memory Systems for Future Vector Architectures, International-presentation

the 16th CCSE Workshop on High-Performance Computing on Vector Based Architectures – Recent Achievements and Future Directions-　2007/04/23
ISC Plans and Update International-presentation

The Sixth Teraflop Workshop　2007/03/26
HPC Activities at Information Synergy Center International-presentation

The Fifth Teraflop Workshop　2006/11/20
Implication of Memory Performance in HEC Systems International-presentation

The Fourth Teraflop Workshop　2006/03/30
Performance Evaluation of SX-7 using HPCC and Real Application Codes International-presentation

3rd Teraflop Workshop　2005/11/11
情報シナジーセンターのHPC研究活動とペタフロップス時代のセンターの役割

NEC HPC研究会　2005/11/09
スーパーコンピュータにまつわる誤信と落し穴

東北大学大学院情報科学研究科談話会　2005/07/26
大規模科学計算システムの技術動向

NUA東北地区ユーザ研修会　2003/06/05
High-Performance Photo-Realistic Graphics on the 3DCGiRAM Architecture International-presentation

2002 International Conference on Optical Communication and Multimedia　2002/11/14
高性能・高機能ネットワーク社会を支える基盤技術の展望

NetOne Tohoku Seminar 2000　2000/10/17
機械を知能化するコンピュータ

日本機械学会特別企画フォーラム「機械と知能」　1998/10/11
並列処理を用いた高速ボリュームレンダリング手法と医用画像における興味部位の自動抽出手法

秋田県立脳血管研究センター講演会　1997/02/05
東北大学情報科学研究科のマルチメディア環境

（株）アシスト，日本サン・マイクロシステムズ（株）合同主催セミナー　1996/03/07
スーパーコンピュータと数値流体力学

大阪大学溶接工学研究所研究集会　1991/03/29

Show all Show first 5

Industrial Property Rights 13

参照画像キャッシュ、削除先決定方法及びコンピュータプログラム

小林広明

特許特許第7416380号

Property Type: Patent
参照画像キャッシュメモリ、データ要求方法及びコンピュータプログラム

小林広明他

特許特許第7425446号

Property Type: Patent
津波浸水予測システム，制御装置，並列計算システムの制御方法及びプログラム

越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

特許第6362178号

Property Type: Patent
津波浸水予測システム，データ処理サーバ，津波浸水予測の依頼方法及びプログラム

越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

特許第6323880号

Property Type: Patent
津波浸水予測システム、制御装置、津波浸水予測の提供方法及びプログラム

越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

特許第6161130号

Property Type: Patent
キャッシュメモリおよびキャッシュ制御方法

小林広明, 斎田泰昌

第3834323号

Property Type: Patent
利用形態指向P２Pネットワークシステム、及び、コンピュータプログラム

小林広明, 滝沢寛之, 稲葉勉

第4170285号

Property Type: Patent
グリッドコンピューティングシステム、及びグリッドコンピューティングシステムにおける計算資源収集方法

小林広明, 稲葉勉, 松村龍太郎

第3857258号

Property Type: Patent
グリッドコンピューティングシステム

小林広明, 稲葉勉, 松村龍太郎

第3977298号

Property Type: Patent
物性マップ画像生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

Property Type: Patent
特異材料検出装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

Property Type: Patent
マップ画像生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

Property Type: Patent
推奨データ生成装置、制御方法、及びプログラム

鍬守直樹, 撫佐昭裕, 瀧川陽平, 風間悠加, 佐藤佳彦, 小林広明, 菊川豪太, 岡部朋永, 小松一彦

Property Type: Patent

Show all Show first 5

Research Projects 53

量子・古典ハイブリッド計算によるソフトマテリアル研究開発デジタルツインの創成

小林広明, 撫佐昭裕, 阿部圭晃, 佐藤雅之, 小松一彦, 菊川豪太

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 基盤研究(B)

Institution: 東北大学

2024/04/01 - 2028/03/31
New principal infrastructure based on large-scale quantum computing

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2023/04/01 - 2028/03/31
大規模量子コンピューティングによる新計算原理計算基盤の創生

小松一彦, 小林広明, 佐藤雅之, 百瀬真太郎

Offer Organization: 日本学術振興会

System: 科学研究費助成事業基盤研究(B)

Category: 基盤研究(B)

Institution: 東北大学

2023/04 - 2028/03
Digital twin computing for enhancing resilience of disaster medical system

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (S)

Institution: Tohoku University

2021/07/05 - 2026/03/31
Real-time video coding technology using the latest coding VVC/H.266 and its applications

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tokyo University of Agriculture and Technology

2022/04/01 - 2025/03/31
Real-time video coding technology using the latest coding VVC/H.266 and its applications

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tokyo University of Agriculture and Technology

2022/04/01 - 2025/03/31
Expanding Industrial Use of Innovative Technology for Transportation Equipment Design Using Microdevices Through Large-Scale Simulation

Offer Organization: Tohoku University Cyber Science Center

System: JHPCN:Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures

Institution: Tohoku University

2017 - 2024
統合型材料開発システムによるマテリアル革命

小林広明,小松一彦,佐藤雅之

Offer Organization: 内閣府

System: 戦略的イノベーションプログラム(SIP)

Category: CFRP向けマテリアルインテグレーション（MI）システムの高速実装と評価

Institution: 国立大学法人東北大学、東レ株式会社、公立大学法人兵庫県立大学、国立大学法人京都大学、学校法人金沢工業大学、国立研究開発法人物質・材料研究機構

2020/05 - 2023/03
Quantum-Annealing Assisted Innovative Material Informatics Infrastructure

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

Category: Grant-in-Aid for Scientific Research (A)

Institution: Tohoku University

2019/04 - 2023/03
量子アニーリングアシスト型次世代スーパーコンピューティング基盤の開発

小林広明, 滝沢, 寛之, 山口, 健太, 撫佐, 昭裕, 曽我隆, 渡部修, 横川, 三津夫, 江川隆輔, 下村, 陽一, 中田, 一人, 越村俊一, 小松, 一彦, 佐藤, 雅之, 愛野, 茂幸, 磯部洋子, 政岡, 靖久, 百瀬, 真太郎, 藤本, 壮也, 山本悟, 古澤卓, 荒木拓也, 村嶋, 陽一, 大関, 真之, 觀山, 正道, 太田雄策, マスエリック, 星, 宗王, 萩原孝

Offer Organization: 文部科学省

System: 次世代領域研究開発

2018/04 - 2023/03
Fusion of sensing and simulation of tsunami damage assessment towards innovation of disaster medical system

KOSHIMURA Shunichi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

Category: Grant-in-Aid for Scientific Research (S)

Institution: Tohoku University

2017/05 - 2022/03

More details Close

The project promoted collaborative research among science, engineering, and disaster medicine with the goal of enhancing resilience of disaster medical systems by integrating real-time simulation and sensing. Considering the catastrophic tsunami disaster concerned as future risks in Japan, we achieved three outcomes ; 1) quantitative and rapid estimation of human and physical damage caused by the tsunami, 2) immediate estimation of medical demands in disaster affected areas, and 3) methodology for planning and updating disaster medical activities through multi-agent simulation. Through the project, we examined the required specifications for an innovative medical support system in the anticipated disaster process of future Nankai Trough earthquake and tsunami disaster that is expected to occur next 30 years.
Fusion of sensing and simulation towards enhancing disaster medical system

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

Category: Grant-in-Aid for Scientific Research (A)

Institution: Tohoku University

2017/04 - 2021/03
Theory and Practice of Vector Data Processing at Extreme Scale: Back to the Future

2018/04 - 2020/03
Supporting performance-aware programming with machine learning techniques

Hiroyuki Takizawa, Kobayashi Hiroaki, Suda Reiji, Okatani Takayuki, Egawa Ryusuke, Ohshima Satoshi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2016/04/01 - 2019/03/31

More details Close

This work has demonstrated some case studies of effectively using machine learning techniques for supporting High-Performance Computing (HPC) programming. Various problems in code optimization can be solved by converting the problems to the problems that have already been proven to be solved by machine learning. Moreover, this work clarified the importance of analyzing the target problems in advance of machine learning, because it is unlikely that a sufficient number of training data are available in code optimization problems. Moreover, as well as HPC programming, machine learning also needs knowledge and experiences of human experts. However, in machine learning, the problem is already parameterized, and hence can be solved if sufficiently-high performance is available.
Design Space Exploration of Future Microprocessors using the post CMOS devices

EGAWA Ryusuke, Kobayashi Hiroaki, Takizawa Hiroyuki, Tada Jubee, Sato Masayuki, Uno Wataru, Toyoshima Takuya, Sakai Zentaro, Ogasawara Daisuke

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: Tohoku University

2015/04 - 2018/03

More details Close

In this research, for realizing a high energy efficiency microprocessor using novel device technologies in the post-Moore's era, expected to be practical around 2025, we have worked on circuits and memory subsystems designs. Regarding the circuit design, we worked on the design method of wave-pipelined circuits using CNFET. For the memory subsystem, we focus on a die stacking and STT-RAM technologies. We have examined the cache-bypass mechanism, the energy efficient data allocation method for the multi-bank memory, and the power-aware controlling mechanism for STT-RAM last-level caches.
低電力積層型半導体用高密度自己組織化配線技術の研究開発

小柳光正, 東, 和幸, 元吉真, 知京, 豊裕, 川喜多, 仁, 田中徹, 福島, 誉史, 李, 康旭, 池田誠, 小林広明, 岡谷, 貴之, 清山浩司

Offer Organization: 独立行政法人新エネルギー・産業技術総合開発機構

System: エネルギー・環境新技術先導プログラム

2015/04 - 2017/03
A Green Microarchitecure in 5.5D-Design Era

EGAWA RYUSUKE, Kobayashi Hiroaki, Takizawa Hiroyuki, Sato Masayuki, Uno Wataru, Nishimura Shin, Hosokawa Mikio, Toyoshima Takuya

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2014/04 - 2017/03

More details Close

To clarify the design space of future microprocessors after the end of moor’s law, this research project focuses on vertical integration technologies such as 2.5D and 3D technologies using a through silicon via (TSV). Since the TSVs have a high potential of shortening the latency and reducing the power consumption in/of microprocessors and computing systems, these technologies are expected to overcome the limits of technology scaling. In this research, we explore the design space of the future microprocessors by aggressively using TSVs in various stacking granularities. The evaluation results show that appropriate usage of TSVs with considering a trade-off among performance, power, and cost can drastically improve the energy efficiency of the microprocessors and computer systems.
リアルタイム津波予測システムとＬアラートの連携による「津波Lアラート」の構築と災害対応の高度化実証事業

越村俊一小林広明他

Offer Organization: 総務省

System: G空間情報を活用したLアラート高度化事業

2015/04 - 2016/03
Checkpoint restart technologies for hierarchcal storages

Hiroyuki Takizawa, Uno Atsuya, Kobayashi Hiroaki, Egawa Ryusuke, Sato Yukinori

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: Tohoku University

2014/04 - 2016/03

More details Close

Assuming that the state of an application is periodically saved during its execution, we have considered an automatic tuning method for the frequency of saving the state to a hierarchical storage system, and also have discussed a way for reducing the time for writing the state to the storage. A promising approach to the reduction is to speculatively write data that will be written in the future at a high probability. Hence, one technical issue is how to predict such data. For the prediction, we need to analyze memory access patterns of the target application. Hence, we have developed a performance analysis tool for the purpose. The validity and effectiveness of these proposed methods are evaluated based on job scheduling simulation of a large-scale computing system.
A 3D Processor Architecture Co-Designed with Dependable Processing

Kobayashi Hiroaki, TAKIZAWA HIROYUKI, EGAWA RYUSUKE

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: Tohoku University

2014/04 - 2016/03

More details Close

The objective of this study is to establish a novel processor architecture that realize both high performance and high dependability in the execution of a wide variety of applications by using 3D die-stacking technology toward the post-Moore’s era. In particular, we have developed a 3D die-stacking memory subsystem architecture integrated with processor cores and its data management mechanism for highly power-efficient and high-throughput memory hierarchy. In addition, we have also developed on-line checkpoint/restart mechanism by using a 3D die-stacking on-chip memory to increase dependability of the processor. The proposed architecture has been evaluated quantitatively by using a wide variety of applications and its effectiveness and limitation have been clarified and discussed.
Infrastructures for accelerating the synergy effect of software-hardware co-design

Hiroyuki Takizawa, Kobayashi Hiroaki, Aoki Takafumi, Sano Kentaro, Egawa Ryusuke, Tada Jube, Ito Koichi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2013/04 - 2016/03

More details Close

Assuming OpenCL as a standard environment for accelerator programming, we have pointed out some missing features for supporting more various accelerator architectures,and proposed OpenCL extensions. Although OpenCL has gradually become to be used for hardware description, OpenCL C is not necessarily appropriate for describing OpenCL kernels. Hence, we have designed and implemented high productivity languages for typical computations in the fields of image processing and high performance computing. In addition, we have proposed an automatic tuning method for performance parameters, which need to be adjusted for individual accelerators. The proposed method has been implemented for evaluating its performance impacts.
A Universal Memory Architecture Based on Device-Architecture Co-Design

Kobayashi Hiroaki, TAKIZAWA HIROYUKI, EGAWA RYUSUKE

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2013/04 - 2016/03

More details Close

The objective of this study is to establish a smart memory subsystem architecture that can consider memory access behaviors of applications and effectively manage data in the memory hierarchy in terms of performance and power efficiency. In particular, we have developed 1) a low-power/high-bandwidth cache architecture, 2) a cache management policy with an on-line evaluation of the memory request behavior of an application for reducing its working set in the memory hierarchy, 3) a cache partitioning mechanism to protect performance-sensitive shared data for chip multicore processors, 4)a memory address mapping mechanism with the performance/performance optimization by using an online-estimation of memory access behavior.
リアルタイム津波浸水・被害予測・災害情報配信による自治体の減災力強化の実証事業

越村俊一小林広明他

Offer Organization: 総務省

2014/04 - 2015/03
高メモリバンド幅アプリケーションに適した将来のＨＰＣＩシステムのあり方の調査研究

小林広明金田義行橋本ユキ子

Offer Organization: 文部科学省

System: 将来のHPCIのシステムのあり方の調査研究

2012/04 - 2014/03
Application-Aware Highly Hierarchical Memory Architecture

KOBAYASHI Hiroaki, TAKIZAWA Hiroyuki, EGAWA Ryusuke

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: Tohoku University

2012/04 - 2014/03

More details Close

The objective of this study is to establish a novel on-chip memory architecture that can provide necessary memory resources to running applications under the consideration of their behaviors and requirements regarding a memory subsystem on a multi-core processor. In this study, we have developed a cache-resource management mechanism to realize energy-efficient high performance execution of multi-threaded applications on a multi-core processor. In cooperation with developed hardware functions of cache resizing and partitioning to reduce cache conflicts and maximize the efficiency of cache utilization, this mechanism can extract the potential of multi-core processors with a low-power consumption.
Study of Next-Generation CFD toward Petaflops Computers

NAKAHASHI Kazuhiro, YAMAMOTO Satoru, OBAYASHI Shigeru, KOBAYASHI Hiroaki, YAMAMOTO Kazuomi, SASAKI Daisuke, JEONG Shinkyu, TAKIZAWA Hiroyuki, EGAWA Ryusuke, KUROTAKI Takuji, ENOMOTO Shunji, IMAMURA Taro, TAKAHASHI Shun

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

Category: Grant-in-Aid for Scientific Research (S)

2009/05 - 2014/03

More details Close

This study was conducted aimed at solving the problems of the current CFD in the use of the aerodynamic designs of aircrafts, such as the physical model dependence of the computational results and the increase of the work load for treating complex geometries. The Building-Cube Method was proposed bearing the further performance improvement of computers in mind, and the various algorithm studies for practical use were conducted. One of the achievements was demonstrated by the world-leading large scale flow computation around a car using the K-computer. It is significant that the proposed CFD approach can treat extremely complicated and incomplete CAD data directly for the simulation. This can be a game-changing technology for aerodynamic design process of aircrafts and automobiles.
自己修復機能を有する３次元VLSIシステムの創製

小柳光正小林広明青木孝文末吉敏則鎌田忠元吉真

Offer Organization: 独立行政法人科学技術振興機構

System: 戦略的創造研究推進事業

2009/04 - 2013/03
Innovative 3D Design for the New Generation Vector Microarchitecture

KOBAYASHI Hiroaki, TAKIZAWA Hiroyuki, EGAWA Ryusuke

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2010 - 2012

More details Close

This study discusses a new design methodology for a microarchitecture of next-generation, low-power high-performance vector processors by using 3D die-stacking technology. A strategy for mixed design of conventional 2D design and TSV (Through-Silicon-Via)-based 3D design that realizes a good trade-off between them in the all level of on-chip units design has also been proposed. Through the performance evaluation of a prototyped 3D vector processor, the effectiveness of 3D design regarding power consumption and performance has been clarified.
超音波計測連成解析による超高精度生体機能計測システム

早瀬敏幸小杉隆司小林広明小玉哲也

Offer Organization: 独立行政法人科学技術振興機構

System: 先端計測分析技術・機器開発事業

2007/04 - 2011/03
Instruction Steering Based on Static Data Dependency

SUZUKI Ken-Ichi, KOBAYASHI Hiroaki

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

Category: Grant-in-Aid for Scientific Research (C)

Institution: Tohoku Institute of Technology

2008 - 2010

More details Close

Modern microprocessors achieve high performance by executing multiple instructions in parallel. In this research, we have introduced a new execution model where instructions in a local critical path are statically found at the compile time, and only the instruction steering is dynamically performed at execution time. From the performance evaluations, we have shown that the IPC of our execution model is comparable to that of existing models, even in the case of no dynamic steering.
Design and Development of Advanced IT Research Platform for Information Explosion Era

ADACHI Jun, TANAKA Katsumi, NISHIDA Toyoaki, KUNIYOSHI Yasuo, SUDOH Osamu, KUROHASHI Sadao, HARA Takahiro, MATSUOKA Satoshi, TAURA Kenjiro, TATEBE Osami, MUNETOMO Masaharu, HIROTSU Toshio, MATSUBARA Jin, SHIMOJYO Shinji, CHIBA Shigeru, YUASA Taichi, MATSUYAMA Takashi, CHIKAYAMA Takashi, KONDO Toru, KONO Kenji, OKAMOTO Masahiro, AIDA Kento, KAMADA Tomio, KITSUREGAWA Mararu, YAMANA Hayato, NAKAMURA Yutaka, KOBAYASHI Hiroaki, NAKAJIMA Hiroshi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

Category: Grant-in-Aid for Scientific Research on Priority Areas

Institution: National Institute of Informatics

2006 - 2010

More details Close

This project implemented a common research infrastructure for all the research groups participating in this priority-area research initiative, accordingly supported all research activities in this initiative. Providing this infrastructure, we succeeded in accelerating shared utilization of research facilities and resources within the limitation of research funding and strengthening the collaboration among research groups. These shared facilities include (a)TSUBAKI: a open search engine for large-scale corpus, (b)InTrigger : Widely-distributed computing test-bed, (c)IMADE : an environment for real-world interaction measurement and analysis, and (d) prototyping for sensor-network based preventive medicine.
ICTエコ社会を創造する安全・安心・安価なユビキタスコンピューティングプラットフォームの研究・開発

小林広明, 堀口進, 滝沢, 寛之, 福士将

Offer Organization: 総務省

System: 戦略的情報通信研究開発推進制度（ＳＣＯＰＥ）

2006/04 - 2009/03
Study on Hardware-Software Collaborative Scheduling for Highly Efficient Multithreading

KOBAYASHI Hiroaki, NAKAMURA Tadao, SUZUKI Kenichi, TAKIZAWA Hiroyuki, EGAWA Ryusuke, SATO Yukinori, KOTERA Isao, FUNAYA Yusuke, SATO Masayuki

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2006 - 2009
3次元積層技術による超高帯域幅ベクトルプロセッサ設計に関する研究

小林広明

Offer Organization: 日本学術振興会

System: 科学研究費助成事業萌芽研究

Category: 萌芽研究

Institution: 東北大学

2008 - 2008

More details Close

本研究では, 近未来に起こる3次元集積化実装時代に対応した高性能マイクロプロセッサアーキテクチャ設計制約条件, 及びその制約下での最適アーキテクチャ設計方式を明らかにすることを目的としている. 平成20年度には, 3次元積層の要素技術, および3次元積層技術を用いた新たなアーキテクチャ設計に関する研究動向の調査・検討を行った. これにより, 3次元積層技術により利用可能となるチップ内のトランジスタ数は飛躍的な増加し, 3次元方向に積層される各シリコン層を結合するThrough Silicon Via(TSV)によりチップ上の配線長, および配線遅延時間の短縮が可能であることを確認した. また, 近年入出力ピンの実装技術の限界により, メモリバンド幅の低下が懸念されているベクトルプロセッサに着目し, 前述の三次元積層技術がもたらす利点を最大限に活かすことが可能な3次元積層技術を用いた大容量オンチップメモリを搭載する3次元ベクトルプロセッサを提案した. 提案した3次元ベクトルプロセッサは, プロセッサ層と複数のメモリ層から構成され, メモリ層を増加させることオンチップメモリの容量を容易に増加させることが可能であり, オフチップメモリへのアクセス数を削減することで, オフチップメモリアクセスに伴う消費電力を抑制しつつ, メモリアクセスレイテンシを効果的に隠蔽する. 評価の結果, 提案するメモリ積層型3次元ベクトルプロセッサは既存の2次元実装のベクトルプロセッサと比較して, 消費エネルギを最大14%, 実行サイクルを最大63%削減出来ることを示した.
安全・安心なボランティアコンピューティングによる超大規模データマイニング

小林広明, 滝沢寛之

Offer Organization: 日本学術振興会

System: 科学研究費助成事業特定領域研究

Category: 特定領域研究

Institution: 東北大学

2007 - 2008

More details Close

本研究は, 家庭用ゲーム機の機能・性能を活用するボランティアコンピューティングによって, 大規模データマイニングを実現するための基盤技術を確立することを目的としている. 平成20年度には, ロケット噴射ノズル近辺での物理現象の解析を行う分散データマイニングシステムを構築し, PLAYSTATION 3およびInTriggerから構成されるボランティアコンピューティング環境で大規模データマイニングの実証実験を行った. その結果, 動的負荷分散の実施方法として従来通り集中型のタスクスケジューリングを用いる場合, 計算資源の増加に伴い動的負荷分散が効率的に行えなくなり, 大規模ボランティアコンピューティング環境で期待する性能を実現することができないことが示された. 一方, 本研究で提案している分散協調型スケジューリング機構では計算資源の台数が増加しても動的負荷分散を効率的に実施すること可能であることが明らかになった. 本評価実験より, 提案機構が大規模ボランティアコンピューティング環境における動的負荷分散を実現する有効な機構であることが明らかになった. また, 複数のプロジェクトに参加するボランティアが遊休計算能力を浪費しないために, ワーカ側でのスケジューリング手法も提案した. ボランティアコンピューティングの信頼性を高めるための仕組みとして, 計算結果の妥当性を効率的に確認する車法も提案した. 各ワーカの信頼度を定量化し, 計算結果妥当評価に基づいて信頼度を変化させることによって, 不正なワーカを検出できることをシミュレーションにより明らかにした. さらに, 家庭用ゲーム機が高い描画処理性能を有している点に着目し, その描画処理性能をデータマイニングのために利用する方法について検討し, そのようなプログラミングを容易に行うためのプログラミングフレームワークについても研究した.
Network Architectures for High-speed Photonic Networks

HORIGUCHI Susumu, KOBAYASHI Hiroaki, JIANG Xiahong, FUKUSHI Masaru, YAMAMORI Kunihito

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2005 - 2007

More details Close

Optical Networks have been expecting for promising high-speed network for applications require a high data transmission rate, low error rate and low delay. In this research, we proposed new optical network architecture for high-speed photonic network which is implemented by all photonic devise. First, we invented the recursive network architecture for non-blocking optical switches and showed that the recursive switch architecture had the good network performance as well as the simple control strategy of self-routing. We also invented the multi-stage optical switch architecture which has good properties of non-block and crosstalk free. The multi-stage optical switch architecture is a good choice for constructing non-blocking optical switch networks with low signal loss and crosstalk. Banyan networks with optical switches that are very attractive for serving as the optical switch architectures due to their nice properties of small depth and absolutely signal loss uniformity. We investigated the stacked vertically stacked optical banyan which is combining the horizontal expansion and vertical stacking of optical banyan networks. We have showed that the horizontally expanded and vertically stacked optical banyan networks usually have good properties of non-block and crosstalk free. Also, we study on the blocking behavior analysis and show that the proposed method is an effective approach to studying network performance and finding a graceful compromise among hardware cost, network depth, and blocking probability. Finally, we study the high survival network systems using a new restoration strategy which achieves the higher restoration performance than proactive restoration. We invented the active restoration strategy to compensate the switch node faults and link faults and achieve the high-survival performance for a large scale networks.
安全・安心なボランティアコンピューティングによる超大規模データマイニング

小林広明, 滝沢寛之

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 特定領域研究

Institution: 東北大学

2006 - 2006

More details Close

本年度には、代表的なデータマイニング手法の中でも特に高い演算性能が要求されるデータクラスタリング(Data Clustering, DC)とニューラルネットワーク(Neural Networks, NN)に着目し、それらの処理を家庭用ゲーム機で効率良く実行するための実装方法について検討した。具体的には,家庭用ゲーム機に搭載されている高性能プロセッサであるCell Broadband Engine(CBE)や、描画処理ユニット(Graphics Processing Unit, GPU)をデータマイニング処理に効果的に利用する方法について研究し、実装と定量的性能評価を行った。大規模P2Pコンピューティングに関する研究として、ネットワーク上に遍在する膨大な数の遊休計算機資源から、利用者の要望を満たす計算機資源を効率良く検索するための分散型計算資源管理機構について研究した。研究成果として、利用者からの要望には計算機のメモリアクセスの振舞いに見られるような時間的、空間的な局所性が存在し、それらの局所性を利用することで探索効率の飛躍的改善が可能であることが明らかにした。本年度は特に不均質な環境下での資源探索を考慮し、利用される頻度に応じてP2P通信の接続数を自動調整する仕組みについて検討した。また、膨大な数の計算機を連携させるための仕組みとして、完全分散型の動的負荷分散機構についても研究を進め、その基本制御方式を設計した。耐タンパー性計算による安全・安心な分散データマイニングシステムをボランティア計算基盤に実現するための準備として、本年度は開発環境の構築を行った。また、関連資料を収集するとともに、関係者との議論を行った。
進化型計算機能を有する自律再構成ハードウェアに関する研究

堀口進, 小林広明, 福士将

Offer Organization: 日本学術振興会

System: 科学研究費助成事業萌芽研究

Category: 萌芽研究

Institution: 東北大学

2004 - 2006

More details Close

VLSI技術の発展により、可変結合論理アレイ素子を用いて動作環境に応じ機能を自律的に変化させる進化型ハードウェアに関する研究が注目されている。本研究では、静的FPGAや動的FPGAなどのプログラマブル論理素子により実用規模VLSIシステムに進化型計算を適用させ、自律再構成が可能なハードウェア方式について研究を行ってきた。特に、進化型計算機能に基づいた再構成システムの詳細な性能評価を行った。その結果、階層型ニューラルネットワークの故障補償可能な再構成型ハードウェアに適応した進化型計算の機能回路システムと遺伝的アルゴリズムにより学習した回路情報をハードウェア実装することにより木構成方式の有用性を示した。次に、故障状況に応じてニューラルネットワーク構成を可変にできる自律再構成ハードウェアシステムならびに進化型計算機能を適用した故障回避可能な格子型結合プロセッサ縮退再構成システムについて詳細に検討した。その結果、FPGAデバイスを用いた進化型計算機能回路システムを搭載した故障補償可能な階層型ニューラルネットワークハードウェア実装システムに関する研究成果に基づいて、新しく考案した遺伝的アルゴリズム学習、回路情報と故障補償可能ニューラルネットワークは、問題規模や動作環境に応じてネットワーク構成を自律的に変化させることが出来ることが分かった。更に、進化型計算機能に基づいた自律再構成格子型結合プロセッサ縮退再構成方式や遺伝的アルゴリズムの故障回避コーディング学習方式の提案とシステム実装を行いその性能評価を行った。これらの研究成果により、進化型計算機能に基づいた故障回避可能な自律格子型結合プロセッサ縮退再構成方式の有用性を明らかにした。
An Intelligent Memory Architecture for 3D Graphics

KOBAYASHI Hiroaki, NAKAMURA Tadao, SUZUKI Ken-ichi, TAKIZAWA Hiroyuki, SANO Kentaro

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2002 - 2004

More details Close

We have the following achievements (1)High-performance graphics algorithm and its hardware We analyzed parallelism and locality of reference in a graphics algorithm based on the global illumination model, and designed a novel rendering pipeline architecture for this algorithm. In addition, we designed and developed a prototype hardware based on the architecture. Through the performance evaluation of the hardware, we showed its effectiveness for realizing interactive ray-tracing. Moreover, we designed a new high-performance algorithm for generating walkthrough animations. (2)Power-efficient memory mechanism For design of the intelligent memory architecture for mobile devices, a low-power mechanism for on-chip memory system was designed. In this mechanism, memory modules are activated and inactivated based on their activity during the program execution. We clarified the relationship between activated memory modules and sustained performance, and showed the effectiveness of power-aware computing for on-chip cache memory. (3)Data compression algorithms for graphics hardware. We applied vector quantization to volume data set to achieve efficient data compression, and designed a visualization algorithm that can directly visualize the compressed volume data. We also designed a novel data compression algorithm using data clustering for graphics hardware
Low Power and Ultra High Speed Microprocessor Architectures

NAKAMURA Tadao, GOTO Gensuke, FUKASE Masaaki, KOBAYASHI Hiroaki, HAGIWARA Masafumi, SUZUKI Ken-ichi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2002 - 2004

More details Close

In the recent decades, the performance improvement of microprocessors has been achieved, resulting in the increase of power consumption of processors, which causes the serious thermal problem on a chip. Nevertheless, since the strong demand for low power and high performance processors still exists, an architecture of microprocessors to solve the problem is required. In this research, our objective has been to establish microprocessor architectures enabling low power and low frequency operation by composing their modules reasonably. Firstly, we have shown a direction of future microprocessor design by defining the conception of its low power and high speed operation. This direction is so revolutionary that the head investigator has been and is going to be asked to be an invited speaker at international conferences. Based on the definition, we have proposed and evaluated some architectures for low power microprocessors. We also have shown that fine and course grain parallelism in threads should be extracted from application programs to exploit the feature of the proposed architectures, and further have implemented the method to obtain the parallelism from programs. On the other hand, in order to achieve low power and high speed microprocessors, it is essential to design their datapaths. We have designed a datapath by using wave pipelining, which enables both high speed processing and low power operation. In addition, we have proposed a new cache mechanism to bridge the speed gap between the datapath of a microprocessor and the main memory. As an application of parallel processing, designing codebooks for compressing information is well-known. We have challenged to this application by investigating the possibility of reducing the power consumption from the viewpoints of both software and hardware. We have shown the effectiveness of our architecture by implementing low power and high speed parallel dedicated processors.
A Study of a High-Speed and Highly-Functional Instruction Feeding Mechanism for the VLSI Architecture

SUZUKI Ken-ichi, NAKAMURA Tadao

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

Category: Grant-in-Aid for Scientific Research (C)

2000 - 2002

More details Close

The VLIW architecture, that is the most promising for the implementation of the next generation microprocessors, executes many instructions in parallel, requiring a high performance memory system to supply a huge number of instructions in short time from the main memory to its functional units. We introduce a high performance instruction cache mechanism devoted to the VLIW architecture, named the MULHI (MULtiple HIt) cache. A MULHI cache achieves high cache hit ratio by eliminating unnecessary "nop" instructions from its cache memory array, that enables to create a high-bandwidth memory system. The MULHI cache is based on the same concept with the COMPRESS cache and the SILO cache, at the point of eliminating nops from their data array. However, only the MULHI cache could apply a cache associativity to its cache management policy to acquire a higher cache hit ratio. Using software simulations, we evaluate the MULHI cache miss ratio that show it achieve a higher (OPC Operations Per Cycle) than the other cache mechanisms. Moreover, we make a detailed hardware design, that show the overhead of the MULHI cache control logic circuits is significantly small. Consequently, the MULHI cache architecture is much feasible for implementing a high speed memory system for VLIW processors. At last, as a new application of cache memory, we evaluate a real-time ray tracing system, that is remarkably powerful for rendering images.
Self-Reconfigufation Architecture of Mesh-Connected Network for Multiprocessor Systems and The Implemantation

HORIGUCHI Susumu, HAYASHI Ryouko, YAMAMORI Kunihito, KOBAYASHI Hiroaki, INOGUCHI Yasushi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: Japan Advanced Institute of Science and Technology

1999 - 2001

More details Close

This research deals with the issue of reconfiguring network interconnection for mesh-connected processor arrays (mesh array) implemented in VLSI/WSI. For massively parallel systems, it is becoming necessary to develop self-reconfiguratopn architecture that can automatically reconfigure partially faulty systems. Many reconfiguration algorithms have been proposed to date, however, most of them are not suitable for the self-reconfiguration and little literature shows the hardware implementation of the architecture actually. In this research, we propose a hardware-oiented self- reconfiguration architecture based on simple schemes of column bypass and south directional rerouting, and show a hardware implementation of proposed architecture using FPGA. The main feature of the proposed self-reconfiguration architecture is that faulty processors are avoided by switchig mechanisum, which can be determined its desired function automatically using states of neighboring processors. Simulated result shows that the proposed self-reconfiguration architecture is that faulty processors are avoided by switching machanism, which can be determined its desired function automatically using states of neighboring processors. Simulated result shows that the proposed architecture achieves higher system yield than those of the previous archtectures in rectangular mesh arrays. We also implement the reconfiguration system in FPGA and have been discussed in performance of it. The hardware overhead of redundant circuits such as switches and control circuits shows less than 4 %, where hardware cost of a procesor, which includes a test circuit, is 50 Kgates.
DEVELOPING A PHOTO-REALISTIC COMPUTER GRAPHICS SYSTEM

KOBAYASHI Hiroaki, KATAHIRA Masayuki, KITAJIMA Hiroyuki, NAKAMURA Tadao, SUZUKI Ken-ichi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B).

Category: Grant-in-Aid for Scientific Research (B).

Institution: TOHOKU UNIVERSITY

1998 - 2000

More details Close

In this research project, we did a basic design of a graphics hardware architecture for photo-realistic image synthesis. The design is based on the object space parallel processing model that have been proposed by the main investigator of the project. A prototype, named Thunder, was developed as a printed circuit board with a PCI interface, 2 FPGAs, each of which can implement a logic circuit with up to 200K gates and 4 256-MB SDRAMs (total 1GB). We implement the basic function units of Thunder : a 3DDDA unit, an intersection calculation unit (ICU), and a secondary ray generator on the FPGAs, and an object memory on the SDRAMs. The maximum bandwidth between the object memory and function units is 512MB/s. In the design of the Thunder, we especially focus on the optimization of the ICU.We employed the fix-point calculations instead of the floating-point ones to achieve low latency and high throughput of the ICU.To avoid the image quality degradation by fixed-point calculations, we developed a novel fix-point intersection calculation algorithm to keep calculation accuracy as high as possible. Through the experiments, we confirmed that the image quality using our algorithm with fixed-point calculations is comparable to that obtained by 64-bit floating-point calculations. In addition, we discussed the performance scalability in terms of the number of ICUs. The experimental results have shown that speedups of 6.4 in 8 ICUs and 11 in 16 ICUs can be obtained. Especially, in the case of 16 ICUs, running at 400MHz, we estimated that the accelerator is 20 times faster than Pentium-II based image synthesis running at the same clock frequency. The accelerator also needs a memory bandwidth of around 100GB/s. We believe that such a large bandwidth can be available as the CMOS technology proceeds, for example, the memory-logic merged.
Advanced Architectures for Brain - Structured Supercomputers

NAKAMURA Tadao, FUKASE Masa-aki, KOYANAGI Mitsumasa, HASEGAWA Katsuo, KOBAYASHI Hiroaki, HAGIWARA Masafumi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B).

Category: Grant-in-Aid for Scientific Research (B).

Institution: TOHOKU UNIVERSITY

1998 - 1999

More details Close

In addition to high speed data processing by supercomputing, introducing of the functions of the human brain has contributed to build flexible computers. Structural hints from the human brain allowed us to think about more flexible functions that lead to various data/information processing. In this research we have improved the architecture of brain-structured computers, designed an MISD processor, and implemented an example of the design. We call this architecture the SHIFT MACHINE architecture which can be regarded as an MISD computer. This work is based upon several research results consisting of the analysis of VLIW architectures with a cache analysis and a speculation method. The simulator of the SHIFT MACHINE is available to show its behavior visually. Related to the SHIFT MACHINE architecture, we have also developed a reconfigurable synchronous dataflow computer called SOUND. This was implemented in a chip processor fashion and evaluated. As a result, the left-brain function has been implemented and the speed has been accepted to be a suitable one under low power condition. On the other hand, the right brain has been implemented in the research of computer graphics and volume rendering. These two subjects has been realized speedy and flexibly by improving the algorithms. In computer graphics of our system, the algorithm is evaluated on a commercially available parallel machine and we got the fast rendering. This fact is proved in volume rendering. Developing reasonable algorithms for computer graphics and volume rendering, we can reach some potential to show fast rendering that is suited for real time processing and rendering. Further neural network research has been developed to discover the right brain function. The results are worth comparing artificial mechanism with the human brain structure. From these two results in left-brain and right brain research, we have discussed the integration of these two brain functions in terms of the mutual behavior of the functions of the left and right brains. To have these two get together, we concluded that the integration is based on the processing speed on computers engaged in the left and right brains. To increase their processing speed, we have developed the architecture with software including special speculation. Also a cache mechanism has been developed to have high speed processing.
空間分割型並列処理に基づくボリュームレンダリングアルゴリズムに関する研究

小林広明

Offer Organization: 日本学術振興会

System: 科学研究費助成事業奨励研究(A)

Category: 奨励研究(A)

Institution: 東北大学

1998 - 1999

More details Close

本研究では、3次元データであるボリュームのリアルタイム可視化を可能とする並列アルゴリズムの研究を行なった。具体的には、平成10年度に設計した適応分割による負荷バランスを考慮した並列シェア・ワープアルゴリズムを並列計算機に実装し、その性能を評価した。性能評価の結果、本並列アルゴリズムは、並列計算機の処理要素であるプロセッサ数に比例した性能向上が得ることがわかった。また、適応分割を導入することにより、並列処理を行なうプロセッサ間の負荷分散が実現されると同時に、並列アルゴリズムに内在する通信量が減少し、その結果、並列処理効率が改善されることがわかった。そして、32台のプロセッサからなる並列計算機により、256×256画素の画像を1秒間に10枚以上生成できることを確認した。また、本研究では、ボリュームデータとポリゴンデータが混在したシーンに対する写実的画像生成を実現するために、大域照明モデルに基づく画像生成法であるレイトレーシング法とラジオシティ法の改良と、その並列化を行なった。具体的には、光線のボリューム内伝搬におけるエネルギー授受モデルをラジオシティとレイトレーシングの照明モデルと統合化し、さらに、統合化したモデルをオブジェクト空間分割型並列処理モデルに基づいて並列化した。本改良並列アルゴリズムにより、ポリゴンで実現される物体と雲や霧などが混在するシーンに対する大域照明モデルでの写実的画像生成が高速に実現できる。
Study on Massively Parallel Simulations and Visualizations

HORIGUCHI Susumu, ABE Toru, KOBAYASHI Hiroaki, ABE Masato, KAWAZOE Yoshiyuki, TANNO Kuninobu

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Category: Grant-in-Aid for Scientific Research (B)

Institution: JAPAN ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY,Hokuriku

1995 - 1996

More details Close

Computer simulations have been important and are frequently used in advanced science and technology. Advanced simulations require large computation times and huge memory capacities of computers. Massively parallel computers have been attractive for advanced simulations instead of supercomputers in this decade. Parallelized simulations, however, have not been studied sufficiently and the visualizations of simulation results also have been more important recently. The study on massively parallel simulations and visualizations focuses on the advanced scientific field such as physics, Chemistry, Material Science, Fluid Dynamics and Neural Networks, We implemented many parallelized simulations on massively parallel computers : CM-5, ncube2 and Parsytec GC and demonstrated visualization of simulation results by Computer Graphics. To achieve high-speed parallel simulations, we developed the dynamics load balancing method for Molecular Dynamics on CM-5, the global communication method for room illlumination simulation by Radiosity on Parsytec GC,and the packed message passing method for neural network on ncube2. We also snow the effective visualizations of huge simulation data using 3D computer graphics.
TLB統一型キャッシュメモリシステムに関する研究

小林広明

Offer Organization: 日本学術振興会

System: 科学研究費助成事業奨励研究(A)

Category: 奨励研究(A)

Institution: 東北大学

1995 - 1995

More details Close

本研究では,マイクロプロセッサのチップ上に個別に実装され,チップ面積の大きな割合を占めるTLBとキャッシュメモリについて,それらをタグの共有という形で統合化することにより,領域の縮小を試みた.また,縮小によって得られた領域をTLBの拡大として再利用することにより,メモリアクセスサイクルの減少の可能性について検討した. まず,TLB統一型キャッシュメモリの構成とその制御法を明確にし,TLB統一型キャッシュメモリのハードウェア量をレジスタビット相当で評価した.その結果,TLB統一型キャッシュメモリを導入することにより,従来のキャッシュメモリとTLBの構成に比べて,ハードウェア量を大幅に削減できることがわかった.そして,削減できたハードウェアをTLBの拡張に再利用した場合,キャッシュサイズが4KBの時は16エントリのTLBを2倍,8KBの時は4倍,16KB,32KBの時は8倍,128KBの時は16倍にそれぞれ拡張できることが明らかになった.次に,TLB統一型キャッシュメモリの性能評価をトレースドリブンシミュレーションにより行った.まず,実用的な8個の応用プログラムをワークステーションで800万命令実行した際のメモリアクセル状況を記録し,これを命令実行に必要なメモリアクセスとして,TLB統一型キャッシュメモリシミュレータと通常のTLB-キャッシュメモリシミュレータに入力した.そして,シミュレータ上でのキャッシュとTLBを介したメモリアクセス状況から,それぞれのミス率を求め,ミス率から1命令の実行に必要な平均メモリサイクル数を求めた.シミュレーションによる性能評価の結果,TLBとキャッシュメモリの統合化により削減できるハードウェア領域をTLBの拡張に再利用することにより,同量のハードウェアを必要とする従来型の構成比べて,メモリサイクル数減少させることが可能であることを明らかにした.
Self-Reconfigurable Massively Parallel Computer on Stacked Wafers

HORIGUCHI Susumu, NUMATA Issei, ABE Touru, TANNO Kuninobu, KOBAYASHI Hiroaki, ASO Hirotomo, JAIN Vijay k., KIM Jung h., TAKETA Hiroshi, SHIMODAIRA Hiroshi, THOMAS Knight jr., FABRIZIO Lambardi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

Category: Grant-in-Aid for international Scientific Research

Institution: JAPAN ADVANCED JNSTITUTE of SCIENCE and Technology, Hokuriku

1993 - 1995

More details Close

This research deals with a 3D-mesh array on stacked wafers and its fault tolerant architecture. The architecture of 3D-mesh arrays provides a self-reconfiguration of interconnections using a recursive shift scheme. Anuj Chandra et al. also proposed a reconfigurable algorithm for 3D 1/ track model based on a compensation path scheme that was originally proposed S.Y.Kung et al. The 3D 1/ track model was, however, discussed only from the theoretical view points of extension of the 2D 1/ track model. This paper examines its fault tolerant performance to obtain the system yield of a 3D-mesh array using a self-reconfiguration scheme. First, we reviews recent WSI devices to construct massively parallel computers and summarize the merit of WSI parallel computers. Next. we deal with the mesh-connected multiprocessor architecture and reconfiguration stategies to enhance the array yield for WSI implementation. Reconfiguration performance of a mesh-connected parallel computer is discussed by comparing it to previous works. WSI implementation of a cube-connected cycles (CCC) is addressed and its yield performance is discussed by taking into account the chip area of the PEs, switches, and links. We also propose a new interconnection network HCQ based on a crossed cube interconnection to reduce the diameter and the average distance of the interconnection network. The excellent network property of HCQ is theoretically investigated. Finally, we discussed a 3D-mesh array on stacked wafers for massively parallel computers. A reconfiguration algorithm based on a recursive shift scheme is proposed. Applying the recursive shift scheme to a 3D-mesh array, it is shown that the reconfiguration performance becomes high and provides the possibility to construct a massively parallel computer on stacked wafers like as the 3D-mesh array.
写実的画像生成のための超並列システムに関する研究

小林広明

Offer Organization: 日本学術振興会

System: 科学研究費助成事業奨励研究(A)

Category: 奨励研究(A)

Institution: 東北大学

1994 - 1994

More details Close

本研究では,写実的画像生成のための超並列システム実現に向けて,その基礎となる新しい大域照明モデルを提案し,本モデルに基づいたシステム構成方式とその制御方式について検討した.具体的には,まず,物体情報を各プロセッサに分散配置するメモリモデル上での新しい超並列写実画像生成方式を実現するために,オブジェクト空間分割型並列処理方式に注目し,光線追跡法とラジオシティ法を統合した大域照明モデルにオブジェクト空間分割型並列処理方式を適用させて,新しい超並列写実的画像生成アルゴリズムを考案した.次に,本アルゴリズムに適した超並列計算機アーキテクチャについて検討し,システム構成,およびその制御方法を具体化した.最後に,本システムの性能評価のために,本システムのレジスタトランスファレベルでのシミュレーションが可能なシミュレータを開発し,いくつかのテスト画像生成でその性能を評価した.性能評価の結果,本システムは,256台程度まではプロセッサ台数に比例して処理時間が減少し,台数効果が得らることがわかった.また,システムの稼働率について検討したところ,256台以下では高い稼働率が達成されているが,それ以上のプロセッサからなるシステムでは,稼働率の著しい低下が観測された.この理由としては,本研究で考案した並列アルゴリズムでは,物体定義空間を静的に分割し,それをプロセッサに均一に割り当てることによりプロセッサへのタスク割り当てを行う静的負荷分散法を採用しているために,プロセッサ数を増加させた場合,それに見合う十分な空間分割が行われないと,負荷の不均一が発生し,その結果,プロセッサの稼働率に偏りが生じてしまうからである.これをさけるためには,より細かい空間分割を行うか,実行時のプロセッサの稼働率状態に応じてタスクの再配置を行う動的負荷分散を行うことが必要と思われる.これについては,今後の最重要課題である.
Studies in Brain-Structured Supercomputers

NAKAMURA Tadao, SUGIMOTO Osamu, KOBAYASHI Hiroaki, HAGIWARA Masafumi, GOTOH Eisuke, FUKASE Masa-aki, HASEGAWA Katsuo, FLYNN Michael j.

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

Category: Grant-in-Aid for international Scientific Research

Institution: TOHOKU UNIVERSITY

1993 - 1994

More details Close

We have planned the joint research project between Tohoku University and Stanford University with the aim of analyzing, synthesizing, and evaluating the performance of brain-structured supercomputers. During the fourth academic year of the project, ten meetings have been held. The meetings have been very profitable for the progress of our project. We have published many papers about brain-structured supercomputers in this year. Following are main subjects discussed extensively and deeply in these meetings. 1. Synthesis of brain-structured supercomputers : The model of a brain-structured supercomputer has been diagramed by using mind computer, expression recognition associative memory, wave cybernetics, artificial cochlea, sparse adaptive memory, wave pipeline, jet pipeline, logical architecture, symbolic architecture, functional architecture, distributed shared memory multiprocessor, and computer graphics etc. 2. Architecture of the sparse adaptive memory : The wave pipcline has been designed by using a CMOS VLSI vector unit. Advances and problems in high speed processor design has been made clear. 3. Analysis of RIGHT computer : So far developed methodologies have been farther investigated to implement possible brain functionalities onto supercomputers from various aspects like memory system, conversion system from input to memory reference frame, mechanisms of heuristics, etc. 4. Architecture of superparallel symbol processing : A book has been published that describes the basis of the VLSI architecture for the superparallel symbol processing. 5. Performance evaluation of LEFT and RIGHT computer : The whole system of computer graphics has been investigated by using multiprocessing techniques for ray tracing and multipass rendering. 6. Study of neural network toward RIGHT computer : Four papers have been published about combination of neural networks and fuzzy inference, knowledge processing by distributed representation.
Basic Studies in Supercomputer Organization and Performance

NAKAMURA Tadao, SUGIMOTO Osamu, KOBAYASHI Hiroaki, HAGIWARA Masafumi, GOTOH Eisuke, FUKASE Masa-aki, HASEGAWA Katsuo, FLYNN Michael J

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

Category: Grant-in-Aid for international Scientific Research

Institution: TOHOKU UNIVERSITY

1991 - 1992

More details Close

Through 1991-1992, we have continued the international joint research project entitled "Basic Studies in Supercomputer Organization and Performance" supported by the Ministry of Education of Japan. By virtue of this support we have progressed in research on Supercomputers and then we have developed our research from the fundamental field of neuroscience to computer science and technologies. Especially, in neuroscience our group aims at the next generation computer based on neural networks and their applications that includes the flexibility of thinking. We name the image (meaning)-oriented computer the RIGHT computer that stems from the right brain of the human being. The research field of the RIGHT computer covers input/output devices of usual computers in addition to original functions of the right brain. Here, the original function means, for example, creating the concept for something, which is towards realizing the real artificial intelligence/brain. To develop these fields, we have studied a concept of a mind-oriented computer, sparse distributed memory for pattern recognition and a variable resolution, nonlinear silicon cochlea for speech recognition in input devices category, computer graphics in output devices one, and expression recognition using a neural network in a training fashion. In numerical calculations in scientific applications, the function of the left brain is extremely expected at highest processing rate. Usual von-Neumann computers are the LEFT computer in view of the left brain. Wave pipelining to increase clock frequency in practical circuits without increasing the number of storage elements has been proposed for speedup of calculations. A novel architecture of supercomputing has been proposed and advanced that is called the Jet Pipeline whose feature is to integrate all the possible features used in usual computers. Then, in terms of theoretical models of computation, a functional programming language gas been examined.
オブジェクト指向レイトレーシングにおける並列モデリングに関する研究

小林広明

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 奨励研究(A)

Institution: 東北大学

1990 - 1990
オブジェクト指向並列レイトレーシングシステムに関する研究

小林広明

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 奨励研究(A)

Institution: 東北大学

1989 - 1989

Show all Show first 5

Social Activities 10

7th Teraflop Workshop

2007/11/21 - 2007/11/22

More details Close

スーパーコンピュータとその応用に関する国際学術講演会
5th Teraflop Workshop

2006/11/20 - 2006/11/21

More details Close

スーパーコンピュータとその応用に関する国際学術講演会
津波被害予測に活用／スーパーコンピュータの多彩な役割

2015/06/06 -
Japan Concludes Exascale Feasibility Study

2014/12/03 -
津波浸水域, 20分で予測東北大など, スパコン活用

2014/08/03 -
東北大学とNEC、次世代スーパーコンピュータ技術の共同研究組織

2014/06/29 -
Feasibility Study of Advanced Vector Architecture System toward Exascale at Cyberscience Center, Tohoku University, Japan

2013/05/13 -
震災を乗り越えた東北大のスパコンが目指す未来

2011/10/28 -
仙台育英学園秀光中等教育学校講演会

2006/12/14 -

More details Close

高校での出張講義
仙台市医師会学術講演会

2006/04/19 -

More details Close

医師向け技術講演会

Show all Show first 5

Media Coverage 7

科学の泉「未来をひらくスパコン(1)〜(9)」

河北新報

2015/05

Type: Newspaper, magazine
災害を３Ｄで可視化津波浸水予測に活用東北大

河北新報，NHK

2014/06/29

Type: Newspaper, magazine
超高速計算が起こす“新・産業革命” 〜スパコン「京」のひらく未来〜

NHK

2013/01/08

Type: TV or radio program
ベクトル型復権に光

日経産業新聞

2007/12/25

Type: Newspaper, magazine
性能世界一のスパコン，東北大「ＳＸ－７」

朝日新聞

2005/02/24

Type: Newspaper, magazine
スーパーコンピューター，東北大学が性能世界一

NHK総合

2005/02/09

Type: TV or radio program
計測器性能は世界一東北大スーパーコンピューター

河北新報

2005/01/24

Type: Newspaper, magazine

Show all Show first 5

Other 8

リアルタイム津波予測システムとＬアラートの連携による「津波Lアラート」の構築と災害対応の高度化実証事業

More details Close

大規模地震発生時に，遠隔に設置するスーパーコンピュータによるリアルタイム津波シミュレーションを相補的に機能させ，日本全国をカバーするリアルタイム津波浸水被害予測システムの研究開発と，シミュレーション結果をLアラートから提供することにより全国の自治体への配信を可能とした．
リアルタイム津波浸水・被害予測・災害情報配信による自治体の減災力強化の実証事業

More details Close

地震観測データとスーパーコンピュータによるリアルタイムシミュレーションを連携させ，地震発生から20分以内に関係自治体に津波浸水被害予測情報を配信するためのシステムの研究開発を行う
高メモリバンドはアプリケーションに適した将来のHPCIシステムのあり方の調査研究

More details Close

本事業では, 2018年頃に実現が求められ,我が国の安全安心な社会作りと,産業界の国際競争の強化に不可欠な先端ものづくりを支える将来のスーパーコンピュータシステムの実現に必要な技術的知見の獲得を目的として，アプリケーション，システムアーキテクチャ,システムソフトウェア，デバイス技術，それぞれについて技術的課題を明らかにし，その解決のための要素技術の検討とシステム設計研究を行い，将来のHPCIシステムの在り方についての調査研究を行う.
「「京」を中核とするＨＰＣＩの産業利用支援・裾野拡大のための設備拡充」

More details Close

HPCIを支える高度計算機設備の拡充と，その利用環境の高度化に関する研究開発に取り組む
プログラマブル・キャッシュ付ベクトル機構によるアプリケーション性能評価

More details Close

シミュレーションプログラムの高速化技術としてオンチップメモリ機構とそのソフトウェア利用技術の協調設計を行う
自己修復機能を有する３次元VLSI システムの創製

More details Close

本研究プロジェクトでは、車載用画像処理システムのディペンダビリティについて、アーキテクチャ・OS レベルからのディペンダビリティ向上に対する考え方を基に、ディペンダブルな画像処理システムの実現に必要な画像処理・認識能力、要件を考慮したシステムの全体設計、診断・修復機能を有するリコンフィギュラブルロジックおよびリコンフィギュラブル等のハードウェア技術、VM を基本としたディペンダブルソフトウェア技術の面から研究を進める。研究全体を、画像処理システムに関する研究、ソフトウェア技術に関する研究、ハードウェア技術に関する研究の3 つの分野に分け、それぞれの分野間で緊密な連携が取れるような研究分担体制を構築しながら、研究を進めて行く。
超音波計測連成解析による超高精度生体機能計測システム

More details Close

スーパーコンピュータによるシミュレーション解析と超音波計測機器データとを融合させることにより、高精度な生体機能計測を高速に行うシステムの研究開発において、スーパーコンピュータと計測機器間のインタフェース設計・開発を担当
ICTエコ社会を創造する安全・安心・安価なユビキタスコンピューティングプラットフォームの研究・開発

More details Close

情報通信分野でのエコロジーモデルの確立を目指し、社会に遍在する計算資源として活用する、ユビキタス時代の安心・安全・安価なボランティアコンピューティング基盤を研究開発する。特にボランティアコンピューティングの高効率化、高信頼化、および参加を促進するインセンティブモデルについて研究し、機密性の高い計算にも利用可能で、しかも従来の実装技術では実現困難な規模の大規模計算基盤を安価に提供するための基盤技術を確立する。

Show all Show first 5