Details of the Researcher

PHOTO

Hiroaki Kobayashi
Section
Graduate School of Information Sciences
Job title
Professor
Degree
  • 工学博士(東北大学)

Research History 14

  • 2018/07 - Present
    Tohoku University

  • 2016/04 - Present
    Tohoku University Graduate School of Information Sciences Professor

  • 2017/04 - 2019/03
    東北大学 情報科学研究科 情報基礎科学専攻長

  • 2012/04 - 2016/03
    東北大学 教育研究評議会評議員

  • 2008/04 - 2016/03
    Tohoku University

  • 2008/04 - 2016/03
    Tohoku University Cyberscience Center

  • 2008/04 - 2016/03
    東北大学 情報シナジー機構 副機構長

  • 2006/10 - 2016/03
    National Institute of Informatics Visiting Professor

  • 2002/12 - 2008/03
    Tohoku University

  • 2001/10 - 2008/03
    東北大学 情報シナジーセンター 教授

  • 1995/10 - 2002/01
    スタンフォード大学 電気工学科・計算機システム研究所 客員准教授

  • 1993/04 - 2001/09
    東北大学 情報科学研究科 助教授

  • 1991/04 - 1993/03
    東北大学 工学部 講師

  • 1988/04 - 1991/03
    東北大学 工学部 助手

Show all Show first 5

Education 2

  • Tohoku University Graduate School, Division of Engineering Department of Information Engineering

    - 1988/03/25

  • Tohoku University Faculty of Engineering 通信工学

    - 1983/03/25

Committee Memberships 29

  • 文部科学省 科学技術・学術審議会専門委員

    2021/04 - Present

  • 大阪大学サイバーメディアセンター全国共同利用運営委員会 委員

    2014/04 - Present

  • 日本学術会議 連携会員

    2014/04 - Present

  • Editorial Board of International Journal of Networked and Distributed Computing Member

    2011/03 - Present

  • Workshop on Sustained Simulation Performance Organizing Committee Chair

    2006/10 - Present

  • 文部科学省 HPCI計画推進委員

    2017/03 - 2025/03

  • HPCIコンソーシアム 副理事長・副議長

    2020/04 - 2024/05

  • 重点課題(8) 「近未来型ものづくりを先導する革新的設計・製造プロセスの開発」諮問委員会 委員長

    2015/04 - 2020/03

  • ポスト京重点課題「地震・津波による複合災害の統合的予測システムの構築」運営委員会 委員

    2015/04 - 2020/03

  • HPCIコンソーシアム 理事

    2014/04 - 2018/03

  • JST CREST「ポストペタスケール高性能計算に資するシステムソフトウェア 技術の創出」 領域アドバイザー

    2012/04 - 2018/03

  • IEEE COOL Chips 組織委員長

    2011/04 - 2017/04

  • HPCI連携サービス委員会 委員長

    2013/04 - 2016/03

  • 次世代スーパーコンピュータ戦略プログラム分野3「防災・減災に資する地球変動予測」運営委員会 委員

    2013/04 - 2016/03

  • 国立情報学研究所「学術情報ネットワーク運営・連携本部」 委員

    2012/04 - 2016/03

  • HPCI連携サービス委員会 委員

    2011/04 - 2016/03

  • 北海道大学情報基盤センター外部評価委員会 委員

    2014/04 - 2015/03

  • 独立行政法人海洋研究開発機構部署評価委員会 部署評価アドバイザー

    2012/04 - 2015/03

  • 高度情報科学技術研究機構「学際共同研究WG」 委員

    2013/04 - 2014/03

  • 情報処理学会 代表会員

    2012/04 - 2014/03

  • 学際大規模情報基盤共同利用・共同研究拠点共同研究課題審査委員会 委員長

    2012/04 - 2014/03

  • 情報処理学会東北支部 情報処理学会東北支部長

    2012/04 - 2014/03

  • 国立大学共同利用共同研究拠点協議会 役員

    2012/04 - 2014/03

  • 学際大規模情報基盤共同利用・共同研究拠点共同研究課題審査委員会 委員長

    2012/04 - 2014/03

  • HPCIコンソーシアム 監事

    2012/04 - 2014/03

  • 電気関係学会東北支部連合大会実行委員会 電気関係学会東北支部連合大会実行委員長

    2013/04 - 2013/08

  • 海洋研究開発機構「環境・社会システム統合研究フォーラム」 委員

    2012/04 - 2013/03

  • 科学研究費委員会 専門委員

    2011/04 - 2013/03

  • 東京工業大学学術国際情報センター外部評価委員会 委員

    2014/04 -

Show all ︎Show first 5

Professional Memberships 4

  • 米国計算機学会(ACM)(The Association for Computing Mackinery)

  • 米国電気学会(IEEE)(The Institute of Electrical and Electronics Engineers,INC)

  • 情報処理学会

  • 電子情報通信学会

Research Interests 2

  • Computer Architectures

  • Supercomputers

Research Areas 4

  • Informatics / High-performance computing / Supercomputers

  • Informatics / Software /

  • Informatics / Information networks /

  • Informatics / Computer systems /

Awards 10

  1. Best Paper Award

    2020/11 The Eighth International Symposium on Computing and Networking (CANDAR'20) Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint

  2. Best Poster Winner HPC-in-Asia

    2019 A Skewed Multi-Bank Cache for Vector Processors

  3. Best Paper Award of PaCT, 2019

    2019 Analysis of relationship between SIMD-processing features used in NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors

  4. 平成30年度科学技術分野の文部科学大臣表彰 科学技術賞(開発部門)

    2018/04 文部科学省

  5. 2018年全NUA事例論文技術貢献賞受賞

    2018 新ベクトルプロセッサSX-Aurora TSUBASAの基本性能評価

  6. 文部科学大臣賞「情報化促進貢献個人等表彰」

    2017/10 文部科学省

  7. ジャパン・レジリエンス・アワード2016優秀賞

    2016

  8. Best Paper Award

    2015 Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework

  9. BEST PAPER AWARD at the 2nd international symposium on Parallel and Distributed Processing and Applications (ISPA’04)

    2004/12/13 the 2nd international symposium on Parallel and Distributed Processing and Applications (ISPA’04) BEST PAPER AWARD

  10. IPデザインアワード研究助成賞

    2002/05/29 日経BP社 3DCGiRAMアーキテクチャに基づく実時間レイトレーシングエンジンの研究開発

Show all ︎Show 5

Papers 429

  1. A Graph-based Molecular Structure Identification Method via Feature Extraction for Three-dimensional Electron Diffraction Data

    Yusuke Fukasawa, Kazuhiko Komatsu, Masayuki Sato, Saori Maki-Yonekura, Hirofumi Kurokawa, Koji Yonekura, Hiroaki Kobayashi

    2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW) 325-329 2024/11/26

    Publisher: IEEE

    DOI: 10.1109/candarw64572.2024.00060  

  2. Adaptive Parallelization based on Frame-level and Tile-level Parallelisms for VVC Encoding

    Karin Onouchi, Masayuki Sato, Hiroe Iwasaki, Kazuhiko Komatsu, Hiroaki Kobayashi

    2024 Twelfth International Symposium on Computing and Networking (CANDAR) 87-95 2024/11/26

    Publisher: IEEE

    DOI: 10.1109/candar64496.2024.00018  

  3. An Ising-based Decision Method for Intra Prediction Mode in Video Coding

    Takuto Momominami, Naoya Niwa, Masahito Kumagai, Kazuhiko Komatsu, Hiroaki Kobayashi, Hiroe Iwasaki

    SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 1748-1754 2024/11/17

    Publisher: IEEE

    DOI: 10.1109/scw63240.2024.00218  

  4. File I/O Cache Performance of Supercomputer Fugaku Using an Out-of-Core Direct Numerical Simulation Code of Turbulence

    Yuto Hatanaka, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Akihiro Musa, Takashi Ishihara, Atsuya Uno, Kazuhiko Komatsu, Hiroaki Kobayashi, Mitsuo Yokokawa

    Computational Science – ICCS 2024 173-187 2024/06/30

    Publisher: Springer Nature Switzerland

    DOI: 10.1007/978-3-031-63778-0_13  

    ISSN: 0302-9743

    eISSN: 1611-3349

  5. An Asymptotic Parallel Linear Solver and Its Application to Direct Numerical Simulation for Compressible Turbulence

    Mitsuo Yokokawa, Taiki Matsumoto, Ryo Takegami, Yukiya Sugiura, Naoki Watanabe, Yoshiki Sakurai, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

    Computational Science – ICCS 2024 383-397 2024/06/27

    Publisher: Springer Nature Switzerland

    DOI: 10.1007/978-3-031-63751-3_26  

    ISSN: 0302-9743

    eISSN: 1611-3349

  6. Prediction of Steam Turbine Blade Erosion Using CFD Simulation Data and Hierarchical Machine Learning

    Issei Fukamizu, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Journal of Engineering for Gas Turbines and Power 1-10 2024/06/25

    Publisher: ASME International

    DOI: 10.1115/1.4065815  

    ISSN: 0742-4795

    eISSN: 1528-8919

    More details Close

    Abstract The information of the degree of blade erosion is vital for the efficient operation of steam turbines. However, it is nearly impossible to directly measure the degree of blade erosion during operation. Moreover, collecting sufficient data of eroded cases for predictive analysis is challenging. Therefore, this paper proposes a blade erosion prediction method using numerical simulation and machine learning. Pressure data of several blade erosion cases are collected from the numerical turbine simulation. The machine learning approach involves training on collected simulation data to predict the degree of erosion for the firststage stator (1S) and the first-stage rotor blade (1R) from internal pressure data. The proposed erosion prediction model employs a two-step hierarchical approach. First, the proposed model predicts the 1S erosion degree using the k-NN (k-Nearest Neighbor) regression. Second, the proposed model estimates the 1R erosion degree with Linear Regression models. These models are tailored for each of the 1S erosion degrees, utilizing pressure data processed through Fast Fourier Transform (FFT). The evaluation shows that the proposed method achieves the prediction of the 1S erosion with a Mean Absolute Error (MAE) of 0.000693 mm, and the 1R erosion with an MAE of 0.458 mm. The evaluation results indicate that the proposed method can accurately capture the degree of turbine blade erosion from internal pressure data. As a result, the proposed method suggests that the erosion prediction method can be effectively used to determine the optimal timing for Maintenance and Repair Operations (MRO).

  7. Quantum annealing-based algorithm for lattice gas automata

    Yuichi Kuya, Kazuhiko Komatsu, Kouki Yonaga, Hiroaki Kobayashi

    Computers and Fluids 274 2024/04/30

    DOI: 10.1016/j.compfluid.2024.106238  

    ISSN: 0045-7930

  8. A Constraint Partition Method for Combinatorial Optimization Problems Peer-reviewed

    Onoda Makoto, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

    In Proceedings of 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 600 (607) 2023/12

    DOI: 10.1109/MCSoC60832.2023.00093  

  9. Appropriate Graph-Algorithm Selection for Edge Devices Using Machine Learning Peer-reviewed

    Yusuke Fukasawa, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    In Proceedings of 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 544 (551) 2023/12

    DOI: 10.1109/MCSoC60832.2023.00086  

  10. Multi-scale Loss based Electron Microscopic Image Pair Matching Method Peer-reviewed

    Chunting Duan, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    In Proceedings of 22nd IEEE International Conference on Machine Learning and Applications 1957-1964 2023/12

    DOI: 10.1109/ICMLA58977.2023.00295  

  11. Investigating the Characteristics of Ising Machines Peer-reviewed

    Kazuhiko Komatsu, Makoto Onoda, Masahito Kumagai, Hiroaki Kobayashi

    Proceedings of IEEE International Conference on Quantum Computing and Engineering 2023/09

    DOI: 10.1109/QCE57702.2023.00108  

  12. Performance Evaluation of Tsunami Evacuation Route Planning on Multiple Annealing Machines

    Yihui Liu, Kazuhiko Komatsu, Masahito Kumagai, Masayuki Sato, Hiroaki Kobayashi

    Proceedings of the 20th ACM International Conference on Computing Frontiers 2023/05/09

    Publisher: ACM

    DOI: 10.1145/3587135.3592193  

  13. I/O Performance Evaluation of a Memory-Saving DNS Code on SX-Aurora TSUBASA

    Mitsuo Yokokawa, Yuki Yamane, Kenta Yamaguchi, Takashi Soga, Taiki Matsumoto, Akihiro Musa, Kazuhiko Komatsu, Takashi Ishihara, Hiroaki Kobayashi

    2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2023/05

    Publisher: IEEE

    DOI: 10.1109/ipdpsw59300.2023.00117  

  14. Ising-Based Kernel Clustering

    Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Algorithms 16 (4) 214-214 2023/04/19

    Publisher: MDPI AG

    DOI: 10.3390/a16040214  

    eISSN: 1999-4893

    More details Close

    Combinatorial clustering based on the Ising model is drawing attention as a high-quality clustering method. However, conventional Ising-based clustering methods using the Euclidean distance cannot handle irregular data. To overcome this problem, this paper proposes an Ising-based kernel clustering method. The kernel clustering method is designed based on two critical ideas. One is to perform clustering of irregular data by mapping the data onto a high-dimensional feature space by using a kernel trick. The other is the utilization of matrix–matrix calculations in the numerical libraries to accelerate preprocess for annealing. While the conventional Ising-based clustering is not designed to accept the transformed data by the kernel trick, this paper extends the availability of Ising-based clustering to process a distance matrix defined in high-dimensional data space. The proposed method can handle the Gram matrix determined by the kernel method as a high-dimensional distance matrix to handle irregular data. By comparing the proposed Ising-based kernel clustering method with the conventional Euclidean distance-based combinatorial clustering, it is clarified that the quality of the clustering results of the proposed method for irregular data is significantly better than that of the conventional method. Furthermore, the preprocess for annealing by the proposed method using numerical libraries is by a factor of up to 12.4 million × from the conventional naive python’s implementation. Comparisons between Ising-based kernel clustering and kernel K-means reveal that the proposed method has the potential to obtain higher-quality clustering results than the kernel K-means as a representative of the state-of-the-art kernel clustering methods.

  15. Analysis of Precision Vectors for Ising-Based Linear Regression

    Kaho Aoyama, Kazuhiko Komatsu, Masahito Kumagai, Hiroaki Kobayashi

    Parallel and Distributed Computing, Applications and Technologies 251-261 2023/04/08

    Publisher: Springer Nature Switzerland

    DOI: 10.1007/978-3-031-29927-8_20  

    ISSN: 0302-9743

    eISSN: 1611-3349

  16. A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders

    Masayuki Sato, Yuya Omori, Ryusuke Egawa, Ken Nakamura, Daisuke Kobayashi, Hiroe Iwasaki, Kazuhiko Komatsu, Hiroaki Kobayashi

    Parallel and Distributed Computing, Applications and Technologies 288-300 2023/04/08

    Publisher: Springer Nature Switzerland

    DOI: 10.1007/978-3-031-29927-8_23  

    ISSN: 0302-9743

    eISSN: 1611-3349

  17. Performance evaluation of parallel direct numerical simulation code on supercomputer SX-Aurora TSUBASA

    Mitsuo Yokokawa, Yujiro Takenaka, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

    Computers & Fluids 261 105913-105913 2023/04

    Publisher: Elsevier BV

    DOI: 10.1016/j.compfluid.2023.105913  

    ISSN: 0045-7930

  18. Rapid and quantitative uncertainty estimation of coseismic slip distribution for large interplate earthquakes using real-time GNSS data and its application to tsunami inundation prediction

    Keitaro Ohno, Yusaku Ohta, Ryota Hino, Shunichi Koshimura, Akihiro Musa, Takashi Abe, Hiroaki Kobayashi

    Earth, Planets and Space 74 (1) 2022/12

    Publisher: Springer Science and Business Media LLC

    DOI: 10.1186/s40623-022-01586-6  

    eISSN: 1880-5981

    More details Close

    <title>Abstract</title>This study proposes a new method for the uncertainty estimation of coseismic slip distribution on the plate interface deduced from real-time global navigation satellite system (GNSS) data and explores its application for tsunami inundation prediction. Jointly developed by the Geospatial Information Authority of Japan and Tohoku University, REGARD (REal-time GEONET Analysis system for Rapid Deformation monitoring) estimates coseismic fault models (a single rectangular fault model and slip distribution model) in real time to support tsunami prediction. The estimated results are adopted as part of the Disaster Information System, which is used by the Cabinet Office of the Government of Japan to assess tsunami inundation and damage. However, the REGARD system currently struggles to estimate the quantitative uncertainty of the estimated result, although the obtained result should contain both observation and modeling errors caused by the model settings. Understanding such quantitative uncertainties based on the input data is essential for utilizing this resource for disaster response. We developed an algorithm that estimates the coseismic slip distribution and its uncertainties using Markov chain Monte Carlo methods. We focused on the Nankai Trough of southwest Japan, where megathrust earthquakes have repeatedly occurred, and used simulation data to assume a Hoei-type earthquake. We divided the 2951 rectangular subfaults on the plate interface and designed a multistage sampling flow with stepwise perturbation groups. As a result, we successfully estimated the slip distribution and its uncertainty at the 95% confidence interval of the posterior probability density function. Furthermore, we developed a new visualization procedure that shows the risk of tsunami inundation and the probability on a map. Under the algorithm, we regarded the Markov chain Monte Carlo samples as individual fault models and clustered them using the k-means approach to obtain different tsunami source scenarios. We then calculated the parallel tsunami inundations and integrated the results on the map. This map, which expresses the uncertainties of tsunami inundation caused by uncertainties in the coseismic fault estimation, offers quantitative and real time insights into possible worst-case scenarios. <bold>Graphical Abstract</bold>

  19. Page-Address Coalescing of Vector Gather Instructions for Efficient Address Translation Peer-reviewed

    Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

    Proceedings of 2022 IEEE/ACM 12th Workshop on Irregular Applications: Architectures and Algorithms (IA3) 1-8 2022/11

    DOI: 10.1109/IA356718.2022.00007  

  20. A hierarchical wavefront method for LU-SGS

    Kazuhiko Komatsu, Yuta Hougi, Masayuki Sato, Hiroaki Kobayashi

    Computers &amp; Fluids 245 105572-105572 2022/06

    Publisher: Elsevier BV

    DOI: 10.1016/j.compfluid.2022.105572  

    ISSN: 0045-7930

  21. High-Performance GraphBLAS Backend Prototype for NEC SX-Aurora TSUBASA

    Ilya Afanasyev, Kazuhiko Komatsu, Dmitry Lichmanov, Vadim Voevodin, Hiroaki Kobayashi

    2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2022/05

    Publisher: IEEE

    DOI: 10.1109/ipdpsw55747.2022.00050  

  22. An Efficient Reference Image Sharing Method for the Image-division Parallel Video Encoding Architecture

    Nakamura Ken, Omori Yuya, Kobayashi Daisuke, Nitta Koyo, Sano Kimikazu, Sato Masayuki, Iwasaki Hiroe, Kobayashi Hiroaki

    IEICE Transactions on Electronics advpub 2022

    Publisher: The Institute of Electronics, Information and Communication Engineers

    DOI: 10.1587/transele.2022lhp0002  

    ISSN: 0916-8524

    eISSN: 1745-1353

    More details Close

    This paper proposes an efficient reference image sharing method for the image-division parallel video encoding architecture. This method efficiently reduces the amount of data transfer by using pre-transfer with area prediction and on-demand transfer with a transfer management table. Experimental results show that the data transfer can be reduced to 19.8-35.3% of the conventional method on average without major degradation of coding performance. This makes it possible to reduce the required bandwidth of the inter-chip transfer interface by saving the amount of data transfer.

  23. Optimizations of a Linear Matrix Solver in a Composite Simulation for a Vector Computer

    Zhilin He, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) 2021/12/10

    Publisher: IEEE

    DOI: 10.1109/paap54281.2021.9720445  

  24. A dynamic parameter tuning method for SpMM parallel execution Peer-reviewed

    Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Concurrency and Computation: Practice and Experience 2021/12/09

    Publisher: Wiley

    DOI: 10.1002/cpe.6755  

    ISSN: 1532-0626

    eISSN: 1532-0634

  25. Ising-Based Combinatorial Clustering Using the Kernel Method

    Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 2021/12

    Publisher: IEEE

    DOI: 10.1109/mcsoc51149.2021.00037  

  26. Real-time automatic uncertainty estimation of coseismic single rectangular fault model using GNSS data Peer-reviewed

    Keitaro Ohno, Yusaku Ohta, Satoshi Kawamoto, Satoshi Abe, Ryota Hino, Shunichi Koshimura, Akihiro Musa, Hiroaki Kobayashi

    Earth, Planets and Space 73 (1) 2021/12

    Publisher: Springer Science and Business Media LLC

    DOI: 10.1186/s40623-021-01425-0  

    ISSN: 1343-8832

    eISSN: 1880-5981

  27. An Externally-Constrained Ising Clustering Method for Material Informatics

    Kazuhiko Komatsu, Masahito Kumagai, Ji Qi, Masayuki Sato, Hiroaki Kobayashi

    2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW) 2021/11

    Publisher: IEEE

    DOI: 10.1109/candarw53999.2021.00040  

  28. Register Flush-free Runahead Execution for Modern Vector Processors

    Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

    2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2021/10

    Publisher: IEEE

    DOI: 10.1109/sbac-pad53543.2021.00023  

  29. Detection of Machinery Failure Signs From Big Time-Series Data Obtained by Flow Simulation of Intermediate-Pressure Steam Turbines Peer-reviewed

    Kazuhiko Komatsu, Hironori Miyazawa, Cheng Yiran, Masayuki Sato, Takashi Furusawa, Satoru Yamamoto, Hiroaki Kobayashi

    Journal of Engineering for Gas Turbines and Power 144 (1) 2021/08/13

    Publisher: ASME International

    DOI: 10.1115/1.4052142  

    ISSN: 0742-4795

    eISSN: 1528-8919

    More details Close

    <title>Abstract</title> The periodic maintenance, repair, and overhaul (MRO) of turbine blades in thermal power plants are essential to maintain a stable power supply. During MRO, older and less-efficient power plants are put into operation, which results in wastage of additional fuels. Such a situation forces thermal power plants to work under off-design conditions. Moreover, such an operation accelerates blade deterioration, which may lead to sudden failure. Therefore, a method for avoiding unexpected failures needs to be developed. To detect the signs of machinery failures, the analysis of time-series data is required. However, data for various blade conditions must be collected from actual operating steam turbines. Further, obtaining abnormal or failure data is difficult. Thus, this paper proposes a classification approach to analyze big time-series data alternatively collected from numerical results. The time-series data from various normal and abnormal cases of actual intermediate-pressure steam-turbine operation were obtained through numerical simulation. Thereafter, useful features were extracted and classified using K-means clustering to judge whether the turbine is operating normally or abnormally. The experimental results indicate that the status of the blade can be appropriately classified. By checking data from real turbine blades using our classification results, the status of these blades can be estimated. Thus, this approach can help decide on the appropriate timing for MRO.

  30. Distributed Graph Algorithms for Multiple Vector Engines of NEC SX-Aurora TSUBASA Systems Peer-reviewed

    Ilya V. Afanasyev, Vadim V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

    Supercomputing Frontiers and Innovations 8 (2) 2021/06

    Publisher: FSAEIHE South Ural State University (National Research University)

    DOI: 10.14529/jsfi210206  

    ISSN: 2313-8734

  31. Optimizing Load Balance in a Parallel CFD Code for a Large-scale Turbine Simulation on a Vector Supercomputer Peer-reviewed

    Osamu Watanabe, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Supercomputing Frontiers and Innovations 8 (2) 2021/06

    Publisher: FSAEIHE South Ural State University (National Research University)

    DOI: 10.14529/jsfi210207  

    ISSN: 2313-8734

  32. Performance and Power Analysis of a Vector Computing System Peer-reviewed

    Supercomputing Frontiers and Innovations 8 (2) 2021/06

    Publisher: FSAEIHE South Ural State University (National Research University)

    DOI: 10.14529/jsfi210205  

    ISSN: 2313-8734

  33. A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs Peer-reviewed

    Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2021/06

    Publisher: IEEE

    DOI: 10.1109/ipdpsw52791.2021.00116  

  34. A Metadata Prefetching Mechanism for Hybrid Memory Architectures Peer-reviewed

    Shunsuke Tsukada, Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

    2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) 2021/04/14

    Publisher: IEEE

    DOI: 10.1109/coolchips52128.2021.9410321  

    ISSN: 0916-8524

    eISSN: 1745-1353

  35. Optimization of the Himeno Benchmark for SX-Aurora TSUBASA Peer-reviewed

    Akito Onodera, Kazuhiko Komatsu, Soya Fujimoto, Yoko Isobe, Masayuki Sato, Hiroaki Kobayashi

    Benchmarking, Measuring, and Optimizing 127-143 2021/03

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-030-71058-3_8  

    ISSN: 0302-9743

    eISSN: 1611-3349

  36. VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture Peer-reviewed

    Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

    The Journal of Supercomputing 2021/01/26

    Publisher: Springer Science and Business Media LLC

    DOI: 10.1007/s11227-020-03564-9  

    ISSN: 0920-8542

    eISSN: 1573-0484

  37. Performance Evaluation of SX-Aurora TSUBASA and Its QA-Assisted Application Design

    Hiroaki Kobayashi, Kazuhiko Komatsu

    Sustained Simulation Performance 2019 and 2020 3-20 2021

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-030-68049-7_1  

  38. Optimizations of DNS Codes for Turbulence on SX-Aurora TSUBASA

    Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

    Sustained Simulation Performance 2019 and 2020 51-59 2021

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-030-68049-7_4  

  39. Efficient Mixed-Precision Tall-and-Skinny Matrix-Matrix Multiplication for GPUs Peer-reviewed

    Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    International Journal of Networking and Computing 11 (2) 267-282 2021

    Publisher: IJNC Editorial Committee

    DOI: 10.15803/ijnc.11.2_267  

    ISSN: 2185-2839

    eISSN: 2185-2847

  40. An External Definition of the One-Hot Constraint and Fast QUBO Generation for High-Performance Combinatorial Clustering Peer-reviewed

    Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

    International Journal of Networking and Computing 11 (2) 463-491 2021

    Publisher: IJNC Editorial Committee

    DOI: 10.15803/ijnc.11.2_463  

    ISSN: 2185-2839

    eISSN: 2185-2847

  41. A Deep Reinforcement Learning Based Feature Selector Peer-reviewed

    Yiran Cheng, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Parallel Architectures, Algorithms and Programming 378-389 2021

    Publisher: Springer Singapore

    DOI: 10.1007/978-981-16-0010-4_33  

    ISSN: 1865-0929

    eISSN: 1865-0937

  42. A Dynamic Parameter Tuning Method for High Performance SpMM Peer-reviewed

    Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Parallel and Distributed Computing, Applications and Technologies 318-329 2021

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-030-69244-5_28  

    ISSN: 0302-9743

    eISSN: 1611-3349

  43. Effects of Using a Memory Stalled Core for Handling MPI Communication Overlapping in the SOR Solver on SX-ACE and SX-Aurora TSUBASA Peer-reviewed

    Takashi Soga, Kenta Yamaguchi, Raghunandan Mathur, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

    Supercomputing Frontiers and Innovations 7 (4) 4-15 2020/12

    Publisher: FSAEIHE South Ural State University (National Research University)

    DOI: 10.14529/jsfi200401  

    ISSN: 2313-8734

  44. An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations Peer-reviewed

    Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) 2020/11

    Publisher: IEEE

    DOI: 10.1109/candarw51189.2020.00041  

  45. Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint Peer-reviewed

    Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masayuki Sato, Hiroaki Kobayashi

    2020 Eighth International Symposium on Computing and Networking (CANDAR) 2020/11

    Publisher: IEEE

    DOI: 10.1109/candar51075.2020.00015  

  46. Importance of Selecting Data Layouts in the Tsunami Simulation Code Peer-reviewed

    Takumi Kishitani, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

    2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 830-837 2020/05

    Publisher: IEEE

    DOI: 10.1109/ipdpsw50202.2020.00140  

  47. I/O Performance of the SX-Aurora TSUBASA Peer-reviewed

    Mitsuo Yokokawa, Ayano Nakai, Kazuhiko Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi

    2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2020/05

    Publisher: IEEE

    DOI: 10.1109/ipdpsw50202.2020.00014  

  48. Energy-efficient Design of an STT-RAM-based Hybrid Cache Architecture Peer-reviewed

    Masayuki Sato, Xue Hao, Kazuhiko Komatsu, Hiroaki Kobayashi

    2020 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) 2020/04

    Publisher: IEEE

    DOI: 10.1109/coolchips49199.2020.9097643  

  49. Performance Evaluation of SX-Aurora TSUBASA by Using Benchmark Programs

    Kazuhiko Komatsu, Hiroaki Kobayashi

    Sustained Simulation Performance 2018 and 2019 69-77 2020

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-030-39181-2_7  

  50. Optimizations for the Himeno Benchmark on Vector Computing System SX-Aurora TSUBASA Peer-reviewed

    Akito Onodera, Kazuhiko Komatsu, Masayuki Sato, Yoko Isobe, Hiroaki Kobayashi

    Proceedings of ISC High Performance 2020 Poster Presentation 2020 2020

  51. Metadata Management for Large-Scale Hybrid Memory Architectures Peer-reviewed

    Shunsuke Tsukada, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

    Proceedings of ISC High Performance 2020 Poster Presentation 2020

  52. An Evaluation of a Hierarchical Clustering Method Using Quantum Annealing Peer-reviewed

    Masahito Kumagai, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Proceedings of ISC High Performance 2020 Poster Presentation 2020

  53. Acceleration of Numerical Turbine using the Red-Black Method Peer-reviewed

    Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Poster Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia) 2020

  54. Performance evaluation of a clustering approach based on thermophysical properties by using multiple platforms Peer-reviewed

    Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

    Poster Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia) 2020

  55. Evaluation of Tsunami Inundation Simulation Using Vector Scalar Hybrid MPI on SX-Aurora TSUBASA Peer-reviewed

    Akihiko Musa, Takashi Soga, Takashi Abe, Masayuki Sato, Kazuhiko Komatsu, Shunichi Koshimura, Hiroaki Kobayashi

    Proceedings of Research Poster Presentation of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20) 2020

  56. PERFORMANCE EVALUATION OF PARALLEL DNS CODES ON THE SUPERCOMPUTER SX-AURORA TSUBASA Peer-reviewed

    Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

    Proceedings of the 32nd International conference on Parallel Computational Fluid Dynamics (ParCFD 2020) 2020

  57. A hierarchical wavefront method for LU-SGS on modern multi-core vector processors Peer-reviewed

    Yuta Hougi, Kazuhiko Komatsu, Osamu Watanabe, Masayuki Sato, Hiroaki Kobayashi

    Proceedings of the 32nd International conference on Parallel Computational Fluid Dynamics (ParCFD 2020) 2020

  58. Developing an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA Peer-reviewed

    Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

    Communications in Computer and Information Science 131-145 2020

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-030-55326-5_10  

    ISSN: 1865-0929

    eISSN: 1865-0937

  59. An Energy-aware Dynamic Data Allocation Mechanism for Many-channel Memory Systems Peer-reviewed

    Masayuki Sato, Takuya Toyoshima, Hikaru Takayashiki, Ryusuke Egawa, Hiroaki Kobayashi

    Supercomputing Frontiers and Innovations 6 (4) 4-19 2019/12

    Publisher: FSAEIHE South Ural State University (National Research University)

    DOI: 10.14529/jsfi190401  

    ISSN: 2313-8734

  60. Developing Efficient Implementations of Shortest Paths and Page Rank Algorithms for NEC SX-Aurora TSUBASA Architecture Peer-reviewed

    I. V. Afanasyev, Vad. V. Voevodin, Vl. V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

    LOBACHEVSKII JOURNAL OF MATHEMATICS 40 (11) 1753-1762 2019/11

    DOI: 10.1134/S1995080219110039  

    ISSN: 1995-0802

    eISSN: 1818-9962

  61. A Skewed Multi-banked Cache for Many-core Vector Processors Peer-reviewed

    Hikaru Takayashiki, Masayuki Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

    Supercomputing Frontiers and Innovations 6 (3) 86-101 2019/09

    Publisher: FSAEIHE South Ural State University (National Research University)

    DOI: 10.14529/jsfi190305  

    ISSN: 2313-8734

  62. A layer-adaptable cache hierarchy by a multiple-layer bypass mechanism

    Ryusuke Egawa, Ryoma Saito, Masayuki Sato, Hiroaki Kobayashi

    PervasiveHealth: Pervasive Computing Technologies for Healthcare 2019/06/06

    Publisher: ICST

    DOI: 10.1145/3337801.3337820  

    ISSN: 2153-1633

  63. Development and Validation of a Tsunami Numerical Model with the Polygonally Nested Grid System and its MPI-Parallelization for Real-Time Tsunami Inundation Forecast on a Regional Scale Invited

    T. Inoue, T. Abe, S. Koshimura, A. Musa, Y. Murashima, H. Kobayashi

    Journal of Disaster Research 14 (3) 416-434 2019/03

    DOI: 10.20965/jdr.2019.p0416  

    ISSN: 1881-2473

    eISSN: 1883-8030

  64. Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines Peer-reviewed

    Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, Kazuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Supercomputing Frontiers and Innovations 6 (1) 36-47 2019/03

    DOI: 10.14529/jsfi190106  

  65. A Hardware Prefetching Mechanism for Vector Gather Instructions. Peer-reviewed

    Hikaru Takayashiki, Masayuki Sato 0001, Kazuhiko Komatsu, Hiroaki Kobayashi

    9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms(IA3@SC) 59-66 2019

    Publisher: IEEE

    DOI: 10.1109/IA349570.2019.00015  

  66. Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA. Peer-reviewed

    Osamu Watanabe, Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato 0001, Akihiro Musa, Hiroaki Kobayashi

    25-32 2019

    DOI: 10.1109/MCHPC49590.2019.00011  

  67. Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors. Peer-reviewed

    Ilya V. Afanasyev, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Hiroaki Kobayashi

    Parallel Computing Technologies - 15th International Conference(PaCT) 125-139 2019

    Publisher: Springer

    DOI: 10.1007/978-3-030-25636-4_10  

  68. An Appropriate Computing System and Its System Parameters Selection Based on Bottleneck Prediction of Applications. Peer-reviewed

    Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato 0001, Hiroaki Kobayashi

    IEEE International Parallel and Distributed Processing Symposium Workshops 768-777 2019

    Publisher: IEEE

    DOI: 10.1109/IPDPSW.2019.00127  

  69. Perceptron-based Cache Bypassing for Way-Adaptable Caches. Peer-reviewed

    Masayuki Sato 0001, Yongcheng Chen, Haruya Kikuchi, Kazuhiko Komatsu, Hiroaki Kobayashi

    IEEE Symposium in Low-Power and High-Speed Chips 1-3 2019

    Publisher: IEEE

    DOI: 10.1109/CoolChips.2019.8721331  

  70. Perceptron-based Cache Bypassing for Way-Adaptable Caches Peer-reviewed

    Masayuki Sato, Yongcheng Chen, Haruya Kikuchi, Kazuhiko Komatsu, Hiroaki Kobayashi

    2019 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS 22) 2019

    ISSN: 2473-4683

  71. Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA Peer-reviewed

    Osamu Watanabe, Yuta Hougi, Kazuhiko Komatsu, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

    PROCEEDINGS OF MCHPC'19: 2019 IEEE/ACM WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING (MCHPC) 25-32 2019

    DOI: 10.1109/MCHPC49590.2019.00011  

  72. Performance Evaluation of Tsunami Inundation Simulation on SX-Aurora TSUBASA. Peer-reviewed

    Akihiro Musa, Takashi Abe, Takumi Kishitani, Takuya Inoue, Masayuki Sato 0001, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

    Computational Science - ICCS 2019 - 19th International Conference, Faro, Portugal, June 12-14, 2019, Proceedings, Part II 363-376 2019

    Publisher: Springer

    DOI: 10.1007/978-3-030-22741-8_26  

  73. An Adjacent-Line-Merging Writeback Scheme for STT-RAM-Based Last-Level Caches

    Masayuki Sato, Yoshiki Shoji, Zentaro Sakai, Ryusuke Egawa, Hiroaki Kobayashi

    IEEE Transactions on Multi-Scale Computing Systems 4 (4) 593-604 2018/10/01

    Publisher: Institute of Electrical and Electronics Engineers Inc.

    DOI: 10.1109/TMSCS.2018.2827955  

    ISSN: 2332-7766

  74. Developing Efficient Implementations of Bellman–Ford and Forward-Backward Graph Algorithms for NEC SX-ACE Peer-reviewed

    Ilya V. Afanasyev, Alexander S. Antonov, Dmitry A. Nikitenko, Vadim V. Voevodin, Vladimir V. Voevodin, Kazuhiko Komatsu, Osamu Watanabe, Akihiro Musa, Hiroaki Kobayashi

    SUPERCOMPUTING FRONTIERS AND INNOVATIONS 5 (3) 65-69 2018/10

    DOI: 10.14529/jsfi180311  

  75. A Machine Learning-based Approach for Selecting SpMV Kernels and Matrix Storage Formats Peer-reviewed

    Hang Cui, Shoichi Hirasawa, Hiroaki Kobayashi, Hiroyuki Takizawa

    IEICE Transactions on Information and Systems E101-D (9) 2307-2314 2018/09

  76. メニーコアプロセッサのためのパラメータチューニング時間削減手法

    岸谷 拓海, 小松 一彦, 撫佐 昭裕, 佐藤 雅之, 小林 広明

    並列/分散/協調処理に関する『熊本』サマー・ワークショップ 2018/07

  77. マルチベクトルコアプロセッサの共有キャッシュ構成に関する一検討,

    高屋敷 光, 佐藤 雅之, 小松 一彦, 江川 隆輔, 小林 広明

    並列/分散/協調処理に関する『熊本』サマー・ワークショップ 2018/07

  78. Expressing the Differences in Code Optimizations between Intel Knights Landing and NEC SX-ACE Processors

    Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

    The 13th World Congress on Computational Mechanics/2nd Pan American Congress on Computational Mechanics 2018/07

  79. An energy-aware set-level refreshing mechanism for eDRAM last-level caches Peer-reviewed

    Masayuki Sato, Zehua Li, Ryusuke Egawa, Hiroaki Kobayashi

    21st IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL Chips 2018 - Proceedings 1-3 2018/06/05

    Publisher: Institute of Electrical and Electronics Engineers Inc.

    DOI: 10.1109/CoolChips.2018.8373082  

  80. Early Evaluation of a New Vector Processor SX-Aurora TSUBASA Peer-reviewed

    Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

    International Supercomputing Conference 2018 (ISC18) 2018/06

  81. Performance Evaluation of a Real-Time Tsunami Inundation Forecast System on Modern Supercomputers Peer-reviewed

    Akihiro Musa, Takumi Kishitani, Takuya Inoue, Hiroaki Hokari, Masayuki Sato, Kazuhiko Komatsu, Yoichi Murashima, Shunichi Koshimura, Hiroaki Kobayashi

    15th Annual Meeting Asia Oceania Geoscience Society 2018/06

    DOI: 10.20965/jdr.2018.p0234  

  82. MIGRATING AN OLD VECTOR CODE TO MODERN VECTOR MACHINES Peer-reviewed

    Hiroyuki Takizawa, Kenta Yamaguchi, Takashi Soga, Thorsten Reimannz, Kuzuhiko Komatsu, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

    Proceedings of the 30th International Conference on Parallel Computational Fluid Dynamics 2018/05

  83. Real-time tsunami inundation forecast system for tsunami disaster prevention and mitigation Peer-reviewed

    Akihiro Musa, Osamu Watanabe, Hiroshi Matsuoka, Hiroaki Hokari, Takuya Inoue, Yoichi Murashima, Yusaku Ohta, Ryota Hino, Shunichi Koshimura, Hiroaki Kobayashi

    Journal of Supercomputing 74 (7) 1-21 2018/04/16

    Publisher: Springer New York LLC

    DOI: 10.1007/s11227-018-2363-0  

    ISSN: 1573-0484 0920-8542

  84. A Real-Time Tsunami Inundation Forecast System Using Vector Supercomputer SX-ACE Peer-reviewed

    Akihiro Musa, Takashi Abe, Takuya Inoue, Hiroaki Hokari, Yoichi Murashima, Yoshiyuki Kido, Susumu Date, Shinji Shimojo, Shunichi Koshimura, Hiroaki Kobayashi

    Journal of Disaster Research 13 (2) 234-244 2018/03

    DOI: 10.20965/jdr.2018.p0234  

    ISSN: 1881-2473

    eISSN: 1883-8030

  85. Tsunami inundation and damage forecasting with high-performance computing infrastructure

    S. Koshimura, Y. Murashima, A. Musa, R. Hino, Y. Ohta, H. Kobayashi, M. Kachi, Y. Sato

    11th National Conference on Earthquake Engineering 2018, NCEE 2018: Integrating Science, Engineering, and Policy 6 3423-3427 2018

    Publisher: Earthquake Engineering Research Institute

  86. 反応・相変化を伴う多分散系混相流シミュレーションコードの最適化

    佐々木, 大輔, 加藤, 季広, 磯部, 洋子, 笠原, 弘貴, 渡部, 広吾輝, 志村, 啓, 奥野, 航平, 松尾, 亜紀子, 江川, 隆輔, 滝沢, 寛之, 小林, 広明

    SENAC : 東北大学大型計算機センター広報 51 (1) 47-51 2018/01

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

    More details Close

    紀要類(bulletin)

  87. Search Space Reduction for Parameter Tuning of a Tsunami Simulation on the Intel Knights Landing Processor Peer-reviewed

    Kazuhiko Komatsu, Takumi Kishitani, Masayuki Sato, Akihiro Musa, Hiroaki Kobayashi

    2018 IEEE 12TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2018) 117-124 2018

    DOI: 10.1109/MCSoC2018.2018.00030  

  88. Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Peer-reviewed

    Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato, Hiroaki Kobayashi

    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18) 2018

  89. Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems Peer-reviewed

    Muhammad Alfian Amrizal, Atsuya Uno, Yukinori Sato, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E100D (12) 2749-2760 2017/12

    DOI: 10.1587/transinf.2017PAP0002  

    ISSN: 1745-1361

  90. Advances of tsunami inundation forecasting and its future perspectives Peer-reviewed

    Shunichi Koshimura, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi, Yoichi Murashima, Akihiro Musa

    OCEANS 2017 - Aberdeen 2017- 1-4 2017/10/25

    Publisher: Institute of Electrical and Electronics Engineers Inc.

    DOI: 10.1109/OCEANSE.2017.8084753  

  91. A Multiple-layer Bypass Mechanism for Energy-Efficient Computing

    Ryusuke Egawa, Masayuki Sato, Ryoma Saito, Hiroaki Kobayashi

    In Proceedings of 26th Workshop on Sustained Simulation Performance 2017/10

  92. Early Evaluation of a Heterogeneous Memory Architecture on a Vector Supercomputer

    Ryosuke Sato, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

    Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers 2017 20-20 2017/08

    Publisher: Organizing Committee of Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers, Japan

    DOI: 10.11528/tsjc.2017.0_20  

  93. A power-aware LLC control mechanism for the 3D-stacked memory system Peer-reviewed

    Ryusuke Egawa, Wataru Uno, Masayuki Sato, Hiroaki Kobayashi, Jubee Tada

    2016 IEEE International 3D Systems Integration Conference, 3DIC 2016 2017/07/05

    Publisher: Institute of Electrical and Electronics Engineers Inc.

    DOI: 10.1109/3DIC.2016.7970034  

  94. Toward Dynamic Load Balancing across OpenMP Thread Teams for Irregular Workloads Peer-reviewed

    Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    International Journal of Networking and Computing 7 (2) 387-404 2017/07

    Publisher: IJNC Editorial Committee

    DOI: 10.15803/ijnc.7.2_387  

    ISSN: 2185-2839

    More details Close

    In the field of high performance computing, massively-parallel many-core processors such as Intel Xeon Phi coprocessors are becoming popular because they can significantly accelerate various applications. In order to efficiently parallelize applications for such many-core processors, several high-level programming models have been proposed. The de facto standard programming model mainly for shared-memory parallel processing is OpenMP. For hierarchical parallel processing, OpenMP version 4.0 or later allows programmers to create multiple thread teams. Each thread team contains a bunch of newly-created synchronizable threads. When multiple thread teams are used to execute an application, it is important to have dynamic load balancing across thread teams, since static load balancing easily encounters load imbalance across teams, and thus degrades performance. In this paper, we first motivate our work by clarifying the benefit of using multiple thread teams to execute an irregular workload on a many-core processor. Then, we demonstrate that dynamic load balancing across those thread teams has a potential of significantly improving the performance of irregular workloads on a many-core processor, with considering the scheduling overhead. Although such a dynamic load balancing mechanism has not been provided by the current OpenMP specification, the benefits of dynamic load balancing across thread teams are discussed through experiments using the Intel Xeon Phi coprocessor. We evaluate the performance gain of dynamic load balancing across thread teams using a ray tracing code. The results show that such a dynamic load balancing mechanism can improve the performance by up to 14% compared to static load balancing across teams, with considering scheduling overhead.

  95. 太陽光及び暑熱同時ばく露に対する熱中症リスク評価シ太陽光及び暑熱同時ばく露に対する熱中症リスク評価シミュレータの開発ミュレータの開発 Peer-reviewed

    西尾 渉, 小寺 紗千子, 平田 晃正, 佐々木 大輔, 山下 毅, 江川 隆輔, 小林 広明, 曽根 秀昭

    電子情報通信学会和文論文誌C J100-C (5) 208-216 2017/05

  96. Effects of Using a Memory-Stalled Core for Handling MPI Communication Overlapping in The SOR Solver Peer-reviewed

    Takashi Soga, Kenta Yamaguchi, Raghunandan Mathur, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

    Proceedings of The 29th International Conference on Parallel Computational Fluid Dynamics (ParallelCFD 2017) 2017/05

  97. 人体太陽光および暑熱同時ばく露による熱中症リスク評価の高速化 Peer-reviewed

    西尾 渉, 小寺 紗千子, 平田 晃正, 佐々木 大輔, 山下 毅, 江川 隆輔, 曽根 秀昭, 小林 広明

    電子情報通信学会論文誌 C J100-C (5) 208-216 2017/04

  98. シナリオテンプレートを用いた自動チューニングに関する研究

    Daichi Sato, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IPSJ National Convention 2017 (1) 45-46 2017/03

  99. A STUDY ON APPLICABILITY OF A TSUNAMI INUNDATION MODEL WITH THE POLYGONALLY NESTED GRID SYSTEM AND ITS MPI-PARALLELIZATION TO NATION-WIDE TSUNAMI FORECAST AT MULTIPLE GRID RESOLUTIONS Peer-reviewed

    INOUE Takuya, ABE Takashi, KOSHIMURA Shunichi, MUSA Akihiro, MURASHIMA Yoichi, KOBAYASHI Hiroaki

    Journal of Japan Society of Civil Engineers, Ser. B2 (Coastal Engineering) 73 (2) I_319-I_324 2017

    Publisher: Japan Society of Civil Engineers

    DOI: 10.2208/kaigan.73.I_319  

    More details Close

    &nbsp;Applicability of a tsunami inundation model with the polygonally nested grid system and its MPI-parallelization to nation-wide tsunami forecast was examined in terms of accuracy and computational costs through tsunami simulation at multiple grid resolutions of 270, 90 and 30 m. The computation efficiency of the tsunami model, in which the configuration of the grid system is extended from rectangular to polygonal regions so that deployment of high-resolution grids is confined to coastal lowland, was further improved by about 14 %. This paper also proposes an automatic way of setting the polygonally nested grid system, and elucidates that it requires 140 Tflop/s supercomputer resources to complete tsunami inundation forecast for the entire coast of Japan at resolution of 30-meter grids within 10 minutes.

  100. Optimization of a tsunami inundation model with the polygonally nested grid system and MPI parallelization Peer-reviewed

    Takuya Inoue, Takashi Abe, Shunichi Koshimura, Akihiro Musa, Yoichi Murashima, Hiroaki Kobayashi

    Proceedings of International Tsunami Symposium 2017 2017

    DOI: 10.1109/OCEANSE.2017.8084753  

  101. Rapid Tsunami Inundation and Damage Estimation System with High-performance Computing and Networking Peer-reviewed

    Shunichi Koshimura, Yoichi Murashima, Akihiro Musa, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi, Masahiro Kachi, Yoshihiro Sato

    Proceedings of International Tsunami Symposium 2017 2017

  102. An Application-adaptive Data Allocation Method for Multi-channel Memory Peer-reviewed

    Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

    2017 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS) 2017

    DOI: 10.1109/CoolChips.2017.7946381  

    ISSN: 2473-4683

  103. An Adjacent-Line-Merging Writeback Scheme for STT-RAM Last-Level Caches Peer-reviewed

    Masayuki Sato, Zentaro Sakai, Ryusuke Egawa, Hiroaki Kobayashi

    2017 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS) 2017

    DOI: 10.1109/CoolChips.2017.7946380  

    ISSN: 2473-4683

  104. Performance and Power Analysis of SX-ACE using HP-X Benchmark Programs Peer-reviewed

    Ryusuke Egawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi, Yoko Isobe, Toshihiro Kato, Souya Fujimoto

    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 693-700 2017

    DOI: 10.1109/CLUSTER.2017.65  

    ISSN: 1552-5244

  105. Performance Evaluation of Quantum ESPRESSO on NEC SX-ACE Peer-reviewed

    Osamu Watanabe, Akihiro Musa, Hiroaki Hokari, Shivanshu Singh, Raghunandan Mathur, Hiroaki Kobayashi

    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 701-708 2017

    DOI: 10.1109/CLUSTER.2017.57  

    ISSN: 1552-5244

  106. Vectorization-aware Loop Optimization with User-defined Code Transformations Peer-reviewed

    Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, Hiroaki Kobayashi

    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 685-692 2017

    DOI: 10.1109/CLUSTER.2017.102  

    ISSN: 1552-5244

  107. Program optimization of numerical turbine for vector supercomputer SX-ACE Peer-reviewed

    Yuta Sakaguchi, Kenryo Kataumi, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi, Satoru Yamamoto

    Computers & Fluids 2017

  108. A Directive Generation Approach to High Code-Maintainability for Various HPC Systems. Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Int. J. Netw. Comput. 7 (2) 405-418 2017

  109. Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE. Peer-reviewed

    Ryusuke Egawa, Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

    The Journal of Supercomputing 73 (9) 3948-3976 2017

    DOI: 10.1007/s11227-017-1993-y  

  110. Directive Translation for Various HPC Systems Using the Xevolver Framework Invited

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Sustained Simulation Performance 2016 109-117 2016/12

    DOI: 10.1007/978-3-319-46735-1_9  

  111. Making a Legacy Code AUto-tunable without Messing It Up Peer-reviewed

    Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, Hiroaki Kobayashi

    Proceedings of the 29th International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) 2016/11

  112. 高バンド幅メモリのための省電力データ配置手法に関する研究

    豊嶋 拓也, 佐藤 雅之, 江川 隆輔, 小林 広明

    東北支部大会連合大会予稿集 2016 39-39 2016/08

    Publisher:

    DOI: 10.11528/tsjc.2016.0_39  

  113. Message from the organizing committee chair Peer-reviewed

    Hiroaki Kobayashi

    19th IEEE Symposium on Low-Power and High-Speed Chips, IEEE COOL Chips 2016 - Proceedings i-ii 2016/07/05

    Publisher: Institute of Electrical and Electronics Engineers Inc.

    DOI: 10.1109/CoolChips.2016.7503663  

  114. Effects of Stacking Granularity on 3-D Stacked Floating-point Fused Multiply Add Units Peer-reviewed

    Jubee Tada, Maiki Hosokawa, Ryusuke Egawa, Hiroaki Kobayashi

    Proceedings of International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART 2016) 2016/07

  115. Performance Optimization of Numerical Turbine for Supercomputer SX-ACE Peer-reviewed

    Y. Sakaguchi, K. Kataumi, H. Matsuoka, O. Watanabe, A. Musa, K. Komatsu, R. Egawa, H. Kobayashi, S. Yamamoto

    Proceedings of the 28th International Conference on Parallel Computational Fluid Dynamics 2016/05

  116. A Power-Performance Tradeoff of HBM by Limiting Access Channels Peer-reviewed

    Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of IEEE Symposium on Low-Power and High-Speed Chips 2016/04

  117. A Bypassing Mechanism for Application-Adaptive Cache Resizing Peer-reviewed

    Masayuki Sato, Takumi Takai, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    The IEICE Transactions on Information and Systems J99-D (3) 337-347 2016/03

    DOI: 10.14923/transinfj.2014JDP7131  

  118. A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE Peer-reviewed

    Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

    International Journal of Networking and Computing 6 (2) 243-262 2016/02

  119. 機械学習を用いたコード変換に関する研究

    川原畑 勇希, 平澤 将一, 滝沢 寛之, 小林 広明

    電気関係学会東北支部連合大会講演論文集 2016 227-227 2016

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2016.0_227  

  120. 多角形領域接続・MPI並列による広域津波解析の効率化 Peer-reviewed

    井上拓也, 阿部孝志, 越村俊一, 撫佐昭裕, 村嶋陽一, 小林広明

    土木学会論文誌B2 72 (2) I_373-I_378 2016

    Publisher: Japan Society of Civil Engineers

    DOI: 10.2208/kaigan.72.I_373  

    More details Close

    &nbsp;This paper elucidated that it requires 2 Pflop/s supercomputer resources to complete tsunami inundation forecast for the entire coast of Japan at resolution of 10-meter grids within 10 minutes if we adopt a numerical model solving non-linear shallow water equations. Therefore, we improved efficiency of the model by extending the geometry of calculation regions from rectanglar to polygonal so that deployment of high-resolution grids is confined to coastal lowland, and validated its accuracy in comparison to the existing model. A wide-area tsunami simulation on the prefectural scale resulted in over 3 times more efficient, and the possibility of nation-wide tsunami inundation forecast was indicated.

  121. ディレクティブに基づくステンシル計算の性能パラメータ自動設定 Peer-reviewed

    角川 拓也, 平澤 将一, 滝沢 寛之, 小林 広明

    情報処理学会論文誌コンピューティングシステム(ACS) 9 (4) 25-37 2016

  122. Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework. Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

    Int. J. Netw. Comput. 6 (2) 167-180 2016

  123. A Code Selection Mechanism Using Deep Learning Peer-reviewed

    Hang Cui, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC) 385-392 2016

    DOI: 10.1109/MCSoC.2016.46  

  124. A Cache Partitioning Mechanism to Protect Shared Data for CMPs Peer-reviewed

    Masayuki Sato, Shin Nishimura, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2016 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS XIX) 2016

    DOI: 10.1109/CoolChips.2016.7503674  

    ISSN: 2473-4683

  125. A User-Defined Code Transformation Approach to Overlapping MPI Communication with Computation Peer-reviewed

    Yasuharu Hayashi, Hiroyuki Takizawa, Hiroaki Kobayashi

    2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) 508-514 2016

    DOI: 10.1109/CANDAR.2016.35  

    ISSN: 2379-1888

  126. A Directive Generation Approach Using User-defined Rules Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) 515-521 2016

    DOI: 10.1109/CANDAR.2016.94  

    ISSN: 2379-1888

  127. The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads Peer-reviewed

    Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) 529-535 2016

    DOI: 10.1109/CANDAR.2016.48  

    ISSN: 2379-1888

  128. Parallel processing model for cholesky decomposition algorithm in AlgoWiki project Peer-reviewed

    Alexander S. Antonov, Alexey V. Frolov, Hiroaki Kobayashi, Igor N. Konshin, Alexey M. Teplov, Vadim V. Voevodin, Vladimir V. Voevodin

    Supercomputing Frontiers and Innovations 3 (3) 61-70 2016

    Publisher: South Ural State University, Publishing Center

    DOI: 10.14529/jsfi160307  

    ISSN: 2313-8734 2409-6008

  129. Performance Evaluation of Compiler-Assisted OpenMP Codes on Various HPC Systems Invited

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Sustained Simulation Performance 2015 147-157 2015/12

    DOI: 10.1007/978-3-319-20340-9_12  

  130. A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning Peer-reviewed

    Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E98D (12) 2178-2186 2015/12

    DOI: 10.1587/transinf.2015PAP0028  

    ISSN: 1745-1361

  131. A Real-Time Tsunami Inundation Forecast System for Tsunami Disaster and Mitigation Peer-reviewed

    Akihiro Musaa, Hiroshi Matsuoka, Osamu Watanabe, Yoichi Murashima, Shunichi Koshimura, Ryota Hino, Yusaku Ohta, Hiroaki Kobayashi

    the 28th International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) 2015/11

  132. An Approach to the Highest Efficiency of the HPCG Benchmark on the SX-ACE Supercomputer Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Yoko Isobe, Ryusei Ogata, Hiroyuki Takizawa, Hiroaki Kobayashi

    the 28th International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) 2015/11

  133. 三次元積層時代における高電力効率メモリ階層設計

    宇野 渉, 佐藤 雅之, 江川 隆輔, 小林 広明

    信学技報 115 (271) 19-24 2015/10

    Publisher:

    ISSN: 0913-5685

  134. マルチコアプロセッサのためのスレッド間共有データを考慮したキャッシュ機構

    西村 秦, 佐藤 雅之, 江川 隆輔, 小林 広明

    研究報告計算機アーキテクチャ(ARC) 2015-ARC-216 (38) 1-8 2015/08

  135. FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms Peer-reviewed

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEICE TRANSACTIONS ON ELECTRONICS E98C (7) 550-558 2015/07

    DOI: 10.1587/transele.E98.C.550  

    ISSN: 1745-1353

  136. Xevolver による実アプリケーションの性能と保守性の両立

    平澤将一, 滝沢寛之, 小林広明

    計算工学講演会論文集 20 4p 2015/06

    Publisher:

  137. Performance Evaluation of an OpenMP Parallelization by Using Automatic Parallelization Information

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Sustained Simulation Performance 2014 119-126 2015

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-319-10626-7_10  

  138. Code Optimization Activities Toward a High Sustained Simulation Performance

    Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

    Sustained Simulation Performance 2015 159-168 2015

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-319-20340-9_13  

  139. Design of a 3-D Stacked Floating-point Goldschmidt Divider Peer-reviewed

    Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

    2015 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC 2015) 2015

    ISSN: 2164-0157

  140. A Data Management Policy for Energy-Efficient Cache Mechanisms

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Sustained Simulation Performance 2015 61-75 2015

    DOI: 10.1007/978-3-319-20340-9_6  

  141. Xevolver を用いた自動チューニング

    平澤将一, 肖熊, 滝沢寛之, 小林広明

    計算工学会学会誌「計算工学」 20 (2) 14-17 2015

  142. Identication and elimination of platform-specic code smells in high performance computing applications Peer-reviewed

    Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    International Journal of Networking and Computing 5 (1) 180-199 2015

    Publisher: IJNC Editorial Committee

    DOI: 10.15803/ijnc.5.1_180  

    ISSN: 2185-2839

    More details Close

    A code smell is a code pattern that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help users find which parts of their application codes should be refactored. However, code smells have not been defined in a formal manner. Moreover, existing detection tools are designed mainly for object-oriented applications, but rarely provided for high performance computing (HPC) applications. HPC applications are usually optimized for a particular platform to achieve a high performance, and hence have special code smells called platform-specific code smells (PSCSs). The purpose of this work is to develop a code smell alert system to help users find PSCSs of HPC applications to improve the performance portability across different platforms. This paper presents a PSCS alert system that is based on an abstract syntax tree (AST) and XML. Code patterns of PSCSs are defined in a formal way using the AST information represented in XML. XML Path Language (XPath) is used to describe those patterns. A database is built to store the transformation recipes written in XSLT files for eliminating detected PSCSs. The recall and precision evaluation results obtained by using real applications show that the proposed system can detect potential PSCSs accurately. The evaluation on performance portability of real applications demonstrates that eliminating PSCSs leads to significant performance changes and therefore the code portions with detected PSCSs have to be refactored to improve the performance portability across multiple platforms.

  143. Optimized Data Transfers Based on the OpenCL Event Management Mechanism Peer-reviewed

    Hiroyuki Takizawa, Shoichi Hirasawa, Makoto Sugawara, Isaac Gelado, Hiroaki Kobayashi, Wen-mei W. Hwu

    SCIENTIFIC PROGRAMMING 2015 (576498) 2015

    DOI: 10.1155/2015/576498  

    ISSN: 1058-9244

    eISSN: 1875-919X

  144. Real-time tsunami inundation forecasting and damage estimation method by fusion of real-time crustal deformation monitoring and high-performance computing Peer-reviewed

    S. Koshimura, R. Hino, Y. Ohta, H. Kobayashi, A. Musa, Y. Murashima

    the 26th International Union of Geodesy and Geophysics 2015

  145. Expressing system-awareness as code transformations for performance portability across diverse HPC Peer-reviewed

    Hiroyuki Takizawa, Shoichi Hirasawa, Kazuhiko Komatsu, Ryusuke Egawa, Hiroaki Kobayashi

    Workshop on Portability Among HPC Architectures for Scientific Applications 2015

  146. Combining code refactoring and auto-tuning to improve performance portability of high-performance computing applications Peer-reviewed

    Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    The Sixth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking (COMPUTATION TOOLS 2015) 2015

  147. Automatic Parameter Tuning of Hierarchical Incremental Checkpointing Peer-reviewed

    Alfian Amrizal, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2014 8969 298-309 2015

    DOI: 10.1007/978-3-319-17353-5_25  

    ISSN: 0302-9743

  148. A Verification Framework for Streamlining Empirical Auto-tuning Peer-reviewed

    Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) 508-514 2015

    DOI: 10.1109/CANDAR.2015.115  

    ISSN: 2379-1888

  149. Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, Hiroaki Kobayashi

    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) 515-520 2015

    DOI: 10.1109/CANDAR.2015.102  

    ISSN: 2379-1888

  150. A Case Study of Memory Optimization for Migration of a Plasmonics Simulation Application to SX-ACE Peer-reviewed

    Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa, Hiroaki Kobayashi

    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) 521-527 2015

    DOI: 10.1109/CANDAR.2015.105  

    ISSN: 2379-1888

  151. A Case Study of User-Defined Code Transformations for Data Layout Optimizations Peer-reviewed

    Takeshi Yamada, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) 535-541 2015

    DOI: 10.1109/CANDAR.2015.96  

    ISSN: 2379-1888

  152. An Energy-Efficient Dynamic Memory Address Mapping Mechanism Peer-reviewed

    Masayuki Sato, Chengguang Han, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2015 IEEE SYMPOSIUM ON LOW-POWER AND HIGH-SPEED CHIPS 2015

    DOI: 10.1109/CoolChips.2015.7158660  

  153. Designing an HPC Refactoring Catalog Toward the Exa-scale Computing Era

    Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi

    Sustained Simulation Performance 2014 91-98 2014/11

    DOI: 10.1007/978-3-319-10626-7_8  

  154. Early Evaluation of the SX-ACE Processor Peer-reviewed

    Ryusuke Egawa, Shintaro Momose, Kazuhiko Komatsu, Yoko Isobe, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi

    the 27th International Conference for High Performance Computing, Networking, Storage and Analysis (SC14) 2014/11

  155. MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications Peer-reviewed

    Ye Gao, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E97D (11) 2835-2843 2014/11

    DOI: 10.1587/transinf.2014EDP7227  

    ISSN: 1745-1361

  156. ベクトル型メディアプロセッサの低消費電力化に関する研究

    宇野 渉, 高 也, 佐藤 雅之, 江川 隆輔, 滝沢 寛之, 小林 広明

    電気関係学会東北支部連合大会予稿集 2014/08

  157. キャッシュメモリにおけるスレッド間共有データの管理に関する研究

    西村 秦, 佐藤 雅之, 江川 隆輔, 滝沢 寛之, 小林 広明

    電気関係学会東北支部連合大会予稿集 2014/08

  158. Exploring system architectures for next-generation CFD simulations in the postpeta-scale era Peer-reviewed

    KOMATSU Kazuhiko, EGAWA Ryusuke, TAKIZAWA Hiroyuki, SOGA Takashi, MUSA Akihiro, KOBAYASHI Hiroaki

    Journal of Fluid Science and Technology 9 (5) JFST0073-JFST0073 2014

    Publisher: The Japan Society of Mechanical Engineers

    DOI: 10.1299/jfst.2014jfst0073  

    ISSN: 1880-5558

    More details Close

    CFD simulations with uniform grids have been paid attention as a next-generation CFD simulation on a large-scale supercomputing system. The Building-Cube Method (BCM) is one of the next-generation CFD methods. The basic idea is to balance loads of calculations among processing elements on a supercomputing system by dividing the whole calculations into many parallel tasks with the same amount of computation. Thus, it is suitable for highly parallel computation on supercomputing systems. This paper firstly implements BCM on five supercomputing systems as an example of a next-generation CFD simulation in the upcoming postpeta-scale era. Then, by theoretical analyses and performance evaluations, this paper clarifies the requirements of future supercomputing systems for a next-generation CFD simulation. The performance evaluations show that as the number of processing elements increases, the imbalance of data exchanges among nodes becomes more serious than that of calculations even in a next-generation CFD simulation. While the calculation time can ideally be reduced according to the number of processing elements, the data transfer time becomes dominant in the total execution time. Different from the massively-parallel system architecture, the number of nodes in a system should be as small as possible to prevent the data transfer. The performance analyses also show that the memory bandwidth limits the performance of BCM and use of an on-chip memory is effective to improve the performance. A memory subsystem that achieves a higher sustained memory bandwidth is required. Therefore, a supercomputing system that consists of a small number of high-performance nodes is essential to achieve high sustained performance of the next-generation CFD in the up coming postpeta-scale era by reducing the data transfers, which becomes eventually a bottleneck in large-scale simulation.

  159. On-Chip Checkpointing with 3D-Stacked Memories Peer-reviewed

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC) 1-6 2014

    DOI: 10.1109/3DIC.2014.7152173  

    ISSN: 2164-0157

  160. OpenMP Parallelization Method using Compiler Information of Automatic Optimization Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Legacy HPC Application Migration 2014 2014

  161. Real-time tsunami inundation forecasting and damage mapping towards enhancing tsunami disaster resilience Peer-reviewed

    S. Koshimura, R. HIno, Y. Ohta, H. Kobayashi, A. Musa, Y.Murashima

    American Geophysical Union Fall Meeting 2014

  162. An Approach to Customization of Compiler Directives for Application-Specific Code Transformations Peer-reviewed

    Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC) 99-106 2014

    DOI: 10.1109/MCSoC.2014.23  

  163. Xevolver: An XML-based Code Translation Framework for Supporting HPC Application Migration Peer-reviewed

    Hiroyuki Takizawa, Shoichi Hirasawa, Yasuharu Hayashi, Ryusuke Egawa, Hiroaki Kobayashi

    2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) 2014

    DOI: 10.1109/HiPC.2014.7116902  

    ISSN: 1094-7256

  164. A compiler-assisted OpenMP migration method based on automatic parallelizing information Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8488 450-459 2014

    Publisher: Springer Verlag

    DOI: 10.1007/978-3-319-07518-1_30  

    ISSN: 1611-3349 0302-9743

  165. A Platform-Specific Code Smell Alert System for High Performance Computing Applications Peer-reviewed

    Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) 653-662 2014

    DOI: 10.1109/IPDPSW.2014.76  

  166. An Energy Optimization Method for Vector Processing Mechanisms Peer-reviewed

    Ye Gao, Masayuki Satoi, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2014 IEEE COOL CHIPS XVII 2014

    DOI: 10.1109/CoolChips.2014.6842957  

    ISSN: 2473-4683

  167. On-Chip Checkpointing with 3D-Stacked Memories Peer-reviewed

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC) 2014

    DOI: 10.1109/3DIC.2014.7152173  

    ISSN: 2164-0157

  168. An Impact of Circuit Scale on the Performance of 3-D Stacked Arithmetic Units Peer-reviewed

    Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

    2014 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC) 2014

    ISSN: 2164-0157

  169. An XML-based Programming Framework for User-defined Code Transformations Peer-reviewed

    Hiroyuki Takizawa, Xiong Xiao, Shoichi Hirasawa, Hiroaki Kobayashi

    The 4th AICS International Symposium 2013/12/02

  170. 複合システムにおけるチェックポイントリスタート Peer-reviewed

    滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

    日本信頼性学会誌 35 (12) 515-516 2013/12

    DOI: 10.11348/reajshinrai.35.8_515  

  171. 三次元LSIの課題と高信頼化 Peer-reviewed

    小柳光正, 小林広明, 末吉敏則, 鎌田忠

    日本信頼性学会誌 35 (8) 471-471 2013/12

    Publisher: Reliability Engineering Association of Japan (REAJ)

    DOI: 10.11348/reajshinrai.35.8_471  

    ISSN: 0919-2697

  172. Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi

    International Conference on Fluid Dynamics(ICFD2013), November 27 2013/11/27

  173. Xevolver : an XML-based Programming Framework for Software Evolution Peer-reviewed

    Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

    Supercomputing Conference 2013 (SC13) 2013/11

  174. An Automatic Performance Tracking System for Software Evolution Peer-reviewed

    平澤 将一, 滝沢 寛之, 小林 広明

    情報処理学会論文誌コンピューティングシステム(ACS) 6 (4) 96-104 2013/10/30

    ISSN: 1882-7829

  175. A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts Peer-reviewed

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E96D (9) 2047-2054 2013/09

    DOI: 10.1587/transinf.E96.D.2047  

    ISSN: 1745-1361

  176. ブロックバイパス機構によるキャッシュのエネルギ効率化に関する研究

    高井 拓実, 佐藤 雅之, 江川 隆輔, 滝沢 寛之, 小林 広明

    並列/分散/協調処理に関する「北九州」サマー・ワークショップ (SWoPP2013) 1-9 2013/07

  177. Autotuning for Improving the Fault Tolerance of Large-scale Simulations Peer-reviewed

    Hiroyuki Takizawa, Alfian Amrizal, Shoichi Hirasawa, Hiroaki Kobayashi

    Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@2HPC) 2013/05

  178. An Automatic Performance Tracking System for Scientific Software Evolution Peer-reviewed

    Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi

    Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@2HPC) 2013/05

  179. An IDE Integrated Cross-Platform Build System for Scientific Applications Peer-reviewed

    Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    SIAM CSE2013 Minisymposium on Auto-tuning Technologies for Tools and Development Environment in Extreme-Scale Scientific Computing 2013/02

  180. Performance Evaluation of a Next-Generation CFD on Various Supercomputing Systems

    Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Sustained Simulation Performance 2012 123-132 2013

    Publisher: Springer Berlin Heidelberg

    DOI: 10.1007/978-3-642-32454-3_11  

  181. Analysing the performance improvements of optimizations on modern HPC systems Peer-reviewed

    Kazuhiko Komatsu, Toshihide Sasaki, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Sustained Simulation Performance 2013 - Proceedings of the Joint Workshop on Sustained Simulation Performance 13-25 2013

    Publisher: Springer Science and Business Media, LLC

    DOI: 10.1007/978-3-319-01439-5-2  

  182. Feasibility study of future HPC systems for memory-intensive applications Peer-reviewed

    Hiroaki Kobayashi

    Sustained Simulation Performance 2013 - Proceedings of the Joint Workshop on Sustained Simulation Performance 3-11 2013

    Publisher: Springer Science and Business Media, LLC

    DOI: 10.1007/978-3-319-01439-5-1  

  183. Exploring a design space of 3-D stacked vector processors Peer-reviewed

    Ryusuke Egawa, Jubee Tada, Hiroaki Kobayashi

    Sustained Simulation Performance 2012 - Proceedings of the Joint Workshop on High Performance Computing on Vector Systems, and Workshop on Sustained Simulation Performance 35-49 2013

    Publisher: Springer Science and Business Media, LLC

    DOI: 10.1007/978-3-642-32454-3-4  

  184. Message from the organizing committee chair Peer-reviewed

    Hiroaki Kobayashi

    IEEE Symposium on Low-Power and High-Speed Chips - Proceedings for 2013 COOL Chips XVI i-ii 2013

    DOI: 10.1109/CoolChips.2013.6547906  

  185. ClMPI: An opencl extension for interoperation with the message passing interface Peer-reviewed

    Hiroyuki Takizawa, Makoto Sugawara, Shoichi Hirasawa, Isaac Gelado, Hiroaki Kobayashi, Wen-Mei W. Hwu

    Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013 1138-1148 2013

    Publisher: IEEE Computer Society

    DOI: 10.1109/IPDPSW.2013.183  

  186. Power and Performance Evaluation of 3-D Stacked Floating-point Multipliers Peer-reviewed

    Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

    IEEE Computer Society Annual Symposium on VLSI (ISLVLSI2013) 218-223 2013

  187. Design and Evaluation of a Media-oriented Vector Processor with a Multi-banked Cache Memory Peer-reviewed

    Ye Gao, Naold Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2013 IEEE 11TH SYMPOSIUM ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA (ESTIMEDIA) 78-87 2013

    DOI: 10.1109/ESTIMedia.2013.6704506  

    ISSN: 2325-1271

  188. Vertically Integrated Processor and Memory Module Design for Vector Supercomputers Peer-reviewed

    Ryusuke Egawa, Masayuki Sato, Jubee Tada, Hiroaki Kobayashi

    2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC) 1-8 2013

    ISSN: 2164-0157

  189. Design of a 3-D Stacked Floating-Point Adder Peer-reviewed

    Jubee Tada, Ryusuke Egawa, Hiroaki Kobayashi

    2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC) 1-5 2013

    ISSN: 2164-0157

  190. Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD Peer-reviewed

    Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi

    International Conference on Fluid Dynamics(ICFD2013) 2013

  191. Performance evaluation of phase-based correspondence matching on GPUs Peer-reviewed

    Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

    APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXVI 8856 2013

    DOI: 10.1117/12.2023550  

    ISSN: 0277-786X

    eISSN: 1996-756X

  192. A comparison of performance tunabilities between OpenCL and OpenACC Peer-reviewed

    Makoto Sugawara, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013 147-152 2013

    Publisher: IEEE Computer Society

    DOI: 10.1109/MCSoC.2013.31  

  193. A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms Peer-reviewed

    Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2013 IEEE COOL CHIPS XVI (COOL CHIPS) 2013

    DOI: 10.1109/CoolChips.2013.6547923  

    ISSN: 2473-4683

  194. Performance Portability Issues on Modern HPC Systems

    小松一彦, 江川隆輔, 安田一平, 撫佐昭裕, 松岡浩司, 小林広明

    情報処理学会研究報告(CD-ROM) 2012 (4) ROMBUNNO.HPC-136,NO.27 2012/12/15

    ISSN: 2186-2583

  195. ウェイ適応型キャッシュの高エネルギ効率化のためのデッドブロック早期追い出しポリシ Peer-reviewed

    東方 雄亮, 佐藤 雅之, 江川 隆輔, 滝沢 寛之, 小林 広明

    先進的計算基盤シンポジウムSACSIS2012 2012 4-5 2012/05

  196. メタ情報拡散に基づくP2P型自己組織化サービス資源検索機構 Peer-reviewed

    稲葉勉, 村田善智, 滝沢寛之, 小林広明

    電子情報通信学会論文誌 D J95-D (5) 1110-1122 2012/05

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 1880-4535

  197. A bypass mechanism for way-adaptable caches Peer-reviewed

    Takumi Takai, Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEEE COOL Chips XV 2012/04

  198. A Runtime Dependency Analysis Method for Task Parallelization of OpenCL Programs Peer-reviewed

    Katuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

    IPSJ Transactions on Computing Systems 5 (1) 53-67 2012/01/27

    Publisher:

    ISSN: 1882-7829

  199. A Runtime Dependency Analysis Method for Task Parallelization of OpenCL Programs Peer-reviewed

    佐藤功人, 小松一彦, 滝沢寛之, 小林広明

    情報処理学会論文誌 論文誌コンピューティングシステム(ACS) 5 (1) 53-67 2012/01/27

    Publisher:

    ISSN: 1882-7829

  200. Performance and scalability analysis of a chip multi vector processor Peer-reviewed

    Yoshiei Sato, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    High Performance Computing on Vector Systems 2011 3-20 2012

    Publisher: Springer Science and Business Media, LLC

    DOI: 10.1007/978-3-642-22244-3-1  

  201. A prototype implementation of OpenCL for SX vector systems Peer-reviewed

    Hiroyuki Takizawa, Ryusuke Egawa, Hiroaki Kobayashi

    High Performance Computing on Vector Systems 2011 41-50 2012

    Publisher: Springer Science and Business Media, LLC

    DOI: 10.1007/978-3-642-22244-3-3  

  202. A media-oriented vector architectural extension with a high bandwidth cache system Peer-reviewed

    Ye Gao, Naoki Shoji, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Symposium on Low-Power and High-Speed Chips - Proceedings for 2012 IEEE COOL Chips XV 1-3 2012

    DOI: 10.1109/COOLChips.2012.6216588  

  203. Exploring design space of a 3D stacked vector cache Peer-reviewed

    Ryusuke Egawa, Jubee Tada, Yusuke Endo, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012 1475-1477 2012

    DOI: 10.1109/SC.Companion.2012.270  

  204. Performance Evaluation of BCM on Various Supercomputing Systems Peer-reviewed

    Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

    Proceedings of 24th International Conference on Parallel Computational Fluid Dynamics 2012

  205. An out-of-order vector processing mechanism for multimedia applications Peer-reviewed

    Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    CF '12 - Proceedings of the ACM Computing Frontiers Conference 233-235 2012

    DOI: 10.1145/2212908.2212941  

  206. A capacity-efficient insertion policy for dynamic cache resizing mechanisms Peer-reviewed

    Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    CF '12 - Proceedings of the ACM Computing Frontiers Conference 265-267 2012

    DOI: 10.1145/2212908.2212949  

  207. GPU IMPLEMENTATION OF PHASE-BASED STEREO CORRESPONDENCE AND ITS APPLICATION Peer-reviewed

    Mamoru Miura, Kinya Fudano, Koichi Ito, Takafumi Aoki, Hiroyuki Takizawa, Hiroaki Kobayashi

    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012) 1697-1700 2012

    DOI: 10.1109/ICIP.2012.6467205  

    ISSN: 1522-4880

  208. Improving the Scalability of Transparent Checkpointing for GPU Computing Systems Peer-reviewed

    Alfian Amrizal, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

    TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY 2012

    ISSN: 2159-3442

  209. A Network Clustering Algorithm for Sybil-Attack Resisting Peer-reviewed

    Ling Xu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E94D (12) 2345-2352 2011/12

    DOI: 10.1587/transinf.E94.D.2345  

    ISSN: 0916-8532

    eISSN: 1745-1361

  210. Performance of building cube method on various platforms Peer-reviewed

    Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

    The 8th International Conference on Flow Dynamics 2011 (ICFD2011) 2011/11

  211. An automatic task assignment method for heterogeneous computing systems Peer-reviewed

    Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

    The 8th International Conference on Flow Dynamics 2011 (ICFD2011) 2011/11

  212. Job Scheduling with Migration for Heterogeneous Computing Systems Peer-reviewed

    kentaro Koyama, Katuto Sato, Kazuhiko Komatsu, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

    IPSJ Transactions on Computing Systems 4 (4) 203-213 2011/10/05

    Publisher:

    ISSN: 1882-7829

  213. A Patch-Based Bit Mask Filtering Method for Micropolygon Rasterization Peer-reviewed

    Jiali Yao, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of High-Performance Graphics(HPG) 2011/08

  214. Performance of SOR methods on modern vector and scalar processors Peer-reviewed

    Takashi Soga, Akihiro Musa, Koki Okabe, Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

    COMPUTERS & FLUIDS 45 (1) 215-221 2011/06

    DOI: 10.1016/j.compfluid.2010.12.024  

    ISSN: 0045-7930

  215. Parallel processing of the Building-Cube Method on a GPU platform Peer-reviewed

    Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

    COMPUTERS & FLUIDS 45 (1) 122-128 2011/06

    DOI: 10.1016/j.compfluid.2010.12.019  

    ISSN: 0045-7930

  216. ウェイ適応型キャッシュのための低消費エネルギ指向挿入ポリシ Peer-reviewed

    東方 雄亮, 佐藤 雅之, 江川 隆輔, 滝沢 寛之, 小林 広明

    先進的計算基盤シンポジウムSACSIS2011 2011 213-214 2011/05

  217. A Power-Aware Insertion Policy for the Way-Adaptable Caches Peer-reviewed

    Yusuke Tobo, Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of COOLChips XIV 2011/04

  218. Energy Consumption of a Chip Multi-Vector Processor Using Real Applications

    永岡龍一, 佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 小林広明

    情報処理学会研究報告(CD-ROM) 2010 (5) ROMBUNNO.ARC-192,NO.3 2011/02/15

    ISSN: 2186-2583

  219. A Self-Organized Overlay Network Management Mechanism for Heterogeneous Environments Peer-reviewed

    Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

    IPSJ Journal 52 (2) 320-333 2011/02

    Publisher: Information and Media Technologies Editorial Board

    DOI: 10.11185/imt.6.546  

    More details Close

    The technologies of Cloud Computing and NGN are now growing a paradigm shift where various services are provided to business users over the network. In conjunction with this movement, many studies are active to realize a ubiquitous computing environment in which a huge number of individual users can share their computing resources on the Internet, such as personal computers (PCs), game consoles, sensors and so on. To realize an effective resource discovery mechanism for such an environment, this paper presents an adaptive overlay network that enables a self-organizing resource management system to efficiently adapt to a heterogeneous environment. The proposed mechanism is composed of two functions. One is to adjust the number of logical links of a resource, which forward search queries so that less-useful query flooding can be reduced. The other is to connect resources so as to decrease the communication latency on the physical network rather than the number of query hops on an overlay network. To further improve the discovery efficiency, this paper integrates these functions into a self-organizing resource management system, SORMS, which has been proposed in our previous work. The simulation results indicate that the proposed mechanism can increase the number of discovered resources by 60% without decreasing the discovery efficiency, and can reduce the total communication traffic by 80% compared with the original SORMS. This performance improvement is obtained by efficient control of logical links in a large scale network.

  220. A High-Performance Volunteer Computing Environment with a Dynamic Load-Balancing Mechanism Peer-reviewed

    Yoshitomo Murata, Yuki Ishimori, Hiroyuki Takizawa, Hiroaki Kobayashi

    IPSJ Journal 52 (2) 401-414 2011/02

  221. Performance Evaluation of Real-Time Stereo Correspondence on GPU

    Tohoku-Section Joint Convention Record of Institutes of Electrical and Information Engineers, Japan 2011 31-31 2011

    Publisher: Organizing Committee of Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers, Japan

    DOI: 10.11528/tsjc.2011.0_31  

  222. Power-aware dynamic cache partitioning for CMPs Peer-reviewed

    Isao Kotera, Kenta Abe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6590 135-153 2011

    DOI: 10.1007/978-3-642-19448-1_8  

    ISSN: 0302-9743 1611-3349

  223. Large scaled computation of incompressible flows on Cartesian mesh using a vector-parallel supercomputer Peer-reviewed

    Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

    Lecture Notes in Computational Science and Engineering 74 332-338 2011

    DOI: 10.1007/978-3-642-14438-7-35  

    ISSN: 1439-7358

  224. A self-organized overlay network management mechanism for heterogeneous environments Peer-reviewed

    Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

    Journal of Information Processing 19 (0) 25-38 2011

    Publisher: Information Processing Society of Japan

    DOI: 10.2197/ipsjjip.19.25  

    ISSN: 1882-6652 0387-5806

  225. A Performance Tuning Strategy Based on the Roofline Model for Vector Processors Peer-reviewed

    Yosiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    情報処理学会論文誌:コンピューティングシステム(ACS) 4 (3) 77-87 2011

    ISSN: 1882-7772

  226. OpenCLにおけるタスク並列化支援のための実行時依存関係解析手法 Peer-reviewed

    佐藤功人, 小松一彦, 滝沢寛之, 小林広明

    情報処理学会論文誌 コンピューティングシステム(ACS) 5 (1) 53-67 2011/01

  227. A history-based performance prediction model with profile data classification for automatic task allocation in heterogeneous computing systems Peer-reviewed

    Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings - 9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011 135-142 2011

    DOI: 10.1109/ISPA.2011.36  

  228. CheCL: Transparent checkpointing and process migration of OpenCL applications Peer-reviewed

    Hiroyuki Takizawa, Kentaro Koyama, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

    Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011 864-876 2011

    DOI: 10.1109/IPDPS.2011.85  

  229. Effects of 3-D stacked vector cache on energy consumption Peer-reviewed

    Ryusuke Egawa, Yusuke Funaya, Ryuichi Nagaoka, Yusuke Endo, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2011 IEEE International 3D Systems Integration Conference, 3DIC 2011 2011

    DOI: 10.1109/3DIC.2012.6263026  

  230. A middle-grain circuit partitioning strategy for 3-D integrated floating-point multipliers Peer-reviewed

    Jubee Tada, Ryusuke Egawa, Kazushige Kawai, Hiroaki Kobayashi, Gensuke Goto

    2011 IEEE International 3D Systems Integration Conference, 3DIC 2011 2011

    DOI: 10.1109/3DIC.2012.6263031  

  231. A performance tuning strategy under combining loop transforms for a vector processor with an on-chip cache Peer-reviewed

    Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    ACM/IEEE Supercomputing Conference (SC10) 2010/11

  232. A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering Peer-reviewed

    SUZUKI Ken-ichi, KAERIYAMA Yoshiyuki, KOMATSU Kazuhiko, EGAWA Ryusuke, OHBA Nobuyuki, KOBAYASHI Hiroaki

    IEICE Transactions on Information and Systems 93 (4) 891-902 2010/04/01

    Publisher: The Institute of Electronics, Information and Communication Engineers

    DOI: 10.1587/transinf.E93.D.891  

    ISSN: 0916-8532

    More details Close

    Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive dynamic scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.

  233. A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering Peer-reviewed

    Ken-ichi Suzuki, Yoshiyuki Kaeriyama, Kazuhiko Komatsu, Ryusuke Egawa, Nobuyuki Ohba, Hiroaki Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E93D (4) 891-902 2010/04

    DOI: 10.1587/transinf.E93.D.891  

    ISSN: 1745-1361

  234. The vector computing cloud: Toward a vector meta-computing environment Peer-reviewed

    Ryusuke Egawa, Manabu Higashida, Yoshitomo Murata, Hiroaki Kobayashi

    High Performance Computing on Vector Systems 2010 75-91 2010

    Publisher: Springer Science and Business Media, LLC

    DOI: 10.1007/978-3-642-11851-7-6  

  235. Automatic tuning of CUDA execution parameters for stencil processing Peer-reviewed

    Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

    Software Automatic Tuning: From Concepts to State-of-the-Art Results 209-228 2010

    Publisher: Springer New York

    DOI: 10.1007/978-1-4419-6935-4_13  

  236. Lessons Learned from 1-Year Experience with SX-9 and Toward the Next Generation Vector Computing Peer-reviewed

    Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoko Isobe

    HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009 3-+ 2010

    DOI: 10.1007/978-3-642-03913-3_1  

  237. Large-Scale Flow Computation of Complex Geometries by Building-Cube Method Peer-reviewed

    Daisuke Sasaki, Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

    HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009 167-+ 2010

    DOI: 10.1007/978-3-642-03913-3_13  

  238. Cache partitioning strategies for 3-D stacked vector processors Peer-reviewed

    Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEEE 3D System Integration Conference 2010, 3DIC 2010 1-6 2010

    DOI: 10.1109/3DIC.2010.5751453  

  239. Efficient data management for the building cube method using cartesian meshes on the GPU platform Peer-reviewed

    Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, Kazuhiro Nakahashi

    International Supercomputing Conference (ISC10) 2010

  240. A Majority-Based Control Scheme for Way-Adaptable Caches Peer-reviewed

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    FACING THE MULTICORE-CHALLENGE: ASPECTS OF NEW PARADIGMS AND TECHNOLOGIES IN PARALLEL COMPUTING 6310 16-+ 2010

    DOI: 10.1007/978-3-642-16233-6_5  

    ISSN: 0302-9743

    eISSN: 1611-3349

  241. Evaluating Performance and Portability of OpenCL Programs Peer-reviewed

    Kazuhiko Komatsu, Katsuto Sato, Yusuke Arai, Kentaro Koyama, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the 5th international Workshop on Automatic Performance Tuning 2010

  242. Resisting sybil attack by social network and network clustering Peer-reviewed

    Ling Xu, Satayapiwat Chainan, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings - 2010 10th Annual International Symposium on Applications and the Internet, SAINT 2010 15-21 2010

    DOI: 10.1109/SAINT.2010.32  

  243. A history-based job scheduling mechanism for the vector computing cloud Peer-reviewed

    Yoshitomo Murata, Ryusuke Egawa, Manabu Higashida, Hiroaki Kobayashi

    Proceedings - 2010 10th Annual International Symposium on Applications and the Internet, SAINT 2010 125-128 2010

    DOI: 10.1109/SAINT.2010.43  

  244. A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications Peer-reviewed

    Ye Gao, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    13TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN: ARCHITECTURES, METHODS AND TOOLS 412-415 2010

    DOI: 10.1109/DSD.2010.93  

  245. A Voting-Based Working Set Assessment Scheme for Dynamic Cache Resizing Mechanisms Peer-reviewed

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2010 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN 98-105 2010

    DOI: 10.1109/ICCD.2010.5647599  

    ISSN: 1063-6404

  246. Design and early evaluation of a 3-D die stacked chip multi-vector processor Peer-reviewed

    Ryusuke Egawa, Yusuke Funaya, Ryu-Ichi Nagaoka, Akihiro Musa, Hiroyuki Takizawat, Hiroaki Kobayashi

    IEEE 3D System Integration Conference 2010, 3DIC 2010 2010

    DOI: 10.1109/3DIC.2010.5751448  

  247. Performance Optimization Techniques for Vector Processors with Cache Memory

    佐藤義永, 永岡龍一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

    情報処理学会研究報告(CD-ROM) 2009 (3) ROMBUNNO.ARC-184,6 2009/10/15

    ISSN: 2186-2583

  248. Working Sets based Thread Scheduling with Cache Partitioning Peer-reviewed

    Masayuki Sato, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Poster Abstracts of The Eighteenth International Conference on Parallel Architecture and Compilation Techniques 12 2009/09

  249. ワーキングセット評価に基づくスレッドスケジューリング

    佐藤 雅之, 小寺 功, 江川 隆輔, 滝沢 寛之, 小林 広明

    並列/分散/協調処理に関する「仙台」サマー・ワークショップ (SWoPP仙台2009) 1-10 2009/08

  250. Early evaluation of a memory-stacked vector processor Peer-reviewed

    Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEEE COOL Chips XII 165 2009/04

  251. 実アプリケーションによるSX‐9の性能評価

    曽我隆, 下村陽一, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

    情報処理学会シンポジウム論文集 2009 (2) 57-64 2009/01/15

    ISSN: 1344-0640

  252. Evaluating Computational Performance of Backpropagation Learning on Graphics Hardware Peer-reviewed

    Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

    Electronic Notes in Theoretical Computer Science 225 (C) 379-389 2009/01/02

    DOI: 10.1016/j.entcs.2008.12.087  

    ISSN: 1571-0661

  253. Study of high resolution incompressible flow simulation based on Cartesian mesh

    Shun Takahashi, Takashi Ishida, Kazuhiro Nakahashi, Hiroaki Kobayashi, Koki Okabe, Youichi Shimomura, Takashi Soga, Akihiko Musa

    47th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition 2009

  254. 3D On-Chip Memory for the Vector Architecture Peer-reviewed

    Yusuke Funaya, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION 352-357 2009

    ISSN: 2164-0157

  255. Characteristics of an On-Chip Cache on NEC SX Vector Architecture Peer-reviewed

    Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    Interdisciplinary Information Sciences 15 (1) 51-66 2009

    Publisher: Graduate School of Information Sciences, Tohoku University

    DOI: 10.4036/iis.2009.51  

    ISSN: 1340-9050

    More details Close

    Thanks to the highly effective memory bandwidth of the vector systems, they can achieve the high computation efficiency for computation-intensive scientific applications. However, they have been encountering the memory wall problem and the effective memory bandwidth rate has decreased, resulting in the decrease in the bytes per flop rates of recent vector systems from 4 (SX-7 and SX-8) to 2 (SX-8R) and 2.5 (SX-9). The situation is getting worse as many functions units and/or cores will be brought into a single chip, because the pin bandwidth is limited and does not scale. To solve the problem, we propose an on-chip cache, called vector cache, to maintain the effective memory bandwidth rate of future vector supercomputers. The vector cache employs a bypass mechanism between the main memory and register files under software controls. We evaluate the performance of the vector cache on the NEC SX vector processor architecture with bytes per flop rates of 2 B/FLOP and 1 B/FLOP, to clarify the basic characteristics of the vector cache. For the evaluation, we use the NEC SX-7 simulator extended with the vector cache mechanism. Benchmark programs for performance evaluation are two DAXPY-like loops and five leading scientific applications. The results indicate that the vector cache boosts the computational efficiencies of the 2 B/FLOP and 1 B/FLOP systems up to the level of the 4 B/FLOP system. Especially, in the case where cache hit rates exceed 50%, the 2 B/FLOP system can achieve a performance comparable to the 4 B/FLOP system. The vector cache with the bypass mechanism can provide the data both from the main memory and the cache simultaneously. In addition, from the viewpoints of designing the cache, we investigate the impact of cache associativity on the cache hit rate, and the relationship between cache latency and the performance. The results also suggest that the associativity hardly affects the cache hit rate, and the effects of the cache latency depend on the vector loop length of applications. The cache shorter latency contributes to the performance improvement of the applications with shorter loop lengths, even in the case of the 4 B/FLOP system. In the case of longer loop lengths of 256 or more, the latency can effectively be hidden, and the performance is not sensitive to the cache latency. Finally, we discuss the effects of selective caching using the bypass mechanism and loop unrolling on the vector cache performance for the scientific applications. The selective caching is effective for efficient use of the limited cache capacity. The loop unrolling is also effective for the improvement of performance, resulting in a synergistic effect with caching. However, there are exceptional cases; the loop unrolling worsens the cache hit rate due to an increase in the working space to process the unrolled loops over the cache. In this case, an increase in the cache miss rate cancels the gain obtained by unrolling.

  256. A Cache-Aware Thread Scheduling Policy for Multi-Core Processors Peer-reviewed

    Masayuki Sato, Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks 109-114 2009

  257. Evaluation of Fine Grain 3-D Integrated Arithmetic Units Peer-reviewed

    Ryusuke Egawa, Jubee Tada, Hiroaki Kobayashi, Gensuke Goto

    2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION 198-+ 2009

    ISSN: 2164-0157

  258. Performance tuning and analysis of future vector processors based on the roofline model Peer-reviewed

    Yoshiei Sato, Ryuichi Nagaoka, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    ACM International Conference Proceeding Series 7-14 2009

    DOI: 10.1145/1621960.1621962  

  259. CheCUDA: A Checkpoint/Restart Tool for CUDA Applications Peer-reviewed

    Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009) 408-+ 2009

    DOI: 10.1109/PDCAT.2009.78  

  260. Performance Evaluation of NEC SX-9 using Real Science and Engineering Applications Peer-reviewed

    Takashi Soga, Akihiro Musa, Youichi Shimomura, Ken'ichi Itakura, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS 2009

    DOI: 10.1145/1654059.1654088  

  261. Activities of Cyberscience Center and Performance Evaluation of the SX-9 Supercomputer Peer-reviewed

    Hiroaki Kobayashi, Ryusuke Egawa, Kouki Okabe, Eiichi Ito, Kenji Oizumi

    NEC TECHNICAL JOURNAL 3 (4) 64-72 2008/12

    ISSN: 1880-5884

  262. Caching on a chip multi vector processor Peer-reviewed

    Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    ACM/IEEE Supercomputing Conference (SC08) 2008/11

  263. A PARALLEL IMAGE GENERATION ALGORITHM BASED ON PHOTON MAPPING Peer-reviewed

    Masahide Tamura, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the International Conference on Computer Graphics and Imaging (CGIM 2008) 145-151 2008/02

  264. First Experiences with NEC SX-9.

    Hiroaki Kobayashi, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Akihiko Musa, Takashi Soga, Yoichi Shimomura

    High Performance Computing on Vector Systems 3-11 2008

    Publisher: Springer

    DOI: 10.1007/978-3-540-85869-0_1  

  265. The potential of on-chip memory systems for future vector architectures Peer-reviewed

    Hiroaki Kobayashi, Akihiko Musa, Yoshiei Sato, Hiroyuki Takizawa, Koki Okabe

    HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2007 247-+ 2008

  266. A Utility-based Double Auction Mechanism for Efficient Grid Resource Allocation Peer-reviewed

    Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS 252-260 2008

    DOI: 10.1109/ISPA.2008.103  

  267. A Distributed and Cooperative Load Balancing Method for Large-Scale Computing Environments Peer-reviewed

    Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

    IPSJ(Information Proessing Society Japan) Journal 49 (3) 1214-1228 2008

  268. A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination Peer-reviewed

    Komatsu Kazuhiko, Kaeriyama Yoshiyuki, Suzuki Kenichi, Takizawa Hiroyuki, Kobayashi Hiroaki

    IPSJ Online Transactions 1 (1) 1-11 2008

    Publisher: Information Processing Society of Japan

    DOI: 10.2197/ipsjtrans.1.1  

    ISSN: 1882-6660

    More details Close

    Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with M&ouml;ller's algorithm by exploiting the features of the bundles of rays.

  269. SPRAT:実行時自動チューニング機能を備えるストリーム処理記述用言語 Peer-reviewed

    滝沢寛之, 白取寛貴, 佐藤功人, 小林広明

    情報処理学会論文誌:コンピューティングシステム(ACS) 1 (2) 207-220 2008

    Publisher:

    ISSN: 1882-7829

  270. A Performance Study of Secure Data Mining on the Cell Processor Peer-reviewed

    Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

    CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS 1 (2) 633-+ 2008

  271. An Efficient Intersection Algorithm Design of Ray Tracing for Many-Core Graphics Processors Peer-reviewed

    Kazuhiro Komatasu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the International Conference on Computer Graphics and Imaging (CGIM 2008) 165-171 2008

  272. A Performance Study of Secure Data Mining on the Cell Processor Peer-reviewed

    Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

    CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS 633-+ 2008

    DOI: 10.1109/CCGRID.2008.16  

  273. Implementation and Evaluation of a Distributed and Cooperative Load-Balancing Mechanism for Dependable Volunteer Computing Peer-reviewed

    Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

    2008 IEEE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS & NETWORKS WITH FTCS & DCC 316-+ 2008

    DOI: 10.1109/DSN.2008.4630100  

    ISSN: 1530-0889

  274. Hierarchical Parallel Processing of Ray Tracing on a Cell Cluster Invited Peer-reviewed

    Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

    Prceedings of 1st International Workshop on Super Visualization (IWSV08) 2008

  275. Consideration of resource access history for optimizing overlay networks in P2P-based resource discovery Peer-reviewed

    Tsutomu Inaba, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayash

    Proceedings - 2008 International Symposium on Applications and the Internet, SAINT 2008 269-272 2008

    DOI: 10.1109/SAINT.2008.104  

  276. A Reliability Model for Result Checking in Volunteer Computing Peer-reviewed

    Ling Xu, Hirouyki Takizawa, Hiroaki Kobayashi

    Proceedings of DAS-P2P 2008 Workshop 201-204 2008

    DOI: 10.1109/SAINT.2008.25  

  277. Gain Based Delay Balancing in the Deep Submicron Era Peer-reviewed

    Ryusuke EGAWA, Jubee TADA, Hiroaki Kobayashi, Gensuke GOTO

    Proceedings of The 23nd International Technical Conference on Circuits/Systems (ITC-CSCC 2008) 577-580 2008

  278. SPRAT: Runtime Processor Selection for Energy-aware Computing Peer-reviewed

    Hiroyuki Takizawa, Katuto Sato, Hiroaki Kobayashi

    2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING 386-393 2008

    DOI: 10.1109/CLUSTR.2008.4663799  

    ISSN: 1552-5244

  279. Effects of MSHR and Prefetch Mechanisms on an On-Chip Cache of the Vector Architecture Peer-reviewed

    Akihiro Musa, Yoshiei Sato, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS 335-+ 2008

    DOI: 10.1109/ISPA.2008.100  

  280. Auction-based Resource Allocation for Activating Incentives in Resource Trading in Grid Computing Peer-reviewed

    Chainan Satayapiwat, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of The 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications 252-260 2008

  281. Modeling of cache access behavior based on Zipf's law Peer-reviewed

    Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT 310 9-15 2008

    DOI: 10.1145/1509084.1509086  

    ISSN: 1089-795X

  282. A shared cache for a chip multi vector processor Peer-reviewed

    Akihiro Musa, Yoshiei Sato, Takashi Soga, Koki Okabe, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT 310 24-29 2008

    DOI: 10.1145/1509084.1509088  

    ISSN: 1089-795X

  283. A Power-Aware Shared Cache Mechanism Based on Locality Assessment of Memory Reference for CMPs Peer-reviewed

    Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Transactions on High-Performance Embedded Architectures and Compilers 3 (1) 149-167 2008

  284. Early evaluation of on-chip vector caching for the NEC SX vector architecture Peer-reviewed

    Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    ACM/IEEE Supercomputing Conference (SC07) 2007/11

  285. A progressive 3D-meshing algorithm for interactive simulation of soft bodies Peer-reviewed

    Tomoyuk Saoi, Hiroyuki Takizawat, Hiroaki Kobayashi

    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL 10 (6) 761-776 2007/11

    ISSN: 1343-4500

  286. A dependable Peer-to-Peer computing platform Peer-reviewed

    Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 23 (8) 939-955 2007/11

    DOI: 10.1016/j.future.2007.03.004  

    ISSN: 0167-739X

    eISSN: 1872-7115

  287. Partial distortion entropy maximization for online data clustering Invited Peer-reviewed

    Hiroyuki Takizawa, Hiroaki Kobayashi

    NEURAL NETWORKS 20 (7) 819-831 2007/09

    DOI: 10.1016/j.neunet.2007.04.029  

    ISSN: 0893-6080

  288. 消費電力を考慮したウェイアロケーション型共有キャッシュ機構 Peer-reviewed

    小寺功, 滝沢寛之, 小林広明

    情報科学技術レターズ 55-58 2007/09

  289. Accelerating Möller Intersection Algorithm Using Ray Packets Peer-reviewed

    Kazuhiro Komatsu, Yoshiyuki Kaeriyama, Ken-ichi Suzuki, Hiroaki Kobayashi, Tadao Nakamura

    Information Technology Letters 265-268 2007/09

  290. SMTプロセッサの実行時性能予測のためのハードウェアリソース競合解析 Invited Peer-reviewed

    佐藤雅之, 船矢祐介, 小寺功, 滝沢寛之, 小林広明

    情報科学技術レターズ 67-70 2007/09

  291. An Estimation-Based Redundant Task Dispatch Policy for Volunteer Computing Platforms Peer-reviewed

    Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the International Conference on Dependable Systems and Networks 348-349 2007/06/25

    More details Close

    Fast Abstract (Supplemental Volume)

  292. A fair-sharing and power-aware L2 cache system for chip multiprocessors Peer-reviewed

    Isao Kotera, Hiroyuki Takizawa, Hiroaki Kobayashi

    IEEE COOL Chips X 2007/04

  293. Memory Efficient Scheme for Fast Spectral Photon Mapping Peer-reviewed

    Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the Ninth IASTED International Conference on Computer Graphics and Imaging (CGIM 2007) 2007/02

  294. A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs Peer-reviewed

    Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT 113-120 2007

    DOI: 10.1145/1327171.1327185  

    ISSN: 1089-795X

  295. Preliminary evaluation for runtime auto-tuning of GPGPU applications Peer-reviewed

    Hiroyuki Takizawa, Hiroki Shiratori, Hiroaki Kobayashi

    The 2nd International Workshop on Automatic Performance Tuning 37-37 2007

  296. An Efficient Control Mechanism for Self-Organizing Overlay Networks of Large-Scale P2P Systems Peer-reviewed

    Hiroaki Kobayashi, Hiroyuki Takizawa, Takuro Okawa, Tsutomu Inaba

    Interdisciplinary Information Sciences 13 (2) 227-237 2007

    Publisher: Tohoku University

    DOI: 10.4036/iis.2007.227  

    ISSN: 1340-9050

    More details Close

    P2P (Peer to Peer) has a great potential to handle highly-distributed computing resources and is expected to be a key technology to realize ubiquitous computing environments over the Internet. However, P2P systems tend to waste the network bandwidth for resource acquisition because of their decentralized resource management. This paper presents an efficient control mechanism for self-organizing overlay networks of large-scale P2P systems, and evaluate its performance in detail. The overlay network is configured by making local clusters reflect current interests of individual peers and connecting them together based on their similarity. As a result, the overlay network provides the resource exploitation space for some specific interests. In addition, the overlay network can dynamically be reconfigured based on the change in the interests of individual peers across time so that more useful peers at that time can be reconnected closer to their client peers. Therefore, multicasting of resource requesting messages can be carried out only over peers with similar interests that are dynamically connected through the overlay network, resulting in a remarkable decrease in both messages for resource acquisition and hops a resource requesting query travels to reach the peer that satisfies the request. Experimental results indicate that the proposed mechanism can realize effective self-organization of the overlay network in which useful peers are dynamically relocated around client peers. In addition, the adaptive allocation of links to peers according to their capability works well to keep the higher performance and fault-tolerance of the self-organizing overlay network.

  297. An on-chip cache design for vector processors Peer-reviewed

    Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT 17-23 2007

    DOI: 10.1145/1327171.1327173  

    ISSN: 1089-795X

  298. A Power-Aware Shared Cache Mechanism Based on Locality Assessment of Memory Reference for CMPs Peer-reviewed

    Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the MEDEA workshop (PACT 07) 121-128 2007

  299. Performance Evaluation of K-Means Clustering on the Cell Processor Peer-reviewed

    Hong Wang, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of High Performance Computing Symposium 2007 2007 (1) 161-168 2007/01

  300. An on-chip cache design for vector processors Invited Peer-reviewed

    Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT 17-23 2007

    DOI: 10.1145/1327171.1327173  

    ISSN: 1089-795X

  301. Multi-Core Data Streaming Architecture for Ray Tracing Peer-reviewed

    Yoshiyuki Kaeriyama, Daichi Zaitsu, Kenichi Suzuki, Hiroaki Kobayashi, Nobuyuki Ohba

    2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2 171-+ 2007

    DOI: 10.1109/ICCD.2007.4601897  

    ISSN: 1063-6404

  302. Thread Scheduling Based on the Thread Characteristics for Multi-Core Processors Invited Peer-reviewed

    Yusuke Funaya, Isao Kotera, Hiroyuki Takizawa, Hiroaki Kobayashi

    Information Technology Letters 5 (5) 37-40 2006/09

  303. A Dynamic Logical Link Management Mechanism for P2P Resource Discovery Systems Peer-reviewed

    Takuro Okawa, Hiroyuki Takizawa, Hiroaki Kobayashi

    Information Technology Letters 5 (5) 363-366 2006/09

    Publisher: Forum on Information Technology

  304. Towards Effective GPU Implementation of Neural Networks Peer-reviewed

    Hiroyuki Takizawa, Tatsuya Chida, Hiroaki Kobayashi

    Proceedings of the fourth Irish Conference on Mathematical Foundations of Computer Science and Information Technology (MFCSIT) 2006/07

  305. Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing Peer-reviewed

    H Takizawa, H Kobayashi

    JOURNAL OF SUPERCOMPUTING 36 (3) 219-234 2006/06

    DOI: 10.1007/s11227-006-8294-1  

    ISSN: 0920-8542

  306. Radiative heat transfer simulation using programmable graphics hardware Peer-reviewed

    Hiroyuki Takizawa, Noboru Yamada, Seigo Sakai, Hiroaki Kobayashi

    Proceedings - 5th IEEE/ACIS Int. Conf. on Comput. and Info. Sci., ICIS 2006. In conjunction with 1st IEEE/ACIS, Int. Workshop Component-Based Software Eng., Softw. Archi. and Reuse, COMSAR 2006 2006 29-37 2006

    DOI: 10.1109/ICIS-COMSAR.2006.70  

  307. Design and Implementation of an Efficient Search Mechanism based on the Hybrid P2P Model for Ubiquitous Computing Systems Peer-reviewed

    T Inaba, T Okawa, Y Murata, H Takizawa, H Kobayashi

    INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET , PROCEEDINGS 45-+ 2006

    DOI: 10.1109/SAINT.2006.23  

  308. A distributed and cooperative load balancing mechanism for large-scale P2P systems Peer-reviewed

    Y Murata, T Inaba, H Takizawa, H Kobayashi

    INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS 126-129 2006

    DOI: 10.1109/SAINT-W.2006.2  

  309. An efficient text capture method for moving robots using DCT feature and text tracking Peer-reviewed

    Hiroki Shiratori, Hideaki Goto, Hiroaki Kobayashi

    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS 1050-+ 2006

    DOI: 10.1109/ICPR.2006.243  

    ISSN: 1051-4651

  310. Implications of memory performance for highly efficient supercomputing of scientific applications Peer-reviewed

    Akihiro Musa, Hiroyuki Takizawa, Koki Okabe, Takashi Soga, Hiroaki Kobayashi

    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS 4330 845-+ 2006

    ISSN: 0302-9743

  311. An Efficient Method for Finding Texts in Living Environments Using an Active Camera Peer-reviewed

    齋藤精二, 後藤英昭, 小林広明

    電子情報通信学会論文誌 J88-D-II (9) 2003-2006 2005/09

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1923

  312. 大規模P2Pシステムにおける計算資源探索のモデル化と性能評価 Peer-reviewed

    大川拓郎, 滝沢寛之, 小林広明

    情報科学技術レターズ 46 (4) 21-24 2005/09

    Publisher: Forum on Information Technology

  313. An Incremental Photon-Mapping Algorithm for Fast Walk-Through Animations Peer-reviewed

    Kosuke Ikeda, Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of International Conference on Computer Graphics and Imaging 1-7 2005/08

  314. HPC Challenge ベンチマークを用いたSX-7 システムの性能評価 Peer-reviewed

    滝沢寛之, 小久保達信, 片海健亮, 小林広明

    先進的計算基盤システムシンポジウム(SACSIS2005) 2005 (5) 25-33 2005/05

  315. A New Dynamic Decomposition Method for Parallel Molecular Dynamics Simulation Peer-reviewed

    V.Zhakhovskii, K.Nishihara, Y.Fukuda, S.Shimojo, T.Akiyama, S.Miyanaga, H.Sone, H.Kobayashi, E.Ito, Y.Seo, M.Tamura, Y.Ueshima

    Proceedings of Cluster Computing and Grid 2005 9-12 2005/05

  316. A distributed cooperative scheduling mechanism for P2P computing

    Yoshitomo Murata, Tsutomu Inaba, Hiroyuki Takizawa, Hiroaki Kobayashi

    Advanced Network & Computing Technology Workshop (33) 23-30 2005/01/24

  317. A P2P Semantic Information Search Mechanism for Ubiquitous Grid Computing Systems

    Tsutomu Inaba, Takuro Okawa, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

    Advanced Network & Computing Technology Workshop (33) 45-52 2005/01

  318. Evaluation of Large-Scale Remote Interactive Visialization via Super SINET Peer-reviewed

    Hiroyuki Takizawa, Hiroaki Kobayashi

    Information 8 (3) 383-389 2005

  319. Performance Evaluation of the SX-7 System using the HPC Challenge Benchmark Peer-reviewed

    滝沢寛之, 小久保達信, 片海健亮, 小林広明

    情報処理学会論文誌 46 (SIG12) 37-45 2005

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 1882-7829

    More details Close

    The HPC challenge benchmark (HPCC) is a benchmark suite developed for comprehensive performance evaluation of high-performance computing (HPC) systems. HPCC is promising to appropriately evaluate the effective performance of HPC systems for practical scientific computing, due to its multilateral evaluation from several viewpoints, such as memory access and networking performances, along with the floating-point operation rate widely used until now. In this paper, we report the performance evaluation results of an NEC SX-7 system of Information Synergy Center, Tohoku University, using the HPCC benchmark. Based on the results that the system can get excellent scores in 16 of 28 tests in the benchmark, we discuss the superiority of its vector architecture in the field of HPC.

  320. Text detection in color scene images based on unsupervised clustering of multi-channel wavelet features Peer-reviewed

    T Saoi, H Goto, H Kobayashi

    Eighth International Conference on Document Analysis and Recognition, Vols 1 and 2, Proceedings 690-694 2005

    DOI: 10.1109/ICDAR.2005.227  

  321. A self-organizing overlay network to exploit the locality of interests for effective resource discovery in P2P systems Peer-reviewed

    H Kobayashi, H Takizawa, T Inaba, Y Takizawa

    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS 246-255 2005

  322. A workflow management mechanism for peer-to-peer computing platforms Peer-reviewed

    H Wang, H Takizawa, H Kobayashi

    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS 3758 827-832 2005

    ISSN: 0302-9743

  323. Efficient parallel processing of competitive learning algorithms Peer-reviewed

    K Sano, S Momose, H Takizawa, H Kobayashi, T Nakamura

    PARALLEL COMPUTING 30 (12) 1361-1383 2004/12

    DOI: 10.1016/j.parco.2004.10.001  

    ISSN: 0167-8191

    eISSN: 1872-7336

  324. スーパーSINETを介した大規模遠隔対話的可視化の評価実験

    滝沢寛之, 小林広明

    全国共同利用情報基盤センター研究開発論文集 26 24-29 2004/11

  325. Evaluation of Large-Scale Remote Interactive Visialization via Super SINET Peer-reviewed

    Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of the 3rd International Conference on Information (INFO2004) 456-459 2004/11

  326. スーパーSINETを利用した大規模遠隔可視化処理の評価

    滝沢寛之, 小林広明

    東北大学情報シナジーセンター年報 3 90-96 2004/06

    Publisher:

  327. グリッドミドルウェアGlobusの資源探索と通信に関するオーバヘッドの定量的評価

    村田善智, 稲葉勉, 滝沢寛之, 小林広明

    東北大学情報シナジーセンター年報 3 115-123 2004/06

    Publisher:

  328. An Effective Implementation of Vector Quantization Encoder on Commodity Graphics Hardware Peer-reviewed

    Hiroyuki Takizawa, Hiroaki Kobayashi

    Proceedings of International Conference on IT and Applications (ICITA) 2004

  329. A fast computation scheme of partial distortion entropy updating Peer-reviewed

    H Takizawa, H Kobayashi

    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS 736-741 2004

    DOI: 10.1109/ITCC.2004.1286555  

  330. Locality analysis to control dynamically way-adaptable caches Peer-reviewed

    Hiroaki Kobayashi, Isao Kotera, Hiroyuki Takizawa

    Proceedings of the 2004 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '04 25-32 2004

    DOI: 10.1145/1152922.1101874  

  331. Multi-grain parallel processing of data-clustering on programmable graphics hardware Peer-reviewed

    H Takizawa, H Kobayashi

    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS 3358 16-27 2004

    ISSN: 0302-9743

  332. Locality analysis to control dynamically way-adaptable caches Peer-reviewed

    Hiroaki Kobayashi, Isao Kotera, Hiroyuki Takizawa

    Proceedings of the 2004 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '04 33 (3) 25-32 2004

    DOI: 10.1145/1152922.1101874  

  333. グリッド用動的資源管理のための自己組織化P2Pネットワークに関する一検討

    瀧澤泰明, 滝沢寛之, 佐野健太郎, 小林広明, 中村維男

    情報処理学会東北支部研究会 2003/11

  334. 画像のエッジ劣化を抑制するベクトル量子化符号帳設計 Peer-reviewed

    滝沢寛之, 三浦 健, 小林広明, 中村維男

    Information Technology Letters 2 243-244 2003/09

  335. Vector quantization codebook design using the law-of-the-jungle algorithm Peer-reviewed

    H Takizawa, T Nakajima, K Sano, H Kobayashi, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E86D (6) 1068-1077 2003/06

    ISSN: 0916-8532

  336. A Comparison Study Of Vector Quantization Codebook Design Algorithms Based On The Equidistortion Principle Peer-reviewed

    Hiroyuki Takizawa, Taira Nakajima, Kentaro Sano, Hiroaki Kobayashi

    Proceedings of the 21st IASTED International Conference on Applied Informatics 255-261 2003

  337. An Instruction Cache Mechanism for Simultaneous Multithreaded VLIW Processors Peer-reviewed

    Jubei Tada, Hugo, Kenji, Pereira Harada, Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

    The Journal of Asian Information-Science-Life 2 (1) 2003

  338. Parallel processing for vector quantization codebook design

    S. Momose, K. Sano, H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

    並列/協調/分散処理に関する「湯布院」サマーワークショップ資料 2002/08

  339. Design and Evaluation of the Mulhi Cache Peer-reviewed

    Jubei Tada, Takuya Nakaike, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

    電子情報通信学会論文誌 J85-D-I (3) 274-285 2002

  340. Real-Time Ray-Tracing with the 3DCGiRAM Architecture Peer-reviewed

    Ken-ichi Suzuki, Yasumasa Saida, Kentaro Sano, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

    IEICE Transactions J85-D-II (8) 1365-1367 2002

  341. An Interleaved Multiple-Hit Cache for Simultaneous Multithreaded VLIW Processors Peer-reviewed

    Jubei Tada, Hugo Kenji, Pereira Harada, Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

    Proceedings of the Third International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'02) 25-32 2002

  342. Practical Volume Compression based on Vector Quantization using the Low-of-the-Jungle Algorithm Peer-reviewed

    Kentaro Sano, Hiroyuki Takizawa, Taira Nakajima, Hiroaki Kobayashi, Tadao Nakamura

    Proceedings of the 2nd International Conference on Visualization, Imaging, and Image Processing 519-526 2002

  343. Interactive Ray-Tracing on the 3DCGiRAM Architecture Peer-reviewed

    Hiroaki Kobayashi, Ken-ichi Suzuki, Kentaro Sano, Nobuyuki Oba

    Proceedings of ACM/IEEE MICRO-35 4th Workshop on Media and Streaming Processors 53-59 2002

  344. High-Performance Photo-Realistic Graphics on the 3DCGiRAM Architecture Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of International Conference on Optical Communication and Multimedia (ICOCM2002) 114-117 2002

  345. PARALLEL ALGORITHM FOR THE LAW-OF-THE-JUNGLE LEARNING TO THE FAST DESIGN OF OPTIMAL CODEBOOKS Peer-reviewed

    Kentaro Sano, Shintaro. Momose, Hiroyuki Takizawa, Clecio.Donizete. Lima, Hiroaki Kobayashi, Tadao Nakamura

    Proceedings of Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2002) 582-587 2002

  346. 視覚的画質劣化を抑制するベクトル量子化手法 Peer-reviewed

    三浦 健, 滝沢寛之, 佐野健太郎, 中島 平, 小林広明, 中村維男

    Information Technology Letters 1 185-186 2002

  347. Object-Space Parallel Processing of the Multi-Pass Rendering Method for Message-Passing Parallel Processing Systems Peer-reviewed

    Hiroaki Kobayashi, Hitoshi Yamauchi, Takayuki Maeda, Mayumi Tokunaga, Tadao Nakamura

    The International Journal of High Performance Computer Graphics, Multimedia and Visualisation 1 (3) 1-14 2001

  348. A Design of Caluculation Units for the Images Synthesis Intelligent Memory 3DCGiRAM Peer-reviewed

    Ken-ichi Suzuki, Yoshiyuki Kaeriyama, Jun Sugiyama, Yasumasa Saida, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

    Proceedings of JSPP2001 2001 (6) 295-302 2001

  349. A Technology-Scalable Multithreaded Architecture Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the 13-th Symposium on Computer Architecture and High-Performance Computing 82-89 2001

  350. 3DCGiRAM: An intelligent memory architecture for photo-realistic image synthesis Peer-reviewed

    H Kobayashi, K Suzuki, K Sano, Y Kaeriyama, Y Saida, N Oba, T Nakamura

    2001 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD 2001, PROCEEDINGS 462-467 2001

    ISSN: 1063-6404

  351. Dynamic Boosting for VLIW Architectures Peer-reviewed

    KOBAYASHI Hiroaki

    IEICE Transactions on Information and Systems J80-D-I (1) 171-183 2000

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1915

  352. Data-parallel volume rendering with adaptive volume subdivision Peer-reviewed

    K Sano, H Kitajima, H Kobayashi, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E83D (1) 80-89 2000/01

    ISSN: 1745-1361

  353. An active learning algorithm based on existing training data Peer-reviewed

    H Takizawa, T Nakajima, H Kobayashi, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E83D (1) 90-99 2000/01

    ISSN: 0916-8532

  354. Reconfigurable synchronized dataflow processor Peer-reviewed

    Hiroshi Sasaki, Hitoshi Maruyama, Hideaki Tsukioka, Nobuyoshi Shoji, Hiroaki Kobayashi, Tadao Nakamura

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 27-28 2000

    DOI: 10.1145/368434.368490  

  355. A Pre-attributed Resampling Algorithm for Controlled-Precision Volume Ray-Casting Peer-reviewed

    Kentaro Sano, Hiroaki Kobayashi, Tadao Nakamura

    IPSJ Journal 41 (SIG 5) 113-124 2000

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0387-5806

    More details Close

    Accurate volume rendering is essential for some visualization applications, e.g., medical imaging. However, the computationally expensive feature of conventional volume rendering algorithms for high-quality image generation has restricted their practical use. In this paper, we propose a pre-attributed resampling algorithm that accomplishes controlled-precision volume ray-casting at low computational coste. This algorithm changes resampling intervals based on numerical errors of the volume rendering integral so that the number of resampling points becomes minimum for a given error bound. Besides, to reduce computational costs for resampling, a simple interpolation method is applied to resampling points in regions where intensities and opacities are constant. To suppress the overhead of precision control, information on the numerical errors and the constant regions is obtained for each voxel in pre-processing, and then related to volume data as voxel attributes. The experimental results demonstrate that the proposed algorithm outperforms conventional ray-casting algorithms without precision control for accurate visualization in termes of accuracy/processing-time performance.

  356. Developing a Practical Parallel Multi-pass Render in Java and C --- Toward a Grande Application in Java Peer-reviewed

    Hitoshi Yamauchi, Atsusi Maeda, Hiroaki Kobayashi

    Proceedings of the ACM 2000 Java Grande Conference 126-133 2000

  357. A Scheduling Method for Instruction-Level Parallel Processing of Vector and Scalar Instructions Peer-reviewed

    Takuya Nakaike, Takehito Sasaki, Masayuki Katahira, Hiroaki Kobayashi, Tadao Nakamura

    Systems and Computers in Japan 30 (13) 23-33 1999/11/30

    Publisher: John Wiley and Sons Inc.

    DOI: 10.1002/(SICI)1520-684X(19991130)30:13<23::AID-SCJ3>3.0.CO;2-3  

    ISSN: 0882-1666

  358. A topology preserving neural network for nonstationary distributions Peer-reviewed

    T Nakajima, H Takizawa, H Kobayashi, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E82D (7) 1131-1135 1999/07

    ISSN: 0916-8532

  359. Acceleration techniques for the network inversion algorithm Peer-reviewed

    H Takizawa, T Nakajima, M Nishi, H Kobayashi, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E82D (2) 508-511 1999/02

    ISSN: 0916-8532

  360. Time stamp invalidation of TLB-unified cache and its performance evaluation Peer-reviewed

    Ken-Ichi Suzuki, Nobuyuki Oba, Shigenori Shimizu, Hiroaki Kobayashi, Tadao Nakamura

    Systems and Computers in Japan 30 (11) 94-106 1999

    Publisher: John Wiley and Sons Inc.

    DOI: 10.1002/(SICI)1520-684X(199910)30:11<94::AID-SCJ11>3.0.CO;2-S  

    ISSN: 0882-1666

  361. MULHI Cache : An Instruction Cache Mechanism for VLIW Processors Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of Information Processing Society of Japan 40 (5) 1996-2007 1999

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 1882-7764

    More details Close

    VLIW (Very Long Instruction Word) processors, which are expected to be a next generation high performance microprocessor architecture, need a high-bandwidth, high-hit-rate instruction cache to fetch VLIWs and issue operations of each VLIW to function units quickly. However, when VLIWs including many nops (no operations) are stored in a conventional instruction cache, the cache utilization is not high, resulting in the performance degradation of VLIW processors. In this paper, a new instruction cache mechanism for VLIW processors, named MULHI (MULtiple HIt) cache, is proposed and evaluated using several programs in the SPEC95 benchmark suite. The experimental results indicate that the MULHI cache achieves 1.68 times higher performance than a conventional instruction cache that stores VLIWs with nops.

  362. A Self-organizing network system forming memory from nonstationary probability distributions Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the International Joint Conference on Neural Networks 99 1999

  363. An Architecture of the Reconfigurable Synchronous Dataflow Computer and its Software Development Environment Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of The Seventh Japanese FPGA/PLD Design Conference and Exhibit 9-14 1999

  364. Kohonen learning with a mechanism, the law of the jungle, capable of dealing with nonstationary probability distribution functions Peer-reviewed

    T Nakajima, H Takizawa, H Kobayashi, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E81D (6) 584-591 1998/06

    ISSN: 0916-8532

  365. Facial image processing using wavelet transform

    K. Iimura, H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

    Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers 1998

  366. A Scheduling method for instruction level parallel processing of vector and scalar instructions Peer-reviewed

    KOBAYASHI Hiroaki

    The Transactions of the Institute of Electronics, Information and Communication Engineers D-I J81-D-I (7) 910-920 1998

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1915

  367. Automated design of wave pipelined multiport register files Peer-reviewed

    K Takano, T Sasaki, N Oba, H Kobayashi, T Nakamura

    PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98 197-202 1998

  368. Performance Evaluation of a Parallel Multi-Pass Rendering Algorithm Based on the Object-Space Parallel Processing Model Peer-reviewed

    Hitoshi Yamauchi, Takayuki Maeda, Hiroaki Kobayashi, Tadao Nakamura

    Proceedings of JSPP 98 98 (7) 175-182 1998

  369. Static Load Balncing Schemes for the Object-Space Parallel Multi-Pass Rendering Method on a Distributed-Memory Multiprocessor System Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the 2nd Eurographics Workshop on Parallel Rendering 133-144 1998

  370. オブジェクト空間分割型並列レイトレーシング法の汎用計算機上への実装と評価 Peer-reviewed

    前田隆之, 徳永麻由美, 山内 斉, 小林広明, 中村維男

    Visual Computing/グラフィックスとCAD合同シンポジウム98論文集 55-60 1998

  371. Static Load Balancing Schemes for the Object-Space Parallel Multi-Pass Rendering Method on a Distributed-Memory Multiprocessor System Peer-reviewed

    Hiroaki Kobayashi, Hitoshi Yamauchi, Takayuki Maeda, Mayumi Tokunaga, Tadao Nakamura

    Proceedings of the 2nd Eurographics Workshop on Parallel Rendering 133-144 1998

  372. The object-space parallel processing of the multipass rendering method on the (M pi)(2) with a distributed-frame buffer system Peer-reviewed

    H Yamauchi, T Maeda, H Kobayashi, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E80D (9) 909-918 1997/09

    ISSN: 0916-8532

  373. Decoupled modified-bit cache Peer-reviewed

    Masafumi Takahashi, Nobuyuki Oba, Hiroaki Kobayashi, Tadao Nakamura

    Systems and Computers in Japan 28 (6) 49-59 1997/06/15

    Publisher: John Wiley and Sons Inc.

    DOI: 10.1002/(SICI)1520-684X(19970615)28:6<49::AID-SCJ6>3.0.CO;2-M  

    ISSN: 0882-1666

  374. The Object-Space Parallel Processing of the Multipass Rendering Method on the (M?r)2 with a Distributed-Frame Buffer System

    Hitoshi Yamauchi, Takayuki Maeda, Hiroaki Kobayashi, Tadao Nakamura

    IEICE Transactions on Information and Systems E80-D (9) 899-908 1997

    Publisher: Institute of Electronics, Information and Communication, Engineers, IEICE

    ISSN: 0916-8532

  375. A Hardware Cache Evaluation System : RICE Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Institute of Electronics, Information and Communication Engineers J80-D-I (1) 121-123 1997

  376. A method for improving classification capability of mutilayer perceptrons Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Institute of Electronics, Information and Communication Engineers J80-D-II (1) 390-393 1997

  377. Time-Division Pseudo Multi-Port Register File with Wave Pipelining Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Institute of Electronics, Information and Communication Engineers J80-D-I (3) 223-226 1997

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1915

  378. Performance Evaluation of Level-2 Cacthe by Using RICE Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Institute of Electronics, Information and communication Engineers, J80-D-1 (10) 793-802 1997

  379. Memory hierarchy design for jetpipeline: To execute scalar and vector instructions in parallel Peer-reviewed

    T Sasaki, T Nakaike, K Takano, M Katahira, H Kobayashi, T Nakamura

    SECOND AIZU INTERNATIONAL SYMPOSIUM ON PARALLEL ALGORITHMS/ARCHITECTURE SYNTHESIS, PROCEEDINGS 66-73 66-73 1997

  380. A cached frame buffer system for object-space parallel processing systems Peer-reviewed

    H Kobayashi, T Maeda, H Yamauchi, T Nakamura

    COMPUTER GRAPHICS INTERNATIONAL, PROCEEDINGS 146-+ 1997

  381. Multiport Register File Using Wave Pipelining Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of ACM/IEEE International Workshop on Logic Synthesis'97 1997

  382. Parallel processing of the shear-warp factorization with the binary-swap method on a distributed-memory multiprocessor system Peer-reviewed

    K Sano, HH Kitajima, H Kobayashi, T Nakamura

    1997 IEEE SYMPOSIUM ON PARALLEL RENDERING (PRS '97), PROCEEDINGS 87-+ 1997

  383. 分散フレームバッファシステムを持つ画像生成用超並列処理システム(Mp)2の性能評価 Peer-reviewed

    KOBAYASHI Hiroaki

    電子情報通信学会コンピュータシステム研究会資料 96 (503) 25-32 1997

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    The object-space parallel processing for global illumination models is one of the most promising approaches to fast photo-realistic image synthesis. However, there is a potential bottleneck between processing elements and a frame buffer in massively parallel processing systems based on the object-space parallel processing, and this factor may restrict their scalable performance. To solve this problem, this paper presents a novel frame buffer system, named a distributed frame buffer system. By adopting the distributed frame buffer system into the object-space parallel processing systems. the overhead of the frame buffer access due to conflicts and long latency can be reduced, and the potential of the object-space parallel processing system with a large number of processing elements will be fully exploited.

  384. Time Stamp Invalidation of TLB-Unified Cache and Its Performance Evaluation Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Institute of Electronics, Information and Communication Engineers J80-D-I (12) 941-953 1997

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1915

  385. (M pi)(2): A hierarchical parallel processing system for the multipass rendering method Peer-reviewed

    H Kobayashi, H Yamauchi, Y Toh, T Nakamura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E79D (8) 1055-1064 1996/08

    ISSN: 0916-8532

  386. A study of optimal learning methods in neural networks

    H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

    IPSJ Regional Symposium in Tohoku 1996

  387. Decoupled Moodified-bit Cache Peer-reviewed

    KOBAYASHI Hiroaki

    The Transactions of the Institute of Electronics, Information and Communication Engineers 1996

  388. A Memory Access Protocol for Interconnection Networks with Message Losses Peer-reviewed

    KOBAYASHI Hiroaki

    The Transactions of the Institute of Electronics, Information and Communication Engineers 79 (9) 567-571 1996

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1915

  389. Decoupled modified-bit cache Peer-reviewed

    M Takahashi, N Oba, H Kobayashi, T Nakamura

    CONFERENCE PROCEEDINGS OF THE 1996 IEEE FIFTEENTH ANNUAL INTERNATIONAL PHOENIX CONFERENCE ON COMPUTERS AND COMMUNICATIONS 136-143 1996

  390. A hierarchical parallel processing system for the multipass-rendering method Peer-reviewed

    H Kobayashi, H Yamauchi, Y Toh, T Nakamura

    10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96 62-67 1996

  391. Facial Expression Recognition Using Neural Networks Capable of Recoghizing at an Infant Level Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the Sixth World Congress of World Association for Infant Meutal Health 1996

  392. Task Scheduling Strategies and Their Locality Evaluation of Memory References on a Parallel Graph Reduction System Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of Information Processing Society of Japan 37 (11) 2020-2029 1996

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 1882-7764

    More details Close

    Functional programming languages have many appealing properties such as referential transparency and high programming productivity. On the other hand, the inefficiency of their implementation on conventional computers has prevent them from wide acceptance. In this paper, we propose a task scheduling strategy for high-speed processing of functional programs on a shared-memory multiprocessor system. To reduce shared-memory accesses in parallel graph reduction, the proposed task scheduling strategy allocates tasks to processors by taking the locality of data references among the tasks into account dynamically. Software simulation experiments on a multiprocessor system with the proposed strategy show that speedups of program processing in proportion to the number of processors can be achieved by making good use of local and cluster cache memories. As a result, the effectiveness of the proposed scheduling strategy with locality consideration is revealed.

  393. A Memory Access Buffering Mechanism for a Processor Cluster Peer-reviewed

    高橋雅史, 大庭信之, 小林広明, 中村維男

    The Transactions of the Institute of Electronics, Information and Communication Engineers J78-D-I (10) 861-864 1995/10

  394. Facial image recognition using neural networks

    H. Takizawa, T. Nakajima, H. Kobayashi, T. Nakamura

    Tohoku-Section Joint Convention of Institutes of Electrical and Information Engineers 1995

  395. Task Scheduling with Locality Consideration for a Clustered Parallel FL Reduction System Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the Aizu International Symposium on Parallel Algorithm/Architecture Synthesis 234-240 1995

  396. Design and performance measurements of an execution model for the parallel processing of Prolog programs Peer-reviewed

    D Wang, H Kobayashi, T Nakamura

    IEEE FIRST ICA3PP - IEEE FIRST INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, VOLS 1 AND 2 650-658 1995

  397. Mechanical-Design-Oriented Description Language : MODEL Peer-reviewed

    KOBAYASHI Hiroaki

    Japanese Journal of Advanced Automation Technology 7 (1) 29-34 1995

  398. Adaptive Subdivision for the Point-Matching Method Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Japan Society of Mechanical Engineers 60 (570) 543-548 1994

    Publisher: The Japan Society of Mechanical Engineers

    DOI: 10.1299/kikaia.60.543  

    ISSN: 0387-5008

    More details Close

    The contact stress analysis of elastic bodies is important for mechanical engineering in areas such as friction, wear and fatigue. The point-matching method is the well-known analytical model that satisfies Hertzian-contact theory. However, the point-matching method has critical problems, i.e., large amounts of computation time and memory are required as the number of cells increases. Although there have been many studies on its accuracy to date, there are a few studies on efficient processing of the point-matching method. This paper proposes an efficient discretization method for the contact region to accelerate processing time and save memory space in the point-matching method

  399. Mechanical-Design-Oriented Description Language : MODEL Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Japan Society of Mechanical Engineers 60 (570) 715-720 1994

    Publisher: The Japan Society of Mechanical Engineers

    DOI: 10.1299/kikaic.60.715  

    ISSN: 0387-5024

    More details Close

    Designing mechanical systems by means of special-purpose languages is very effective because they can define objects preciesly. However, this causes serious problems. First, the amount of description is very large in the case of designing complex systems. Second, those languages are not suited for modeling objects at higher abstraction levels. To solve these problems, this paper presents a novel description language for mechanical design called MODEL (Mechanical-design-Oriented DE scription Language). MODEL is designed in order that the designer's intentions can be efficiently reflected in the specifications of mechanical systems. We introduce a new concept, design granularity, so that designers can model objects of a mechanical system at different abstraction levels. Moreover, to reduce the amount of description, we use knowledge bases for mechanical design as a library for MODEL. The design process with MODEL is discussed in detail to clarify the capabilities of the language.

  400. A TLB-Unified Cache Management Scheme Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of Information Processing Society of Japan 35 (6) 1149-1152 1994

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 1882-7764

    More details Close

    This paper proposes the TLB-Unified Cache (TUC), which integrates the management of a cache and a translation-lookaside buffer (TLB). In the TUC, a pointer to an entry of the TLB is stored as a cache tag instead of an address. Therefore, cached data and its address are indirectly related, and the space for cache tags is drastically reduced. This paper also proposes Black and White Invalidation for the fast invalidation of the cache entries pointing a missed TLB entry. Simulation results show that, in spite of the space saving, the TUC has the same performance in terms of cache miss ratio as conventional caches.

  401. STARCORE - A HIGH-SPEED ATM SWITCHING SYSTEM Peer-reviewed

    N OBA, K SUZUKI, H KOBAYASHI, T NAKAMURA

    1994 IEEE GLOBECOM - CONFERENCE RECORD, VOLS 1-3, AND COMMUNICATIONS THEORY MINI-CONFERENCE RECORD 139-143 1994

  402. Breadth-first Parallel Processing of Sequential Prolog Programs Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the Sixth IASTED-ISMM International Conference on Parallel and Distributed Computing and Systems 86-89 1994

  403. A Hierarchical System for Parallel Processing of Prolog Programs Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the Sixth IASTED-ISMM International Conference on Parallel and Distributed Computing and Systems. 90-93 1994

  404. Jetpipeline : A Hybrid Pipeline Architecture for Instruction-Level Parallelism Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of High Performance Computing Conference'94 317-323 1994

    Publisher:

  405. A Hierarchical Parallel Reduction System for the Functional Language FL Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of High Performance Computing Conference'94 270-278 1994

  406. Software Pipelining for JetPipeline Architecture Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks 127-134 1994

  407. (Mp)^2 : A Hierarchical Parallel Processing System for a Global Illumination Model Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks 157-164 1994

  408. LOAD BALANCING BASED ON LOAD COHERENCE BETWEEN CONTINUOUS IMAGES FOR AN OBJECT-SPACE PARALLEL RAY-TRACING SYSTEM Peer-reviewed

    H KOBAYASHI, H KUBOTA, S HORIGUCHI, T NAKAMURA

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E76D (12) 1490-1499 1993/12

    ISSN: 0916-8532

  409. Ants routing: An adaptive packet flow control scheme in multimedia communication

    Emad Rashid, Hiroaki Kobayashi, Tadao Nakamura

    Proceedings of 2nd IEEE International Conference on Universal Personal Communications: Gateway to the 21st Century, ICUPC 1993 1 228-234 1993

    Publisher: Institute of Electrical and Electronics Engineers Inc.

    DOI: 10.1109/ICUPC.1993.528382  

  410. Integrated Computer-Aided Mechanical Design System using MODEL Peer-reviewed

    KOBAYASHI Hiroaki

    Transactions of the Japan Society of Mechanical Engineers(section C) 59 (567) 3597-3602 1993

    Publisher: The Japan Society of Mechanical Engineers

    DOI: 10.1299/kikaic.59.3597  

    ISSN: 0387-5024

    More details Close

    Recently, for the purpose of production rationalization, demand for CAD(computer-aided design) systems has been rapidly increasing. However, most of the CAD systems in mechanical design have mainly performed graphical processing, such as drawing. In this paper, we proposed an integrated computer-aided mechanical design system to support the design process as well as the drawing process. The system employs a mechanical-design-oriented description language called MODEL to design mechanical systems. To reduce the amount of descriptions in MODEL, we introduce knowledge bases for mechanical design. With these knowledge bases, the system can infer final designs from insufficient descriptions of objects at higher abstraction levels and complete them. Inference and knowledge representation schemes are discussed in detail. We also construct a prototype system and examine the effectiveness of our system.

  411. AN ADAPTIVE NETWORK ROUTING METHOD BY ELECTRICAL-CIRCUIT MODELING Peer-reviewed

    N OBA, H KOBAYASHI, T NAKAMURA

    IEEE INFOCOM 93 : THE CONFERENCE ON COMPUTER COMMUNICATIONS, PROCEEDINGS, VOLS 1-3 586-592 586-592 1993

  412. INCORPORATING THE PARALLEL-PROCESSING TECHNIQUES WITH THE DEMAND-DRIVEN MODEL OF FUNCTIONAL PROGRAMMING-LANGUAGES Peer-reviewed

    H SHEN, H KOBAYASHI, T NAKAMURA

    TENCON '93: 1993 IEEE REGION 10 CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND POWER ENGINEERING, VOL 1 146-149 146-149 1993

  413. Developing the Lambda Calculus for FL-oriented Parallel Reductions Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of 3RD INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS 6.49-6.50 1993

  414. Expression Recognition Using the Reformed Back-propagation Network Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of 3RD INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS 3.27-3.30 1993

  415. A Massively Parallel Processing Approach to Fast Photo-Realistic Image Synthesis Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of Computer Graphics International'93 497-507 497-507 1993

    Publisher:

  416. EXPRESSION RECOGNITION USING NEURAL NETWORKS Peer-reviewed

    J DING, M SHIMAMURA, H KOBAYASHI, T NAKAMURA

    WCNN'93 - PORTLAND, WORLD CONGRESS ON NEURAL NETWORKS, VOL IV IV-231-IV-234 231-234 1993

  417. Ants Routing : An Adaptive Packets Flow Control Scheme in Multimedia Networks Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of IEEE 2nd International Conference on Universal Personal Communications 228-234 1993

  418. KNOWLEDGE REPRESENTATION FOR ADAPTIVE OVERLOAD PACKETS CONTROL IN MULTIMEDIA NETWORKS Peer-reviewed

    E RASHID, H KOBAYASHI, T NAKAMURA

    GLOBECOM '93 COMMUNICATIONS FOR A CHANGING WORLD, CONFERENCE RECORD 1516-1520 1993

  419. NEURAL-NETWORK STRUCTURES FOR EXPRESSION RECOGNITION Peer-reviewed

    J DING, M SHIMAMURA, H KOBAYASHI, T NAKAMURA

    IJCNN '93-NAGOYA : PROCEEDINGS OF 1993 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3 1430-1433 1430-1433 1993

  420. An Architecture of a Knowledge-base System to Support Mechanical Design Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of IPSJ Graphics and CAD Symposium 1991

  421. A Study on a Mechanical-Design-oriented Description Language Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of IPSJ Graphics and CAD Symposium 1991

  422. An Proposal on Integrated Computer-Aided Mechanical Design Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of IPSJ Graphics and CAD Symposium 1990

  423. Effective Parallel Processing for synthesizing Continuous Images Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of Computer Graphics International 89 343-352 1989

  424. Load balancing strategies for a parallel ray-tracing system based on constant subdivision Peer-reviewed

    Hiroaki Kobayashi, Satoshi Nishimura, Hideyuki Kubota, Tadao Nakamura, Yoshiharu Shigei

    The Visual Computer 4 (4) 197-209 1988/07

    Publisher: Springer-Verlag

    DOI: 10.1007/BF01887592  

    ISSN: 0178-2789

  425. A Strategy for Mapping Parallel Ray-Tracing into a Hypercube Multiprocessor System Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of Computer Graphics International 88 1988

  426. Parallel processing of an object space for image synthesis using ray tracing Peer-reviewed

    Hiroaki Kobayashi, Tadao Nakamura, Yoshiharu Shigei

    The Visual Computer 3 (1) 13-22 1987/02

    Publisher: Springer-Verlag

    DOI: 10.1007/BF02153647  

    ISSN: 0178-2789

  427. Performance Evaluation of a General Purpose Pipeline System Peer-reviewed

    KOBAYASHI Hiroaki

    The Transactions of the Institute of Electronics and Communication Eng J68-D (10) 1985

  428. A Language Processor of an Intelligent Link System Peer-reviewed

    KOBAYASHI Hiroaki

    Proceedings of the IEEE International Conference on Communications 1984

  429. Organization and Evaluation of a General Purpose Pipeline System Peer-reviewed

    KOBAYASHI Hiroaki

    The Transactions of the Institute of Electronics and Communication Eng J67-D (12) 1984

Show all ︎Show first 5

Misc. 117

  1. リアルタイム津波浸水被害推計シミュレーションの性能評価

    撫佐 昭裕, 岸谷 拓海, 阿部 孝志, 佐藤 佳彦, 田野 邊睦, 鈴木 崇之, 村嶋 陽一, 佐藤 雅之, 小松 一彦, 伊達 進, 越村 俊一, 小林 広明

    SENAC : 東北大学大型計算機センター広報 53 (2) 10-18 2020/04

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  2. リアルタイム津波浸水被害予測の全国展開に向けた検討

    越村 俊一, 阿部 孝志, 井上 拓也, 撫佐 昭裕, 村嶋 陽一, 鈴木 崇之, 太田 雄策, 日野 亮太, 佐藤 佳彦, 加地 正明, 小林 広明

    SENAC : 東北大学大型計算機センター広報 52 (2) 2-8 2019/04

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  3. スーパーコンピュータによるリアルタイム津波浸水被害予測

    越村 俊一, 阿部 孝志, 撫佐 昭裕, 村嶋 陽一, 鈴木 崇之, 井上 拓也, 太田 雄策, 日野 亮太, 佐藤 佳彦, 加地 正明, 小林 広明

    SENAC : 東北大学大型計算機センター広報 51 (1) 30-34 2018/01

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  4. HPCMG-FVを用いたSX-ACEの性能評価

    江川隆輔, 磯部洋子, 加藤季広, 小松一彦, 滝沢寛之, 小林広明, 撫佐昭裕

    東北大学情報シナジーセンター大規模科学計算機システム広報SENAC 50 (3) 15-18 2017/07

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  5. 『銅酸化物の有効モデルに対する揺らぎ交換近似』コードのSX-ACE 向け最適化

    山下 毅, 山崎 国人, 江川 隆輔, 吉岡 匠哉, 土浦 宏紀, 小林 広明, 曽根 秀昭

    SENAC : 東北大学大型計算機センター広報 50 (1) 25-30 2017/01

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  6. 防災減災に資するUrgent Computingへの挑戦(防災・減災に貢献するスーパーコンピュータの開発を目指して/東日本大震災の教訓と津波減災に向けてのシミュレーションの課題と展望/防災減災のための可視化と情報通信システム/JAMSTECのHPCシステムを利用した海溝型巨大地震の防災・減災への取り組み)

    小林 広明, 越村 俊一, 下條 真司, 有吉 慶介

    ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 (2016) 128-129 2016/05/30

  7. リアルタイム津波浸水被害予測技術の実証

    越村俊一, 井上拓也, 日野亮太, 太田雄策, 小林広明, 撫佐昭裕, 村嶋陽一, 目黒公郎

    地域安全学会梗概集(CD-ROM) (38) ROMBUNNO.C‐15 2016/05

  8. SX-ACEにおけるHPCG ベンチマークの性能評価

    小松 一彦, 江川 隆輔, 磯部 洋子, 緒方 隆盛, 滝沢 寛之, 小林 広明

    SENAC : 東北大学大型計算機センター広報 48 (3) 14-19 2015/07

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  9. ベクトルコンピュータにおける高速化

    小林 広明, 江川 隆輔, 小松 一彦, 岡部 公起, 大泉 健治, 小野 敏, 山下 毅, 佐々木 大輔, 森谷 友映, 齋藤 敦子, 撫佐 昭裕, 松岡 浩司, 渡部 修, 曽我 隆, 山口 健太

    SENAC : 東北大学大型計算機センター広報 48 (3) 20-51 2015/07

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  10. 東北大学サイバーサイエンスセンター高速化推進研究活動報告書(第6号)

    小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 小松一彦, 大泉健治, 小野 敏, 山下毅, 佐々木大輔, 森谷友映, 齋藤敦子, 撫佐昭裕, 松岡浩司, 渡部修 他

    2015/04

  11. リアルタイム津波浸水・被害予測シミュレーションシステム開発の取り組み

    大泉 健治, 阿部 孝志, 佐藤 佳彦, 松岡 浩司, 撫佐 昭裕, 小林 広明

    SENAC : 東北大学大型計算機センター広報 48 (1) 54-57 2015/01

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  12. 東北大学サイバーサイエンスセンターにおける分子動力学シミュレーションコードの高速化支援について

    森谷 友映, 佐々木 大輔, 山下 毅, 小野 敏, 大泉 健治, 小松 一彦, 江川 隆輔, 小林 広明

    SENAC : 東北大学大型計算機センター広報 47 (1) 51-56 2014/01

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  13. Heuristic Data Partitioning for Social Networking Service

    2013 (34) 1-8 2013/12/09

  14. 複合システムにおけるチェックポイントリスタート

    滝沢寛之, 佐藤雅之, 江川隆輔, 小林広明

    日本信頼性学会誌 35 (12) 2013/12

    DOI: 10.11348/reajshinrai.35.8_515  

  15. 三次元LSIの課題と高信頼化

    小柳光正, 小林広明, 末吉敏則, 鎌田忠

    日本信頼性学会誌 35 (12) 2013/12

    DOI: 10.11348/reajshinrai.35.8_471  

  16. マルチプラットフォームにおける最適化手法の効果に関する一検討

    小松一彦, 佐々木俊英, 江川隆輔, 滝沢寛之, 小林広明

    研究報告ハイパフォーマンスコンピューティング(HPC) 2013 (24) 1-7 2013/07/24

    Publisher: 一般社団法人情報処理学会

    More details Close

    近年,HPC システムの多様化が進んでおり,特徴の異なる複数種類の HPC システムにおいて高い性能を引き出すことができる,性能可搬性の高い HPC コードの開発が強く求められている.本研究では,各種 HPC システム向けの最適化手法が HPC コードの性能に与える効果を詳細に解析し,その知見に基づいて性能可搬性の高い HPC コードを開発することを目的としている.本報告では,異なる手動最適化同士や自動最適化を組み合わせた場合の HPC コードの性能可搬性を解析する.HPC システムごとに,それぞれの手動最適化同士や自動最適化の組み合わせによる相乗効果を評価し,性能可搬性の低下を引き起こす可能性のある最適化について議論する.

  17. チューニング対象の限定による効率の良い性能可搬性向上手法

    平澤将一, 秋葉諒, 滝沢寛之, 小林広明

    研究報告ハイパフォーマンスコンピューティング(HPC) 2013 (19) 1-8 2013/05/22

    Publisher: 一般社団法人情報処理学会

    More details Close

    計算システムの多様化に伴い,既存の科学技術計算プログラムを新たな計算システムへ移植し性能を最適化する作業がしばしば求められている.しかしながら大規模な科学技術計算プログラムの移植および性能最適化には多大な労力が必要となり,問題となっている.本研究では,性能可搬性向上を目的とした場合に優先的に性能最適化を行うべきソースコードの箇所を限定し,効率良くアプリケーション全体の性能可搬性を向上させる手法を提案する.ベンチマークプログラムおよび実アプリケーションによる評価の結果,提案手法はアプリケーション全体の性能可搬性を効率よく向上させるために,最適化すべきソースコードの部位を限定できることが示された.

  18. 大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

    安田一平, 小松一彦, 江川隆輔, 小林広明

    研究報告計算機アーキテクチャ(ARC) 2012 (7) 1-6 2012/12/06

    More details Close

    近年,大規模並列システムのノード数が増大するのに伴い,その高い演算性能を引き出すためには各ノードの演算性能ばかりではなく,ノード間の通信性能を考慮する必要がある.そのため,大規模化したシステムにおいて,容易にアプリケーションの性能解析を示すことができる手法が求められている.アプリケーションの性能解析や,最適化指針を与える方法として,性能モデルを用いたボトルネック解析が挙げられる.しかしながら,ノード間の通信を考慮した性能モデルや性能モデルに基づく解析・最適化手法は確立されていない.本報告ではノード間の通信を考慮したシステムの性能モデルを提案し, SX-9, Nehalem EX クラスタ, FX1, FX10, SR16000 の 5 つの大規模並列システムを用いて提案するモデルの妥当性を調査する.

  19. 大規模並列システムのノード間通信を考慮した性能モデルに関する一検討

    安田一平, 小松一彦, 江川隆輔, 小林広明

    研究報告ハイパフォーマンスコンピューティング(HPC) 2012 (7) 1-6 2012/12/06

    More details Close

    近年,大規模並列システムのノード数が増大するのに伴い,その高い演算性能を引き出すためには各ノードの演算性能ばかりではなく,ノード間の通信性能を考慮する必要がある.そのため,大規模化したシステムにおいて,容易にアプリケーションの性能解析を示すことができる手法が求められている.アプリケーションの性能解析や,最適化指針を与える方法として,性能モデルを用いたボトルネック解析が挙げられる.しかしながら,ノード間の通信を考慮した性能モデルや性能モデルに基づく解析・最適化手法は確立されていない.本報告ではノード間の通信を考慮したシステムの性能モデルを提案し, SX-9, Nehalem EX クラスタ, FX1, FX10, SR16000 の 5 つの大規模並列システムを用いて提案するモデルの妥当性を調査する.

  20. A History-Based Job Scheduling Mechanism for the Vector Meta Computing

    MURATA YOSHITOMO, EGAWA Ryusuke, KOBAYASHI Hiroaki

    IEICE technical report. Internet Architecture 112 (236) 15-19 2012/10/05

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    A wide-area vector meta computing infrastructure named vector computing cloud has been proposed as a next generation high-performance computing infrastructure. However, in the vector computing cloud, the difference in site policies between organizations causes inefficient usage of vector computing resources. To execute a parallel job such as an MPI application on the vector computing cloud, this paper presents a history-based job scheduling mechanism. Firstly, the proposed job scheduler estimates the time to start the job execution from the history of job-execution on vector supercomputers. Next, based on the estimation, the job scheduling mechanism automatically allocates the parallel job to appropriate sites. The simulation results show that the proposed job scheduling mechanism improves the utilization efficiency of vector computing resources, compared to the conventional round-robin scheduling mechanism.

  21. 統合開発環境と連携するポータブルなビルドシステム

    平澤将一, 滝沢寛之, 小林広明

    研究報告ハイパフォーマンスコンピューティング(HPC) 2012 (28) 1-8 2012/09/26

    More details Close

    本研究では,性能可搬性を保ちつつアプリケーションを開発するためのフレームワーク構築に向けて,ポータブルなビルドシステムを開発する.現在の高性能計算 (High-Performance Computing, HPC) システムの構成は複雑化しており,アプリケーションを実行せずにその実効性能を予測することは困難である.このため本研究では,開発中のアプリケーションを定期的に実行し,その性能プロファイルを暗黙裡に取得して性能可搬性の低い個所を特定し,プログラマに対話的に提示することにより性能可搬性の維持を支援することを想定している.そのようなアプリケーション開発補助ツールを実現するためには,開発中のアプリケーションを暗黙裡に様々なシステム上でビルドし,実行する機能が必要である.本研究では,そのような可搬性を有するビルドシステムを開発し,アプリケーション開発支援環境として必要な機能を議論する.

  22. Implementation and Evaluation of the Nanopowder Growth Simulation with OpenACC

    2012 (10) 1-7 2012/09/26

  23. 大規模計算システムにおけるBCMの性能評価

    小松一彦, 曽我 隆, 江川隆輔, 滝沢寛之, 小林広明

    SENAC 45 (3) 17-25 2012/07

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  24. ベクトル型スーパーコンピュータ広域連携基盤の性能評価

    山下 毅, 村田善智, 江川隆輔, 小野 敏, 大泉健治, 小林広明

    SENAC 45 (1) 42-45 2012/01

  25. A Circuit Partitioning Strategy for 3-D Integrated Floating-point Multipliers

    Kawai Kazushige, Tada Jubee, Egawa Ryusuke, Kobayashi Hiroaki, Goto Gensuke

    IEICE technical report. Component parts and materials 111 (326) 67-72 2011/11/28

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Three-dimensional (3-D) integration technologies are attractive for enhancing the speed of the arithmetic circuits. To implement 3-D stacked arithmetic units, effective circuit-partitioning strategies should be applied to exploit the potential of 3-D integration technologies. In this paper, we target a single-precision and a double-precision floating-point multipliers for speed-up the circuit2 by using 3-D integration. Our partitioning strategy is that the parts of the critical-path circuits for multiplication, normalizer and rounder are implemented on the same layer, avoiding to use TSV. The simulation analysis shows that the delay time reduces to 92% for a single-precision and 83% for a double-precision multipliers, as compared with those of the conventional 2-D floating-point multipliers

  26. Evaluation of GPU Computing Based on An Automatic Program Generation Technology

    2011 (18) 1-7 2011/07/20

  27. A Client-Level Deadline Scheduling Strategy for Volunteer Computing Systems

    2011 45-54 2011/05/18

  28. A Performance Tuning Strategy Based on the Roofline Model for Vector Processors

    4 (3) 77-87 2011/05/12

    ISSN: 1882-7829

  29. チップマルチベクトルプロセッサのためのプログラム最適化技術

    佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

    東北大学情報シナジーセンター大規模科学計算機システム広報SENAC 44 (2) 29-36 2011/04

  30. 東北大学サイバーサイエンスセンター高速化推進研究活動報告書(第5号)

    小林広明, 岡部公起, 滝沢寛之, 江川隆輔, 伊藤英一, 大泉健治, 小野 敏, 小久保達信, 橋本ユキ子, 磯部洋子, 撫佐昭裕, 神山 典, 金野浩伸

    2011/04

  31. A Circuit Partitioning Strategy for 3-D Integrated Multipliers

    SAKAI Kazuhito, TADA Jubee, EGAWA Ryusuke, KOBAYASHI Hiroaki, GOTO Gensuke

    IEICE technical report 110 (344) 153-158 2010/12/09

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Three-dimensional(3-D) integration technologies attract a lot of attention to further enhance the performance of the LSI. To implement 3-D stacked arithmetic units, appropriate circuit partitioning strategies should be applied to exploit the potential of 3-D integration technologies. In this paper, we propose a circuit partitioning technology, which can improve the performance of arithmetic units with small overheads of vertical interconnects. To clarify the effectiveness of the proposed partitioning strategy, 3-D stacked parallel multipliers are designed and evaluated. The multipliers designed by the proposed circuit partitioning strategy achieve a 20% delay reduction compared to multipliers that is designed based on conventional 2-D implementations.

  32. Energy Consumption of a Chip Multi-Vector Processor Using Real Applications

    2010 (3) 1-8 2010/12/09

    Publisher: 情報処理学会

    ISSN: 1884-0930

  33. An Out-of-order Vector Processing Mechanism for Multimedia Applications

    GAO YE, EGAWA RYUSUKE, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

    2010 (24) 1-10 2010/07/27

    Publisher: 情報処理学会

    ISSN: 0919-6072

  34. 広域ベクトルコンピュータ連携による次世代HPC基盤の構築(3.2 第8回情報シナジー研究会, 3. 研究活動報告)

    村田 善智, 江川 隆輔, 東田 学, 小林 広明

    年報 9 94-98 2010/07

    Publisher: 東北大学サイバーサイエンスセンター

  35. Performance Evaluation of GPU Computing with OpenCL

    ARAI YUSUKE, SATO KATSUTO, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

    2010 (11) 1-7 2010/02/15

    Publisher: 情報処理学会

    ISSN: 0919-6072

  36. High performance computing on vector systems 2009

    Michael Resch, Sabine Roller, Katharina Benkert, Martin Galle, Wolfgang Bez, Hiroaki Kobayashi

    High Performance Computing on Vector Systems 2009 1-250 2010

    Publisher: Springer Berlin Heidelberg

    DOI: 10.1007/978-3-642-03913-3  

  37. Implementation and Evaluation of a Checkpint/Restart Tool for CUDA Applications

    TAKIZAWA HIROYUKI, SATO KATSUTO, KOMATSU KAZUHIKO, KOBAYASHI HIROAKI

    122 (7) G1-G7 2009/10/09

    Publisher: 情報処理学会

    ISSN: 0919-6072

  38. RC-008 Client-Level Task Scheduling for Effective Volunteer Computing

    Murata Yoshitomo, Endo Toshiaki, Takizawa Hiroyuki, Kobayashi Hiroaki

    8 (1) 165-172 2009/08/20

    Publisher: Forum on Information Technology

  39. C-024 An Auction based Resource Allocation Considering Multifaceted Utilities in a Peer to Peer Environment

    Satayapiwat Chainan, Komatsu Kazuhiko, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

    8 (1) 491-494 2009/08/20

    Publisher: Forum on Information Technology

    More details Close

    Recently, many market-based approaches have been studied as one of the promising alternatives in a resource allocation problem. Especially, auction-based approaches are widely chosen due to its distributed nature and its relatively lower complexity. However, employing an auction to allocate jobs is only suitable for homogeneous environments of resources. This paper proposes an auction-based resource allocation mechanism which enables resource allocation in a heterogeneous environment while minimizing user's inputs. Our preliminary results show that our resource allocation mechanism improves the performance of important jobs during high-loaded.

  40. C-023 Performance Evaluation towards BLAS with Automatic Processor Selection

    Komatsu Kazuhiko, Koyama Kentaro, Sato Katsuto, Takizawa Hiroyuki, Kobayashi Hiroaki

    8 (1) 485-490 2009/08/20

    Publisher: Forum on Information Technology

  41. Performance Optimization Techniques for Vector Processors with Cache Memory

    SATO YOSHIEI, NAGAOKA RYUICHI, MUSA AKIHIRO, EGAWA RYUSUKE, TAKIZAWA HIROYUKI, OKABE KOKI, KOBAYASHI HIROAKI

    2009 (6) 1-10 2009/07/28

    Publisher: 情報処理学会

    ISSN: 0919-6072

  42. SX-9による大規模並列シミュレーション(3.2 第7回情報シナジー研究会, 3. 研究活動報告)

    曽我 隆, 下村 陽一, 撫佐 昭裕, 江川 隆輔, 滝沢 寛之, 岡部 公起, 小林 広明, 高橋 俊, 中橋 和博

    年報 8 88-93 2009/07

    Publisher: 東北大学サイバーサイエンスセンター

  43. 創造工学研修の実施報告 ― スパコンを使って計算科学・計算機科学のおもしろさを体験 ―

    滝沢 寛之, 江川 隆輔, 笹尾 泰洋, 佐野健太郎, 山本 悟, 小林 広明

    東北大学サイバーサイエンスセンター 大規模科学計算システム広報SENAC 42 (2) 87-90 2009/02

  44. 大規模非圧縮性流体シミュレーションの工学問題への応用

    高橋 俊, 石田 崇, 中橋 和博, 小林 広明, 岡部 公起, 下村 陽一, 曽我 隆, 撫佐 昭裕

    SENAC : 東北大学大型計算機センター広報 42 (1) 107-114 2009/01

    Publisher: 東北大学サイバーサイエンスセンター

    ISSN: 0286-7419

  45. 624 A study of energy-aware GPU computing

    Takizawa Hiroyuki, Sato Katuto, Kobayashi Hiroaki

    The Computational Mechanics Conference 2008 (21) 558-559 2008/11/01

    Publisher: The Japan Society of Mechanical Engineers

    ISSN: 1348-026X

  46. 東北大学サイバーサイエンスセンターの取り組みとSX-9の性能評価 (スーパーコンピュータSX-9特集)

    小林 広明, 江川 隆輔, 岡部 公起

    NEC技報 61 (4) 58-65 2008/10

    Publisher: 日本電気

    ISSN: 0285-4139

  47. RC-006 Hardware Design of A Way-Allocatable Shared Cache Mechanism

    Abe Kenta, Kotera Isao, Egawa Ryusuke, Takizawa Hiroyuki, Kobayashi Hiroaki

    7 (1) 35-38 2008/08/20

    Publisher: Forum on Information Technology

  48. A programming language extension and its automatic optimization techniques for exploiting the potential of GPUs

    SATO KATUTO, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

    IPSJ SIG Notes 2008 (74) 199-204 2008/07/29

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    GPUs have a great potencial of high-performance computing and have been used in various applications in addition to graphics processing. In order to achieve high-performance with GPUs, we have to carry out architecture-aware optimizations because of their unique architecture. We have proposed SPRAT, a programming language for hybrid systems of CPUs and CPUs, to realize both the portability of programs and the high computation effeciency. This paper proposes some automatic optimization techniques based on memory access adjustments. The results shows, significant performance improvements in the executions of Edge detection and LU decomposition.

  49. On-Chip Cache Memory Systems for Next Vector Architectures

    7 89-93 2008/07

    Publisher: 東北大学サイバーサイエンスセンター

  50. Early Performance Evaluations of SX-9 Supercomputer Systems

    7 85-88 2008/07

    Publisher: 東北大学サイバーサイエンスセンター

  51. A Stream Programming Language for GPU Computing

    TAKIZAWA Hiroyuki, SATO Katuto, KOBAYASHI Hiroaki

    Journal of the Visualization Society of Japan 28 (1) 271-274 2008/07/01

    Publisher: 可視化情報学会

    ISSN: 0916-4731

  52. A Fast Ray Frustum-Triangle Intersection Algorithm with Precomputation and Early Termination

    Kazuhiko Komatsu, Yoshiyuki Kaeriyama, Kenichi Suzuki, Hiroyuki Takizawa, Hiroaki Kobayashi

    1 (1) 85-95 2008/06/26

    Publisher: 情報処理学会

    ISSN: 1882-7829

    More details Close

    Although ray tracing is the best approach to high-quality image synthesis much time is required to generate images due to its huge amount of computation. In particular ray-primitive intersection tests still dominate the execution time required for ray tracing and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with M&ouml;ller's algorithm by exploiting the features of the bundles of rays.Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with M&ouml;ller's algorithm by exploiting the features of the bundles of rays.

  53. ベクトルプロセッサ用キャッシュメモリの性能評価

    佐藤義永, 撫佐昭裕, 江川隆輔, 滝沢寛之, 岡部公起, 小林広明

    情報処理学会シンポジウム論文集 2008 (2) 55 2008/01/17

    ISSN: 1344-0640

  54. High-Density Computation in Building-Cube Method by Vector Super-computer

    高橋俊, 石田崇, 中橋和博, 小林広明, 岡部公起, 下村陽一, 曽我隆, 撫佐昭裕

    流体力学講演会/航空宇宙数値シミュレーション技術シンポジウム講演集 40th-2008 433-434 2008

  55. MPIプログラミング入門

    野口 孝明, 曽我 隆, 金野 浩伸, 撫佐 昭裕, 大泉 健治, 小野 敏, 伊藤 英一, 岡部 公起, 江川 隆輔, 小林 広明

    SENAC : 東北大学大型計算機センター広報 40 (4) 69-94 2007/10

    Publisher: Super-Computing System Information Synergy Center, Tohoku University

    ISSN: 0286-7419

  56. I-004 A Parallel Image Generation Algorithm based on Partitioning of Photon Maps

    Tamura Masahide, Takizawa Hiroyuki, Kobayashi Hiroaki

    6 (3) 203-206 2007/08/22

    Publisher: Forum on Information Technology

  57. A Study on Dynamic Task Assignment to CPU and GPU Based on Runtime Performance Prediction

    SHIRATORI Hiroki, TAKIZAWA Hiroyuki, KOBAYASHI Hiroaki

    IEICE technical report 107 (175) 37-42 2007/08/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Recent studies of general-purpose computation on graphics processing units (GPUs) have shown that a PC equipped with high performance CPU and GPU can be regarded as a heterogeneous parallel processing system. On the other hand, programming for such a system has become complicated. In order to exploit the potential of the system, unified programming models for the CPU and GPU have been studied. However, the selection of CPU or GPU that executes a program must be made manually and statically in most of the existing development tools for GPGPU applications. Because appropriate selection depends on some information determined at runtime, the processing efficiency improves if the appropriate processor can be dynamically selected based on the performance prediction at runtime. This paper examines the effectiveness of dynamically selecting the appropriate processor based on the execution time estimation and the the processor switching cost. The experimental results show that the cost of the processor switching except the data transfer is negligible and hence the processor switching can improve the performance if the execution time is long compared to the prediction error.

  58. The Evaluation of A Way-Allocatable Shared Cache Mechanism

    KOTERA ISAO, EGAWA RYUSUKE, TAKIZAWA HIROYUKI, KOBAYASHI HIROAKI

    IPSJ SIG Notes 2007 (79) 31-36 2007/08/01

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    We have proposed a way-allocatable shared cache mechanism for chip multiprocessors, which can save power consumption with remaining the performance by employing cache partitioning and power gating. In the proposed mechanism, a metric of cache access locality is defined and used for the cache partitioning and the power gating. Based on the metric, the proposed mechanism can flexibly change the configuration to be either performance-oriented or power-oriented. This paper evaluates the validity of the proposed mechanism, using some benchmarks with different cache access behaviors. The evaluation results show that the proposed mechanism can appropriately partition the shared cache for applications with high localities. In addition, our proposal at the performance-oriented mode can reduce energy consumption by 28% while improving the performance by 0.3%.

  59. SC|06調査報告(3.2 第5回情報シナジー研究会, 3. 研究活動報告)

    小野 敏, 滝沢 寛之, 小林 広明

    年報 6 83-87 2007/07

    Publisher: 東北大学情報シナジーセンター

  60. SC|05調査報告(3.2 第4回情報シナジー研究会, 3. 研究活動)

    大泉 健治, 伊藤 英一, 滝沢 寛之, 小林 広明

    年報 5 71-74 2006/06

    Publisher: 東北大学情報シナジーセンター

  61. A Runtime Optimization Method for Redundant Task Dispatch on P2P Computing Platforms.(3.2 第4回情報シナジー研究会, 3. 研究活動)

    Wang Hong, Takizawa Hiroyuki, Kobayashi Hiroaki

    年報 5 100-105 2006/06

    Publisher: 東北大学情報シナジーセンター

  62. 実シミュレーションコードによる大規模科学計算システムの性能評価(3.2 第4回情報シナジー研究会, 3. 研究活動)

    滝沢 寛之, 岡部 公起, 伊藤 英一, 撫佐 昭裕, 曽我 隆, 伊藤 学, 小林 広明

    年報 5 78-83 2006/06

    Publisher: 東北大学情報シナジーセンター

  63. 世界一の評価を受けた東北大学のスーパーコンピュータSX-7

    小林広明

    仙台市医師会報 (504) 8-10 2006

  64. 安全・安心な社会の構築に貢献する世界一のスーパーコンピュータSX-7

    小林広明

    まなびの杜<東北大学>知的探検のすすめ 3 32-33 2006

  65. A Weighted Voting Method for Combining Multiple Character Recognition Engines

    KANEKO Shoichiro, GOTO Hideaki, KOBAYASHI Hiroaki

    IEICE technical report 105 (477) 13-18 2005/12/15

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    It is known that combining multiple character recognition engines by majority logic is useful for improving the accuracy of character recognition. In the previous work, however, the suitability of each recognition engine for the input characters was not taken into account. On the other hand, several methods for automatic selection of recognition engines using the suitability of the engines have been proposed. We combine the above two approaches and propose a new method for combining multiple recognition engines. Experimental results show that the recognition accuracy improves from 97.8% to 98.2% using 14 Japanese character sets with 3195 characters each.

  66. 実シミュレーションコードによる大規模科学計算システムの性能評価

    小林 広明, 岡部 公起, 撫佐 昭裕, 曽我 隆, 松村 佳昭, 伊藤 学

    SENAC : 東北大学大型計算機センター広報 38 (4) 39-59 2005/10

    Publisher: Super-Computing System Information Synergy Center, Tohoku University

    ISSN: 0286-7419

  67. HPCチャレンジでのSXシステムの性能評価(3.2 第3回情報シナジー研究会, 3. 研究活動)

    小林 広明, 滝沢 寛之, 小久保 達信, 岡部 公起, 伊藤 英一, 小林 義昭, 浅見 暁, 小林 一夫, 後藤 記一, 片海 健亮, 深田 大輔

    年報 4 98-116 2005/05

    Publisher: 東北大学情報シナジーセンター

  68. HPC チャレンジでのSX システムの性能評価

    小林広明, 滝沢寛之, 小久保達信, 岡部公起, 伊藤英一, 小林義昭, 浅見暁, 小林一夫, 後藤記一, 片海健亮, 深田大輔

    東北大学情報シナジーセンター大規模科学計算機システム広報SENAC 38 (1) 5-28 2005/01

  69. A new dynamical domain decomposition method for parallel molecular dynamics simulation

    V. Zhakhovskii, K. Nishihara, Y. Fukuda, S. Shimojo, T. Akiyama, S. Miyanaga, H. Sone, H. Kobayashi, E. Ito, Y. Seo, M. Tamura, Y. Ueshima

    2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005 2 848-854 2005

    DOI: 10.1109/CCGRID.2005.1558650  

  70. Analysis and comparison of frequency features for scene text detection

    SAITOH Seiji, GOTO Hideaki, KOBAYASHI Hiroaki

    Technical report of IEICE. PRMU 104 (523) 31-36 2004/12/16

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Several methods using features in frequency domains obtained by the Discrete Cosine Transformation (DCT) and the Wavelet Transformation have been proposed so far for text region detection in images. The performances of the methods in the previous work were evaluated mainly by the final precision of text region extraction. However, the analyses and the comparisons with respect to the goodness of features themselves have not been so sufficient. This report proposes an analysis and comparison method using Fisher's discriminant analysis to get a better features and an unsupervised thresholding method to segment text and non-text. Better features can be obtained by choosing DCT coefficients in an appropriate frequency range. Experimental results indicate that the final precisions of text region extraction are improved by using the optimized features.

  71. スーパーSINET を利用した大規模遠隔可視化処理の評価

    滝沢寛之, 小林広明

    東北大学情報シナジーセンター大規模科学計算機システム広報SENAC 37 (2) 5-10 2004/04

  72. Performance Analysis of a Parallel Law-of-the-Jungle Algorithm for Generating Codebooks of Vector Quantization

    MOMOSE Shintaro, SANO Kentaro, TAKIZAWA Hiroyuki, NAKAJIMA Taira, KOBAYASHI Hiroaki, NAKAMURA Tadao

    IEICE technical report. Neurocomputing 103 (92) 25-30 2003/05/22

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Vector quantization is an attractive technique for lossy data compression, which has been a key technology for efficient data storage andlor transfer. So far, various algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. In particular, the Law-of-the-Jungle(LOJ) learning algorithm has been proposed to achieve rapid codebook design by algorithmic improvements. However, its acceleration is still required when large data sets are processed on a single computer. In order to achieve faster codebook design, we have been proposed a scalable parallel codebook design algorithm for parallel computers. This paper analyzes and evaluates the performance of the parallel LOJ learning algorithm on three types of parallel computers: an IBM SP2, an NEC AzusA and a PC cluster.

  73. A Study of Characters Recognition for Auditory Computer-Utilization Support Systems

    Kikuchi Hiroto, Shen Hong, Kawashima Tomoyoshi, Kobayashi Hiroaki, Nakamura Tadao

    Proceedings of the IEICE General Conference 2003 369-369 2003/03/03

    Publisher: The Institute of Electronics, Information and Communication Engineers

  74. Parallel Codebook Generation for Optimal Vector Quantizer

    MOMOSE Shintaro, SANO Kentaro, TAKIZAWA Hiroyuki, NAKAJIMA Taira, LIMA Clecio Donizete, KOBAYASHI Hiroaki, NAKAMURA Tadao

    IPSJ SIG Notes 2002 (80) 67-72 2002/08/21

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Vector quantization is an attractive technique for lossy data compression, which has been a key technology for data storage and/or transfer. So far, various algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. In particular, the Law-of-the-Jungle(LOJ) learning algorithm has been proposed to achieve rapid codebook design by algorithmic improvements. However, its acceleration is still required when large data sets are processed on a single computer. Therefore, a scalable parallel codebook design algorithm for parallel computers is required. This paper presents a parallel algorithm for the LOJ learning, suitable for distributed-memory parallel computers with a message-passing mechanism. Experimental results indicate a high scalability of the, proposed parallel algdrithm on the IBM SP2 parallel com'puter with 32 processing elements.

  75. ベクトル量子化のための並列コードブック生成アルゴリズムの性能評価(2.<特集>第1回情報シナジー研究会)

    百瀬 真太郎, 佐野 健太郎, 滝沢 寛之, 中島 平, 小林 広明, 中村 維男, Clecio Donizete Lima, 東北大学大学院情報科学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院工学研究科, 東北大学大学院情報科学研究科, 東北大学情報シナジーセンター, 東北大学大学院情報科学研究科

    年報 2 33-42 2002/07/01

    More details Close

    ベクトル量子化は高効率なデータ圧縮手法であり、データの保存や転送において核となる技術である。これまでに、誤差の少ない量子化のための最適コードブックを生成する様々な手法が提案されており、中でもアルゴリズムの改良によってコードブック生成処理時間の短縮を図るLaw-of-the-Jungle(LOJ)アルゴリズムが注目を集めている。しかし、大きなデータセットを単一のCPUで処理する場合、アルゴリズムの改良による処理時間短縮には限界があり、並列処理によるさらなる速度向上が求められている。本論文では、メモリ分散型並列計算機に適した並列LOJアルゴリズムを提案する。IBM SP2、NEC AzusA、PCクラスタを用いて並列LOJアルゴリズムの性能評価を行なった結果、いずれもプロセッサ台数に対する高い速度向上率が得られた。

  76. A STUDY ON PRECISION OF INTERSECTION CALCULATION FOR RAY-TRACING HARDWARE

    Shimakura Takamitsu, Saida Yasumasa, Sano Kentaro, Suzuki Ken-ichi, Nakada Takeo, Oba Nobuyuki, Kobayashi Hiroaki, Nakamura Tadao

    Proceedings of the Society Conference of IEICE 2001 158-158 2001/08/29

    Publisher: The Institute of Electronics, Information and Communication Engineers

  77. The Design of an Instruction Fetch Unit for VLIW Processors Supporting Speculative Execution

    HARADA HUGO KENJI PEREIRA, NAKAIKE TAKUYA, KOBAYASHI HIROAKI, NAKAMURA TADAO

    IPSJ SIG Notes 1999 (100) 63-68 1999/11/26

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This paper presents an instruction fetch scheme capable of speculatively executing instructions in VLIW processors. This is achieved with the compiler and the underlining hardware working together in a scheme called Dynamic Boosting (DB). In dynamic boosting, the compiler is responsible for finding instruction level parallelism (ILP) beyond the boundaries of basic blocks. It then schedules and labels the independent instructions belonging to different basic blocks in such a way that the hardware is able to detect and execute these instructions in parallel at run time. The software simulation results show that a speed-up of at most 20% was achieved in the SPECint 95 benchmarks. In addition, the preliminary results on hardware cost and gate level speed show that the hardware complexity and cost are reasonable considering the obtained speed-ups.

  78. A Study of Acceleration of Ray-Tracing by Using Reference Images

    59 149-150 1999/09/28

  79. A Study of A Global Illumination Model for Rendering Gaseous Objects

    59 145-146 1999/09/28

  80. An Active Contour Model with Consideration to the Shape of a Region-of-Interest

    59 257-258 1999/09/28

  81. A Study on a Reconfigurable Synchronous Dataflow Computer

    SASAKI Hiroshi, TSUKIOKA Hideaki, SHOJI Nobuyoshi, KOBAYASHI Hiroaki, NAKAMURA Tadao

    Technical report of IEICE. VLD 98 (446) 17-22 1998/12/10

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    This report proposes a synchronous dataflow computer, which constructs hardware to represent dataflow graphs of applications then processes data in the dataflow fashion. We implemented JPEG encoder on the system and measured the amount of required hardware resources. The experimental results show that computations can naturally be expressed in datafolw graphs using units only for accessing the shared memory. The exploitable features of applications are discussed and a software development environment is also presented.

  82. Adaptive Volume - Subdivision for Efficient Data - Parallel Volume Rendering

    SANO Kentaro, KITAJIMA Hiroyuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

    IPSJ SIG Notes 1998 (93) 7-12 1998/10/09

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    Using parallel processing on a general-purpose parallel computer that is one of the promising strategies for fast volume rendering, we proposed a data-parallel volume rendering algorithm based on the image composition method. Although the algorithm achieves real-time rendering, a constant processing time of image composition lowers the efficiency of parallel processing as the number of processing elements increases. To solve this problem, this study proposes an adaptive subdividing method of volume data and discusses its performance through some experiments. The experimental results show that the method reduces the image-compositing time as the number of processing elements increases.

  83. TLB - Assisted Cache

    SUZUKI KEN-ICHI, OBA NOBUYUKI, KOBAYASHI HIROAKI, NAKAMURA TADAO

    IPSJ SIG Notes 1997 (61) 7-12 1997/06/27

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This report proposes a new on-chip cache system named "TLB Assisted Cache (TAC)." The TAC determines a cache hit/miss by referring to the TLB and the small assist tag comparisons that are faster than a conventional cache tag comparison. Therefore, it is possible to initiate a cache data array access-before a cache tag comparison. Consequently, the TAC achieves an access time as short as a V-V cache. Moreover, the TAC logically acts as a V-P cache so it does not suffer from the V-V cache's shortcomings, such as the synonym problem.

  84. Implementing Functional Programs Based on the SPMD Model

    NAKAIZUMI MITSUHIRO, SHEN HONG, KOBAYASHI HIROAKI, NAKAMURA TADAO

    IPSJ SIG Notes 1997 (61) 25-30 1997/06/27

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Functional languages, which are different from the imperative ones, are characterized with the referential transparency, high programming productivity, and the ease of program verification. However, they are prevented from wide acceptance due to the inefficiency of their implementation on conventional computers. Parallel execution of functional programs utilizing their potential parallelism is a promising way to solve this problem. This paper studies the parallel execution of functional programs based on the SPMD model. We realize the parallel execution of functional programs on parallel computer IBM SP2. The experimental results of benchmark programs reveal the perspective of the execution model.

  85. A Parallel Volume Rendering Algorithm for Distributed-Memory Multiprocessor Systems

    Reports of Toyoda Physical and Chemical Research Institute. (50) 41-54 1997/05

    Publisher: 豊田理化学研究所

    ISSN: 0372-039X

  86. A Study on Parallelizing Scheduling for Jetpipeline

    NAKAIKE TAKUYA, SASAKI TAKEHITO, KATAHIRA MASAYUKI, KOBAYASHI HIROAKI, NAKAMURA TADAO

    IPSJ SIG Notes 1996 (106) 25-30 1996/10/31

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Jetpipeline is an architecture based on instruction-level parallelism(ILP), which utilizes vector and scalar processing to achieve high performance. Therefore. the compiler for Jetpipeline must parallelize vector and scalar instructions of programs. However, since vector instructions take more cycles to complete their execution than scalar instructions, it is not suitable to use parallelizing methods used in VLIW machines. In this paper, we propose a parallelizing method for Jetpipeline by improving the dispatch stack method to parallelize the vector and scalar instructions. We show the effectiveness of the proposed parallelizing method for Jetpipeline through simulation experiments.

  87. Design of asynchronous vector calculator using delay element.

    高野光司, 佐々木毅人, 片平昌幸, 小林広明, 中村維男

    電気関係学会東北支部連合大会講演論文集 1996 1996

  88. Shared Memory System for the Hierarchical Parallel Reduction System of FL

    MORI Noriaki, KITAJIMA Hiroyuki, SHEN Hong, KOBAYASHI Hiroaki, NAKAMURA Tadao

    Proceedings of the Society Conference of IEICE 1995 34-34 1995/09/05

    Publisher: The Institute of Electronics, Information and Communication Engineers

  89. Performance Evaluation of a Distributed Shared Memory Multiprocessor System using Network with Message Losses

    KURIYAMA Kazunari, TAKAHASHI Kazunari, OBA Nobuyuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

    Proceedings of the Society Conference of IEICE 1995 35-35 1995/09/05

    Publisher: The Institute of Electronics, Information and Communication Engineers

  90. An Automatic System for Facial Expression Recognition Using Neural Networks

    Nakajima Taira, Takizawa Hiroyuki, Shimamura Mieko, Kobayashi Hiroaki, Nakamura Tadao

    Proceedings of the Society Conference of IEICE 1995 173-173 1995/09/05

    Publisher: The Institute of Electronics, Information and Communication Engineers

  91. Performance Studies of the FL Hierarchical Parallel Reduction System

    KITAJIMA Hiroyuki, SHEN Hong, KATAHIRA Masayuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

    IPSJ SIG Notes 1995 (56) 1-8 1995/06/01

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    Functional programming languages differ from traditional imperative ones with many appealing properties such as referencial transparency and high programming productivity. However, the inefficiency of their implementation on conventional computers has prevent them from wide acceptance. We have proposed a hierarchical parallel reduction system by combining multiprocessor processing and pipeline processing in our earlier work. In this paper, we investigate the task scheduling strategy with locality consideration suitable for enhancing the system performance, and carry out software simulation experiments. The simulation results reveal the effectiveness of the proposed system with the scheduling strategy.

  92. A Study on A Compile Technique for Jetpipeline

    SASAKI Takehito, NAKAIKE Takuya, KATAHIRA Masayuki, SHEN Hong, KOBAYASHI Hiroaki, NAKAMURA Tadao

    IPSJ SIG Notes 1995 (56) 9-16 1995/06/01

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    To achieve high computation power, we have proposed the Jetpipeline architecture that utilizes ILP (Instruction Level Parallelism) including vector operations in addition to scalar operations. In the Jetpipeline architecture, a compiler has an important role because it exploits ILP from operations In this paper, we present a compile technique for Jetpipeline based on both parallelizaiton for scalar operations and vectorization for vector operations. The proposed compile technique is examined through simulation experiments.

  93. SuperTAINS: Tohoku University Network realizes multimedia applications through sub-giga network

    Yukiyoshi Kameyama, Akinori Ito, Hiroaki Kobayashi

    Computer and Network LAN 13 (6) 114-120 1995/06

    Publisher: Ohmsha

  94. A Hierarchical Processing System for Prolog Programs and Its Performance Evaluation

    Wang Dong, Kobayashi Hiroaki, Nakamura Tadao

    95 (2) 1-8 1995/01/13

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    This paper presents a hierarchical parallel execution model for Prolog programs based on Or-parallelism/And-parallelism as coarse-grain parallelism, and parallel unification as fine-grain parallelism. In the hierarchical model, we proposed an extended And-Or tree for its high level (coarse-grain), and used parallel unification at the low level. Thus, the execution model can exploit high degree of parallelism in Prolog programs. Moreover, the execution model is implemented on a hierarchical processing system which is a shared memory multiprocessor system with a mesh plus tree network devoted to control. Finally, the performance evaluation of this system is also carried out.

  95. Pipelined Execution of OR-Parallel Prolog

    Inaba Tsutomu, Shen Hong, Katahira Masayuki, Kobayashi Hiroaki, Nakamura Tadao

    95 (2) 9-16 1995/01/13

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    In this study, we propose an OR-Prolog parallel execution model on a PE-pipeline architecture. This is an extention of J. Beer's idea in "Pipelined Execution of Sequential Prolog." On our model, we adopt global shared memory and crossbar network. The Global shared memory that consists of a Choice-Point Stack Module and several Environment Frame Modules that store the environments of each Choice-Point. In this paper, the system organization and simulation results are described. Based on the simulation results, we can obtain the LIPS 2. 5 times of that on Beer's model.

  96. Performance evaluation of jet pipe line fusing vector and scalar orders.

    仲池卓也, 佐々木毅人, 片平昌幸, 沈紅, 小林広明, 中村維男

    電気関係学会東北支部連合大会講演論文集 1995 1995

  97. Performance Evaluation of (Mπ)^2

    TOH Yuichiro, YAMAUCHI Hitoshi, KOBAYASHI Hiroaki, NAKAMURA Tadao

    1994 378-378 1994/09/26

    Publisher: The Institute of Electronics, Information and Communication Engineers

  98. A Memory Access Protocol of A Distributed Processing Systems with An ATM Network

    Kuriyama Kazunari, Takahashi Masafumi, Oba Nobuyuki, Kobayashi Hiroaki, Nakamura Tadao

    1994 79-79 1994/09/26

    Publisher: The Institute of Electronics, Information and Communication Engineers

  99. A Study on Load Balancing on a Pipelined Prolog Architecture

    INABA Tsutomu, SHEN Hong, KATAHIRA Masayuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

    1994 84-84 1994/09/26

    Publisher: The Institute of Electronics, Information and Communication Engineers

  100. An Inference Method with Word-Impression and a Feeling Model

    Igata Nobuyuki, Kobayashi Hiroaki, Nakamura Tadao

    1994 71-71 1994/09/26

    Publisher: The Institute of Electronics, Information and Communication Engineers

  101. A Study on Automatic Emotion Recognition with Neural Networks

    Sasaki Kou, Osada Toshiaki, Kobayashi Hiroaki, Nakamura Tadao

    1994 132-132 1994/09/26

    Publisher: The Institute of Electronics, Information and Communication Engineers

  102. Performance Evaluation of The TLB-Unified Cache

    Suzuki Ken-ichi, Kobayashi Hiroaki, Nakamura Tadao

    1994 88-88 1994/09/26

    Publisher: The Institute of Electronics, Information and Communication Engineers

  103. A Memory Access Queuing Mechanism for a Clustered Multiprocessor System

    1994 (66) 225-232 1994/07/21

  104. A Study of a Parallelizing Compiler for the Jet Pipeline.

    佐々木毅人, 片平昌幸, 小林広明, 中村維男

    日本機械学会東北支部総会・講演会講演論文集 29th 1994

  105. A Study on a Shading Method for Volume Rendering.

    佐藤大輔, 片平昌幸, 小林広明, 中村維男

    電子情報通信学会大会講演論文集 1994 (Shuki Pt 6) 1994

    ISSN: 1349-1369

  106. A Study on an Instruction Scheduling Strategy for Jetpipeline

    SASAKI Takehito, KATAHIRA Masayuki, KOBAYASHI Hiroaki, NAKAMURA Tadao

    83-83 1994

    Publisher: The Institute of Electronics, Information and Communication Engineers

  107. A New Control Scheme for Token Ring LANs

    1993 (2) 231-237 1993/11/17

  108. A Distributed Shared-Memory System Using ATM Networks

    1993 (2) 277-286 1993/11/17

  109. An Intelligent Self-Routing Algorithm for B-ISDN

    Rashid Emad, Kobayashi Hiroaki, Nakamura Tadao

    IEICE technical report. Artificial intelligence and knowledge-based processing 93 (240) 39-46 1993/09/20

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    This paper presents a new self-routing algorithm for broadband ISDN′s asynchronous transfer mode(ATM)switching networks.The routi ng algorithm is ambuscade in a switch for congestion control called Ants Routing.The congestion is controlled through regulating the input traffic rate to the switch element that has congestion on one of its output p ports.high throughput and low packet loss probability can be achieved by rerouting packets′arriv al due to the presence of bursty traffic on a switch′s output port .The rerouting algorithm is based on the information of congestion status of each switch,which can be distributed among neighboring switches.Mathematical analysis based on the queuing model shows. that our algorithm has capability of congesfion avoidance on the interconnection network and packet loss improvement especially when traffic is bursty.

  110. A study on parallel processing system for volume rendering

    46 477-478 1993/03/01

  111. A Study on A Parallel Processing System for Photo-Realistic Image Synthesis

    46 369-370 1993/03/01

  112. The Computer Architecture Description Language : CARD - L

    1993 (6) 121-128 1993/01/21

  113. Expert System Aid in Networks' Flow Control

    1992 (76) 33-40 1992/09/24

  114. AN ADAPTVE NETWORK ROUTING METHOD - POTENTIAL ROUTING

    1992 (64) 65-72 1992/08/19

  115. Knowledge Representation by Using Position-Display-Map

    42 220-221 1991/02/25

  116. A Discussion on the knowledge-base for mechannical designs based on the hierarchy of the mechanical structure

    42 325-326 1991/02/25

  117. A Study of Object Space Parallel Processing for Fast Ray Tracing

    1987 (78) 9-16 1987/11/12

Show all ︎Show first 5

Books and Other Publications 17

  1. Sustained Simulation Performance 2019 and 2020

    Michael Resch, Manuela Wossough, Wolfgang Bez, Erich Focht, Hiroaki Kobayashi

    2021

  2. Sustained Simulation Performance 2018 and 2019

    Michael Resch, Manuela Wossough, Wolfgang Bez, Erich Focht, Hiroaki Kobayashi

    2020

  3. Sustained Simulation Performance 2017

    Michael Resch, Wolfgang Bez, Erich Focht, Michael Gienger, Hiroaki Kobayashi

    2017

  4. Sustained Simulation Performance 2016

    Michael M. Resch, Wolfgang Bez, Erich Focht, • Nisarg Patel, Hiroaki Kobayashi Editors

    2016

    ISBN: 9783319467344

  5. コンピュータ工学入門

    鏡慎吾, 佐野健太郎, 滝沢寛之, 岡谷貴之

    コロナ社 2015/03

    ISBN: 9784339024920

  6. Sustained Simulation Performance 2015

    Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Qi, J, Roller, S

    Springer 2015

    ISBN: 9783319203409

  7. Sustained Simulation Performance 2014

    Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Patel, N

    Springer 2014

    ISBN: 9783319106267

  8. Sustained Simulation Performance 2013

    Resch, M.M, Bez, W, Focht, E, Kobayashi, H, Kovalenko, Y

    Springer 2013

    ISBN: 9783319014395

  9. Sustained Simulation Performance 2012

    Resch, M.M, Wang, X, Bez, W, Focht, E, Kobayashi, H

    Springer 2012

    ISBN: 9783642324543

  10. High Performance Computing on Vector Systems 2011

    Resch, M. Wang, X. Focht, E. Kobayashi, H. Roller, S

    Springer 2011

    ISBN: 9783642222436

  11. Cloud, Grid and High Performance Computing: Emerging Applications

    Hong Wang, Yoshitomo Murata, Hiroyuki Takizawa, Hiroaki Kobayashi

    IGI Global 2011

    ISBN: 9781609606039

  12. High Performance Computing on Vector Systems 2010

    M.Resch, K.Benkert, X.Wang, M.Galle, W.Bez, H.Kobayashi, S.Roller

    Springer 2010/11

    ISBN: 9783642118500

  13. Software Automatic Tuning: From Concepts to State-of-the-Art Results

    Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

    Springer 2010

    ISBN: 9781441969347

  14. High Performance Computing on Vector Systems 2009

    Resch, M, Roller, S, Benkert, K, Galle, M, Bez, W, Kobayashi, H

    Springer-Verlag 2009/11

    ISBN: 9783642039126

  15. High Performance Computing on Vector Systems 2008

    M.Resch, M. Galle, H.Kobayashi, T.Hirayama

    Springer-Verlag 2008/11

  16. High Performance Computing on Vector Systems 2007

    Hiroaki Kobayashi

    Springer-Verlag 2007/11

    ISBN: 9783540743835

  17. High Performance Computing on Vector Systems 2006

    Hiroaki Kobayashi

    Springer Verlag 2006/01

Show all Show first 5

Presentations 74

  1. HPCとQuantum Computingの連携とその応用

    小林広明

    AIチップ設計拠点フォーラム 2024/10/25

  2. QA-HPC Hybrid Computing Infrastructure for Quantum Transformation of Simulation-Data Anaysis Combined Applications Invited

    Hiroaki Kobayashi

    IEEE Quantum Week 2024/09/19

  3. R&D of QA-HPC Hybrid Computing Infrastructure and Quantum Transformation of Simulation-Data Science Combined Applications Invited

    Hiroaki Kobayashi

    Tohoku-Chicago Quantum Interaction 2024/06/29

  4. Performance Evaluation of Vector Annealing on NEC Vector Processor SX-Aurora TSUBASA

    Hiroaki Kobayashi

    HPC2024 2024/06/27

  5. Accelerating Quantum Innovation & Startup Creation at Tohoku University Invited

    Hiroaki Kobayashi

    Chicago-Tohoku Quantum Alliance Symposium 2024/02/14

  6. NEC SX-ACE's Operations and Applications Development for the Future

    24 th Workshop on Sustained Simulation Performance 2016/12/04

  7. Overview of Vector Supercomputer SX-ACE and Its Applications International-presentation

    Russian Supercomputing Days 2016 2016/09/26

  8. 防災・減災に貢献するスーパーコンピュータの開発を目指して

    2016年ハイパフォーマンスコンピューティングと計算科学シンポジウム 2016/06/06

  9. 東北大学大規模科学計算システムとその利用支援について

    第25回東北CAE懇話会 2016/05/13

  10. Highly-Productive Computing on Modern and Future Vector Platforms

    The 23rd Workshop on Sustained Simulation Performance 2016/03/16

  11. One-year experience with SX-ACE International-presentation

    22nd Workshop on Sustained Simulation Performance 2015/12/17

  12. Highly-Productive HPC on Modern Vector Supercomputers: present and future International-presentation

    Russia Supercomputing Days 2015/09/28

  13. スーパーコンピュータの驚異的な力

    第116回東北大学サイエンスカフェ 2015/05/29

  14. Real-Time Tsunami Inundation Forecasting and Damage Estimation on SX-ACE: A HPC System as a Social Infrastructure for Tsunami Disaster Prevention and Mitigation, International-presentation

    NUG XXVII 2015/05/11

  15. 東北大学サイバーサイエンスセンターの高性能計算に関する研究開発活動: 普通の人々のためのスーパーコンピュータセンターを目指して

    第25回TOPIC総会講演会 2015/04/20

  16. 普通の人々のためのスーパーコンピュータセンターを目指して

    CyberHPC Symposium 2015/03/20

  17. A SX-ACE-based New Computer System of Tohoku University and: Its Early Evaluation by using Real Applications, International-presentation

    20th Workshop on Sustained Simulation Performance (WSSP20) 2014/12/15

  18. 東北大学サイバーサイエンスセンターの新スーパーコンピュータシステムの概要と高性能計算に関する研究開発活動

    第133回NEC C&Cシステム SP研究会 2014/11/11

  19. Tohoku Univ.’s New Supercomputer System and R&D on Highly-Productive HPC for Memory Intensive Applications International-presentation

    NUG2014 2014/05/12

  20. 防災・減災に資する次世代スーパーコンピュータの開発をめざして〜スーパーコンピューティングによる津波のリアルタイム予測〜

    G 空間情報を活用した次世代防災・被災地支援システム研究会第3回シンポジウム 2014/03/12

  21. 高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

    第11回戦略的高性能計算システム開発に関するワークショップ, 2014/03/10

  22. 高バンド幅アプリケーションに適した将来のHPCIシステムのあり方の調査研究の取り組み

    第132回NEC C&Cシステム SP研究会 2014/01/23

  23. Feasibility study of the next generation vector system architecture for memory intensive applications International-presentation

    18th workshop on Sustained Simulation Performance, Stuttgart Germany 2013/10/28

  24. 東北大学大規模科学計算システムの運用と次世代ベクトルコンピューティングに関する研究開発

    日本学術会議 電気電子工学委員会 URSI分科会無線通信システム信号処理小委員会URSI-C 研究会 2013/09/26

  25. 高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

    文部科学省「革新的ハイパフォーマンス・コンピューティング・インフラ(HPCI)の構築」 HPCI戦略分野2「新物質・エネルギー創成」 計算物質科学イニシアティブ(CMSI)計算分子科学研究拠点 第4回研究会 2013/09/10

  26. 高バンド幅アプリケーションに適した将来のHPCIシステムのあり方に関する調査研究

    第10回戦略的高性能計算システム開発に関するワークショップ 2013/07/30

  27. 防災・減災に資する次世代スーパーコンピュータの開発をめざして

    東北大学電子通信研究機構シンポジウム—耐災害ICTによる東北復興に向けて 2013/07/23

  28. スーパーコンピュータが拓く未来

    東北活性化ユニバーサイエンス・新潟県立十日町高校キャリア教育講演会, 2013/07/05

  29. Early evaluation of NGV and feasibility study of the next generation vector system architecture for memory intensive applications International-presentation

    NUG2013 2013/06/23

  30. Feasibility study of future HPC systems for memory-intensive applications International-presentation

    1st International Workshop on Strategic Development of High Performance Computers 2013/03/18

  31. Feasibility study of future HPC systems for memory-intensive applications International-presentation

    17th Workshop on Sustained Simulation Performance 2013/03/12

  32. イベント企画「安全・安心な暮らしを支えるハイパフォーマンスコンピューティング ~防災・減災に向けて~」

    第75回情報処理学会全国大会 2013/03/08

  33. Potentials of the vector architecture in the post-peta era International-presentation

    Workshop on Sustained Simulation Performance 2012/12/10

  34. Design Space Exploration of the Vector Processor Architecture using 3D Die-Stacking Technology

    筑波大学計算科学研究センター設立20周年記念シンポジウム 2012/09/07

  35. High-End Computing Systems: Past, Present and Future International-presentation

    SICE2012 SICE Annual Conference 2012/08/20

  36. Capability and Potential of Vector Processors: Present and Future International-presentation

    NUG2012 2012/06/12

  37. Capability of Vector-Parallel Computing Platforms International-presentation

    the HPC Workshop in Singapore 2012/05/07

  38. 高生産・高性能コンピューティングと新世代ベクトルコンピューティングに関するR&D International-presentation

    SP研究会 SC10講演会 2010/11/17

  39. Activities for Highly-Productive Computing and R&D on New-Generation Vector Computing International-presentation

    JAEA SC10 Workshop 2010/11/16

  40. Performance Discussion on Scalar and Vector Systems and R&D on New-Generation Vector Computing International-presentation

    the 13th Teraflop Workshop 2010/10/21

  41. Performance Discussion on Scalar and Vector Systems and R&D for New-Generation Vector Computing at Tohoku University International-presentation

    NUG2010 2010/06/29

  42. 東北大学大規模科学計算システムの運用とベクトルコンピューティングに関する研究開発

    第九回PCクラスタシンポジウム 2009/12/10

  43. Supercomputers and Supercomputing in Tohoku University International-presentation

    JAEA SC09-Workshop 2009/11/18

  44. ラボコンピューティングからペタコンピューティングへの橋渡しを目指して〜共同利用・共同研究拠点として新しい時代の情報基盤センターの役割〜

    第4回国立大学法人情報系センター長会議基調講演 2009/10/23

  45. 21世紀はベクトルコンピューティングの時代!?

    第8回情報科学技術フォーラム特別企画 2009/09/03

  46. Lessons Learned from 1-Year SX-9 Experiences and Toward the Next Generation Vector Computing International-presentation

    20th CCSE Workshop on Advanced Computing Technologies toward PetaFLOPS 2009/04/24

  47. Tohoku University View to Supercomputing International-presentation

    10th Teraflop Workshop 2009/03/16

  48. On-chip Caching for vector architectures International-presentation

    JAEA -Symposium at SC08 2008/11/20

  49. The new era of the vector architecture: experiences with the early adaption of SX-9 International-presentation

    NEC HPC Workshop at SC08 2008/11/19

  50. A news update of Cyberscience Center International-presentation

    the 9th Teraflop workshop 2008/11/12

  51. 実アプリケーションを用いたSX-9の性能評価

    大阪大学サイバーメディアセンター平成20年度スーパーコンピュータシンポジウム 2008/10/24

  52. HPC Activities at Tohoku University: Experiences with the early adaption of SX-9 International-presentation

    DWD (ドイツ気象庁)特別講演会 2008/10/02

  53. HPC Activities at Tohoku University International-presentation

    Barcelona Supercomputer Center Seminar 2008/09/30

  54. New Sueprcomputer System SX-9 and its Early Evaluation

    IEEE EMC Sendai Chapter Lecture and Seminar 2008/05/14

  55. 新しいスーパーコンピュータシステムSX-9とその評価について

    SP研究会 2008/05/09

  56. New Sueprcomputer System SX-9 and its Early Evaluation International-presentation

    the 18th CCSE Workshop on Computational Technologies Supporting Development of Future Applications 2008/04/22

  57. Experiences with SX-9 International-presentation

    the 8th Teraflop workshop 2008/04/10

  58. Experiences with SX-9 International-presentation

    Worldwide NEC Users’ Meeting 2008/04/06

  59. メディアプロセッサによる高性能計算

    電子情報通信学会専門講習会 2008/02/22

  60. New System Design and Its Early Evaluation International-presentation

    The Seventh Teraflop Workshop 2007/11/21

  61. The Potential of On-Chip Memory Systems for Future Vector Architectures, International-presentation

    the 16th CCSE Workshop on High-Performance Computing on Vector Based Architectures – Recent Achievements and Future Directions- 2007/04/23

  62. ISC Plans and Update International-presentation

    The Sixth Teraflop Workshop 2007/03/26

  63. HPC Activities at Information Synergy Center International-presentation

    The Fifth Teraflop Workshop 2006/11/20

  64. Implication of Memory Performance in HEC Systems International-presentation

    The Fourth Teraflop Workshop 2006/03/30

  65. Performance Evaluation of SX-7 using HPCC and Real Application Codes International-presentation

    3rd Teraflop Workshop 2005/11/11

  66. 情報シナジーセンターのHPC研究活動とペタフロップス時代のセンターの役割

    NEC HPC研究会 2005/11/09

  67. スーパーコンピュータにまつわる誤信と落し穴

    東北大学大学院情報科学研究科談話会 2005/07/26

  68. 大規模科学計算システムの技術動向

    NUA東北地区ユーザ研修会 2003/06/05

  69. High-Performance Photo-Realistic Graphics on the 3DCGiRAM Architecture International-presentation

    2002 International Conference on Optical Communication and Multimedia 2002/11/14

  70. 高性能・高機能ネットワーク社会を支える基盤技術の展望

    NetOne Tohoku Seminar 2000 2000/10/17

  71. 機械を知能化するコンピュータ

    日本機械学会特別企画フォーラム「機械と知能」 1998/10/11

  72. 並列処理を用いた高速ボリュームレンダリング手法と医用画像における興味部位の自動抽出手法

    秋田県立脳血管研究センター講演会 1997/02/05

  73. 東北大学情報科学研究科のマルチメディア環境

    (株)アシスト,日本サン・マイクロシステムズ(株)合同主催セミナー 1996/03/07

  74. スーパーコンピュータと数値流体力学

    大阪大学溶接工学研究所研究集会 1991/03/29

Show all Show first 5

Industrial Property Rights 13

  1. 参照画像キャッシュ、削除先決定方法及びコンピュータプログラム

    小林広明

    特許特許第7416380号

    Property Type: Patent

  2. 参照画像キャッシュメモリ、データ要求方法及びコンピュータプログラム

    小林広明 他

    特許特許第7425446号

    Property Type: Patent

  3. 津波浸水予測システム,制御装置,並列計算システムの制御方法及びプログラム

    越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

    特許第6362178号

    Property Type: Patent

  4. 津波浸水予測システム,データ処理サーバ,津波浸水予測の依頼方法及びプログラム

    越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

    特許第6323880号

    Property Type: Patent

  5. 津波浸水予測システム、制御装置、津波浸水予測の提供方法及びプログラム

    越村俊一, 小林広明, 日野亮太, 太田雄策, 撫佐昭裕, 佐藤佳彦, 村嶋陽一, 鈴木崇之, 井上拓也, 村田泰洋, 加地正明

    特許第6161130号

    Property Type: Patent

  6. キャッシュメモリおよびキャッシュ制御方法

    小林広明, 斎田泰昌

    第3834323号

    Property Type: Patent

  7. 利用形態指向P2Pネットワークシステム、及び、コンピュータプログラム

    小林広明, 滝沢寛之, 稲葉勉

    第4170285号

    Property Type: Patent

  8. グリッドコンピューティングシステム、及びグリッドコンピューティングシステムにおける計算資源収集方法

    小林広明, 稲葉勉, 松村龍太郎

    第3857258号

    Property Type: Patent

  9. グリッドコンピューティングシステム

    小林広明, 稲葉勉, 松村龍太郎

    第3977298号

    Property Type: Patent

  10. 物性マップ画像生成装置、制御方法、及びプログラム

    鍬守 直樹, 撫佐 昭裕, 瀧川 陽平, 風間 悠加, 佐藤 佳彦, 小林 広明, 菊川 豪太, 岡部朋永, 小松 一彦

    Property Type: Patent

  11. 特異材料検出装置、制御方法、及びプログラム

    鍬守 直樹, 撫佐 昭裕, 瀧川 陽平, 風間 悠加, 佐藤 佳彦, 小林 広明, 菊川 豪太, 岡部朋永, 小松 一彦

    Property Type: Patent

  12. マップ画像生成装置、制御方法、及びプログラム

    鍬守 直樹, 撫佐 昭裕, 瀧川 陽平, 風間 悠加, 佐藤 佳彦, 小林 広明, 菊川 豪太, 岡部朋永, 小松 一彦

    Property Type: Patent

  13. 推奨データ生成装置、制御方法、及びプログラム

    鍬守 直樹, 撫佐 昭裕, 瀧川 陽平, 風間 悠加, 佐藤 佳彦, 小林 広明, 菊川 豪太, 岡部朋永, 小松 一彦

    Property Type: Patent

Show all Show first 5

Research Projects 51

  1. 量子・古典ハイブリッド計算によるソフトマテリアル研究開発デジタルツインの創成

    小林 広明, 撫佐 昭裕, 阿部 圭晃, 佐藤 雅之, 小松 一彦, 菊川 豪太

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 基盤研究(B)

    Institution: 東北大学

    2024/04/01 - 2028/03/31

  2. 大規模量子コンピューティングによる新計算原理計算基盤の創生

    小松 一彦, 小林 広明, 佐藤 雅之, 百瀬 真太郎

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業 基盤研究(B)

    Category: 基盤研究(B)

    Institution: 東北大学

    2023/04 - 2028/03

  3. Digital twin computing for enhancing resilience of disaster medical system

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (S)

    Institution: Tohoku University

    2021/07/05 - 2026/03/31

  4. Real-time video coding technology using the latest coding VVC/H.266 and its applications

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tokyo University of Agriculture and Technology

    2022/04/01 - 2025/03/31

  5. Expanding Industrial Use of Innovative Technology for Transportation Equipment Design Using Microdevices Through Large-Scale Simulation

    Offer Organization: Tohoku University Cyber Science Center

    System: JHPCN:Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures

    Institution: Tohoku University

    2017 - 2024

  6. 統合型材料開発システムによるマテリアル革命

    小林広明,小松一彦,佐藤雅之

    Offer Organization: 内閣府

    System: 戦略的イノベーションプログラム(SIP)

    Category: CFRP向けマテリアルインテグレーション(MI)システムの高速実装と評価

    Institution: 国立大学法人東北大学、東レ株式会社、公立大学法人兵庫県立大学、国立大学法人京都大学、学校法人金沢工業大学、国立研究開発法人物質・材料研究機構

    2020/05 - 2023/03

  7. Quantum-Annealing Assisted Innovative Material Informatics Infrastructure

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Category: Grant-in-Aid for Scientific Research (A)

    Institution: Tohoku University

    2019/04 - 2023/03

  8. 量子アニーリングアシスト型次世代スーパーコンピューティング基盤の開発

    小林 広明, 滝沢, 寛之, 山口, 健太, 撫佐, 昭裕, 曽我 隆, 渡部 修, 横川, 三津夫, 江川 隆輔, 下村, 陽一, 中田, 一人, 越村 俊一, 小松, 一彦, 佐藤, 雅之, 愛野, 茂幸, 磯部 洋子, 政岡, 靖久, 百瀬, 真太郎, 藤本, 壮也, 山本 悟, 古澤 卓, 荒木 拓也, 村嶋, 陽一, 大関, 真之, 觀山, 正道, 太田 雄策, マス エリック, 星, 宗王, 萩原 孝

    Offer Organization: 文部科学省

    System: 次世代領域研究開発

    2018/04 - 2023/03

  9. Fusion of sensing and simulation of tsunami damage assessment towards innovation of disaster medical system

    KOSHIMURA Shunichi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

    Category: Grant-in-Aid for Scientific Research (S)

    Institution: Tohoku University

    2017/05 - 2022/03

    More details Close

    The project promoted collaborative research among science, engineering, and disaster medicine with the goal of enhancing resilience of disaster medical systems by integrating real-time simulation and sensing. Considering the catastrophic tsunami disaster concerned as future risks in Japan, we achieved three outcomes ; 1) quantitative and rapid estimation of human and physical damage caused by the tsunami, 2) immediate estimation of medical demands in disaster affected areas, and 3) methodology for planning and updating disaster medical activities through multi-agent simulation. Through the project, we examined the required specifications for an innovative medical support system in the anticipated disaster process of future Nankai Trough earthquake and tsunami disaster that is expected to occur next 30 years.

  10. Fusion of sensing and simulation towards enhancing disaster medical system

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Category: Grant-in-Aid for Scientific Research (A)

    Institution: Tohoku University

    2017/04 - 2021/03

  11. Theory and Practice of Vector Data Processing at Extreme Scale: Back to the Future

    2018/04 - 2020/03

  12. Supporting performance-aware programming with machine learning techniques

    Hiroyuki Takizawa, Kobayashi Hiroaki, Suda Reiji, Okatani Takayuki, Egawa Ryusuke, Ohshima Satoshi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2016/04/01 - 2019/03/31

    More details Close

    This work has demonstrated some case studies of effectively using machine learning techniques for supporting High-Performance Computing (HPC) programming. Various problems in code optimization can be solved by converting the problems to the problems that have already been proven to be solved by machine learning. Moreover, this work clarified the importance of analyzing the target problems in advance of machine learning, because it is unlikely that a sufficient number of training data are available in code optimization problems. Moreover, as well as HPC programming, machine learning also needs knowledge and experiences of human experts. However, in machine learning, the problem is already parameterized, and hence can be solved if sufficiently-high performance is available.

  13. Design Space Exploration of Future Microprocessors using the post CMOS devices

    EGAWA Ryusuke, Kobayashi Hiroaki, Takizawa Hiroyuki, Tada Jubee, Sato Masayuki, Uno Wataru, Toyoshima Takuya, Sakai Zentaro, Ogasawara Daisuke

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: Tohoku University

    2015/04 - 2018/03

    More details Close

    In this research, for realizing a high energy efficiency microprocessor using novel device technologies in the post-Moore's era, expected to be practical around 2025, we have worked on circuits and memory subsystems designs. Regarding the circuit design, we worked on the design method of wave-pipelined circuits using CNFET. For the memory subsystem, we focus on a die stacking and STT-RAM technologies. We have examined the cache-bypass mechanism, the energy efficient data allocation method for the multi-bank memory, and the power-aware controlling mechanism for STT-RAM last-level caches.

  14. 低電力積層型半導体用高密度自己組織化配線技術の研究開発

    小柳 光正, 東, 和幸, 元吉 真, 知京, 豊裕, 川喜多, 仁, 田中 徹, 福島, 誉史, 李, 康旭, 池田 誠, 小林 広明, 岡谷, 貴之, 清山 浩司

    Offer Organization: 独立行政法人新エネルギー・産業技術総合開発機構

    System: エネルギー・環境新技術先導プログラム

    2015/04 - 2017/03

  15. A Green Microarchitecure in 5.5D-Design Era

    EGAWA RYUSUKE, Kobayashi Hiroaki, Takizawa Hiroyuki, Sato Masayuki, Uno Wataru, Nishimura Shin, Hosokawa Mikio, Toyoshima Takuya

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2014/04 - 2017/03

    More details Close

    To clarify the design space of future microprocessors after the end of moor’s law, this research project focuses on vertical integration technologies such as 2.5D and 3D technologies using a through silicon via (TSV). Since the TSVs have a high potential of shortening the latency and reducing the power consumption in/of microprocessors and computing systems, these technologies are expected to overcome the limits of technology scaling. In this research, we explore the design space of the future microprocessors by aggressively using TSVs in various stacking granularities. The evaluation results show that appropriate usage of TSVs with considering a trade-off among performance, power, and cost can drastically improve the energy efficiency of the microprocessors and computer systems.

  16. リアルタイム津波予測システムとLアラートの連携による「津波Lアラート」の構築と災害対応の高度化実証事業

    越村俊一 小林広明 他

    Offer Organization: 総務省

    System: G空間情報を活用したLアラート高度化事業

    2015/04 - 2016/03

  17. Checkpoint restart technologies for hierarchcal storages

    Hiroyuki Takizawa, Uno Atsuya, Kobayashi Hiroaki, Egawa Ryusuke, Sato Yukinori

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: Tohoku University

    2014/04 - 2016/03

    More details Close

    Assuming that the state of an application is periodically saved during its execution, we have considered an automatic tuning method for the frequency of saving the state to a hierarchical storage system, and also have discussed a way for reducing the time for writing the state to the storage. A promising approach to the reduction is to speculatively write data that will be written in the future at a high probability. Hence, one technical issue is how to predict such data. For the prediction, we need to analyze memory access patterns of the target application. Hence, we have developed a performance analysis tool for the purpose. The validity and effectiveness of these proposed methods are evaluated based on job scheduling simulation of a large-scale computing system.

  18. A 3D Processor Architecture Co-Designed with Dependable Processing

    Kobayashi Hiroaki, TAKIZAWA HIROYUKI, EGAWA RYUSUKE

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: Tohoku University

    2014/04 - 2016/03

    More details Close

    The objective of this study is to establish a novel processor architecture that realize both high performance and high dependability in the execution of a wide variety of applications by using 3D die-stacking technology toward the post-Moore’s era. In particular, we have developed a 3D die-stacking memory subsystem architecture integrated with processor cores and its data management mechanism for highly power-efficient and high-throughput memory hierarchy. In addition, we have also developed on-line checkpoint/restart mechanism by using a 3D die-stacking on-chip memory to increase dependability of the processor. The proposed architecture has been evaluated quantitatively by using a wide variety of applications and its effectiveness and limitation have been clarified and discussed.

  19. Infrastructures for accelerating the synergy effect of software-hardware co-design

    Hiroyuki Takizawa, Kobayashi Hiroaki, Aoki Takafumi, Sano Kentaro, Egawa Ryusuke, Tada Jube, Ito Koichi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2013/04 - 2016/03

    More details Close

    Assuming OpenCL as a standard environment for accelerator programming, we have pointed out some missing features for supporting more various accelerator architectures,and proposed OpenCL extensions. Although OpenCL has gradually become to be used for hardware description, OpenCL C is not necessarily appropriate for describing OpenCL kernels. Hence, we have designed and implemented high productivity languages for typical computations in the fields of image processing and high performance computing. In addition, we have proposed an automatic tuning method for performance parameters, which need to be adjusted for individual accelerators. The proposed method has been implemented for evaluating its performance impacts.

  20. A Universal Memory Architecture Based on Device-Architecture Co-Design

    Kobayashi Hiroaki, TAKIZAWA HIROYUKI, EGAWA RYUSUKE

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2013/04 - 2016/03

    More details Close

    The objective of this study is to establish a smart memory subsystem architecture that can consider memory access behaviors of applications and effectively manage data in the memory hierarchy in terms of performance and power efficiency. In particular, we have developed 1) a low-power/high-bandwidth cache architecture, 2) a cache management policy with an on-line evaluation of the memory request behavior of an application for reducing its working set in the memory hierarchy, 3) a cache partitioning mechanism to protect performance-sensitive shared data for chip multicore processors, 4)a memory address mapping mechanism with the performance/performance optimization by using an online-estimation of memory access behavior.

  21. リアルタイム津波浸水・被害予測・災害情報配信による自治体の減災力強化の実証事業

    越村俊一 小林広明 他

    Offer Organization: 総務省

    2014/04 - 2015/03

  22. 高メモリバンド幅アプリケーションに適した将来のHPCIシステムのあり方の調査研究

    小林 広明 金田 義行 橋本 ユキ子

    Offer Organization: 文部科学省

    System: 将来のHPCIのシステムのあり方の調査研究

    2012/04 - 2014/03

  23. Application-Aware Highly Hierarchical Memory Architecture

    KOBAYASHI Hiroaki, TAKIZAWA Hiroyuki, EGAWA Ryusuke

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: Tohoku University

    2012/04 - 2014/03

    More details Close

    The objective of this study is to establish a novel on-chip memory architecture that can provide necessary memory resources to running applications under the consideration of their behaviors and requirements regarding a memory subsystem on a multi-core processor. In this study, we have developed a cache-resource management mechanism to realize energy-efficient high performance execution of multi-threaded applications on a multi-core processor. In cooperation with developed hardware functions of cache resizing and partitioning to reduce cache conflicts and maximize the efficiency of cache utilization, this mechanism can extract the potential of multi-core processors with a low-power consumption.

  24. Study of Next-Generation CFD toward Petaflops Computers

    NAKAHASHI Kazuhiro, YAMAMOTO Satoru, OBAYASHI Shigeru, KOBAYASHI Hiroaki, YAMAMOTO Kazuomi, SASAKI Daisuke, JEONG Shinkyu, TAKIZAWA Hiroyuki, EGAWA Ryusuke, KUROTAKI Takuji, ENOMOTO Shunji, IMAMURA Taro, TAKAHASHI Shun

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

    Category: Grant-in-Aid for Scientific Research (S)

    2009/05 - 2014/03

    More details Close

    This study was conducted aimed at solving the problems of the current CFD in the use of the aerodynamic designs of aircrafts, such as the physical model dependence of the computational results and the increase of the work load for treating complex geometries. The Building-Cube Method was proposed bearing the further performance improvement of computers in mind, and the various algorithm studies for practical use were conducted. One of the achievements was demonstrated by the world-leading large scale flow computation around a car using the K-computer. It is significant that the proposed CFD approach can treat extremely complicated and incomplete CAD data directly for the simulation. This can be a game-changing technology for aerodynamic design process of aircrafts and automobiles.

  25. 自己修復機能を有する3次元VLSIシステムの創製

    小柳 光正 小林 広明 青木 孝文 末吉 敏則 鎌田 忠 元吉 真

    Offer Organization: 独立行政法人科学技術振興機構

    System: 戦略的創造研究推進事業

    2009/04 - 2013/03

  26. Innovative 3D Design for the New Generation Vector Microarchitecture

    KOBAYASHI Hiroaki, TAKIZAWA Hiroyuki, EGAWA Ryusuke

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2010 - 2012

    More details Close

    This study discusses a new design methodology for a microarchitecture of next-generation, low-power high-performance vector processors by using 3D die-stacking technology. A strategy for mixed design of conventional 2D design and TSV (Through-Silicon-Via)-based 3D design that realizes a good trade-off between them in the all level of on-chip units design has also been proposed. Through the performance evaluation of a prototyped 3D vector processor, the effectiveness of 3D design regarding power consumption and performance has been clarified.

  27. 超音波計測連成解析による超高精度生体機能計測システム

    早瀬 敏幸 小杉 隆司 小林 広明 小玉 哲也

    Offer Organization: 独立行政法人科学技術振興機構

    System: 先端計測分析技術・機器開発事業

    2007/04 - 2011/03

  28. Instruction Steering Based on Static Data Dependency

    SUZUKI Ken-Ichi, KOBAYASHI Hiroaki

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

    Category: Grant-in-Aid for Scientific Research (C)

    Institution: Tohoku Institute of Technology

    2008 - 2010

    More details Close

    Modern microprocessors achieve high performance by executing multiple instructions in parallel. In this research, we have introduced a new execution model where instructions in a local critical path are statically found at the compile time, and only the instruction steering is dynamically performed at execution time. From the performance evaluations, we have shown that the IPC of our execution model is comparable to that of existing models, even in the case of no dynamic steering.

  29. Design and Development of Advanced IT Research Platform for Information Explosion Era

    ADACHI Jun, TANAKA Katsumi, NISHIDA Toyoaki, KUNIYOSHI Yasuo, SUDOH Osamu, KUROHASHI Sadao, HARA Takahiro, MATSUOKA Satoshi, TAURA Kenjiro, TATEBE Osami, MUNETOMO Masaharu, HIROTSU Toshio, MATSUBARA Jin, SHIMOJYO Shinji, CHIBA Shigeru, YUASA Taichi, MATSUYAMA Takashi, CHIKAYAMA Takashi, KONDO Toru, KONO Kenji, OKAMOTO Masahiro, AIDA Kento, KAMADA Tomio, KITSUREGAWA Mararu, YAMANA Hayato, NAKAMURA Yutaka, KOBAYASHI Hiroaki, NAKAJIMA Hiroshi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

    Category: Grant-in-Aid for Scientific Research on Priority Areas

    Institution: National Institute of Informatics

    2006 - 2010

    More details Close

    This project implemented a common research infrastructure for all the research groups participating in this priority-area research initiative, accordingly supported all research activities in this initiative. Providing this infrastructure, we succeeded in accelerating shared utilization of research facilities and resources within the limitation of research funding and strengthening the collaboration among research groups. These shared facilities include (a)TSUBAKI: a open search engine for large-scale corpus, (b)InTrigger : Widely-distributed computing test-bed, (c)IMADE : an environment for real-world interaction measurement and analysis, and (d) prototyping for sensor-network based preventive medicine.

  30. ICTエコ社会を創造する安全・安心・安価なユビキタスコンピューティングプラットフォームの研究・開発

    小林 広明, 堀口 進, 滝沢, 寛之, 福士 将

    Offer Organization: 総務省

    System: 戦略的情報通信研究開発推進制度(SCOPE)

    2006/04 - 2009/03

  31. Study on Hardware-Software Collaborative Scheduling for Highly Efficient Multithreading

    KOBAYASHI Hiroaki, NAKAMURA Tadao, SUZUKI Kenichi, TAKIZAWA Hiroyuki, EGAWA Ryusuke, SATO Yukinori, KOTERA Isao, FUNAYA Yusuke, SATO Masayuki

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2006 - 2009

  32. 3次元積層技術による超高帯域幅ベクトルプロセッサ設計に関する研究

    小林 広明

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業 萌芽研究

    Category: 萌芽研究

    Institution: 東北大学

    2008 - 2008

    More details Close

    本研究では, 近未来に起こる3次元集積化実装時代に対応した高性能マイクロプロセッサアーキテクチャ設計制約条件, 及びその制約下での最適アーキテクチャ設計方式を明らかにすることを目的としている. 平成20年度には, 3次元積層の要素技術, および3次元積層技術を用いた新たなアーキテクチャ設計に関する研究動向の調査・検討を行った. これにより, 3次元積層技術により利用可能となるチップ内のトランジスタ数は飛躍的な増加し, 3次元方向に積層される各シリコン層を結合するThrough Silicon Via(TSV)によりチップ上の配線長, および配線遅延時間の短縮が可能であることを確認した. また, 近年入出力ピンの実装技術の限界により, メモリバンド幅の低下が懸念されているベクトルプロセッサに着目し, 前述の三次元積層技術がもたらす利点を最大限に活かすことが可能な3次元積層技術を用いた大容量オンチップメモリを搭載する3次元ベクトルプロセッサを提案した. 提案した3次元ベクトルプロセッサは, プロセッサ層と複数のメモリ層から構成され, メモリ層を増加させることオンチップメモリの容量を容易に増加させることが可能であり, オフチップメモリへのアクセス数を削減することで, オフチップメモリアクセスに伴う消費電力を抑制しつつ, メモリアクセスレイテンシを効果的に隠蔽する. 評価の結果, 提案するメモリ積層型3次元ベクトルプロセッサは既存の2次元実装のベクトルプロセッサと比較して, 消費エネルギを最大14%, 実行サイクルを最大63%削減出来ることを示した.

  33. 安全・安心なボランティアコンピューティングによる超大規模データマイニング

    小林 広明, 滝沢 寛之

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業 特定領域研究

    Category: 特定領域研究

    Institution: 東北大学

    2007 - 2008

    More details Close

    本研究は, 家庭用ゲーム機の機能・性能を活用するボランティアコンピューティングによって, 大規模データマイニングを実現するための基盤技術を確立することを目的としている. 平成20年度には, ロケット噴射ノズル近辺での物理現象の解析を行う分散データマイニングシステムを構築し, PLAYSTATION 3およびInTriggerから構成されるボランティアコンピューティング環境で大規模データマイニングの実証実験を行った. その結果, 動的負荷分散の実施方法として従来通り集中型のタスクスケジューリングを用いる場合, 計算資源の増加に伴い動的負荷分散が効率的に行えなくなり, 大規模ボランティアコンピューティング環境で期待する性能を実現することができないことが示された. 一方, 本研究で提案している分散協調型スケジューリング機構では計算資源の台数が増加しても動的負荷分散を効率的に実施すること可能であることが明らかになった. 本評価実験より, 提案機構が大規模ボランティアコンピューティング環境における動的負荷分散を実現する有効な機構であることが明らかになった. また, 複数のプロジェクトに参加するボランティアが遊休計算能力を浪費しないために, ワーカ側でのスケジューリング手法も提案した. ボランティアコンピューティングの信頼性を高めるための仕組みとして, 計算結果の妥当性を効率的に確認する車法も提案した. 各ワーカの信頼度を定量化し, 計算結果妥当評価に基づいて信頼度を変化させることによって, 不正なワーカを検出できることをシミュレーションにより明らかにした. さらに, 家庭用ゲーム機が高い描画処理性能を有している点に着目し, その描画処理性能をデータマイニングのために利用する方法について検討し, そのようなプログラミングを容易に行うためのプログラミングフレームワークについても研究した.

  34. Network Architectures for High-speed Photonic Networks

    HORIGUCHI Susumu, KOBAYASHI Hiroaki, JIANG Xiahong, FUKUSHI Masaru, YAMAMORI Kunihito

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2005 - 2007

    More details Close

    Optical Networks have been expecting for promising high-speed network for applications require a high data transmission rate, low error rate and low delay. In this research, we proposed new optical network architecture for high-speed photonic network which is implemented by all photonic devise. First, we invented the recursive network architecture for non-blocking optical switches and showed that the recursive switch architecture had the good network performance as well as the simple control strategy of self-routing. We also invented the multi-stage optical switch architecture which has good properties of non-block and crosstalk free. The multi-stage optical switch architecture is a good choice for constructing non-blocking optical switch networks with low signal loss and crosstalk. Banyan networks with optical switches that are very attractive for serving as the optical switch architectures due to their nice properties of small depth and absolutely signal loss uniformity. We investigated the stacked vertically stacked optical banyan which is combining the horizontal expansion and vertical stacking of optical banyan networks. We have showed that the horizontally expanded and vertically stacked optical banyan networks usually have good properties of non-block and crosstalk free. Also, we study on the blocking behavior analysis and show that the proposed method is an effective approach to studying network performance and finding a graceful compromise among hardware cost, network depth, and blocking probability. Finally, we study the high survival network systems using a new restoration strategy which achieves the higher restoration performance than proactive restoration. We invented the active restoration strategy to compensate the switch node faults and link faults and achieve the high-survival performance for a large scale networks.

  35. 安全・安心なボランティアコンピューティングによる超大規模データマイニング

    小林 広明, 滝沢 寛之

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 特定領域研究

    Institution: 東北大学

    2006 - 2006

    More details Close

    本年度には、代表的なデータマイニング手法の中でも特に高い演算性能が要求されるデータクラスタリング(Data Clustering, DC)とニューラルネットワーク(Neural Networks, NN)に着目し、それらの処理を家庭用ゲーム機で効率良く実行するための実装方法について検討した。具体的には,家庭用ゲーム機に搭載されている高性能プロセッサであるCell Broadband Engine(CBE)や、描画処理ユニット(Graphics Processing Unit, GPU)をデータマイニング処理に効果的に利用する方法について研究し、実装と定量的性能評価を行った。 大規模P2Pコンピューティングに関する研究として、ネットワーク上に遍在する膨大な数の遊休計算機資源から、利用者の要望を満たす計算機資源を効率良く検索するための分散型計算資源管理機構について研究した。研究成果として、利用者からの要望には計算機のメモリアクセスの振舞いに見られるような時間的、空間的な局所性が存在し、それらの局所性を利用することで探索効率の飛躍的改善が可能であることが明らかにした。本年度は特に不均質な環境下での資源探索を考慮し、利用される頻度に応じてP2P通信の接続数を自動調整する仕組みについて検討した。また、膨大な数の計算機を連携させるための仕組みとして、完全分散型の動的負荷分散機構についても研究を進め、その基本制御方式を設計した。 耐タンパー性計算による安全・安心な分散データマイニングシステムをボランティア計算基盤に実現するための準備として、本年度は開発環境の構築を行った。また、関連資料を収集するとともに、関係者との議論を行った。

  36. 進化型計算機能を有する自律再構成ハードウェアに関する研究

    堀口 進, 小林 広明, 福士 将

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業 萌芽研究

    Category: 萌芽研究

    Institution: 東北大学

    2004 - 2006

    More details Close

    VLSI技術の発展により、可変結合論理アレイ素子を用いて動作環境に応じ機能を自律的に変化させる進化型ハードウェアに関する研究が注目されている。本研究では、静的FPGAや動的FPGAなどのプログラマブル論理素子により実用規模VLSIシステムに進化型計算を適用させ、自律再構成が可能なハードウェア方式について研究を行ってきた。特に、進化型計算機能に基づいた再構成システムの詳細な性能評価を行った。その結果、階層型ニューラルネットワークの故障補償可能な再構成型ハードウェアに適応した進化型計算の機能回路システムと遺伝的アルゴリズムにより学習した回路情報をハードウェア実装することにより木構成方式の有用性を示した。 次に、故障状況に応じてニューラルネットワーク構成を可変にできる自律再構成ハードウェアシステムならびに進化型計算機能を適用した故障回避可能な格子型結合プロセッサ縮退再構成システムについて詳細に検討した。その結果、FPGAデバイスを用いた進化型計算機能回路システムを搭載した故障補償可能な階層型ニューラルネットワークハードウェア実装システムに関する研究成果に基づいて、新しく考案した遺伝的アルゴリズム学習、回路情報と故障補償可能ニューラルネットワークは、問題規模や動作環境に応じてネットワーク構成を自律的に変化させることが出来ることが分かった。 更に、進化型計算機能に基づいた自律再構成格子型結合プロセッサ縮退再構成方式や遺伝的アルゴリズムの故障回避コーディング学習方式の提案とシステム実装を行いその性能評価を行った。これらの研究成果により、進化型計算機能に基づいた故障回避可能な自律格子型結合プロセッサ縮退再構成方式の有用性を明らかにした。

  37. An Intelligent Memory Architecture for 3D Graphics

    KOBAYASHI Hiroaki, NAKAMURA Tadao, SUZUKI Ken-ichi, TAKIZAWA Hiroyuki, SANO Kentaro

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2002 - 2004

    More details Close

    We have the following achievements (1)High-performance graphics algorithm and its hardware We analyzed parallelism and locality of reference in a graphics algorithm based on the global illumination model, and designed a novel rendering pipeline architecture for this algorithm. In addition, we designed and developed a prototype hardware based on the architecture. Through the performance evaluation of the hardware, we showed its effectiveness for realizing interactive ray-tracing. Moreover, we designed a new high-performance algorithm for generating walkthrough animations. (2)Power-efficient memory mechanism For design of the intelligent memory architecture for mobile devices, a low-power mechanism for on-chip memory system was designed. In this mechanism, memory modules are activated and inactivated based on their activity during the program execution. We clarified the relationship between activated memory modules and sustained performance, and showed the effectiveness of power-aware computing for on-chip cache memory. (3)Data compression algorithms for graphics hardware. We applied vector quantization to volume data set to achieve efficient data compression, and designed a visualization algorithm that can directly visualize the compressed volume data. We also designed a novel data compression algorithm using data clustering for graphics hardware

  38. Low Power and Ultra High Speed Microprocessor Architectures

    NAKAMURA Tadao, GOTO Gensuke, FUKASE Masaaki, KOBAYASHI Hiroaki, HAGIWARA Masafumi, SUZUKI Ken-ichi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2002 - 2004

    More details Close

    In the recent decades, the performance improvement of microprocessors has been achieved, resulting in the increase of power consumption of processors, which causes the serious thermal problem on a chip. Nevertheless, since the strong demand for low power and high performance processors still exists, an architecture of microprocessors to solve the problem is required. In this research, our objective has been to establish microprocessor architectures enabling low power and low frequency operation by composing their modules reasonably. Firstly, we have shown a direction of future microprocessor design by defining the conception of its low power and high speed operation. This direction is so revolutionary that the head investigator has been and is going to be asked to be an invited speaker at international conferences. Based on the definition, we have proposed and evaluated some architectures for low power microprocessors. We also have shown that fine and course grain parallelism in threads should be extracted from application programs to exploit the feature of the proposed architectures, and further have implemented the method to obtain the parallelism from programs. On the other hand, in order to achieve low power and high speed microprocessors, it is essential to design their datapaths. We have designed a datapath by using wave pipelining, which enables both high speed processing and low power operation. In addition, we have proposed a new cache mechanism to bridge the speed gap between the datapath of a microprocessor and the main memory. As an application of parallel processing, designing codebooks for compressing information is well-known. We have challenged to this application by investigating the possibility of reducing the power consumption from the viewpoints of both software and hardware. We have shown the effectiveness of our architecture by implementing low power and high speed parallel dedicated processors.

  39. A Study of a High-Speed and Highly-Functional Instruction Feeding Mechanism for the VLSI Architecture

    SUZUKI Ken-ichi, NAKAMURA Tadao

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

    Category: Grant-in-Aid for Scientific Research (C)

    2000 - 2002

    More details Close

    The VLIW architecture, that is the most promising for the implementation of the next generation microprocessors, executes many instructions in parallel, requiring a high performance memory system to supply a huge number of instructions in short time from the main memory to its functional units. We introduce a high performance instruction cache mechanism devoted to the VLIW architecture, named the MULHI (MULtiple HIt) cache. A MULHI cache achieves high cache hit ratio by eliminating unnecessary "nop" instructions from its cache memory array, that enables to create a high-bandwidth memory system. The MULHI cache is based on the same concept with the COMPRESS cache and the SILO cache, at the point of eliminating nops from their data array. However, only the MULHI cache could apply a cache associativity to its cache management policy to acquire a higher cache hit ratio. Using software simulations, we evaluate the MULHI cache miss ratio that show it achieve a higher (OPC Operations Per Cycle) than the other cache mechanisms. Moreover, we make a detailed hardware design, that show the overhead of the MULHI cache control logic circuits is significantly small. Consequently, the MULHI cache architecture is much feasible for implementing a high speed memory system for VLIW processors. At last, as a new application of cache memory, we evaluate a real-time ray tracing system, that is remarkably powerful for rendering images.

  40. Self-Reconfigufation Architecture of Mesh-Connected Network for Multiprocessor Systems and The Implemantation

    HORIGUCHI Susumu, HAYASHI Ryouko, YAMAMORI Kunihito, KOBAYASHI Hiroaki, INOGUCHI Yasushi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Japan Advanced Institute of Science and Technology

    1999 - 2001

    More details Close

    This research deals with the issue of reconfiguring network interconnection for mesh-connected processor arrays (mesh array) implemented in VLSI/WSI. For massively parallel systems, it is becoming necessary to develop self-reconfiguratopn architecture that can automatically reconfigure partially faulty systems. Many reconfiguration algorithms have been proposed to date, however, most of them are not suitable for the self-reconfiguration and little literature shows the hardware implementation of the architecture actually. In this research, we propose a hardware-oiented self- reconfiguration architecture based on simple schemes of column bypass and south directional rerouting, and show a hardware implementation of proposed architecture using FPGA. The main feature of the proposed self-reconfiguration architecture is that faulty processors are avoided by switchig mechanisum, which can be determined its desired function automatically using states of neighboring processors. Simulated result shows that the proposed self-reconfiguration architecture is that faulty processors are avoided by switching machanism, which can be determined its desired function automatically using states of neighboring processors. Simulated result shows that the proposed architecture achieves higher system yield than those of the previous archtectures in rectangular mesh arrays. We also implement the reconfiguration system in FPGA and have been discussed in performance of it. The hardware overhead of redundant circuits such as switches and control circuits shows less than 4 %, where hardware cost of a procesor, which includes a test circuit, is 50 Kgates.

  41. DEVELOPING A PHOTO-REALISTIC COMPUTER GRAPHICS SYSTEM

    KOBAYASHI Hiroaki, KATAHIRA Masayuki, KITAJIMA Hiroyuki, NAKAMURA Tadao, SUZUKI Ken-ichi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B).

    Category: Grant-in-Aid for Scientific Research (B).

    Institution: TOHOKU UNIVERSITY

    1998 - 2000

    More details Close

    In this research project, we did a basic design of a graphics hardware architecture for photo-realistic image synthesis. The design is based on the object space parallel processing model that have been proposed by the main investigator of the project. A prototype, named Thunder, was developed as a printed circuit board with a PCI interface, 2 FPGAs, each of which can implement a logic circuit with up to 200K gates and 4 256-MB SDRAMs (total 1GB). We implement the basic function units of Thunder : a 3DDDA unit, an intersection calculation unit (ICU), and a secondary ray generator on the FPGAs, and an object memory on the SDRAMs. The maximum bandwidth between the object memory and function units is 512MB/s. In the design of the Thunder, we especially focus on the optimization of the ICU.We employed the fix-point calculations instead of the floating-point ones to achieve low latency and high throughput of the ICU.To avoid the image quality degradation by fixed-point calculations, we developed a novel fix-point intersection calculation algorithm to keep calculation accuracy as high as possible. Through the experiments, we confirmed that the image quality using our algorithm with fixed-point calculations is comparable to that obtained by 64-bit floating-point calculations. In addition, we discussed the performance scalability in terms of the number of ICUs. The experimental results have shown that speedups of 6.4 in 8 ICUs and 11 in 16 ICUs can be obtained. Especially, in the case of 16 ICUs, running at 400MHz, we estimated that the accelerator is 20 times faster than Pentium-II based image synthesis running at the same clock frequency. The accelerator also needs a memory bandwidth of around 100GB/s. We believe that such a large bandwidth can be available as the CMOS technology proceeds, for example, the memory-logic merged.

  42. Advanced Architectures for Brain - Structured Supercomputers

    NAKAMURA Tadao, FUKASE Masa-aki, KOYANAGI Mitsumasa, HASEGAWA Katsuo, KOBAYASHI Hiroaki, HAGIWARA Masafumi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B).

    Category: Grant-in-Aid for Scientific Research (B).

    Institution: TOHOKU UNIVERSITY

    1998 - 1999

    More details Close

    In addition to high speed data processing by supercomputing, introducing of the functions of the human brain has contributed to build flexible computers. Structural hints from the human brain allowed us to think about more flexible functions that lead to various data/information processing. In this research we have improved the architecture of brain-structured computers, designed an MISD processor, and implemented an example of the design. We call this architecture the SHIFT MACHINE architecture which can be regarded as an MISD computer. This work is based upon several research results consisting of the analysis of VLIW architectures with a cache analysis and a speculation method. The simulator of the SHIFT MACHINE is available to show its behavior visually. Related to the SHIFT MACHINE architecture, we have also developed a reconfigurable synchronous dataflow computer called SOUND. This was implemented in a chip processor fashion and evaluated. As a result, the left-brain function has been implemented and the speed has been accepted to be a suitable one under low power condition. On the other hand, the right brain has been implemented in the research of computer graphics and volume rendering. These two subjects has been realized speedy and flexibly by improving the algorithms. In computer graphics of our system, the algorithm is evaluated on a commercially available parallel machine and we got the fast rendering. This fact is proved in volume rendering. Developing reasonable algorithms for computer graphics and volume rendering, we can reach some potential to show fast rendering that is suited for real time processing and rendering. Further neural network research has been developed to discover the right brain function. The results are worth comparing artificial mechanism with the human brain structure. From these two results in left-brain and right brain research, we have discussed the integration of these two brain functions in terms of the mutual behavior of the functions of the left and right brains. To have these two get together, we concluded that the integration is based on the processing speed on computers engaged in the left and right brains. To increase their processing speed, we have developed the architecture with software including special speculation. Also a cache mechanism has been developed to have high speed processing.

  43. 空間分割型並列処理に基づくボリュームレンダリングアルゴリズムに関する研究

    小林 広明

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業 奨励研究(A)

    Category: 奨励研究(A)

    Institution: 東北大学

    1998 - 1999

    More details Close

    本研究では、3次元データであるボリュームのリアルタイム可視化を可能とする並列アルゴリズムの研究を行なった。具体的には、平成10年度に設計した適応分割による負荷バランスを考慮した並列シェア・ワープアルゴリズムを並列計算機に実装し、その性能を評価した。性能評価の結果、本並列アルゴリズムは、並列計算機の処理要素であるプロセッサ数に比例した性能向上が得ることがわかった。また、適応分割を導入することにより、並列処理を行なうプロセッサ間の負荷分散が実現されると同時に、並列アルゴリズムに内在する通信量が減少し、その結果、並列処理効率が改善されることがわかった。そして、32台のプロセッサからなる並列計算機により、256×256画素の画像を1秒間に10枚以上生成できることを確認した。また、本研究では、ボリュームデータとポリゴンデータが混在したシーンに対する写実的画像生成を実現するために、大域照明モデルに基づく画像生成法であるレイトレーシング法とラジオシティ法の改良と、その並列化を行なった。具体的には、光線のボリューム内伝搬におけるエネルギー授受モデルをラジオシティとレイトレーシングの照明モデルと統合化し、さらに、統合化したモデルをオブジェクト空間分割型並列処理モデルに基づいて並列化した。本改良並列アルゴリズムにより、ポリゴンで実現される物体と雲や霧などが混在するシーンに対する大域照明モデルでの写実的画像生成が高速に実現できる。

  44. Study on Massively Parallel Simulations and Visualizations

    HORIGUCHI Susumu, ABE Toru, KOBAYASHI Hiroaki, ABE Masato, KAWAZOE Yoshiyuki, TANNO Kuninobu

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: JAPAN ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY,Hokuriku

    1995 - 1996

    More details Close

    Computer simulations have been important and are frequently used in advanced science and technology. Advanced simulations require large computation times and huge memory capacities of computers. Massively parallel computers have been attractive for advanced simulations instead of supercomputers in this decade. Parallelized simulations, however, have not been studied sufficiently and the visualizations of simulation results also have been more important recently. The study on massively parallel simulations and visualizations focuses on the advanced scientific field such as physics, Chemistry, Material Science, Fluid Dynamics and Neural Networks, We implemented many parallelized simulations on massively parallel computers : CM-5, ncube2 and Parsytec GC and demonstrated visualization of simulation results by Computer Graphics. To achieve high-speed parallel simulations, we developed the dynamics load balancing method for Molecular Dynamics on CM-5, the global communication method for room illlumination simulation by Radiosity on Parsytec GC,and the packed message passing method for neural network on ncube2. We also snow the effective visualizations of huge simulation data using 3D computer graphics.

  45. TLB統一型キャッシュメモリシステムに関する研究

    小林 広明

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業 奨励研究(A)

    Category: 奨励研究(A)

    Institution: 東北大学

    1995 - 1995

    More details Close

    本研究では,マイクロプロセッサのチップ上に個別に実装され,チップ面積の大きな割合を占めるTLBとキャッシュメモリについて,それらをタグの共有という形で統合化することにより,領域の縮小を試みた.また,縮小によって得られた領域をTLBの拡大として再利用することにより,メモリアクセスサイクルの減少の可能性について検討した. まず,TLB統一型キャッシュメモリの構成とその制御法を明確にし,TLB統一型キャッシュメモリのハードウェア量をレジスタビット相当で評価した.その結果,TLB統一型キャッシュメモリを導入することにより,従来のキャッシュメモリとTLBの構成に比べて,ハードウェア量を大幅に削減できることがわかった.そして,削減できたハードウェアをTLBの拡張に再利用した場合,キャッシュサイズが4KBの時は16エントリのTLBを2倍,8KBの時は4倍,16KB,32KBの時は8倍,128KBの時は16倍にそれぞれ拡張できることが明らかになった.次に,TLB統一型キャッシュメモリの性能評価をトレースドリブンシミュレーションにより行った.まず,実用的な8個の応用プログラムをワークステーションで800万命令実行した際のメモリアクセル状況を記録し,これを命令実行に必要なメモリアクセスとして,TLB統一型キャッシュメモリシミュレータと通常のTLB-キャッシュメモリシミュレータに入力した.そして,シミュレータ上でのキャッシュとTLBを介したメモリアクセス状況から,それぞれのミス率を求め,ミス率から1命令の実行に必要な平均メモリサイクル数を求めた.シミュレーションによる性能評価の結果,TLBとキャッシュメモリの統合化により削減できるハードウェア領域をTLBの拡張に再利用することにより,同量のハードウェアを必要とする従来型の構成比べて,メモリサイクル数減少させることが可能であることを明らかにした.

  46. Self-Reconfigurable Massively Parallel Computer on Stacked Wafers

    HORIGUCHI Susumu, NUMATA Issei, ABE Touru, TANNO Kuninobu, KOBAYASHI Hiroaki, ASO Hirotomo, JAIN Vijay k., KIM Jung h., TAKETA Hiroshi, SHIMODAIRA Hiroshi, THOMAS Knight jr., FABRIZIO Lambardi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

    Category: Grant-in-Aid for international Scientific Research

    Institution: JAPAN ADVANCED JNSTITUTE of SCIENCE and Technology, Hokuriku

    1993 - 1995

    More details Close

    This research deals with a 3D-mesh array on stacked wafers and its fault tolerant architecture. The architecture of 3D-mesh arrays provides a self-reconfiguration of interconnections using a recursive shift scheme. Anuj Chandra et al. also proposed a reconfigurable algorithm for 3D 1/ track model based on a compensation path scheme that was originally proposed S.Y.Kung et al. The 3D 1/ track model was, however, discussed only from the theoretical view points of extension of the 2D 1/ track model. This paper examines its fault tolerant performance to obtain the system yield of a 3D-mesh array using a self-reconfiguration scheme. First, we reviews recent WSI devices to construct massively parallel computers and summarize the merit of WSI parallel computers. Next. we deal with the mesh-connected multiprocessor architecture and reconfiguration stategies to enhance the array yield for WSI implementation. Reconfiguration performance of a mesh-connected parallel computer is discussed by comparing it to previous works. WSI implementation of a cube-connected cycles (CCC) is addressed and its yield performance is discussed by taking into account the chip area of the PEs, switches, and links. We also propose a new interconnection network HCQ based on a crossed cube interconnection to reduce the diameter and the average distance of the interconnection network. The excellent network property of HCQ is theoretically investigated. Finally, we discussed a 3D-mesh array on stacked wafers for massively parallel computers. A reconfiguration algorithm based on a recursive shift scheme is proposed. Applying the recursive shift scheme to a 3D-mesh array, it is shown that the reconfiguration performance becomes high and provides the possibility to construct a massively parallel computer on stacked wafers like as the 3D-mesh array.

  47. 写実的画像生成のための超並列システムに関する研究

    小林 広明

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業 奨励研究(A)

    Category: 奨励研究(A)

    Institution: 東北大学

    1994 - 1994

    More details Close

    本研究では,写実的画像生成のための超並列システム実現に向けて,その基礎となる新しい大域照明モデルを提案し,本モデルに基づいたシステム構成方式とその制御方式について検討した.具体的には,まず,物体情報を各プロセッサに分散配置するメモリモデル上での新しい超並列写実画像生成方式を実現するために,オブジェクト空間分割型並列処理方式に注目し,光線追跡法とラジオシティ法を統合した大域照明モデルにオブジェクト空間分割型並列処理方式を適用させて,新しい超並列写実的画像生成アルゴリズムを考案した.次に,本アルゴリズムに適した超並列計算機アーキテクチャについて検討し,システム構成,およびその制御方法を具体化した.最後に,本システムの性能評価のために,本システムのレジスタトランスファレベルでのシミュレーションが可能なシミュレータを開発し,いくつかのテスト画像生成でその性能を評価した.性能評価の結果,本システムは,256台程度まではプロセッサ台数に比例して処理時間が減少し,台数効果が得らることがわかった.また,システムの稼働率について検討したところ,256台以下では高い稼働率が達成されているが,それ以上のプロセッサからなるシステムでは,稼働率の著しい低下が観測された.この理由としては,本研究で考案した並列アルゴリズムでは,物体定義空間を静的に分割し,それをプロセッサに均一に割り当てることによりプロセッサへのタスク割り当てを行う静的負荷分散法を採用しているために,プロセッサ数を増加させた場合,それに見合う十分な空間分割が行われないと,負荷の不均一が発生し,その結果,プロセッサの稼働率に偏りが生じてしまうからである.これをさけるためには,より細かい空間分割を行うか,実行時のプロセッサの稼働率状態に応じてタスクの再配置を行う動的負荷分散を行うことが必要と思われる.これについては,今後の最重要課題である.

  48. Studies in Brain-Structured Supercomputers

    NAKAMURA Tadao, SUGIMOTO Osamu, KOBAYASHI Hiroaki, HAGIWARA Masafumi, GOTOH Eisuke, FUKASE Masa-aki, HASEGAWA Katsuo, FLYNN Michael j.

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

    Category: Grant-in-Aid for international Scientific Research

    Institution: TOHOKU UNIVERSITY

    1993 - 1994

    More details Close

    We have planned the joint research project between Tohoku University and Stanford University with the aim of analyzing, synthesizing, and evaluating the performance of brain-structured supercomputers. During the fourth academic year of the project, ten meetings have been held. The meetings have been very profitable for the progress of our project. We have published many papers about brain-structured supercomputers in this year. Following are main subjects discussed extensively and deeply in these meetings. 1. Synthesis of brain-structured supercomputers : The model of a brain-structured supercomputer has been diagramed by using mind computer, expression recognition associative memory, wave cybernetics, artificial cochlea, sparse adaptive memory, wave pipeline, jet pipeline, logical architecture, symbolic architecture, functional architecture, distributed shared memory multiprocessor, and computer graphics etc. 2. Architecture of the sparse adaptive memory : The wave pipcline has been designed by using a CMOS VLSI vector unit. Advances and problems in high speed processor design has been made clear. 3. Analysis of RIGHT computer : So far developed methodologies have been farther investigated to implement possible brain functionalities onto supercomputers from various aspects like memory system, conversion system from input to memory reference frame, mechanisms of heuristics, etc. 4. Architecture of superparallel symbol processing : A book has been published that describes the basis of the VLSI architecture for the superparallel symbol processing. 5. Performance evaluation of LEFT and RIGHT computer : The whole system of computer graphics has been investigated by using multiprocessing techniques for ray tracing and multipass rendering. 6. Study of neural network toward RIGHT computer : Four papers have been published about combination of neural networks and fuzzy inference, knowledge processing by distributed representation.

  49. Basic Studies in Supercomputer Organization and Performance

    NAKAMURA Tadao, SUGIMOTO Osamu, KOBAYASHI Hiroaki, HAGIWARA Masafumi, GOTOH Eisuke, FUKASE Masa-aki, HASEGAWA Katsuo, FLYNN Michael J

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research Grant-in-Aid for international Scientific Research

    Category: Grant-in-Aid for international Scientific Research

    Institution: TOHOKU UNIVERSITY

    1991 - 1992

    More details Close

    Through 1991-1992, we have continued the international joint research project entitled "Basic Studies in Supercomputer Organization and Performance" supported by the Ministry of Education of Japan. By virtue of this support we have progressed in research on Supercomputers and then we have developed our research from the fundamental field of neuroscience to computer science and technologies. Especially, in neuroscience our group aims at the next generation computer based on neural networks and their applications that includes the flexibility of thinking. We name the image (meaning)-oriented computer the RIGHT computer that stems from the right brain of the human being. The research field of the RIGHT computer covers input/output devices of usual computers in addition to original functions of the right brain. Here, the original function means, for example, creating the concept for something, which is towards realizing the real artificial intelligence/brain. To develop these fields, we have studied a concept of a mind-oriented computer, sparse distributed memory for pattern recognition and a variable resolution, nonlinear silicon cochlea for speech recognition in input devices category, computer graphics in output devices one, and expression recognition using a neural network in a training fashion. In numerical calculations in scientific applications, the function of the left brain is extremely expected at highest processing rate. Usual von-Neumann computers are the LEFT computer in view of the left brain. Wave pipelining to increase clock frequency in practical circuits without increasing the number of storage elements has been proposed for speedup of calculations. A novel architecture of supercomputing has been proposed and advanced that is called the Jet Pipeline whose feature is to integrate all the possible features used in usual computers. Then, in terms of theoretical models of computation, a functional programming language gas been examined.

  50. オブジェクト指向レイトレーシングにおける並列モデリングに関する研究

    小林 広明

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 奨励研究(A)

    Institution: 東北大学

    1990 - 1990

  51. オブジェクト指向並列レイトレーシングシステムに関する研究

    小林 広明

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 奨励研究(A)

    Institution: 東北大学

    1989 - 1989

Show all Show first 5

Social Activities 10

  1. 7th Teraflop Workshop

    2007/11/21 - 2007/11/22

    More details Close

    スーパーコンピュータとその応用に関する国際学術講演会

  2. 5th Teraflop Workshop

    2006/11/20 - 2006/11/21

    More details Close

    スーパーコンピュータとその応用に関する国際学術講演会

  3. 津波被害予測に活用/スーパーコンピュータの多彩な役割

    2015/06/06 -

  4. Japan Concludes Exascale Feasibility Study

    2014/12/03 -

  5. 津波浸水域, 20分で予測 東北大など, スパコン活用

    2014/08/03 -

  6. 東北大学とNEC、次世代スーパーコンピュータ技術の共同研究組織

    2014/06/29 -

  7. Feasibility Study of Advanced Vector Architecture System toward Exascale at Cyberscience Center, Tohoku University, Japan

    2013/05/13 -

  8. 震災を乗り越えた東北大のスパコンが目指す未来

    2011/10/28 -

  9. 仙台育英学園秀光中等教育学校講演会

    2006/12/14 -

    More details Close

    高校での出張講義

  10. 仙台市医師会学術講演会

    2006/04/19 -

    More details Close

    医師向け技術講演会

Show all Show first 5

Media Coverage 7

  1. 科学の泉「未来をひらくスパコン(1)〜(9)」

    河北新報

    2015/05

    Type: Newspaper, magazine

  2. 災害を3Dで可視化 津波浸水予測に活用 東北大

    河北新報,NHK

    2014/06/29

    Type: Newspaper, magazine

  3. 超高速計算が起こす“新・産業革命” 〜スパコン「京」のひらく未来〜

    NHK

    2013/01/08

    Type: TV or radio program

  4. ベクトル型復権に光

    日経産業新聞

    2007/12/25

    Type: Newspaper, magazine

  5. 性能世界一のスパコン,東北大「SX-7」

    朝日新聞

    2005/02/24

    Type: Newspaper, magazine

  6. スーパーコンピューター,東北大学が性能世界一

    NHK総合

    2005/02/09

    Type: TV or radio program

  7. 計測器性能は世界一 東北大スーパーコンピューター

    河北新報

    2005/01/24

    Type: Newspaper, magazine

Show all Show first 5

Other 8

  1. リアルタイム津波予測システムとLアラートの連携による「津波Lアラート」の構築と災害対応の高度化実証事業

    More details Close

    大規模地震発生時に,遠隔に設置するスーパーコンピュータによるリアルタイム津波シミュレーションを相補的に機能させ,日本全国をカバーするリアルタイム津波浸水被害予測システムの研究開発と,シミュレーション結果をLアラートから提供することにより全国の自治体への配信を可能とした.

  2. リアルタイム津波浸水・被害予測・災害情報配信による自治体の減災力強化の実証事業

    More details Close

    地震観測データとスーパーコンピュータによるリアルタイムシミュレーションを連携させ,地震発生から20分以内に関係自治体に津波浸水被害予測情報を配信するためのシステムの研究開発を行う

  3. 高メモリバンドはアプリケーションに適した将来のHPCIシステムのあり方の調査研究

    More details Close

    本事業では, 2018年頃に実現が求められ,我が国の安全安心な社会作りと,産業界の国際競争の強化に不可欠な先端ものづくりを支える将来のスーパーコンピュータシステムの実現に必要な技術的知見の獲得を目的として,アプリケーション,システムアーキテクチャ,システムソフトウェア,デバイス技術,それぞれについて技術的課題を明らかにし,その解決のための要素技術の検討とシステム設計研究を行い,将来のHPCIシステムの在り方についての調査研究を行う.

  4. 「「京」を中核とするHPCIの産業利用支援・裾野拡大のための設備拡充」

    More details Close

    HPCIを支える高度計算機設備の拡充と,その利用環境の高度化に関する研究開発に取り組む

  5. プログラマブル・キャッシュ付ベクトル機構によるアプリケーション性能評価

    More details Close

    シミュレーションプログラムの高速化技術としてオンチップメモリ機構とそのソフトウェア利用技術の協調設計を行う

  6. 自己修復機能を有する3次元VLSI システムの創製

    More details Close

    本研究プロジェクトでは、車載用画像処理システムのディペンダビリティについて、アーキテクチ ャ・OS レベルからのディペンダビリティ向上に対する考え方を基に、ディペンダブルな画像処理システ ムの実現に必要な画像処理・認識能力、要件を考慮したシステムの全体設計、診断・修復機能を有する リコンフィギュラブルロジックおよびリコンフィギュラブル等のハードウェア技術、VM を基本としたデ ィペンダブルソフトウェア技術の面から研究を進める。研究全体を、画像処理システムに関する研究、 ソフトウェア技術に関する研究、ハードウェア技術に関する研究の3 つの分野に分け、それぞれの分野 間で緊密な連携が取れるような研究分担体制を構築しながら、研究を進めて行く。

  7. 超音波計測連成解析による超高精度生体機能計測システム

    More details Close

    スーパーコンピュータによるシミュレーション解析と超音波計測機器データとを融合させることにより、高精度な生体機能計測を高速に行うシステムの研究開発において、スーパーコンピュータと計測機器間のインタフェース設計・開発を担当

  8. ICTエコ社会を創造する安全・安心・安価なユビキタスコンピューティングプラットフォームの研究・開発

    More details Close

    情報通信分野でのエコロジーモデルの確立を目指し、社会に遍在する計算資源として活用する、ユビキタス時代の安心・安全・安価なボランティアコンピューティング基盤を研究開発する。特にボランティアコンピューティングの高効率化、高信頼化、および参加を促進するインセンティブモデルについて研究し、機密性の高い計算にも利用可能で、しかも従来の実装技術では実現困難な規模の大規模計算基盤を安価に提供するための基盤技術を確立する。

Show all Show first 5