テクノフェスティバルに参加しました.

12月7~9日に渡って開催された、東北大学機械系主催のテクノフェスティバルに修士1年生が参加しました。
機械系OBの方々に当研究室でどのような研究をしているのか、ポスターを使って説明を行いました。

 

Booth Exhibition @ SC18 Dallas

Our group exhibited our research activities at SC 18 with IFS(Institute of Fluid Science) and IMR (Institute of Material Research)!Thank you for visiting our Booth.

SC18でのブース展示

サイバーサイエンスセンター スーパーコンピューティング研究部では,毎年高性能計算に関する国際会議SCにおいて,流体科学研究所,金属材料研究所と共同でブース展示をしております.今年も大盛況でした!

Poster presentation @ SC18

M. Agung (D3) presented his research effort at SC18!
M. Agung, M. A. Amrizal, R. Egawa, and H. Takizawa, “A Locality and Memory Congestion-aware Thread Mapping Method for Modern NUMA Systems,” Poster Presentation at SC18, 13 Nov. 2018, Dallas.
 

 
 
 

D3 Agung君がSC18でポスター発表を行いました.

博士課程三年のAgung君が,米国ダラスで開催された高性能計算に関する国際会議SC18でポスター発表を行いました.
M. Agung, M. A. Amrizal, R. Egawa, and H. Takizawa, “A Locality and Memory Congestion-aware Thread Mapping Method for Modern NUMA Systems,” Poster Presentation at SC18, 13 Nov. 2018, Dallas.
 

寺西博士が11月22日に研究室を訪問します

サンディア国立研究所の寺西博士が11月22日に訪問して講演してくれます。
Dr. Keita Teranishi is a principal member of technical staff at Sandia National Laboratories, California, USA. He received the BS and MS degrees from the University of Tennessee, Knoxville, in 1998 and 2000, respectively, and the PhD degree from The Pennsylvania State University, in 2004. His research interests are parallel programming model, fault tolerance, numerical algorithm and data analytics for high performance computing systems.
講演の概要は以下の通りです。
Abstract: Tensors have found utility in a wide range of applications, such as chemometrics, network traffic analysis, neuroscience, and signal processing. Many of these data science applications have increasingly large amounts of data to process and require high-performance methods to provide a reasonable turnaround time for analysts. Sparse tensor decomposition is a tool that allows analysts to explore a compact representation (low-rank models) of high-dimensional data sets, expose patterns that may not be apparent in the raw data, and extract useful information from the large amount of initial data. In this work, we consider decomposition of sparse count data using CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR).
Unlike the Alternating Least Square (ALS) version, CP-APR algorithm involves non-trivial constraint optimization of nonlinear and nonconvex function, which contributes to the slow adaptation to high performance computing (HPC) systems. The recent studies by Kolda et al. suggest multiple variants of CP-APR algorithms amenable to data and task parallelism together, but their parallel implementation involves several challenges due to the continuing trend toward a wide variety HPC system architecture and its programming models.
To this end, we have implemented a production-quality sparse tensor decomposition code, named SparTen, in C++ using Kokkos as a hardware abstraction layer. By using Kokkos, we have been able to develop a single code base and achieve good performance on each architecture. Additionally, SparTen is templated on several data types that allow for the use of mixed precision to allow the user to tune performance and accuracy for specific applications. In this presentation, we will use SparTen as a case study to document the performance gains, performance/accuracy tradeoffs of mixed precision in this application, development effort, and discuss the level of performance portability achieved. Performance profiling results from each of these architectures will be shared to highlight difficulties of efficiently processing sparse, unstructured data. By combining these results with an analysis of each hardware architecture, we will discuss some insights for improved use of the available cache hierarchy, potential costs/benefits of analyzing the underlying sparsity pattern of the input data as a preprocessing step, critical aspects of these hardware architectures that allow for improved performance in sparse tensor applications, and where remaining performance may still have been left on the table due to having single algorithm implementations on diverging hardware architectures.

Dr. Keita Teranishi will visit our lab on Nov 22!

Dr. Keita Teranishi will visit our lab and give a talk on Nov 22.
He is a principal member of technical staff at Sandia National Laboratories, California, USA. He received the BS and MS degrees from the University of Tennessee, Knoxville, in 1998 and 2000, respectively, and the PhD degree from The Pennsylvania State University, in 2004. His research interests are parallel programming model, fault tolerance, numerical algorithm and data analytics for high performance computing systems.
The abstract of his talk is as follows.
Abstract: Tensors have found utility in a wide range of applications, such as chemometrics, network traffic analysis, neuroscience, and signal processing. Many of these data science applications have increasingly large amounts of data to process and require high-performance methods to provide a reasonable turnaround time for analysts. Sparse tensor decomposition is a tool that allows analysts to explore a compact representation (low-rank models) of high-dimensional data sets, expose patterns that may not be apparent in the raw data, and extract useful information from the large amount of initial data. In this work, we consider decomposition of sparse count data using CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR).
Unlike the Alternating Least Square (ALS) version, CP-APR algorithm involves non-trivial constraint optimization of nonlinear and nonconvex function, which contributes to the slow adaptation to high performance computing (HPC) systems. The recent studies by Kolda et al. suggest multiple variants of CP-APR algorithms amenable to data and task parallelism together, but their parallel implementation involves several challenges due to the continuing trend toward a wide variety HPC system architecture and its programming models.
To this end, we have implemented a production-quality sparse tensor decomposition code, named SparTen, in C++ using Kokkos as a hardware abstraction layer. By using Kokkos, we have been able to develop a single code base and achieve good performance on each architecture. Additionally, SparTen is templated on several data types that allow for the use of mixed precision to allow the user to tune performance and accuracy for specific applications. In this presentation, we will use SparTen as a case study to document the performance gains, performance/accuracy tradeoffs of mixed precision in this application, development effort, and discuss the level of performance portability achieved. Performance profiling results from each of these architectures will be shared to highlight difficulties of efficiently processing sparse, unstructured data. By combining these results with an analysis of each hardware architecture, we will discuss some insights for improved use of the available cache hierarchy, potential costs/benefits of analyzing the underlying sparsity pattern of the input data as a preprocessing step, critical aspects of these hardware architectures that allow for improved performance in sparse tensor applications, and where remaining performance may still have been left on the table due to having single algorithm implementations on diverging hardware architectures.

滝沢教授,江川准教授が28thWSSPで研究成果の発表を行いました.

2018年ドイツシュトゥットガルト大学高性能計算センター(HLRS)で開催された第28回Workshop on Sustained Simulation Perfirmanceにおいて,滝沢教授,江川が研究成果の発表を行いました.
第29回のWSSPは2019年3月19日,20日の2日間仙台で開催予定です!

M1 student Shiotsuki presented at SWoPP2018

M1 student Shiotsuki made presentations at SWoPP2018 (Summer United Workshops on Parallel, Distributed and Cooperative Processing) held at 熊本市国際交流会館 from July 30th to August 1st.
SWoPP2018:
https://sites.google.com/site/swoppweb/swopp2018
He made a presentation on “Performance evaluation of inter-process communication of SX-Aurora TSUBASA”.

修士1年の塩月くんがSWoPP2018で発表しました.

修士1年の塩月くんが7月30日〜8月1日に熊本市国際交流会館で開催されたSWoPP2018(Summer United Workshops on Parallel, Distributed and Cooperative Processing)で発表しました.
SWoPP2018:
https://sites.google.com/site/swoppweb/swopp2018
「SX-Aurora TSUBASAにおけるプロセス間通信の性能評価」という題目で発表しました.