Best Paper Award!! (AutoML2018)

Our members’ paper was selected for the best paper award at AutoML2018!
Title:Automatic Hyperparameter Tuning of Machine Learning Models under Time Constraints
Authors: Zhen Wang, Agung Mulya, Ryusuke Egawa, Reiji Suda, Hiroyuki Takizawa


 
 

最優秀論文賞@AutoML2018

Zhen Wongくんの論文が, 2018年12月13日に開催されたIEEE BigData併設のAutoML2018という国際ワークショプで最優秀論文賞に選ばれました!
対象論文:Automatic Hyperparameter Tuning of Machine Learning Models under Time Constraints
著者: Zhen Wang, Agung Mulya, Ryusuke Egawa, Reiji Suda, Hiroyuki Takizawa

 
 
 

テクノフェスティバルに参加しました.

12月7~9日に渡って開催された、東北大学機械系主催のテクノフェスティバルに修士1年生が参加しました。
機械系OBの方々に当研究室でどのような研究をしているのか、ポスターを使って説明を行いました。

 

Booth Exhibition @ SC18 Dallas

Our group exhibited our research activities at SC 18 with IFS(Institute of Fluid Science) and IMR (Institute of Material Research)!Thank you for visiting our Booth.

SC18でのブース展示

サイバーサイエンスセンター スーパーコンピューティング研究部では,毎年高性能計算に関する国際会議SCにおいて,流体科学研究所,金属材料研究所と共同でブース展示をしております.今年も大盛況でした!

Poster presentation @ SC18

M. Agung (D3) presented his research effort at SC18!
M. Agung, M. A. Amrizal, R. Egawa, and H. Takizawa, “A Locality and Memory Congestion-aware Thread Mapping Method for Modern NUMA Systems,” Poster Presentation at SC18, 13 Nov. 2018, Dallas.
 

 
 
 

D3 Agung君がSC18でポスター発表を行いました.

博士課程三年のAgung君が,米国ダラスで開催された高性能計算に関する国際会議SC18でポスター発表を行いました.
M. Agung, M. A. Amrizal, R. Egawa, and H. Takizawa, “A Locality and Memory Congestion-aware Thread Mapping Method for Modern NUMA Systems,” Poster Presentation at SC18, 13 Nov. 2018, Dallas.
 

寺西博士が11月22日に研究室を訪問します

サンディア国立研究所の寺西博士が11月22日に訪問して講演してくれます。
Dr. Keita Teranishi is a principal member of technical staff at Sandia National Laboratories, California, USA. He received the BS and MS degrees from the University of Tennessee, Knoxville, in 1998 and 2000, respectively, and the PhD degree from The Pennsylvania State University, in 2004. His research interests are parallel programming model, fault tolerance, numerical algorithm and data analytics for high performance computing systems.
講演の概要は以下の通りです。
Abstract: Tensors have found utility in a wide range of applications, such as chemometrics, network traffic analysis, neuroscience, and signal processing. Many of these data science applications have increasingly large amounts of data to process and require high-performance methods to provide a reasonable turnaround time for analysts. Sparse tensor decomposition is a tool that allows analysts to explore a compact representation (low-rank models) of high-dimensional data sets, expose patterns that may not be apparent in the raw data, and extract useful information from the large amount of initial data. In this work, we consider decomposition of sparse count data using CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR).
Unlike the Alternating Least Square (ALS) version, CP-APR algorithm involves non-trivial constraint optimization of nonlinear and nonconvex function, which contributes to the slow adaptation to high performance computing (HPC) systems. The recent studies by Kolda et al. suggest multiple variants of CP-APR algorithms amenable to data and task parallelism together, but their parallel implementation involves several challenges due to the continuing trend toward a wide variety HPC system architecture and its programming models.
To this end, we have implemented a production-quality sparse tensor decomposition code, named SparTen, in C++ using Kokkos as a hardware abstraction layer. By using Kokkos, we have been able to develop a single code base and achieve good performance on each architecture. Additionally, SparTen is templated on several data types that allow for the use of mixed precision to allow the user to tune performance and accuracy for specific applications. In this presentation, we will use SparTen as a case study to document the performance gains, performance/accuracy tradeoffs of mixed precision in this application, development effort, and discuss the level of performance portability achieved. Performance profiling results from each of these architectures will be shared to highlight difficulties of efficiently processing sparse, unstructured data. By combining these results with an analysis of each hardware architecture, we will discuss some insights for improved use of the available cache hierarchy, potential costs/benefits of analyzing the underlying sparsity pattern of the input data as a preprocessing step, critical aspects of these hardware architectures that allow for improved performance in sparse tensor applications, and where remaining performance may still have been left on the table due to having single algorithm implementations on diverging hardware architectures.

Dr. Keita Teranishi will visit our lab on Nov 22!

Dr. Keita Teranishi will visit our lab and give a talk on Nov 22.
He is a principal member of technical staff at Sandia National Laboratories, California, USA. He received the BS and MS degrees from the University of Tennessee, Knoxville, in 1998 and 2000, respectively, and the PhD degree from The Pennsylvania State University, in 2004. His research interests are parallel programming model, fault tolerance, numerical algorithm and data analytics for high performance computing systems.
The abstract of his talk is as follows.
Abstract: Tensors have found utility in a wide range of applications, such as chemometrics, network traffic analysis, neuroscience, and signal processing. Many of these data science applications have increasingly large amounts of data to process and require high-performance methods to provide a reasonable turnaround time for analysts. Sparse tensor decomposition is a tool that allows analysts to explore a compact representation (low-rank models) of high-dimensional data sets, expose patterns that may not be apparent in the raw data, and extract useful information from the large amount of initial data. In this work, we consider decomposition of sparse count data using CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR).
Unlike the Alternating Least Square (ALS) version, CP-APR algorithm involves non-trivial constraint optimization of nonlinear and nonconvex function, which contributes to the slow adaptation to high performance computing (HPC) systems. The recent studies by Kolda et al. suggest multiple variants of CP-APR algorithms amenable to data and task parallelism together, but their parallel implementation involves several challenges due to the continuing trend toward a wide variety HPC system architecture and its programming models.
To this end, we have implemented a production-quality sparse tensor decomposition code, named SparTen, in C++ using Kokkos as a hardware abstraction layer. By using Kokkos, we have been able to develop a single code base and achieve good performance on each architecture. Additionally, SparTen is templated on several data types that allow for the use of mixed precision to allow the user to tune performance and accuracy for specific applications. In this presentation, we will use SparTen as a case study to document the performance gains, performance/accuracy tradeoffs of mixed precision in this application, development effort, and discuss the level of performance portability achieved. Performance profiling results from each of these architectures will be shared to highlight difficulties of efficiently processing sparse, unstructured data. By combining these results with an analysis of each hardware architecture, we will discuss some insights for improved use of the available cache hierarchy, potential costs/benefits of analyzing the underlying sparsity pattern of the input data as a preprocessing step, critical aspects of these hardware architectures that allow for improved performance in sparse tensor applications, and where remaining performance may still have been left on the table due to having single algorithm implementations on diverging hardware architectures.

滝沢教授,江川准教授が28thWSSPで研究成果の発表を行いました.

2018年ドイツシュトゥットガルト大学高性能計算センター(HLRS)で開催された第28回Workshop on Sustained Simulation Perfirmanceにおいて,滝沢教授,江川が研究成果の発表を行いました.
第29回のWSSPは2019年3月19日,20日の2日間仙台で開催予定です!