Li had a presentation at PDCAT’21

Hello there. This is Li Jiahao.

22nd International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) was held from Dec. 17th to 19th. PDCAT is a major forum for scientists, engineers, and practitioners throughout the world to present the latest research, results, ideas, developments and applications in all areas of parallel and distributed computing.

I made a presentation about the performance of neoSYCL.

Jin, Mike had a presentation at HPC Asia 2022

Hi, this is M2 student Jin Yifan.

The 5th International Conference on High-Performance Computing in Asia-Pacific Region (HPC Asia 2022) was held from January 12th to 14th.

HPC Asia is an international conference series in the Asia Pacific region on HPC technologies, fostering the exchange of ideas, research results, and case studies related to all issues of high-performance computing. The 5th edition, HPC Asia 2022 was held with the motto “Stepping forward to the Post Moore Era together.”

You can find the program here.

In our laboratory, Jin had a poster presentation at poster session #122: Memory-aware Task Mapping for Heterogeneous Multi-Core Systems.

Due to the Covid-19 pandemic, the conference was held in a “fully online” format. The organizers created a Gather space for the poster session. In this special era, it is also interesting to participate in the conference in this form.

 


Hi, my name is Mike Zielewski. I am a doctoral student in Takizawa laboratory and my studies focus on quantum computing.

Last month I was fortunate enough to be able to present my work at the International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022).

One thing that I thought was challenging when preparing my presentation was that much of my audience would be unfamiliar with not only the background of my work but also quantum annealing in general. This is because quantum annealing has yet to be widely adopted for practical applications. With this in mind, I realized that I would have to take special care in making the presentation accessible to people of various backgrounds.

I hope that some of the attendees found my presentation interesting enough that they will think about using quantum annealing for their own applications.

Currently, some of the most effective ways of using quantum annealing actually combine it with traditional HPC practices. I’m looking forward to seeing more mature HPC technologies be integrated with quantum annealing to produce results that neither could achieve alone.

 

Minglu had a presentation at PAAP’21

Hello there. This is Minglu.

12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP’21) was held from Dec. 10th to 12th.
PAAP is an international conference for scientists and engineers in academia and industry to present their research results and development activities in all aspects of parallel architectures, algorithms and programming techniques.
You can find the details here. Program

In our lab., Minglu had a presentation at the track05: Big Data Processing and Deep Learning.

Assistant Professor Keichi Takahashi has joined Takizawa lab

Hi, I’m Keichi Takahashi. I’m excited to announce that I have joined Takizawa lab this December as an assistant professor.

Prior to joining Takizawa laboratory, I was an assistant professor at the Nara Institute of Science and Technology (NAIST), and conducted research on accelerating inter-node communication and storage I/O in large-scale HPC systems. Please visit my website to check out my latest list of publications and software projects.

I am looking forward to working with the students, faculty, and staff at Takizawa lab in the future!

About attending SC21

The International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC21) was held from Nov. 14th to 19th this year.

Similar to last year, Tohoku University had a booth page including posters and videos about our research.

The detailed booth information is shown here.

Although there is no presentation made by lab members this year, students in our lab still actively attended the conference and shared interesting topics or presentations in the lab seminar.

Interesting Topics
A high-performance tensor-based simulator for random quantum circuits [1]
An extension of the Message Passing Interface to enable high-performance implementations of distributed quantum algorithms [2]

A unified programming model for constraint satisfaction problems that can be mapped to both quantum circuit and annealing devices through QUBOs [3]

A scalable performance prediction toolkit for GPUs [4]
In-depth analyses of unified virtual memory system for GPU accelerated computing [5]

Hope that in the future, there will be opportunities for our lab members to present research at such top-level conferences.

Reference

  1. Liu, Yong, et al. “Closing the” quantum supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer.” Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2021.
  2. Häner, Thomas, et al. “Distributed Quantum Computing with QMPI.” arXiv preprint arXiv:2105.01109 (2021).
  3. Wilson, Ellis, Frank Mueller, and Scott Pakin. “Mapping Constraint Problems onto Quantum Gate and Annealing Devices.”
  4. Arafa, Yehia, et al. “Hybrid, scalable, trace-driven performance modeling of GPGPUs.” Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2021.
  5. Allen, Tyler, and Rong Ge. “In-depth analyses of unified virtual memory system for GPU accelerated computing.” Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2021.

The access control system is complete!

I am Shuhei Sugawara, a first year Takizawa trainee.
In light of the recent situation, we have created a system to record the arrival and departure status of students in the laboratory, and have recently started operation.
Previously, we had to write down the entry and exit times of each person on a spreadsheet, but now we can use the touch keys in the lab to automatically fill in the information, which I think is much easier and less time-consuming.
It was more difficult than I expected due to various problems and improvements before the system was up and running, but it is now working successfully thanks to the cooperation of the lab members. Thank you very much!
For now, only lab members can use it, but we are planning to improve it so that guests can also use it, so please try it out when you visit our lab!

入退室システム

The book written by Prof. Takizawa (co-author), will be released soon!

The book “Software Auto-Tuning: Code Optimization Techniques for Scientific and Technical Computing” authored by Prof. Hiroyuki Takizawa (co-author) will be released soon! The book introduces the principles and usage of each tool for software auto-tuning. We recommend this for your research or lectures to improve your work!
https://www.morikita.co.jp/index.php/books/mid/087221

Congratulations to B4 students for passing the entrance exam

The entrance exam for graduate school was conducted at the end of August.
Thank all B4 students for your hard-working.
In our lab, we have been given 3 months to prepare for this entrance exam. There are many examples of past questions from our seniors.
B4 students are now working hard for their graduation research!

We had a celebration for B4 students

Hello, this is Ishii, a 4th-year undergraduate (B4) student.

Recently, three B4 students in our lab successfully passed the entrance exam for graduate school, congratulations!
To celebrate the good news, our lab had a party on September 15th.
Although the party was held virtually due to the COVID-19, we really enjoyed it and had a great time.

After passing the exam, I am still going to do my best and get well-prepared for becoming a graduate student.

A note about attending HOT CHIPS 33 by Liu

HOT CHIPS is one of the semiconductor industry’s leading conferences on high-performance microprocessors and related integrated circuits. This year, the latest technologies and products were introduced and presented by engineers and chip designers from famous corporations and national laboratories. The conference was held virtually from August 22nd to 24th.

It had several sessions including CPUs, data processors, machine learning platforms, and etc. There were many innovations and technical improvements introduced in high-quality presentations.

This year, several top-level corporations brought their new CPUs. Here, I am going to share some topics I am interested in and my impressions.

1. Intel Alder Lake

Alder Lake is the newest generation of Intel Core processors. The core design is totally different from the past. It uses two different cores with different architectures, P-core and E-core, to achieve the performance hybrid.

Fig.1 P Core and E Core (source: Intel)

P-core delivers higher performance on single and lightly threaded scalable applications, while E-core provides better throughput on multi-threaded applications. For the scheduling, Intel uses the Thread Director to put the right workload on the right core at the right time. Based on the IPC differences between P-core and E-core, applications are classified into 4 classes. The information on energy efficiency and performance is periodically written into the EHFI table. Then the OS scheduler selects the best core allocation. The Thread Direct architecture is shown as follows.

Fig.2 Thread Director (source: Intel)

2. AMD Zen3

Compared to Zen2, the new generation achieves 19% IPC improvement, which is awesome. The figure shows major changes in Zen3.

Fig.3 Comparison between Zen3 and Zen2 (source: AMD)

The most interesting part for me is the Zen3 cache hierarchy. Although the total size of the L3 is not changed, the direct accessibility per core becomes twice, which brings the reduction in effective memory latency. The outstanding misses of L2 and L3 are also amazing.  Besides, their 3D V-cache technology makes the L3 192MB, which is a surprising capacity.

Fig.4 Zen3 Cache Hierarchy and 3D V-Cache (source: AMD)

3. IBM Telum

The new IBM Z system is quite different from Z15, especially the cache hierarchy. Each core has a private 32MB L2 cache, which is 8 times larger than the L2 cache in Z15. However, in Telum, there are no physical L3 and L4 caches, instead, IBM uses L2 to generate virtual L3 and L4 caches. This inspiring design greatly saves chip areas and latencies of L3 and L4, while still remains their functions, and further improves cache size per core. With such implementation, the system can achieve over 40% per socket performance growth.

Fig.5 IBM Telum Cache Hierarchy (source: IBM)

4. Intel Xeon Sapphire Rapids

Recently, modular architecture has become popular in processor design. This is because a smaller die size can bring better yield in the chip fabrication. Thus, Sapphire Rapids uses EMIB technology to achieve a multi-tile design. The acceleration engine in each tile is one important part. It includes data streaming, quick assist technology, and dynamic load balancer, and supports common-mode tasks offload.

Fig.6 Sapphire Rapids (source: Intel)

In addition, the shared LLC is also increased this time, and the HBM with 2 modes (flat or caching) are mentioned.

There are still many brilliant presentations that I cannot introduce all. For example, Samsung’s HBM2-PIM, the chiplet and 3D packaging, and etc. From this conference, I have learned a great number of inspiring ideas and technologies. I am impressed by their creations and efforts in achieving improvements. These new architectures and ideas show me popular trends in chip design fields and will help a lot for my own research.