Prof. Takizawa gave a talk at Cyber HPC Symposium 2021

Prof. Takizawa gave a talk at Cyber HPC Symposium 2021 on 16th March.
The details of the Cyber HPC Symposium 2021, Osaka University are here.

One week has passed since the powerful magnitude 7.1 earthquake

Hi, there. This is Minglu, who felt really scared for the terrible earthquake even though it is my fifth year in Japan.
Around 13th 11pm, the powerful magnitude 7.1 earthquake jolts Tohoku area.
The next morning, Prof. Takizawa sent some photos of our lab after the earthquake via slack.

(upper photos) In the student room, books were fell off the shelf, monitors were fell off on the top of desks, and drawers were pulled out.
(lower photos) The experiment room was really chaotic. The keyboards were fell off the shelf that keycaps were coming apart, and even the server racks were moved.
It is hard that such a big earthquake happened near the graduation season, especially near the thesis’s deadline.
Fortunately, all of our servers are fine due to the anti-earthquake procedures that there is no data loss.
Take this occasion, I would like to introduce the main anti-earthquake procedures we have done so far.

(upper photos) Firstly, about our students’ room:
– The top two layers of the bookshelf have fall prevention rods that prevent books from falling off and hitting the students’ desks.
– Since the earthquake happened on a Saturday night, no one was in the lab. In case of being in the lab, we placed the helmets near everyone’s desk that students can wear them and run away.
– To prevent the fire, all electrical outlets are organized carefully.
(lower photos) Next is our experiment room.
– All shelves were fixed to the wall by furniture-wall-brackets so that shelves will not fall off to the floor.
– There are many servers in our experiment room. Also, to prevent fire, all outlets are managed carefully by numbering and calculating each current limitation.
– There are lots of equipment such as the USB-cables are stored on the shelf. The top layer of the shelf also has the anti-earthquake strap.
Besides, our lab uses both cloud storage and local NAS to store and manage all research data.
The NAS in our lab has the redundancy that all data could be repaired even one or two HDDs are broken.
The most significant damages by this earthquake are the monitors that were fell off the desk, and a few are broken. To prevent this problem, we plan to buy some anti-earthquake mats for fixing the monitors on each desk.
All in all, it is fortunate that no labmate got injured due to the earthquake.
Though one week has already passed and some aftershocks still come, our daily lives are coming back. Everyone is enjoying their campus lives.

We joined Khronos Group

Khronos Group is a non-commercial organization to manage open standards for software development such as OpenGL, OpenCL and SYCL.
Led by our lab, Tohoku University has joined Khronos Group as an academic member.
Our lab will aggressively join the discussions on standard programming models for future HPC systems.
A SYCL implementation we are developing, named neoSYCL, is introduced at Khnoros’s website.
It is also mentioned in an article at HPCwire.

Welcome A100

Hello there~ This is Minglu, a 1st-year master-degree student who still could not forget the impact when we heard our prof was planning to buy an A100 GPU!
Finally, welcome to A100! It was very shining when we were taking it out from the box.
We installed A100 into a server used for RTX 2080Ti temporarily.

 
Okay, there, I believe you must also be interested in its performance! Let’s figure it out using CIFAR-100 and CNN.
The training time for 20 and 100 epochs are showing in the left figure. Obviously, it is definitely faster than the CPU.
Though it is the same as what I expected that A100 is faster than 2080Ti for 20 epochs, when it comes to 100 epochs, A100 becomes slower instead.
To investigate the reason, we checked the execution time of each step. The results are showing in the right figure. At first few tens of epochs, A100 is fast. But it gets slowing down later.
To our surprise, the reason is because of the temperature 🙁
RTX2080Ti’s temperature was kept around 26 to 40 degrees, while A100’s temperature exceeded 80 degrees in a moment from 34 degrees. (AMAZING!!😨)

In conclusion:
Though A100’s performance is very excellent, we could not fully use it without a good cooling environment. 🙁

Ke made a presentation at HPC Asia 2021

Hello, this is Ke from Takizawa Lab.
2021 International Conference on High Performance Computing in Asia-Pacific Region was held from 20 to 22 Jan.
Due to the COVID-19, the HPC Asia 2021 was held as a virtual event.
I made a presentation in Session 2. The details of the presentation are shown as follows.

  • Ke, Yinan, Mulya Agung, and Hiroyuki Takizawa. “neoSYCL: a SYCL implementation for SX-Aurora TSUBASA.” The International Conference on High Performance Computing in Asia-Pacific Region. 2021. [Program]