Undergrad Research Project - Creating an HDF5 plugin for DeltaFS

Spring 2018

Weiqiao Chen
George Amvrosiadis
Project description

With the advent of exascale computing, we are becoming hardpressed to alleviate the performance bottleneck of I/O operations and help scientists utilize their time on supercomputers more efficiently. DeltaFS is a distributed file system that aims to do that by allowing each scientific application to carefully control the amount of resources it allocates to the file system. Using this fixed amount of resources, DeltaFS can guarantee both fast writing and reading of scientific data through its Indexed Massive Directories. Data stored in IMDs is indexed as it is written to storage, eliminating the need for expensive post-processing to sort the data for quick reading.

Scientists, however, are used to storing the data in the Hierachical Data Format (HDF5) as opposed to DeltaFS' current format. HDF5 is an open source technology tailored to managing scientific data that allows users to specify complex data relationships and dependencies, and serves as a standard format for encapsulating scientific data used across labs.

Therefore, the goal of this project is to enable DeltaFS to use the HDF5 format, helping scientists use DeltaFS in the field. To do that, we plan to design and implement a plugin for HDF5 that uses DeltaFS to store, read, and write large data sets in an efficient manner. As part of this process, we plan to analyze scientific codes in order to understand how scientists utilize HDF5, so that the adoption of our plugin becomes easier for them.

Return to project list