HPC Parallel storage solutions – Harrington HPC Microsystems

The advent of more powerful compute systems has increased the capacity to generate data at a fantastic rate. To solve associated issues of data management, a combination of grid technology and other storage components are currently being deployed. Many solutions have been designed to address these petabyte-scale data management problems, including new software, NAS/NFS products and parallel storage solutions from Sancluster, IBM, Intel and others. This involves handling and storing very large data sets accessed simultaneously by thousands of compute clients.

In HPC there is a strong demand for parallel storage from users in the fields of computational physics, CFD, crash analysis, climate modelling, oceanography, seismic processing and interpretation, bioinformatics, cosmology, computational chemistry and materials sciences. The parallel storage requirement is being driven by the growing size of data sets, more complex analysis, the requirement to run more jobs, simulations with more iterations and the fact that the HPC solutions (Linux clusters) are using multicore processors and more nodes. Inherently the systems and applications are becoming more parallel hence the requirement for parallel I/O increases.

To deliver a “best-in-class” solution, the compute server and data handling are decoupled. They are highly complementary, but need to be scaled together for balance to handle several petabytes of active data. Although data patterns vary, the system needs to be designed from the ground up for multiple petabyte capability and several millions, or even billions, of files. It is therefore imperative that the data handling systems scale and the network bandwidth does not become a bottleneck.

In the rich digital content environment of today, the limitations of traditional NAS/SAN storage — scalability, performance bottlenecks and cost — are driving the industry to find new solutions. The response from the industry was the clustered storage evolution. Vendors claim that clustered NFS storage provides customers with enormous benefits in this digital content environment. The benefits include massive scalability, 100X larger file system, unmatched performance, 20X higher total throughput and industry-leading reliability. They also claim it is as easy to manage a 10-petabyte file system as a 1-terabyte file system. Clustered NFS solutions are fine for most large Web sites, but they simply don’t handle the kind of large files typical of most HPC applications very well.

A typical cluster computing architecture consists of a software stack of applications and middle ware, tens or thousands of processors/clients, a high speed interconnect using, say, 10GigE and InfiniBand , thousands of direct network connections and hundreds of connections to physical storage.

Storage clusters, similar to compute clusters, transparently aggregate a large amount of independent storage nodes in order to appear as a single-entity. They typically use the same network technology as the compute cluster (InfiniBand or 10GigE), processing power (CPU, multicore, SMP), large amounts of globally coherent cache, and disk drives (up to 1 TB each).

A cluster file system creates one giant drive mounted fully symmetric cluster. Such a system is massively scalable to multiple petabytes, easy to manage and has plenty of growth potential. The management of LUNs, volumes or RAID is taken care of by the storage cluster management system and is normally hidden from the user.

The future of HPC is tied to larger data sets, more CPUs applied to each problem, and a requirement for parallel storage. Today’s high density 1U servers have increased the number of processing cores per node, but I/O bandwidth has not evolved at the same rate. The reality is that the number of cores per node is still increasing, however scientific and technical analysis requires a system that balances compute cores and I/O bandwidth.

With this increase in compute nodes, traditional single-server NFS solutions have quickly become a bottleneck. A first approach to solve this problem came in the form of clustered NFS. This however is falling short of HPC requirements. Major HPC sites are therefore not significantly deploying clustered NFS, but are rather moving directly from NFS to parallel storage (SanCluster and IBM GPFS are twp popular options).

Government and academia users are already heavily deploying parallel storage and this is likely to become a requirement for all simulation and modelling applications deployed on clusters. Simply put, parallel compute clusters require parallel storage!

“What is driving the need for ‘parallel storage’ in HPC is the combination of multiple factors: 1) Explosion of data sets due to the need to run large and more accurate models. 2) The massive use of x86 clusters and multicore CPUs, where users are applying 100s and 1000s of CPUs to simulation and modelling problems. 3) Currently deployed I/O and file systems based on NFS, and even clustered NFS, cannot handle the I/O requirements,” continued Rosenthal.

These standards effort is admirable and should be supported across the storage industry. Potential benefits for users include improved sustained performance, accelerated time to results (solution) and parallel storage capability with standard highly reliable performance. It offers more choice of parallel I/O capabilities from multiple storage vendors, freedom to access parallel storage from any client, as well as mix and match best of breed of vendor offerings. It also contains lower risk for the user community, since client constructs are tested and optimized by the operating system of vendors whilst the customer is free from vendor lock-in concerns. In short, it extends the benefits of the investment in storage systems.

In summary, vendors and users are recognizing that the future of high-end file storage is parallel. The early adopters like government and academia have adopted it, but anyone in the HPC space who is building clusters with 100s of CPU-cores and generating terabytes of data will require parallel storage.

Customer can rely on our experience and knowledge about Parallel storage design and deployment to decrease their HPC systems complexity and costs.