Friday, November 13, 2009

System Fabric Works Makes Key Contributions to Sandia’s Latest Institutional Cluster Called RedSky

System Fabric Works Makes Key Contributions to Sandia’s Latest Institutional Cluster Called RedSky

4000 Node Intel® Xeon® Processor 5500 Series-based Sun Blade(TM) X6275 Server Modules with Torus QDR InfiniBand(TM)

Austin, TX (PRWEB) November 14, 2009 -- System Fabric Works (SFW) announces that during the installation and deployment of Sandia’s latest institutional cluster, known as RedSky, that SFW’s technical team made significant contributions to the routing, resiliency and scaling capabilities of the OpenFabrics OFED software which Sun Microsystems, Inc.™ and Sandia chose for integrating the compute, network and storage 40 gigabit per second InfiniBand (IB) infrastructure of the machine.

“During deployment, System Fabric Works developed extensions to the LASH routing engine in the Open Fabrics Alliance subnet manager (OpenSM) that allowed us to successfully manage RedSky’s 6x6x8 3D torus mesh InfiniBand fabric,” said John Naegle, Senior Scientist/Engineer at Sandia National Laboratories. “Now the machine is running, SFW is continuing to work with Sandia to develop a completely new routing engine for OpenSM that will provide improved features and performance compared to the existing routing engine. This will be critical for the very large IB mesh fabrics that will enable the performance and cost targets of extreme scale toroidal clusters. We are especially pleased that RedSky’s has only one high-speed interconnect, InfiniBand, and that we did not have to invest in multiple high-speed networks as most conventional Top500 machines need today.”

The Sun Blade(TM) 6048 Modular System-based RedSky includes 4000 compute nodes connected by Quad Data Rate (QDR) InfiniBand in a 3-D toroidal mesh architecture. Each Sun Blade X6275 server module compute node is based on the Intel® Xeon® processor 5500 series and is arranged in groups of 12 compute nodes feeding each vertex in the 3-D torus.

“Several aspects of RedSky mark a new pinnacle in HPC architectural development, from the 12x QDR InfiniBand Network Express technology that enables the torus fabric, thecomputational blade components and the highly efficient power distribution system to the room neutral cooling technology. These technology breakthroughs coupled with the contribution by System Fabric Works throughout the project made our solution the best possible choice for the deployment of RedSky,” said Michael Stevens, Sun Federal’s Chief Technologist for HPC. “We have been very pleased with the deployment of all of the new technologies and key among them is the routing engine development by System Fabric Works.”

“A major requirement of the Sandia procurement of RedSky was to acquire an institutional cluster that would deliver in production rapid boot, high resiliency and extreme scaling of nodes,” said Bob Pearson, CTO and CEO of SFW. “We made significant changes and improvements to OFED stack, OpenMPI and MVapich to improve the resiliency to errors and scaling behavior. OpenMPI was enhanced to enable the successful running of full scale jobs on the torus. SFW has contributed all the resiliency and scaling improvements to the OFED Linux software repository at www.openfabrics.org.”

Another significant innovation in RedSky is its fast-booting over IB and management without using a separate Ethernet. During deployment and preparation for acceptance testing SFW decreased bootime by two orders of magnitude down to aiming for less than ten minutes for all 4000 nodes including the compute and storage segments of the machine.

About System Fabric Works

System Fabric Works (“SFW”), a small business based in Austin, TX, specializes in delivering engineering, integration and strategic consulting services to organizations that seek to implement high performance computing and storage systems, low latency fabrics and the necessary related software. Derived from its 7 years of experience, SFW also offers custom integration and deployment of commodity servers and storage systems at extreme levels of performance scale and cost effectiveness that are not available from mainstream suppliers. SFW personnel are widely recognized experts in the fields of high performance computing, networking and storage systems particularly with respect to OpenFabrics Software, InfiniBand, Ethernet and energy saving, efficient computing technologies such as RDMA. Detailed information describing SFW’s areas of expertise and corporate capabilities can be found at www.systemfabricworks.com.

###



Contact Information Bill Boas

System Fabric Works

http://www.systemfabricworks.com

510-375-8840



No comments:

Post a Comment