Wednesday, August 27, 2008

High Performance Computing for Research

In the past, I've written about the Harvard infrastructure I oversee in support of the research community

High Performance Computing clusters are a collaboration of the research community and IT to create a shared infrastructure for the benefit of all. We've learned a great detail about how to balance central/local IT offerings and how to build highly reliable low cost research IT infrastructure. Last year we organized a summit to share lessons learned. Over 100 leaders in high performance computing (HPC) gathered at Harvard Medical School to share ideas and hear presentations from their colleagues.

The Summit was a great success with over 90% of attendees rating it as extremely valuable and that they learned something that would be of immediate use to them. And this was the point of the Summit -- to bring together the leaders to share ideas and approaches. Sharing stories and best practices (and worst practices!) allows the field to grow more rapidly and efficiently.

Biomedical Informatics is at an exciting cross roads: the computational challenges facing researchers, clinicians and public health professionals now exceed the computational power typically available in an academic biomedical setting. This is exciting because it means that the advances in high performance computing from other disciplines (e.g. physics) can be brought to bear on the great challenges of life sciences, health and medical research. The opportunities to develop new therapies, monitor trends in ambulatory hospital data and catch and avert drug related mishaps (e.g. Vioxx) are truly astounding. With the advent of the $1,000 �ome� (genotype, phenotype, labs) � the capacity to analyze and predict longitudinally and in real time as well as the ability to hypothesis test retrospectively will challenge the computational boundaries of all biomedical research organizations. Computational power is now at the very core or our ability to rapidly advance the state of clinical care and healthcare. In fact some new labs at Harvard Medical School do not even have a wet lab component, but do all of their work through "in silico" simulations and modeling. All of this leads to an exciting time for people who need to build and provide HPC infrastructure to the research community!

One unique part of the summit is the use of audience participation devices which allow the organizers and participants to poll all the attendees. Here are a few of the results of last year's audience participation surveys:

* While 43% of biomed HPC facilities are building new data centers -- but 35% are leasing commercial data center space instead!
* Around 50% of biomed HPC clusters now use some sort of parallel or distributed filesystem -- with 95% of leaders planning in production or implemented in the next two years
* Most facilities (63%) still rely on gigabit Ethernet as their primary interconnect followed by 10G ethernet (17%) and Infiniband (12%)
* Around hald of biomed HPC shops use virtualization in their production environment and when they do they use VMWare (66%) and Xen (23%)
* Platform LSF is the primary job scheduler, used in nearly half of all Biomed HPC shops -- the remainder of shops are divided amongst Sun Grid Engine, Open PBS and Univa

The Summit has four focus areas this year:

* Managing biomedical storage growth -- with many institutions seeing storage growth measured in petabytes the challenges with managing, archiving and retrieving accelerate
* Bringing HPC to the users -- many scientists need access to HPC resources but don't have the technical skills to adapt to command lines and shell scripts. New frameworks allow researchers to do common tasks through a web GUI or via automated web services.
* Connecting to the grid -- many small and medium biomedical HPC shops would like to participate in the national grid efforts, but knowing where to start and how to participate is an ongoing challenge
* Trends in Biomedical HPC -- hear from the experts about the latest trends in Biomedical HPC including the new Centers for Translational Science and the growing demand for large scale HPC services in Biomedical Research

I'd like to invite anyone interested in High Performance Computing in the Biomedical fields to join their colleagues and other leaders in this rapidly evolving field to join us at Harvard Medical School in October. Registration is open at

I hope to see you there!
Load disqus comments