Home          People          Resources          Publications          Projects
Power Aware HPC



Checkpointing and


Data Intensive

Virtual Surgery Training Systems

Computer Assisted Surgery




Accelerated Computing Lab Projects

Power Aware HPC

We are currently developing tools with the goal of making the operation of HPC hardware more energy efficient by modeling power consumption and effecting scheduling and provisioning policies.

Power Modeling:

One of the projects undertaken by our research lab is to design and implement a power model based on the monitored system events. The major goals of this project is to implement a low-overhead online monitoring tool which is responsible to gather vital system performance matrices and then perform an efficient modeling to predict instantaneous power consumption of the system at any given instance of time. The project scope includes power prediction for both type of cluster nodes: non-virtualized as well as virtualized cluster nodes. The model design is being tested over various number of server hardware architectures such as: Intel, AMD etc. This tool will assist our development of power aware scheduling policies, provisioning decision for deployment of new jobs or workload consolidations etc.

Power Aware Provisioning:

The main objective of this project at our research lab is to design and implement provisioning techniques to assist cluster provisioning node to make power aware decisions including: workload consolidation or new incoming job deployments within the cluster nodes. The primary goal of this project is to conserve the total power consumption of the cluster while maintaining the desired Quality-Of-Service.

Power Aware Scheduling Simulation:

In order to design the techniques and policies necessary for power aware provisioning, we are currently developing a scheduling simulator in which both the throughput and power performance can be evaluated for various scheduling parameters. Our goal is to enable the modeling of arbitrary virtualized and non-virtualized clusters in order to develop power-efficient scheduling policies to fit the needs of different user bases.


MPI-HMMER is an open source MPI implementation of the HMMER protein sequence analysis suite. The main search algorithms, hmmpfam and hmmsearch, have been ported to MPI in order to provide high throughput HMMER searches on modern computational clusters. We improve on HMMER through sophisticated I/O, a self-contained coordinator/worker model, and the easy inclusion of accelerated architectures. This results in better scalability while still maintaining the familiar user interface.

MPI-HMMER Features:

- Improved database chunking strategy
- Portable across any POSIX operating system
- MPI implementation independent
- Vastly reduced computation times
- Improved query throughput
- Output nearly identical to standard HMMER

All information regarding this project, along with downloads and FAQs can be found at www.mpihmmer.org.


Sequencing and protein docking are very compute-intensive tasks that see a large performance benefit by using a CUDA-enabled GPU.


GPU-HMMER implements the hmmsearch portion of the HMMER sequence analysis suite. All other tools (hmmpfam, standard hmmsearch, etc.) remain available to the user and are unmodified. The GPU portion consists of cuda_hmmsearch and its helper utility hmmsort.

The GPU-HMMER project results are also cited at NVIDIA. Information regarding this project can be found here.


Checkpointing and Fault-Tolerance

As computational clusters increase in size, their mean-time-to-failure reduces drastically. Typically, checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. We propose a scalable replication-based MPI checkpointing facility. Our reference implementation is based on LAM/MPI, however, it is directly applicable to any MPI implementation. We extend the existing state of fault-tolerant MPI with asynchronous replication, eliminating the need for central or network storage. We evaluate centralized storage, a Sun X4500-based solution, an EMC SAN, and the Ibrix commercial parallel file system and show that they are not scalable, particularly after 64 CPUs. We demonstrate the low overhead of our checkpointing and replication scheme with the NAS Parallel Benchmarks and the High Performance LINPACK benchmark with tests up to 256 nodes while demonstrating that checkpointing and replication can be achieved with much lower overhead than that provided by current techniques. Finally, we show that the monetary cost of our solution is as low as 25% of that of a typical SAN/parallel file system-equipped storage system.

Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.



Molecular Dynamics :

Molecular dynamics simulations are known to run for many days or weeks before completion. In the paper " Accelerating Molecular Dynamics Simulations with GPUs", (ISCA-PDCCS'08) we explore the use of GPUs to accelerate a Lennard- Jones-based molecular dynamics simulation of up to 27000 atoms. We demonstrate speedups that exceed 100x on commodity Nvidia GPUs and discuss the strategies that allow for such exceptional speedups. We show that traditional molecular dynamics simulations can be greatly improved from a runtime of over 1 day to 18 minutes.


In "Improving MPI-HMMERís Scalability With Parallel I/O" (IPDPS,2009), we present PIO-HMMER, an enhanced version of MPI-HMMER. PIO-HMMER improves on MPIHMMERís scalability through the use of parallel I/O and a parallel file system. In addition, we describe several enhancements, including a new load balancing scheme, enhanced post-processing, improved doublebuffering support, and asynchronous I/O for returning scores to the master node. Our enhancements to the core HMMER search tools, hmmsearch and hmmpfam, allow for scalability up to 256 nodes whereMPI-HMMER previously did not scale beyond 64 nodes. We show that our performance enhancements allow hmmsearch to achieve between 48x and 221x speedup using 256 nodes, depending on the size of the input HMM and the database. Further, we show that by integrating database caching with PIO-HMMERís hmmpfam tool we can achieve up to 328x performance using only 256 nodes.


Data Intensive Computing

Microarray Correlation:

One particular data intensive application in which we have investigated deeply is the problem of computing the correlation between all gene locations within DNA microarray (or Array Comparative Genomic Hybridization) data. The newest microarray hardware, which measures gene copy number, a strong indicator of the level of expression of a particular gene in a patient, can sample hundreds of thousands of locations within the genome. Computing, archiving, and analyzing the correlation of every gene with every other necessitates HPC resources, and requires scalability in computation, storage, and I/O. Our work in enabling this analysis represents a case study in data intensive computing on various HPC architectures. We have thoroughly compared the performance and scalability of a cluster with several file systems, a Hadoop-enabled cluster, and a Netezza Data Warehousing appliance.
The paper " Comparing the performance of Clusters, Hadoop, and Active Disks on Microarray Correlation Computations " is set to appear in HiPC 2009. We are currently expanding our experimentation into the performance with other Netezza products, and refining our testing of Hadoopís performance.


Virtual Surgery Training Systems

This project initiates research to enhance current training modalities in orthopaedic surgery by using computerized training and assessment systems. Simulations will be designed such that residents can experience surgical procedures hands-on before operating on patients. Ex-vivo training on virtual systems will familiarize residents with real-life scenarios, minimize the risk to patients and allow for competence based advancement of residents. We will provide an inexpensive, multi-use solution to complement current training methods in Orthopaedic Surgery.

This project is be a collaboration between the Dept. of Orthopaedics, and Dept. of Computer Science and Engineering. Dr. Lawrence Bone is an experienced orthopaedic surgeon, Chair of the Dept. of Orthopaedic Surgery, a full professor and the Program Director of Residency Education. Dr. Vipin Chaudhary is an associate professor with the Dept. of Computer Science and Engineering, and has experience in designing augmented surgical systems for neurological procedures, and in high performance computing. Dr. Chaudhary is the director of Computer Aided Diagnostics and Imaging, a research team with experience in computing areas that are critical to this project: high performance computing, haptics, and 3D visualization.

Virtual training systems have been designed for minimally invasive procedures like endoscopy and arthroscopic surgery. However, a realistic training environment for orthopaedic surgery is limited by the huge computing requirement (>540 TFlops) that rivals the performance of supercomputers (>$1 million). We are working to develop novel simulation algorithms and architectures which can divide the different computing tasks to dedicated processors and meet the high-speed requirements at a reasonable cost. To ensure realistic feedback of orthopaedic surgical devices, we will use haptic interfaces that provide greater degrees of freedom, high speed, and smooth response, allowing a resident to perform the entire range of motions required during surgery. The initial focus will be to develop a simulation of surgery which can be recorded for evaluation. This recording will serve as the assessment system providing a means to demonstrate proficiency prior to advancement. The goal is to enhance current training modalities in orthopaedic surgery while improving patient outcomes by providing objective measures of training assessment.

By taking advantage of the advances in computer technology, we would like to change the status of medical education while decreasing the risk to patients and ideally increasing patient outcomes. Training systems using computer simulations have achieved maturity and recognition in minimally invasive surgery, for example endoscopic gastro-intestinal procedures, or arthroscopic knee surgery.

The model will allow a surgeon to complete the operation, have a recorded copy of the procedure and if desired have an assessment completed by one or a series of raters. Our assessment tool will allow for the demonstration of mastery at all stages of the operation in one or repeated trials and evaluations. This model will allow for an outcomes assessment of different surgical techniques, errors, and new procedures or implants.


Computer Assisted Surgery

CADI has developed an image guided neurosurgery toolkit to produce optimum plans resulting in minimally invasive surgeries. The Computer Assisted Surgery (CAS) engine covers several research and engineering solutions.

Finite Element Modeling (FEM) to predict brain shift:

FEM is used to predict intraoperative brain shift during neurosurgery; the system uses a three-dimensional (3D) patient-specific finite element (FE) brain model with detailed anatomical structures using quadrilateral and hexahedral elements. Methods: A template-based algorithm was developed to build a 3D patient-specific FE brain model. The template model is a 50th percentile male FE brain model with gray and white matter, ventricles, pia mater, dura mater, falx cerebri, tentorium cerebelli, b rainstem and cerebellum. Two patient specific models were constructed to demonstrate the robustness of this method. Gravity-induced brain shift after dura opening was simulated based on one clinical case of computer assisted neurosurgery for model validation. The pre-operative MR images were updated by the FE results, and displayed as intraoperative MR images easily recognizable by surgeons. A set of algorithms for developing a 3D patient-specific FE brain model have been developed. Gravity-induced brain shift can be predicted by this model and displayed as high resolution MR images. Such strategy can be used for not only intraoperative MRI updating, but also pre-surgical planning.


We developed DICOMBox tool based on the DICOM processing algorithm in Eview project, which can view and edit the Dicom images on hand held devices. This work shows the promising future to move computing non-intensive functionalities of the CAS Engine to hand held platform. In terms of the secure access for the CAS Engine, a location based access control model is proposed as a comprehensive solution for CAS Engine to meet the HIPAA standard.


The CAS Database set up in a secure Client/Server architecture allows users to upload case information, image data, planning and annotation information. The system supports several types of navigational queries that assist a surgeon in decision making.

Identify/design and develop advanced (3D) interfaces for navigational queries :

The surgical interface will also allow users to navigate possible surgical trajectory even before entering the OR. This is accomplished using a new indexing structure developed by over the course of the CAS program. Called the target tree, this index is a variable height tree that recursively decomposes the search space around a single target point. The index allows for insertion and deletion operations to be intermixed with searches. The target point of the index is the end goal of a surgical procedure, usually a tumor that must be removed.

Augmented Reality:

We have successfully developed and implemented a prototype for Augmented Reality (AR) system to visualize invisible critical structures of brain in the real view of patient phantom.

Landmark-based Patient and Atlas Co-Registration:

The transfer of anatomical knowledge from 3D atlases to patient images via image-atlas co-registration is a very helpful tool in applications such as diagnosis, therapy planning, and simulation. However, there are anatomical differences among individual patients that make registration difficult; accurate voxel-wise fusion of different individuals is an open problem. For planning and simulation applications accuracy is essential, because any geometrical deviation may be harmful to a patient.

Landmarks-based registration is one of the most popular algorithms in atlas-based application. We have implemented landmarks based registration as our first atlas registration algorithm. Here, AC, PC, L, and R were chosen as our control points.


CADI group has worked on mainly five rigid registration algorithms and a deformable registration technique. Following are the registration techniques:

  • Multi-Resolution Mutual Information
  • Mutual Information
  • Landmark based rigid registration
  • Landmark with Mutual Information.

The concentration has been to achieve best results with minimal time take for registration or fusion of mutli-modality data.

Figure shows the registration result using algorithm “Landmark + Mutual information” and a simple image fusion.



| Director | UB Home | CSE Home|About CADI