Accelerated Computing Lab Projects
Power Aware HPC
We are currently developing tools with the goal of making the
operation of HPC hardware more energy efficient by modeling power consumption and effecting scheduling
and provisioning policies.
One of the projects undertaken by our research lab is to design and implement a power model
based on the monitored system events. The major goals of this project is to implement a low-overhead
online monitoring tool which is responsible to gather vital system performance matrices and then
perform an efficient modeling to predict instantaneous power consumption of the system at any given
instance of time. The project scope includes power prediction for both type of cluster nodes:
non-virtualized as well as virtualized cluster nodes. The model design is being tested over various
number of server hardware architectures such as: Intel, AMD etc. This tool will assist our development
of power aware scheduling policies, provisioning decision for deployment of new jobs or workload
Power Aware Provisioning:
The main objective of this project at our research lab is to design and implement provisioning techniques
to assist cluster provisioning node to make power aware decisions including: workload consolidation or new
incoming job deployments within the cluster nodes. The primary goal of this project is to conserve the total
power consumption of the cluster while maintaining the desired Quality-Of-Service.
Power Aware Scheduling Simulation:
In order to design the techniques and policies necessary for power aware provisioning, we are currently
developing a scheduling simulator in which both the throughput and power performance can be evaluated for
various scheduling parameters. Our goal is to enable the modeling of arbitrary virtualized and
non-virtualized clusters in order to develop power-efficient scheduling policies to fit the needs of different user bases.
MPI-HMMER is an open source MPI implementation
of the HMMER protein sequence analysis suite. The main search algorithms, hmmpfam
and hmmsearch, have been ported to MPI in order to provide high throughput HMMER
searches on modern computational clusters. We improve on HMMER through sophisticated
I/O, a self-contained coordinator/worker model, and the easy inclusion of accelerated
architectures. This results in better scalability while still maintaining the familiar
- Improved database chunking strategy
- Portable across any POSIX operating system
- MPI implementation independent
- Vastly reduced computation times
- Improved query throughput
- Output nearly identical to standard HMMER
All information regarding this project, along with downloads and FAQs can be found at
Sequencing and protein docking are very compute-intensive tasks
that see a large performance benefit by using a CUDA-enabled GPU.
GPU-HMMER implements the hmmsearch portion of the
HMMER sequence analysis suite. All other tools (hmmpfam, standard hmmsearch, etc.)
remain available to the user and are unmodified. The GPU portion consists of cuda_hmmsearch
and its helper utility hmmsort.
The GPU-HMMER project results are also cited at
NVIDIA. Information regarding this project can be found
Checkpointing and Fault-Tolerance
As computational clusters increase in size,
their mean-time-to-failure reduces drastically. Typically, checkpointing is used to
minimize the loss of computation. Most checkpointing techniques, however, require
central storage for storing checkpoints. This results
in a bottleneck and severely limits the scalability of checkpointing, while also
proving to be too expensive for dedicated checkpointing
networks and storage systems. We propose a scalable replication-based MPI
checkpointing facility. Our reference implementation is based on LAM/MPI, however,
it is directly applicable to any MPI implementation. We extend the existing
state of fault-tolerant MPI with asynchronous replication,
eliminating the need for central or network storage. We evaluate centralized storage, a Sun X4500-based solution, an EMC SAN,
and the Ibrix commercial parallel file system and show that they are not scalable, particularly after 64 CPUs. We demonstrate the
low overhead of our checkpointing and replication scheme with the NAS Parallel Benchmarks and the High Performance LINPACK
benchmark with tests up to 256 nodes while demonstrating that checkpointing and replication can be achieved with much lower
overhead than that provided by current techniques. Finally, we show that the monetary cost of our solution is as low as 25% of that of
a typical SAN/parallel file system-equipped storage system.
Virtualization is a common strategy for improving the utilization of existing
computing resources, particularly within data centers. However, its use for high
performance computing (HPC) applications is currently limited despite its potential
for both improving resource utilization as well as providing resource guarantees to
its users. In this article we systematically evaluate three major virtual machine implementations
for computationally intensive HPC applications using various standard
benchmarks. Using VMWare Server, Xen, and OpenVZ we examine the suitability
of full virtualization (VMWare), paravirtualization (Xen), and operating system-level
virtualization (OpenVZ) in terms of network utilization, SMP performance, file system
performance, and MPI scalability. We show that the operating system-level virtualization
provided by OpenVZ provides the best overall performance, particularly
for MPI scalability. With the knowledge gained by our VM evaluation, we extend
OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual
server distributed computing.
GPGPU and PS3
Molecular Dynamics :
Molecular dynamics simulations are known to run
for many days or weeks before completion. In the paper
Accelerating Molecular Dynamics Simulations with GPUs",
(ISCA-PDCCS'08) we explore the use of GPUs to accelerate a Lennard-
Jones-based molecular dynamics simulation of up to 27000
atoms. We demonstrate speedups that exceed 100x on commodity
Nvidia GPUs and discuss the strategies that allow for such
exceptional speedups. We show that traditional molecular
dynamics simulations can be greatly improved from a runtime
of over 1 day to 18 minutes.
In "Improving MPI-HMMERís Scalability With Parallel I/O" (IPDPS,2009),
we present PIO-HMMER, an enhanced version
of MPI-HMMER. PIO-HMMER improves on MPIHMMERís
scalability through the use of parallel I/O
and a parallel file system. In addition, we describe
several enhancements, including a new load balancing
scheme, enhanced post-processing, improved doublebuffering
support, and asynchronous I/O for returning
scores to the master node. Our enhancements to the core
HMMER search tools, hmmsearch and hmmpfam, allow
for scalability up to 256 nodes whereMPI-HMMER previously
did not scale beyond 64 nodes. We show that our
performance enhancements allow hmmsearch to achieve
between 48x and 221x speedup using 256 nodes, depending
on the size of the input HMM and the database.
Further, we show that by integrating database caching
with PIO-HMMERís hmmpfam tool we can achieve up
to 328x performance using only 256 nodes.
Data Intensive Computing
One particular data intensive application in which we have investigated deeply is the problem
of computing the correlation between all gene locations within DNA microarray (or Array
Comparative Genomic Hybridization) data. The newest microarray hardware, which measures gene
copy number, a strong indicator of the level of expression of a particular gene in a patient,
can sample hundreds of thousands of locations within the genome. Computing, archiving, and analyzing
the correlation of every gene with every other necessitates HPC resources, and requires scalability
in computation, storage, and I/O. Our work in enabling this analysis represents a case study in data
intensive computing on various HPC architectures. We have thoroughly compared the performance and
scalability of a cluster with several file systems, a
Hadoop-enabled cluster, and a
The paper "
Comparing the performance of Clusters, Hadoop, and
Active Disks on Microarray Correlation Computations " is set to appear in HiPC 2009.
We are currently expanding our experimentation into the performance with
other Netezza products, and refining our testing of Hadoopís performance.
Virtual Surgery Training Systems
This project initiates research to enhance current training modalities in
orthopaedic surgery by using computerized training and assessment systems.
Simulations will be designed such that residents can experience surgical
procedures hands-on before operating on patients. Ex-vivo training on
virtual systems will familiarize residents with real-life scenarios,
minimize the risk to patients and allow for competence based advancement
of residents. We will provide an inexpensive, multi-use solution to complement
current training methods in Orthopaedic Surgery.
This project is be a collaboration between the Dept. of Orthopaedics,
and Dept. of Computer Science and Engineering. Dr. Lawrence Bone is an experienced
orthopaedic surgeon, Chair of the Dept. of Orthopaedic Surgery, a full professor
and the Program Director of Residency Education. Dr. Vipin Chaudhary is an
associate professor with the Dept. of Computer Science and Engineering, and has
experience in designing augmented surgical systems for neurological procedures,
and in high performance computing. Dr. Chaudhary is the director of Computer Aided
Diagnostics and Imaging, a research team with experience in computing areas that
are critical to this project: high performance computing, haptics, and 3D visualization.
Virtual training systems have been designed for minimally invasive
procedures like endoscopy and arthroscopic surgery. However, a realistic training
environment for orthopaedic surgery is limited by the huge computing requirement
(>540 TFlops) that rivals the performance of supercomputers (>$1 million).
We are working to develop novel simulation algorithms and architectures which
can divide the different computing tasks to dedicated processors and meet the
high-speed requirements at a reasonable cost. To ensure realistic feedback of
orthopaedic surgical devices, we will use haptic interfaces that provide greater
degrees of freedom, high speed, and smooth response, allowing a resident to perform
the entire range of motions required during surgery. The initial focus will be to
develop a simulation of surgery which can be recorded for evaluation. This recording
will serve as the assessment system providing a means to demonstrate proficiency
prior to advancement. The goal is to enhance current training modalities in orthopaedic
surgery while improving patient outcomes by providing objective measures of training
By taking advantage of the advances in computer technology, we would like
to change the status of medical education while decreasing the risk to patients
and ideally increasing patient outcomes. Training systems using computer simulations
have achieved maturity and recognition in minimally invasive surgery, for example
endoscopic gastro-intestinal procedures, or arthroscopic knee surgery.
The model will allow a surgeon to complete the operation, have a recorded
copy of the procedure and if desired have an assessment completed by one or a series
of raters. Our assessment tool will allow for the demonstration of mastery at all
stages of the operation in one or repeated trials and evaluations. This model will
allow for an outcomes assessment of different surgical techniques, errors, and new
procedures or implants.
Computer Assisted Surgery
CADI has developed an image guided neurosurgery
toolkit to produce optimum plans resulting in minimally invasive surgeries.
The Computer Assisted Surgery (CAS) engine covers several research and engineering
Finite Element Modeling (FEM) to predict brain shift:
FEM is used to predict intraoperative brain shift during neurosurgery;
the system uses a three-dimensional (3D) patient-specific finite element (FE)
brain model with detailed anatomical structures using quadrilateral and hexahedral elements.
Methods: A template-based algorithm was developed to build a 3D patient-specific FE
brain model. The template model is a 50th percentile male FE brain model with gray
and white matter, ventricles, pia mater, dura mater, falx cerebri, tentorium cerebelli, b
rainstem and cerebellum. Two patient specific models were constructed to demonstrate the robustness of this method. Gravity-induced brain shift after dura opening was simulated based on one clinical case of computer assisted neurosurgery for model validation. The pre-operative MR images were updated by the FE results, and displayed as intraoperative MR images easily recognizable by surgeons.
A set of algorithms for developing a 3D patient-specific FE brain model have been developed.
Gravity-induced brain shift can be predicted by this model and displayed as high
resolution MR images. Such strategy can be used for not only intraoperative MRI
updating, but also pre-surgical planning.
We developed DICOMBox tool based on the DICOM processing algorithm in Eview
project, which can view and edit the Dicom images on hand held devices.
This work shows the promising future to move computing non-intensive functionalities
of the CAS Engine to hand held platform. In terms of the secure access for the CAS
Engine, a location based access control model is proposed as a comprehensive
solution for CAS Engine to meet the HIPAA standard.
The CAS Database set up in a secure Client/Server architecture allows users
to upload case information, image data, planning and annotation information.
The system supports several types of navigational queries that assist a surgeon
in decision making.
Identify/design and develop advanced (3D) interfaces
for navigational queries :
The surgical interface will also allow users to navigate possible surgical
trajectory even before entering the OR. This is accomplished using a new
indexing structure developed by over the course of the CAS program. Called
the target tree, this index is a variable height tree that recursively
decomposes the search space around a single target point. The index allows
for insertion and deletion operations to be intermixed with searches.
The target point of the index is the end goal of a surgical procedure,
usually a tumor that must be removed.
We have successfully developed and implemented a prototype for Augmented
Reality (AR) system to visualize invisible critical structures of brain in
the real view of patient phantom.
Landmark-based Patient and Atlas Co-Registration:
The transfer of anatomical knowledge from 3D atlases to patient images via
image-atlas co-registration is a very helpful tool in applications such as
diagnosis, therapy planning, and simulation. However, there are anatomical
differences among individual patients that make registration difficult;
accurate voxel-wise fusion of different individuals is an open problem.
For planning and simulation applications accuracy is essential, because
any geometrical deviation may be harmful to a patient.
Landmarks-based registration is one of the most popular algorithms
in atlas-based application. We have implemented landmarks based registration
as our first atlas registration algorithm. Here, AC, PC, L, and R were chosen
as our control points.
CADI group has worked on mainly five rigid
registration algorithms and a deformable registration
technique. Following are the registration techniques:
- Multi-Resolution Mutual Information
- Mutual Information
- Landmark based rigid registration
- Landmark with Mutual Information.
The concentration has been to achieve best
results with minimal time take for registration or fusion of
Figure shows the registration result using
algorithm “Landmark + Mutual information” and a simple image