Refereed Conference & Workshop Papers
Jigsaw: A High-Utilization, Interference-Free Job Scheduler for Fat-Tree Clusters
Best Paper Award
30th ACM Symposium on High-Performance Parallel Distributed Computing (HPDC), June 2021
The Case of Performance Variability on Dragonfly-based Systems
34th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS), May 2020
Mitigating Inter-Job Interference via Process-Level Quality-of-Service
IEEE Conference on Cluster Computing (CLUSTER19), September 2019
Mitigating Inter-Job Interference Using Adaptive Flow-Aware Routing
Best Student Paper Nominee
IEEE/ACM Supercomputing 2018 (SC '18), November 2018
A Study of Network Quality of Service in Many-Core MPI Applications
6th Workshop on Runtime and Operating Systems for the Many-core Era, May 2018
I/O Aware Power Shifting
30th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS), May 2016
Finding the Limits of Power-Constrained Application Performance
IEEE/ACM Supercomputing 2015 (SC '15), November 2015
Analyzing and Mitigating the Impact of Manufacturing Variability in Power-Constrained Supercomputing
IEEE/ACM Supercomputing 2015 (SC '15), November 2015
Practical Resource Management in Power-Constrained, High Performance Computing
24th ACM Symposium on High-Performance Parallel Distributed Computing (HPDC), June 2015
A Run-Time System for Power-Constrained HPC Applications
International Supercomputer Conference (ISC), July 2015
Adaptive Configuration Selection for Power-Constrained Heterogeneous Systems
43rd IEEE International Conference on Parallel Processing (ICPP), September 2014
Exploiting Redundancy for Cost-Effective, Time-Constrained Execution of HPC Applications on Amazon EC2
23rd ACM Symposium on High-Performance Parallel Distributed Computing (HPDC), June 2014
A Comparative Study of High-Performance Computing on the Cloud
22nd ACM Symposium on High-Performance Parallel Distributed Computing (HPDC), June 2013
Exploring Hardware Overprovisioning in Power-Constrained High Performance Computing
27th ACM International Conference on Supercomputing (ICS), June 2013
Comet: Decentralized Complex Event Detection in Mobile Delay Tolerant Networks
13th IEEE International Conference on Mobile Data Management, July 2012
Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound
8th Workshop on High-Performance, Power-Aware Computing, May 2012
Practical Performance Prediction Under Dynamic Voltage Frequency Scaling
2nd International Green Computing Conference, July 2011
CAEVA: A Customizable and Adaptive Event Aggregation Framework for Collaborative Broker Overlays
6th International Conference on Collaborative Computing, October 2010
Using Focused Regression for Accurate Time-Constrained Scaling of Scientific Applications
23rd IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS), April 2010
Towards Efficient Event Aggregation in a Decentralized Publish-Subscribe System
Third ACM International Conference on Distributed Event-Based Systems, July 2009
Adagio: Making DVS Practical for Complex HPC Applications
23rd International Conference on Supercomputing (ICS), June 2009
A Regression-Based Approach to Scalability Prediction
International Conference on Supercomputing (ICS), June 2008
Bounding Energy Consumption in Large-Scale MPI Programs
IEEE/ACM Supercomputing 2007 (SC '07), November 2007
Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs
IEEE/ACM Supercomputing 2006 (SC '06), November 2006
A Parallel, Out-of-Core Algorithm for RNA Secondary Structure Prediction
35th IEEE International Conference on Parallel Processing (ICPP), August 2006
STAR-MPI: Self Tuned Adaptive Routines for MPI Collective Operations
20th ACM International Conference on Supercomputing (ICS), June 2006
Minimizing Execution Time in MPI Programs on an Energy-Constrained, Power-Scalable Cluster
11th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), March 2006
Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs
IEEE/ACM Supercomputing 2005 (SC '05), November 2005
The MHETA Execution Model for Heterogeneous Clusters
IEEE/ACM Supercomputing 2005 (SC '05), November 2005
ACE: An Active, Client-Directed Technique for Reducing WNIC Energy During Web Browsing
15th ACM Workshop on Networks and Operating System Support for Digital Audio and Video (NOSSDAV), June 2005
Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster
10th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), June 2005
Exploring the Energy-Time Tradeoff in MPI Programs on a Power-Scalable Cluster
19th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS), April 2005
New Methods for Passive Estimation of Round-Trip Times Using TCP Timestamps
6th Workshop on Passive and Active Measurement (PAM), March 2005
TCP-RC: A Receiver-Centered TCP Protocol for Delay-Sensitive Applications
12th SPIE/ACM Multimedia Computing and Networking Conference (MMCN), January 2005
Dynamic, Power-Aware Scheduling for Mobile Clients Using a Transparent Proxy
33rd International Conference on Parallel Processing (ICPP), August 2004
Implicit Java Array Bounds Checking on 64-bit Architectures
18th ACM International Conference on Supercomputing (ICS), June 2004
Client-Centered Energy Savings for Concurrent HTTP Connections
14th ACM Workshop on Networks and Operating System Support for Digital Audio and Video (NOSSDAV), June 2004
Client-Centered Energy and Delay Analysis for TCP Downloads
12th IEEE International Workshop on Quality of Service (IWQoS), June 2004
Dyn-MPI: Supporting MPI on a Nondedicated Cluster of Workstations
IEEE/ACM Supercomputing 2003 (SC '03), November 2003
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters
9th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), June 2003
Efficient Support for Two-Dimensional Data Distributions in Distributed Shared Memory Systems
16th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS), April 2002
Accurate Data Redistribution Cost Estimation in Distributed Shared Memory Systems
8th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), June 2001
An Integrated Compiler/Run-Time System for Global Data Distribution in Distributed Shared Memory Systems
2nd Workshop on Software Distributed Shared Memory, May 2000
Run-Time Selection of Block Size in Pipelined Parallel Programs
13th IEEE/ACM International Parallel Processing Symposium (IPPS), April 1999
Adaptive Data Placement for Distributed-Memory Machines
10th IEEE/ACM International Parallel Processing Symposium (IPPS), April 1996
Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations
1st USENIX Symposium on Operating Systems Design and Implementation (OSDI), November 1994
Refereed Journal Papers
Protocol Customization for Improving MPI Performance on RDMA-enabled Clusters
International Journal of Parallel Programming, 41(5): 682–703 (2013)
Parallelizing Heavyweight Debugging Tools with MPIecho
Parallel Computing, 39(3): 156–166 (2013)
Adaptive, Transparent CPU Scaling Algorithms Leveraging MPI Communication Regions
Parallel Computing, 37(10–11): 667–683 (2011)
Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs
Journal of Parallel and Distributed Computing, 68(9): 1175–1185 (2008)
Analyzing the Energy-Time Tradeoff in High Performance Computing Applications
IEEE Transactions on Parallel and Distributed Systems, 18(6): 835–848, June 2007
Implicit Array Bounds Checking on 64-bit Architectures
ACM Transactions on Architecture and Code Optimization, 3(4): 502–527 (2006)
Client-Centered, Energy-Efficient Wireless Communication on IEEE 802.11b Networks
IEEE Transactions on Mobile Computing, 5(11): 1575–1590 (2006)
Dyn-MPI: Supporting MPI on a Nondedicated Cluster of Workstations
Journal of Parallel and Distributed Computing, 66(6): 822–838 (2006)
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters
Journal of Parallel and Distributed Computing, 65(10): 1123–1133 (2005)
A Comparative Analysis of Fine-Grain Threads Packages
Journal of Parallel and Distributed Computing, 63(11): 1050–1063 (2003)
HyFi: Architecture-Independent Parallelism on Networks of Multiprocessors
International Journal of Parallel and Distributed Systems and Networks, 25(4): 272–282 (2003)
Efficient Support for Pipelining in Distributed Shared Memory Systems
Parallel and Distributed Computing Practices, 4(2) (2001)
Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program
Astrophysical Journal, 134: 323–329 (2001)
Accurately Selecting Block Size at Run-Time in Pipelined Parallel Programs
International Journal of Parallel Programming, 28(3): 245–274 (2000)
Architecture-Independent Parallelism for Both Shared- and Distributed-Memory Machines Using the Filaments Package
Parallel Computing, 26: 1297–1323 (2000)
Efficient Fine-Grain Parallelism on Shared-Memory Multiprocessors
Concurrency: Practice and Experience, 10(3): 157–173 (1998)
Using Fine-Grain Threads and Run-Time Decision Making in Parallel Computing
Journal of Parallel and Distributed Computing, 37: 41–54 (1996)