Selected Publications

While early public cloud IoT success stories have focused on smaller-scale scenarios such as connected houses, it is unclear to what extent these new public cloud mechanisms and abstractions are suitable and effective for larger-scale and/or scientific scenarios, which often have a different set of constraints or requirements. This paper addresses the challenge of implementing a scalable IoT infrastructure testbed in the public cloud for scientific experimentation. The system created is for dynamic vehicle traffic control based on vehicle volumes/patterns and weather conditions. We find that while AWS IoT performance and performance scalability are likely to meet the requirements of many next-generation scientific IoT use-cases, manageability/modifications of a scientific IoT scenario can be challenging for moderate- to large-scale deployments.
In 9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2016), December 6-9, 2016 – Tongji University – Shanghai, China

It is extremely difficult for cloud users to know which – if any – existing workload predictor will work best for their particular cloud activity, especially when considering highly variable workload patterns, non-trivial billing models, variety of resources to add/subtract, etc. To solve this problem, we conduct comprehensive evaluations for a variety of workload predictors in real-world cloud configurations.
In 8th IEEE International Conference on Cloud Computing (Cloud 2016), June 27 - July 2, 2016, San Francisco, USA

While it is natural and desirable to assume that new VMs start up quickly and predictably in IaaS clouds, we found that this is not always the case. In this paper, we present the results from studying the startup time of cloud VMs across three real-world cloud providers -- Amazon EC2, Windows Azure and Rackspace. We analyzed the relationship between the VM startup time and different factors such as time of the day, OS image size, instance type, data center location and the number of instances acquired at the same time.
In 5th IEEE International Conference on Cloud Computing (Cloud 2012), June 24-29, 2012, Honolulu, Hawaii

We present an approach whereby the basic computing elements are virtual machines (VMs) of various sizes/costs, jobs are specified as workflows, users specify performance requirements by assigning (soft) deadlines to jobs, and the goal is to ensure all jobs are finished within their deadlines at minimum financial cost. We accomplish our goal by dynamically allocating/deallocating VMs and scheduling tasks on the most cost-efficient instances. We evaluate our approach in four representative cloud workload patterns and show cost savings from 9.8% to 40.4% compared to other approaches.
Proceedings of 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing 2011), Seattle, WA, Nov 15-20, 2011.

We present a storage abstraction layer to enable applications to both utilize the highly-available and scalable storage services provided by cloud vendors and be portable across platforms. The abstraction layer, called CSAL, provides Blob, Table, and Queue abstractions across multiple providers and presents applications with an integrated namespace thereby relieving applications of having to manage storage entity location information and access credentials.
Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, November 30-December 3, 2010 . Indianapolis, Indiana.

How does EC2 compare with smaller departmental and lab-sized commodity clusters that are often the primary computational resource for scientists? To answer that question, MPI and memory bandwidth benchmarks are executed on EC2 clusters with each of the 64-bit instance types to compare the performance of a 16 node cluster of each to a dedicated locally-owned commodity cluster based on Gigabit Ethernet. The analysis of results shows that while EC2 does experience reduced performance, it is still viable for smaller-scale applications.
Proceedings of the 10th IEEE/ACM International Conference on Grid Computing (Grid 2009). Oct 13-15 2009. Banff, Alberta, Canada.

The emerging class of adaptive, real-time, data-driven applications are a significant problem for today's HPC systems. In general, it is extremely difficult for queuing-system-controlled HPC resources to make and guarantee a tightly-bounded prediction regarding the time at which a newly-submitted application will execute. While a reservation-based approach partially addresses the problem, it can create severe resource under-utilization (unused reservations, necessary scheduled idle slots, underutilized reservations, etc.) that resource providers are eager to avoid. In contrast, this paper presents a fundamentally different approach to guarantee predictable execution. By creating a virtualized application layer called the performance container, and opportunistically multiplexing concurrent performance containers through the application of formal feedback control theory, we regulate the job's progress such that the job meets its deadline without requiring exclusive access to resources even in the presence of a wide class of unexpected disturbances. Our evaluation using two widely-used applications, WRF and BLAST, on an 8-core server show our approach is predictable and meets deadlines with 3.4 % of errors on average while achieving high overall utilization.
Proceedings of 2008 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing 2008), Austin, TX, Nov 15-21, 2008.

Recent Publications

  • Experiences Creating a Framework for Smart Traffic Control using AWS IOT (UCC 2016)

    Details PDF

  • Empirical Evaluation of Cloud Workload Forecasting Techniques (Cloud 2016)

    Details PDF Project

  • WDCloud: An End to End System for Large-Scale Watershed Delineation on Cloud (Big Data-Geosciences 2015)

    Details PDF

  • PICS: A Public IaaS Cloud Simulator (Cloud 2015)

    Details PDF

  • Toward Optimal Resource Provisioning for Cloud MapReduce and Hybrid Cloud Applications (Cloud 2015)

    Details PDF

  • Calibration of SWAT models using the Cloud (J. Env Model/SW 2014)

    Details PDF

  • Comprehensive Elastic Resource Management to Ensure Predictable Performance for Scientific Applications on Public IaaS Clouds (UCC 2014)

    Details PDF

  • CloudDRN: A Lightweight, End-to-End System for Sharing Distributed Research Data in the Cloud (eScience 2013)

    Details PDF

  • Scaling and Scheduling to Maximize Application Performance within Budget Constraints in Cloud Workflows (IPDPS 2013)

    Details PDF

  • Calibration of Watershed Models using Cloud Computing (eScience 2012)

    Details PDF

  • A Performance Study on the VM Startup Time in the Cloud (Cloud 2012)

    Details PDF Project

  • A Model and Decision Procedure for Data Storage in Cloud Computing (CCGrid 2012)

    Details PDF

  • Assessing the Value of Cloudbursting: A Case Study of Satellite Image Processing on Windows Azure (eScience 2011)

    Details PDF

  • Auto-scaling to minimize cost and meet application deadlines in cloud workflows (Supercomputing 2011)

    Details PDF

  • Data-intensive science: The Terapixel and MODISAzure projects (Int J HPCA 2011)

    Details PDF

  • An automated approach to cloud storage service selection (Science Cloud 2011)

    Details PDF

  • Predictable high-performance computing using feedback control and admission control (J IEEE Trans Par Dis 2011)

    Details PDF

  • Early observations on the performance of Windows Azure (J Sci Prog 2011)

    Details PDF

  • Fault tolerance and scaling in e-Science cloud applications: observations from the continuing development of MODISAzure (eScience 2010)

    Details PDF

  • CSAL: A Cloud Storage Abstraction Layer to Enable Portable Cloud Applications (CloudCom 2010)

    Details PDF

  • Cloud auto-scaling with deadline and budget constraints (Grid 2010)

    Details PDF

  • A data-centered collaboration portal to support global carbon-flux analysis (J Con Comp:Prac Exp 2010)

    Details PDF

  • eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform (IPDPS 2010)

    Details PDF

  • Fluxdata.org: Publication and Curation of Shared Scientific Climate and Earth Sciences Data (eScience 2009)

    Details PDF

  • A Quantitative Analysis of High Performance Computing with Amazon’s EC2 Infrastructure: The Death of the Local Cluster? (Grid 2009)

    Details PDF

  • Self-Tuning Virtual Machines for Predictable eScience (CCGrid 2009)

    Details PDF

  • Feedback-controlled resource sharing for predictable eScience (Supercomputing 2008)

    Details PDF

Projects

  • AmeriFlux

    The AmeriFlux network is a community of sites and scientists measuring ecosystem carbon, water, and energy fluxes across the Americas, and committed to producing and sharing high quality eddy covariance data. AmeriFlux investigators and modelers work together to generate understanding of terrestrial ecosystems in a changing world. We are part of the data team, which is led by Deb Agarwal of LBL.

  • Cloud Auto-scaling

    Public cloud computing has almost all of the capabilities one could ask for; how can we automatically choose/provision/manage the cloud resources to match our requirements?

  • Fluxnet

    Today, the eddy covariance flux measurements of carbon, water vapor, energy exchange are being made routinely across a confederation of regional networks in North, Central and South America, Europe, Asia, Africa, and Australia, in a global network, called FLUXNET. We are part of the infrastructure team, which is led by Dennis Baldocchi of UC-Berkeley.

Teaching

I teach the following courses at the University of Virginia:

  • CS6501: Cloud Computing (graduate seminar) (Fall 2017)
  • CS4740: Cloud Computing (Spring 2017)
  • CS4740: Cloud Computing (Fall 2016)
  • CS4740: Cloud Computing (Spring 2016)
  • CS 6501: Cloud and Big Data (graduate seminar) (Fall 2015)
  • CS 4740: Cloud Computing (Spring 2015)
  • CS 6456: Operating Systems (Fall 2014)

Contact