WDCloud: An End to End System for Large-Scale Watershed Delineation on Cloud (Big Data-Geosciences 2015)


Watershed delineation is a process to compute the drainage area for a point on the land surface, which is a critical step in hydrologic and water resources analysis. However, existing watershed delineation tools are still insufficient to support hydrologists and watershed researchers due to lack of essential capabilities such as fully leveraging scalable and high performance computing infrastructure (public cloud), and providing predictable performance for the delineation tasks. To solve these problems, this paper reports on WDCloud, which is a system for large-scale watershed delineation on public cloud. For the design and implementation of WDCloud, we employ three main approaches: 1) an automated catchment search mechanism for a public data set, 2) three performance improvement strategies (Data-reuse, parallel-union, and MapReduce), and 3) local linear regression-based execution time estimator for watershed delineation. Moreover, WDCloud extensively utilizes several compute and storage capabilities from Amazon Web Services in order to maximize the performance, scalability, and elasticity of watershed delineation system. Our evaluations on WDCloud focus on two main aspects of WDCloud; the performance improvement for watershed delineation via three strategies and the estimation accuracy for watershed delineation time by local linear regression. The evaluation results show that WDCloud can achieve 18x–111x of speed-ups for delineating any scale of watershed in the contiguous United States as compared to commodity laptop environments, and accurately predict execution time for watershed delineation with 85.6% of prediction accuracy, which is 23%–43% higher than other stateof-the-art approaches.

IEEE Big Data in the Geosciences Workshop