New Insight Into California's Drought Through Open Data
Dr. Emily read, data scientist, us geological survey, et al.
Historically unprecedented drought in California has brought water issues to the forefront of the nation’s attention. Crucial investigations that concern water policy, management, and research, in turn, require extensive information about the quality and quantity of California’s water. Unfortunately, key sources of pertinent data are unevenly distributed and frequently hard to find. Thankfully, the vital importance of integrating water data across federal, state, and tribal, academic, and private entities, has recently been recognized and addressed through federal initiatives such as the Climate Data Initiative of President Obama’s Climate Action Plan and the Advisory Committee on Water Information’sOpen Water Data Initiative. Here, we demonstrate an application of integrated open water data, visualized and made available online using open source software, for the purpose of exploring the impact of the current California drought. Our collaborative approach and technical tools enabled a rapid, distributed development process. Many positive outcomes have resulted: the application received recognition within and outside of the Federal Government, inspired others to visualize open water data, spurred new collaborations for our group, and strengthened the collaborative relationships within the team of developers. In this article, we describe the technical tools and collaborative process that enabled the success of the application.
The project was initiated as a team building activity for the cross-disciplinary Center for Integrated Data Analytics (CIDA) within the U.S. Geological Survey Office of Water Information (http://cida.usgs.gov/). An open invitation to CIDA employees brought 15 (out of 35) team members who participated on a voluntary basis and in addition to regular job responsibilities.
We set out to tell a story about water issues in the U.S. through a visualization of water data from multiple sources and disciplines. Given the collective team expertise in development of web applications and data warehouses (seehttp://cida.usgs.gov/products.html), we had a broad knowledge of water data sources, and the technical expertise to bring disparate water information together in a standardized and automated fashion. From the outset, we limited ourselves to use only publically available, web-accessible data (“open data”, hereafter), and sought to develop and release software (or “code”) openly. The visual impact of the application was also a high priority. To encourage data exploration or reveal additional information, we sought to integrate data in simple, visually appealing graphics and to optimize interactivity with the user.
Our core team was intentionally interdisciplinary, including experts in water resources, software engineering, computer science, data science, and geography. The majority of the 16 contributors were employees of the U.S. Geological Survey’s CIDA. As development progressed, additional contributors were recruited to meet specific project needs. For example, after the bulk of development was complete, an external reviewer recommended including remotely sensed reservoir data. Subsequently, a geographer joined the development team to contribute Landsat-derived images. Engagement by team members was based on individuals’ availability and project needs. For example, the system administrator set up servers at the development outset, but was then involved only periodically and as needed after that.
The website was designed to be automatically updated with pre-generated graphics and text depicting the most current observations available. The site was fully encompassed within a web-based source code repository and required no external dependencies, with the exception of the OpenStreetMap background cartography (© OpenStreetMap contributors). We used static content that is served simply by a web server without the need for complex server side scripting or database support.
The website’s content was based entirely on open data, generated using scientific computing tools in R and Python languages. We wrote a collection of scripts to automate data downloading, processing, and visualization from various sources. Datasets available in machine-readable formats (e.g., comma separated values, geospatial vector data, raster images, etc.) included: drought data from the U.S. Drought Monitor (http://droughtatlas.unl.edu), snowpack data from SNOWTEL (http://www.wcc.nrcs.usda.gov/snow/), streamflow data from USGS NWIS (http://waterdata.usgs.gov/nwis/sw), and satellite imagery from Landsat (http://earthexplorer.usgs.gov). We were unable to access machine-readable sources for reservoir storage levels, so R code was written to extract, or ‘scrape’, data from the California Data Exchange website (http://cdec.water.ca.gov/). Minimal data processing was required to prepare for visualization, but included subsetting the drought layer to California, removing stream gage sites with missing data, and normalizing reservoir storage volumes to percent of total capacity. In addition, we processed USGS/NASA Landsat-5 and -8 satellite images acquired in August 2011 and 2014 into binary water/non-water maps at Trinity Lake and Shasta Reservoir. At each site, we measured and visualized the difference in water surface area between the two dates.
Our team was inspired by a number of infographic styles, including Has Rosling's Health and Wealth of Nations figure (http://www.gapminder.org/world/#;example=75). We sought to keep the figures as simple as possible, and used interactivity as a way to reveal additional information that otherwise would have made interpretation more difficult for the user.
TIMELINE OF DEVELOpMENT
We developed the application over several months during the late summer and fall of 2014. See figure 1. All team members contributed to this effort outside of routine job responsibilities and were motivated by a visualization contest with a deadline approximately three weeks from the start of the collaboration. Two months after we entered the ‘The Vizzies’ contest, co-sponsored by the National Science Foundation and Popular Science magazine (https://www.nsf.gov/news/special_reports/scivis/index.jsp), the application was formally released, and accompanied by a U.S. Geological Survey announced a press release (http://www.usgs.gov/newsroom/article.asp?ID=4069#.VViBaNNVhBc) and Science Feature (http://www.usgs.gov/blogs/features/usgs_top_story/data-for-climate-resilience/).
Figure 1. Time series of unique website visitors per day (top panel; note broken axis) and histogram of GitHub code contributions per week (bottom panel) from September 2014 through May 2015. Significant project milestones noted for reference.
COLLABORATIVE TOOLS AND PROCESS
Our team used an iterative, collaborative process to develop the topic, theme, and components of the California Drought visualization. After several brainstorming sessions on relevant water issues, we achieved consensus on drought in the southwest U.S. as a topic. We then brainstormed interesting ways to integrate water data to convey the effect of drought in the region. We translated these ideas to storyboards, and then encoded storyboards into software.
Once software development began, we used web-based tools to assist in project management and collaboration. We used GitHub (https://github.com), a web-based code repository hosting service that relies on the Git (http://git-scm.com/) version control system, for both project management-related communication and for version control and software management. GitHub is designed specifically for collaborative software development, and we met several project needs by using Git’s distributed version control (https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control#Distributed-Version-Control-Systems). Here, we use the term ‘version’ to refer to individual contributions by a team member resulting in revisions to the group’s source repository. Revision history, tracked through the application’s GitHub repository (https://github.com/USGS-CIDA/CIDA-Viz), allows team members and the public to access current and previous versions of the software, and to revert to prior versions, if needed. A challenge of any collaborative coding exercise is managing simultaneous software edits without losing functionality. GitHub tracks such ‘conflicts’ between software versions modified by different team members, and allows contributors to review these conflicts prior to including modifications within the master copy of the software. We intentionally developed and maintained the California Drought visualization code in an open manner by allowing anyone outside of the development team to copy, or ‘fork’, the repository for other uses. Finally, because GitHub is web-based, all team members were able to contribute to the effort in real-time despite being distributed across the country.
For project management and communication, GitHub’s ‘Issue’ tracking functionality provided both within-team communication concerning project management and external, public-facing communication. Because our software repository was public, anyone with a web connection could view the repository and its contents, and anyone with a Github account could use ‘Issues’ to ask questions or suggest changes. Once identified by a team member, ‘Issues’ were used to assign, communicate about, and resolve tasks, bugs, and improvements. Likewise for external communication, ‘Issues’ were used to receive and respond to questions related to data sources or methods. An example of an external inquiry about the data analysis methods can be found athttps://github.com/USGS-CIDA/CIDA-Viz/issues/358. Analytical methods, data attribution, and other ancillary information associated with specific components of the visualization were documented in text files associated with the software repository.
The California Drought Visualized with Open Data increased the visibility of the effects of long-term drought conditions on streams and reservoirs. The visualization inspired other open water data visualizations (e.g., Harvard University computer science studentshttp://vliuatenphasedotcom.github.io/Process_Book_Steineman_Liu.pdf) and perhaps most importantly, demonstrated the power of bringing together disparate water datasets and disciplinary skills. Tens of thousands of unique visitors viewed the website, more than 10 new GitHub ‘forks’ were created, and the website was mentioned on scientific (The GIS Lounge (http://www.gislounge.com/mapping-california-drought-open-data/), federal (e.g., NASA Landsat: http://landsat.gsfc.nasa.gov/?p=9413), and public interest articles and blogs (e.g., Landscape Architecture Magazine:http://landscapearchitecturemagazine.org/tag/california-drought/, and California State Library:http://www.library.ca.gov/sitn/crb/docs/20141217.pdf). In March of 2015, the application was recognized as a finalist entry in the international visualization competition, the Vizzies(https://review.wizehive.com/voting/nsfvizziesgallery/27428). The project catalyzed new collaborations for our team--several of us are now working on another interagency open water data collaboration related to drought and water use in the Lower Colorado River Basin. We demonstrated the importance of the Federal Open Data Initiative, expanded professional connections, and increased collaborative bonds within the core development team.
Ensuring access to clean and ample freshwater is one of the most pressing environmental challenges California faces in the 21st century. The ability to collect, aggregate, and synthesize disparate water information is essential to conserving, protecting, and remediating water resources. Free and open data and open-access code supports scientific transparency, allows reproducibility, and encourages public engagement. This project shows the value of using open data and open-access code to gain insight into California’s drought, and provides a framework for future collaborative research and visualization work. Without the openness of the data and code, many of the positive outcomes of this project might not have been achieved. The technological tools now exist to make software and data easily open and accessible. We suggest a similarly open and collaborative framework for future visualization work because of the many benefits to the community that result.
Please check out the Summer 2015 BAAMA Journal map gallery for snapshots of the drought maps.
Emily Read, Mary Bucknell, Megan Hines, Jim Kreft, Jessica Lucido, Jordan Read, Carl Schroedl, Dave Sibley, Shirley Stephan, Ivan Suftin, Phethala Thongsavanh, Jamon Van Den Hoek, Jordan Walker, Marty Wernimont, Luke Winslow, and Andrew Yan
Summer 2015 Volume 8 Issue 1