Over the past 3 months, as part of Google Summer of Code 2017 (GSoC), I spent my time working with the Environmental Data & Governance Initiative (EDGI). EDGI is a network of scientists, professors, and other non-profits that came together in response to the Trump administration and the threat they posed to several environmental agencies and public resources. EDGI essentially took that threat as a call-to-action and since then has been working diligently as a team on developing tools and resources to help preserve environmental data.
Over the summer, I became a part of that team.
My project focused on data visualization. I was tasked with creating interactive graphs and models that would help users better visualize the data they were scraping/archiving off the internet. Using D3, I helped create a Coverage Map and a DataRescue Map.
EDGI has been archiving data from public resources since December of last year. Previously, they had been using a tree to show coverage with each node delinating url pathing and then file structure. While this was fine at the time, as the data count went from tens to hundreds of thousands, it became messy. There was a disconnect between the information conveyed in the tree and what a user wanted to see.
My first couple of weeks were spent prototyping different models and obtaining feedback. I went through several variations of Bilevel Partitions, Icicle Trees, and whatever this thing is before finally settling on a Sunburst Diagram.
Sunburst Diagrams use concentric rings with divided arcs to show groups/categories with each layer denoting hierarchy. Digging further, I found a variation on the diagram called a Sequence Sunburst by Kerry Rodden in D3 which became a solid base to make the coverage map.
After getting it to correctly convert/process a sample JSON dump of the coverage data, I started polishing the model and creating a standalone ReactJS component. There were several issues that came up during this time that slowed down progress including:
While web monitoring and archiving are a large part of EDGI’s focus, the organization also hosts several DataRescue events around the country with the goal of identifying and preserving different datasets. I always imagined it as a hackathon for archiving data.
Anyway, noticing that there was no effective way to view past events, I created a projection map of the United States highlighting interactive points across the country where the events took place. It was also directly integrated with the EDGI Airtable so the map updates live as new events come up. Working on the model came with it’s fair share of obstacles as well including: