Problem: Over million rows of data per year for the span of two decades was required for water contamination analysis by an environmental consulting firm in Santa Monica, CA. The state of Louisiana was expecting the consulting to retrieve the required data from their website at (sample
data for one location) for their work.
Solution: After a brief data analysis, I did the followings:
- Developed the required relational table structure in SQL Server 2012 to maintain the downloaded data.
- Developed Data Scraping module in ASP.net/C# using xpath to collect drinking water data.
- The downloaded data cleaned up and then uploaded in the tables.
This module was able to be utilized easily to handle the same process for downloads in the remaining 49 states.