Exploring Open Data Sets

It’s always fascinating to take a look at the data visualizations and in-depth reports widely available on the web. As an aspiring (or active) data scientist, however, one of the best things you can do to learn about a particular field is to get your own hands dirty.

You never know what kind of information you’ll find; some of our favorite visualizations cover everything from Internet usage to the Human Capital Index, and even Marvel’s Avengers! No topic is off-limits.

To get started with your own analysis, take a look at the below sites, which host open and free data sets for anyone to use.

Some general caveats for working with public data:

  • Always review the information before analyzing it; since it is publicly owned and shared, there’s a chance some of these data sets may need to be cleaned.
  • Make sure to attribute the site as your source of information, but see what you yourself can contribute to the data science field.
  • Consider the source: Is the hosting site an officially recognized agency, company, or aggregator? (Think data.gov, UNICEF, Facebook). Do other scientists use these sites as a source for their research? Is all information up to date?

Use your best judgment — the open accessibility part of “open data” is both a blessing and a curse, since it allows anyone to take a look but also invites accidental data corruption or duplication.

With these things in mind, feel free to browse the sites below to see if any particular topic catches your eye. If weather data is what intrigues you, the NOAA database might be the first place you want to look. Interested in the NFL? Pro Football Reference has got you covered. So select an interesting data set, roll up your sleeves, and see what kind of conclusions can be drawn from it using just your curiosity and the tools of data science. Who knows? This may be the project that really helps you make your mark.

United States government and demographics:

International government and demographics:

Health:

Science:

Technology and APIs:

Sports and entertainment:

General aggregation sites:

These sites play host to some truly fascinating data sets. We hope you’re able to find one that piques your interest. If you end up creating a data visualization or another condensed representation of your newfound knowledge, let us know. We just might feature your work on our blog!

Are there any key open data sites we’re missing? Let us know in the comments below!