Research, Tools & Tips

Open Data 101

By Katie Wolf, Science and Technology Librarian

What is Open Data?

Open Data, as a concept, is the idea that some data should be available to everyone, and should be free to use without restrictions – either from patents, copyright, or any other restrictions. It is a part of the larger open data movement in technology, which also includes open source software/hardware, open educational resources (check out our post on OERs!), open access content, and the open web.

Data has to meet a few specifications to be considered “open”:

  • It has to be accessible, usable, and shareable by everyone – this can mean that there are no restrictions at all for use, or that the highest restriction is an attribution or share-alike requirement. 
  • It has to be easy to access and use – this means using easy understand formats, and making sure that it’s easy to find out where the data came from. Any dataset, no matter how small, can help to contribute to the world of open data.  
  • Open data CAN be big data, but it doesn’t have to be! 
  • Open data is NOT shared data – just because data is being readily shared between private companies, doesn’t mean it’s open. 
  • Open data is NOT private data – there is a lot of data out there that contains private information. The Open Data Movement specifically excludes this type of data from it’s goals for open data. 

Other than these outlines, open data can be anything! For example, geographic and mapping data, mathematical data, chemical compounds, and text data.

Where is Open Data Used?

Open data has a lot of utility across a wide range of fields. It’s often used for public health and planning projects, data science fields benefit greatly from open data – including testing machine learning algorithms with it, and work surrounding data preservation and reproducibility relies on open data.

Projects like the Human Genome Project (which led to the Bermuda Principles) have greatly advanced efforts to make data public and shareable, and institutions worldwide have committed to making their data accessible and usable. There has been a large-scale push for government data to be made available – which has led to government data being open from the federal level all the way down to the highly localized level of Central Park data (check out the Squirrel Census for some fun).

How to Find Open Data

Open data exists at a lot of different levels, and not all data is good data. 

Some foolproof places to check are with government entities. The U.S. Census Bureau publishes their data, as does the Center for Disease Control. Federal institutions aren’t the only ones who publish data – New York City has a very active open data community. Be sure to check out ongoing projects using NYC data for some inspiration. If you’re interested in a certain area, whether it’s a country or a city, check to see if there’s open data being published – there’s a good chance there is! 

National Science institutions are also great places to search for open data, especially if they receive public funding. The NSF maintains a repository of all the data gathered using NSF funding. The Sloan Digital Sky Survey releases its data, as does CERN.

If the humanities are more your speed, museums and libraries are often at the forefront of sharing data. The Metropolitan Museum of Art makes their collection and exhibition data available to the public, as does MoMA and the Rijksmuseum (and many more!). New York Public Library has a public API that allows users to access their repository metadata from their digital collections. Looking at public institutions such as these is a safe way to get good, reliable data.

You can also check out Fordham’s Open Dataset Research Guide for a quick list of reliable open datasets.

Get Support from the Library

Open data is becoming more prominent in many fields, and the ideas behind open data (and open content in general) are becoming more important to the conversations surrounding research. Take some time to explore the world of open data, whether you need an interesting data set for a project or if you’re interested in becoming a part of the open content discussion at large!

If you have any questions about open data, please feel free to reach out to the library. Katie Wolf, the Science & Technology Librarian (kwolf9@fordham.edu) is happy to help!