Skip to Main Content

Data Science: Find Datasets

Resources for scientific systems, methods, and processes for data scientists.

General

sample text of UFO dataset

How to cite a dataset

APA

O’Donohue, W. (2017). Content analysis of undergraduate psychology textbooks (ICPSR 21600; Version V1) [Data set]. ICPSR. https://doi.org/10.3886/ICPSR36966.v1

Parenthetical citation: (O’Donohue, 2017)
Narrative citation: O’Donohue (2017)

Learn more at the APA site.

Chicago Style

Does not discuss datasets but does have statistics.
https://libguides.webster.edu/data/chicago

MLA

Does not discuss datasets or statistics. 

Learn more at “Quick Guide to Data Citation

APIs

API = Application Programming Interface

 software intermediary that allows two applications to talk to each other

 

Business / Economics

Education

Evaluating Data

Factors to Consider When Evaluating Statistics

Source

Who collected it?
Was it an individual or organization or agency? 
The data source and the reporter or citer are not always the same. For example, advocacy organizations often publish data that were produced by some other organization. When feasible, it is best to go to the original source (or at least know and evaluate the source).
If the data are repackaged, is there proper documentation to lead you to the primary source? Would it be useful to get more information from the primary source? Could there be anything missing from the secondary version?

Authority

How widely known or cited is the producer? Who else uses these data?
Is the measure or producer contested?
What are the credentials of the data producer?
If an individual, are they an expert on the subject?
If an individual, what organizations are they associated with? Could that association affect the work?

Objectivity & Purpose

Who sponsored the production of these data?
What was the purpose of the collection/study?

Who was the intended audience for or users of the data?

Was it collected as part of the mission of an organization? Or for advocacy? Or for business purposes?

Currency

When were the data collected? Not always close to when they were released or published -- there is often a time lag between collection and reporting because of the time required to analyze the data.
Are these the newest figures? Sometimes the newest available figures are a few years old. That is okay, as long as you can verify that there isn't something newer.

Collection Methods & Completeness

How are the data collected? Count, measurement or estimation?
Even a reputable source and collection method can introduce bias. Crime data come from many sources, from victim reports to arrest records.

If a survey, what was the total population -- how does that compare to the size of the population it is supposed to represent?

If a survey, what methods used to select the population included, how was the total population sampled?

If a survey, what was the response rate?

What populations included? Excluded?

Consistency & Verification

Do other sources provide similar numbers?
Can the numbers be verified?

 

Requesting Datasets

Requesting Datasets from Centre

Need data? We can help. Faculty at Centre College are encouraged to request or recommend datasets for purchase. 

To make a request, contact the DIV III Librarian, Jazmine Wilson

Requests

Start by telling us who you are: 

  • Name
  • Your Department or Program
  • Some details about what you plan to use the data for (Research? Your dissertation? Teaching?)

Then, tell us as much as you can about the data: 

  • The publisher / producer of the data
  • The data product name or title
  • Years available and years you want
  • If there are different geographies available, what they are and which you particularly want
  • Price
  • Where you found out about the dataset

The Process

Once you send that information, we'll look into the details and figure out if we can buy the data. We will do our best to get you what you need, but as we make our decision we consider:

  • Price, our budget, and whether it’s an outright purchase or if access is time-limited or ‘leased’
  • Whether the terms of purchase allow multiple users to access the data
  • How many researchers are likely to use the data product
  • Technical support required

For example, data that cost $20,000 for one year’s access and can only be used by one person are unlikely to be purchased.

Government

 

Health

Religion

Social Media

Social media data can be difficult to collect due to security measures in place by individual companies. In addition to technological barriers, there are also ethical considerations to using social media data. You can read more about these issues here:  Social Media Research: Ethical Guidance for Researchers at the University of Edinburgh

Below are datasets and tools to help you mine social media for raw data.

Physics

Biology

Social Science

Transportation

Unusual

Sports