Skip to Main Content

Research Data Management

Welcome

This guide lays out practical considerations and information to aid you in managing your research throughout its' life cycle, including the steps you will take to collect, safeguard, archive, and make available the data used for the research in question.  

Many key granting organizations, like NSF, NIH, NEH and more, now require submitters to include a Data Management Plan as part of their application. These plans outline the best practices in data management that you will apply throughout the course of your grant. You can see some more background on this issue, or get started by selecting a tab at the left of the page.

What is research data?

Research data is any information that has been collected, observed, generated or created to validate original research findings.

Although usually digital, research data also includes non-digital formats such as laboratory notebooks and diaries.

Research data can take many forms. It might be:

  • documents, spreadsheets
  • laboratory notebooks, field notebooks, diaries
  • questionnaires, transcripts, codebooks
  • audiotapes, videotapes
  • photographs, films
  • test responses
  • slides, artifacts, specimens, samples
  • collections of digital outputs
  • data files
  • database contents (video, audio, text, images)
  • models, algorithms, scripts
  • contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
  • methodologies and workflows
  • standard operating procedures and protocols

Research data can be generated for different purposes and through different processes.

  • Observational data is captured in real-time, and is usually irreplaceable, for example sensor data, survey data, sample data, and neuro-images.
  • Experimental data is captured from lab equipment. It is often reproducible, but this can be expensive. Examples of experimental data are gene sequences, chromatograms, and toroid magnetic field data.
  • Simulation data is generated from test models where model and metadata are more important than output data. For example, climate models and economic models.
  • Derived or compiled data has been transformed from pre-existing data points. It is reproducible if lost, but this would be expensive. Examples are data mining, compiled databases, and 3D models.
  • Reference or canonical data is a static or organic conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.

What is research data managment?

Research data management (or RDM) is a term that describes the organization, storage, preservation, and sharing of data collected and used in a research project. It involves the everyday management of research data during the lifetime of a research project (for example, using consistent file naming conventions). It also involves decisions about how data will be preserved and shared after the project is completed (for example, depositing the data in a repository for long-term archiving and access). A data management plan (or DMP), is a formal document that outlines how data will be handled during and after a research project.

There are a host of reasons why research data management is important:

  • Data, like journal articles and books, is a scholarly product.
  • Data (especially digital data) is fragile and easily lost.
  • There are growing research data requirements imposed by funders and publishers.
  • Research data management saves time and resources in the long run.
  • Good management helps to prevent errors and increases the quality of your analyses.
  • Well-managed and accessible data allows others to validate and replicate findings.
  • Research data management facilitates sharing of research data and, when shared, data can lead to valuable discoveries by others outside of the original research team.