Elements of a Data Management Plan
Because some funding agencies do not provide specific guidelines, below is an abbreviated compilation of data management plan elements from several sources including example text. You should review specific guidelines for data management planning from the funding agency with which you are working. See the DMPTool for example templates for various funding sources and to create a DMP. Please see full text example DMPs at the bottom of this page.
- Roles and Responsibilities
- List staff/organizational roles and their responsibilities for carrying out the data management plan (DMP); name specific people where possible. Include a description of time allocations, training requirements, and contributions of non-project staff, as appropriate.
- Indicate when and/or how often adherence to your DMP will be checked and/or demonstrated. Include the names of the person(s) responsible for adherence to your DMP.
- Indicate who/which roles will assume responsibility for carrying out the DMP should personnel changes occur or if the PI leaves the institution. Describe the process for transferring responsibility for the data.
- Indicate who will have primary responsibility for how the data will persist over time when the original personnel are no longer associated with the project.
- "The project will assign a qualified data manager certified in disclosure risk management to act as steward for the data while they are being collected, processed, and analyzed."
- "All research data collected as part of this project is owned by the University. The Principal Investigator of this project will take responsibility for the collection, management, and sharing of the research data."
- "Day-to-day quality assessment will be the responsibility of the Lab Director who in turn is overseen by the Project Director."
- Types of Data
- Provide a short description of the data that will be generated in the research project (e.g., samples, physical collections, software, curriculum materials, and other materials to be produced during the course of the project). Include an estimate of the amount of data and content of the data (if possible).
- In the case of software, contact NC State's Office of Research Commercialization to review considerations for software to review considerations for software that you develop as part of your research.
- Describe the data types will you be creating or capturing (e.g. experimental measures, observational or qualitative, model simulation, processed etc.).
- If data will be created or captured, describe your process.
- If you will be using existing data, describe the source of that data and the relationship between the data you are collecting and the existing data that you are integrating into the project.
- "The associated data types will be captured using X survey software and analyzed using X data analytics tool."
- "Over the course of the project, data will be collected and entered into two relational databases."
- "Over the course of the project, data will be generated from sensors and recorded in X format."
- "This project will produce public-use nationally representative survey data for the United States covering Americans' social backgrounds, enduring political predispositions, social and political values, perceptions and evaluations of groups and candidates, opinions on questions of public policy, and participation in political life."
- "This project will generate data designed to study the prevalence and correlates of DSM III-R psychiatric disorders and patterns and correlates of service utilization for these disorders in a nationally representative sample of over 8000 respondents. The sensitive nature of these data will require that the data be released through a restricted use contract."
- "Few datasets exist that focus on this population in the United States and how their attitudes toward assimilation differ from those of others. The primary resource on this population, [give dataset title here], is inadequate because..."
- "Data have been collected on this topic previously (for example: [add example(s)]). The data collected as part of this project reflect the current time period and historical context. It is possible that several of these datasets, including the data collected here, could be combined to better understand how social processes have unfolded over time."
- "For quantitative data files, the [repository] ensures that missing data codes are defined, that actual data values fall within the range of expected values and that the data are free from wild codes. Processed data files are reviewed by a supervisory staff member before release."
- Data Formats and Metadata
- Indicate which file formats you will use for your data, and why you will use those formats.
- Describe any contextual details (metadata) that are necessary to make the data you capture or collect meaningful to you and others, including details on how you will create or capture this information.
- Describe the form that the metadata will take (i.e., which metadata standards, if any, will be used). Explain why have you chosen particular standards and approaches for metadata and contextual documentation (e.g., recourse to staff expertise, Open Source, accepted domain-local standards, widespread usage). Where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies.
- Describe the set of conventions that you will use for naming data files and folders, and how you intend to manage multiple versions of files.
- See Describing Your Data for more information.
- "Research data will be stored using X file formats. Related files in different formats will be linked by file naming conventions, e.g.,..."
- "Metadata will be generated to describe the data generated in X format and will be stored alongside the data. X metadata standards will be applied during the creation of the metadata."
- "Data will be stored in a CVS system and checked in and out for purposes of versioning. Variables will use a standardized naming convention consisting of a prefix, root, suffix system. Separate files will be managed for the two kinds of records produced: one file for respondents and another file for children with merging routines specified."
- "Data will conform to best practices and standards from the X community." "Internal calibration (for geophysical data), instrument calibrations, duplicate samples and field blanks (for hydrochemical data) will be recorded and tested against collected/recorded data to ensure their validity. Qualitative descriptions (lithological data) will be validated through comparative descriptions of collected materials."
- "The clinical data collected from this project will be documented using CDISC metadata standards." "Digital video data files generated will be processed and submitted to the [repository] in MPEG-4 (.mp4) format."
- Access, Sharing, and Privacy
- Describe which data will be shared and how you will make the data available, including any resources needed such as equipment, systems, expertise, etc.
- Indicate when you will make the data available (including any factors such as embargo periods for political, commercial, patent reasons, or complying with publishing policies).
- Describe the process for gaining access to the data. Be aware that some funders and some journals have specific policies related to data sharing, including specific repositories and embargo limits.
- Indicate if you will use a repository, if the data will be available as supplementary files with the publication, or if the data will be available upon request.
- Indicate if you anticipate that the original data collector, creator, or principal investigator retains the right to use the data before opening it up to wider use.
- Indicate if there are any ethical and privacy issues related to sharing the data. If so, describe how those will be resolved if the data is shared (e.g., removing any personally identifying information in the data, working with institutional ethical committees, resolving potential conflicts by way of formal consent agreements).
- If applicable, describe what you have done to comply with your obligations in your IRB Protocol.
- Describe how the dataset will be licensed if rights exist (e.g., list any restrictions or delays on data sharing needed to protect intellectual property, copyright or patentable data).
- Indicate if any permission restrictions need to be placed on the data.
- Describe which communities/groups are likely to be interested in the data.
- Describe the intended or foreseeable uses and users of the data.
- Indicate if there are any reasons not to share or re-use data (e.g., ethical, non-disclosure, etc.). Please refer to "Sharing Data" for more details.
- If you need help with intellectual property and copyright issues, contact NC State Libraries' Open Knowledge Center. In the case of software, contact NC State's Office of Research Commercialization to review considerations for software that you develop as part of your research.
- "Data will be posted on a website within three months of the grant closing. Data will be contributed to X public database. Data will be submitted to supplementary materials sections of peer-reviewed journals."
- "Data will be available and cited in publication. Researchers will be able to contact the PI for access to data. Data will be maintained in an open XML format to enable open re-use of the data."
- "The main output from this project is field data. We recognize that these data are the property of X and hence we will be asking their permission to licence these data to Y for use in their exploration program."
- "Our project will generate a large volume of data, some of which may not be appropriate for sharing since it involves a small sample that is not representative. The investigators will work with staff of the [repository] to determine what to archive and how long the deposited data should be retained."
- "X and third party copyright will be protected. The PI will be responsible for ensuring that all project members are aware as to the ownership of data and who may access them and under what conditions. Online access to the data will be password protected."
- "There is an agreement regarding the right of the original data collector, creator, or PI for first use of the data. The specified embargo period associated with the data being submitting extends from [date] until [date]. The embargo will be lifted by [date]."
- "This project will generate data linked to administrative records, so the data will be distributed through a restricted data use agreement managed by [repository]. Through this mechanism, users will apply to use these files, create data security plans, and agree to other access controls."
- "The principal investigators on the project and their institutions will hold the intellectual property rights for the research data they generate but will grant redistribution rights to [repository] for purposes of data sharing."
- "Our research group has been trained in human subjects protection and only trained project staff operating under the IRB approval for the project will have access to the confidential individually identifiable data, and all data will be aggregated or anonymized for publication."
- "The following language will be used in the informed consent: The information in this study will only be used in ways that will not reveal who you are. You will not be identified in any publication from this study or in any data files shared with other researchers. Your participation in this study is confidential. Federal or state laws may require us to show information to university or government officials [or sponsors], who are responsible for monitoring the safety of this study."
- "For this project, the principal investigators will request expedited IRB review compliant with procedures established by the [University] campus IRB. Research activities envisioned present no more than minimal risk to human subjects." "During data analysis, the data will be accessible only by certified members of the project team. The research project will remove any direct identifiers in the data before deposit with [repository]."
- Policies and Provisions for Re-use & Re-distribution
- Indicate if any permission restrictions need to be placed on the data.
- Describe which communities/groups are likely to be interested in the data.
- Describe the intended or foreseeable uses and users of the data. Indicate if there any reasons not to share or re-use data (e.g., ethical, non-disclosure, etc.). Please refer to "Sharing Data" for more details.
- "The data gathered will use a copyrighted instrument for some questions. A reproduction of the instrument will be provided to [repository] as documentation for the data deposited with the intention that the instrument be distributed under "fair use" to permit data sharing, but it may not be redisseminated by users."
- "The project team will create a dedicated Web site to manage and distribute the data because the audience for the data is small and has a tradition of interacting as a community. The site will be established using a content management system like Drupal or Joomla so that data users can participate in adding site content over time, making the site self-sustaining. The site will be available at a .org location. For preservation, we will supply periodic copies of the data to [repository]. That repository will be the ultimate home for the data".
- "Users of field data should acknowledge and/or offer co-authorship to the investigators who collected the data."
- "The data to be produced will be of interest to demographers studying family formation practices in early adulthood across different racial and ethnic groups."
- "In addition to the research community, we expect these data will be used by practioners and policymakers."
- Data Storage and Preservation
- Indicate how long will/should the data be kept beyond the life of the project. Many grant funders suggest that the minimum data retention period for research data so be certain to check funder requirements if you have one.
- Describe the long-term strategy for storing and preserving the data. Describe the procedures that your intended long-term data storage facility has in place for preservation and backup.
- Learn more about about storage options at NC State.
- "The research data from this project will be deposited with the institutional repository on the grantees' campus."
- "The research data from this project will be deposited with [repository] to ensure that the research community has long-term access to the data."
- "By depositing data with [repository], our project will ensure that the research data are migrated to new formats, platforms, and storage media as required by good practice." "In addition to distributing the data from a project Web site, future long-term use of the data will be ensured by placing a copy of the data into [repository], ensuring that best practices in digital preservation will safeguard the files."
- "[Repository] will place a master copy of each digital file (i.e., research data files, documentation, and other related files) in Archival Storage, with several copies stored at designated locations and synchronized with the master through the Storage Resource Broker."
- "The data will be processed and managed in a secure non-networked environment using virtual desktop technology." "The data files from this study will be managed, processed, and stored in a secure environment (e.g., lockable computer systems with passwords, firewall system in place, power surge protection, virus/malicious intruder protection) and by controlling access to digital files with encryption and/or password protection. De-identifed files will be deposited with [repository] whose security policy has been written according to best practices."
- "Our research project will generate data from a large national sample. These data will be retained by [repository] as part of their permanent collection."
- Check with the funding agency to determine where in the proposal to include costs related to data management.
- Include any anticipated income from licensing data.
- Include any costs for managing data during the course of the project as well as after the project is complete.
- "Staff time has been allocated in the proposed budget to cover the costs of preparing data and documentation for archiving. The [repository] has estimated their additional cost to archive the data at [insert dollar amount]. This fee appears in the budget for this application as well."
- "The cost model is twice the current cost of storage. At $1,850/usable TB, costs are estimated at $3,700/usable TB for the storage hardware for indefinite data retention."
Examples of Full Data Management Plan Documents
- Public Data Management Plans from the DMPTool
- ICPSR Data Management Plan Examples (wide variety of disciplines)
- NIH Examples of Data Sharing Plans
- Rice University Data Management Plan Examples
- NSF Engineering Data Management Plan Template (University of Michigan)
- NEH Data Management Plans From Successful Grant Applications (2011 - 2014) (zip file)