Thing 7: Data Management
Good data management planning can increase your efficiency and effectiveness as a researcher. Thing 7 shares some practical data management tips to save you time now and inconvenience at the end of your project.
Every research area generates its own kind of data – it could be physical specimens, paper survey forms, digital images or tabular numerical data. Even your research notes, drafts, emails and various jottings are part of the research data you amass. All these kinds of data need to be managed and the best time to start is at the beginning of a research project. You really don’t want to find yourself 12 months into a project with folders full of tables, images, notes and analyses and not know what they all are or how they relate to each other.
What Others Have Tried
Below are some general principles that other researchers have found helpful for managing their research data.
Keep master data separate
This is a fundamental concept in data management. Any raw unprocessed data, be that images, data off an instrument, recordings or data that has been provided by someone else, needs to be quarantined into a place where it will never be touched. You should always work on a copy of your original data. That way there is no chance that the original data will be altered and you can always get back to your starting point if you need to.
Automate everything you can
In this post-truth world, it is becoming increasingly important to show how you did something and not just what you found. To the greatest extent possible you should be able to reproduce what you did or at the very least describe your methods in a way that your peers can understand. If you can’t validate and reproduce your research you could be in trouble.
One of the best ways to be able to reproduce research is to remove any error-prone manual tasks from the mix. For example, you may regularly download some data off the web which you then load into an Excel spreadsheet, you might then apply some filters to get a subset of data which you then export to a stats program to run some analysis. The problem is that you have to do this every time new data is released and what happens if you haven’t had enough coffee and make a mistake?
Tools such as R and Python are being used more and more by researchers to automate tasks. There is a learning curve to these tools, but there are people to help you and the payoff is that you can run the exact same process again and again with the same result. You will also be documenting your research process as you work. You can even share your code with others so that they can check your results. While the STEMM disciplines are big users of these tools, digital humanities a growing area and there are real efficiency gains and capabilities that digital tools can offer humanities researchers. Our SCIP team are experts in helping HASS researchers with their data.
Organise your files
Think carefully about how you are going to organise the files your collect and analyse. There is no one ‘best way’ to organise folders, but there are some general guidelines. You might want to organise folders by date, experiment, subject or file type. Create a folder structure and stick to it – but don’t let those folders get out of hand – you don’t want too many levels. If you find yourself having to dig down more than four folders deep, then have a think about your structure.
You can also look at tagging files as an alternative or in addition to a folder hierarchy. On a Mac you can add tags through the finder and on Windows through explorer.
Name your files consistently
File names should be meaningful. Choose a format and use it consistently. You might want to use some of the following elements in your filename:
- Project or experiment name or acronym
- Researcher initials
For example a file containing raw data from site 27 of the CO2core project collected on 12 July 2016 might look like: 2016-07-12_co2core_site027_raw.csv
Note that by using this particular date format the files can easily be sorted by year month and day. Another good idea is to include a readme.txt file that explains your naming format along with any abbreviations of codes you have used. For more ideas, the University of Edinburgh has a list of 13 rules for file names.
If you don’t pay attention to data management you can get yourself into a mess. You will be less efficient and you may have to redo work unnecessarily. You might also not be able to publish as many journals are now requiring data to accompany publications.
Managing Data @Melbourne has been developed to introduce researchers to some of the concepts and tools for data management. The online modules will also guide you through the process of creating a Data Management Plan. As a graduate researcher, you may already be enrolled – for more information check out the online modules.
This post was written by Peter Neish, Research Data Curator and Acting Manager – Digital Scholarship (Research & Collections).