Thing 21: Data Management
So you’ve found your next project and can’t wait to get started – but have you thought about how you’ll manage your research data? Good data management is important for your own research practice and the broader research community. In this post Peter Neish shares some tips on how to wrangle your research data.
All researchers generate data – it could be physical specimens, paper survey forms, digital images, recordings, system configurations, or tabular numerical data. Even your research notes, drafts, emails and various jottings can be important records of your research and need to be managed. The best time to start your data management is at the beginning of a research project; you really don’t want to find yourself 12 months into a project with folders full of tables, images, notes, and analyses, and not know what they all are, or how they relate to each other.
That Thing you do: integration into practice
Below are some ideas that other researchers have found helpful for managing their research data.
Create a Data Management Plan
A Data Management Plan (DMP) is a great way to think about and plan for the data you will collect as part of your research project. It also helps you think about how the data will be analysed, published, shared, re-used or archived throughout your project. In fact, many funders now require you to create a DMP as a condition of funding, and DMPs are now becoming standard practice. You can start creating a DMP using the DMPMelbourne online tool, or learn more about DMPs in the Managing Data @Melboune training modules.
Keep a copy of your raw data
This is a fundamental concept in data management. Any raw, unprocessed data – be that images, data generated by an instrument, recordings, or data that has been provided by someone else – needs to be quarantined into a place where it will never be touched. You should always work on a copy of your original data. That way there is no chance that the original data will be altered, and you can always get back to your starting point if you need to.
Automate if you can
In this post-truth world, it is becoming increasingly important to show how you did something, and not just what you found. To the greatest extent possible, you should be able to reproduce what you did, or at the very least describe your methods in a way that your peers can understand. If you can’t validate and reproduce your research, you could be in trouble.
One of the best ways to make your research reproducible is to remove any error-prone manual tasks from the mix. For example, you may regularly download some data from the web, which you then load into an Excel spreadsheet. You might then apply some filters to get a subset of data which you then export to a statistical program to run some analysis. The problem is that you have to do this every time new data is released – and what happens if you haven’t had enough coffee and make a mistake?
Tools such as R and Python are being used more and more by researchers to automate tasks. There is a learning curve to these tools, but there are people and training that can help you. The payoff is that you can run or modify your analysis without redoing time-consuming tasks again and again – for example, if you change one of the parameters of your research or analysis. You will also be documenting your research process as you work. You can even share your code with others so that they can check your results. While the STEMM disciplines are big users of these tools, the digital humanities are a growing area, and there are real efficiency gains and capabilities that digital tools can offer humanities researchers.
If you don’t pay attention to data management, you can get yourself into a mess. You will be less efficient, and you may have to redo work unnecessarily. You might also find it harder to get published, as many journals are now requiring data to accompany publications.
Managing Data @Melbourne has been developed to introduce researchers to some of the concepts and tools for data management. The online modules will also guide you through the process of creating a Data Management Plan. As a graduate researcher, you may already be enrolled – for more information check out the online modules.
About the author
Peter Neish is the Program Manager, Stewardship and Open Research at the University of Melbourne. He works across the University in partnership with researchers and service providers on a wide range of data management projects.