Thing 22: Managing research data: file management and version control
Managing data is a key part of the research process and the University Library’s Digital Scholarship Program offers a range of support and resources to help researchers—from new RHD students to long-established academics—to manage their data effectively. If your research project has required Ethics approval, you’ll almost certainly have been required to provide a data management plan as part of the approval process. Some form of data management plan is recommended for all research projects and, although not yet mandated, funding bodies are increasingly in favour of such plans. Indeed, the Australian Research Council ‘considers data management planning an important part of the responsible conduct of research and strongly encourages the depositing of data arising from a Project in an appropriate publically accessible subject and/or institutional repository’ (ARC Discovery Project Funding Rules, 2014-15, A11.5.2).
Research data can come in a myriad of forms: biological samples; survey data; medical records; photographs; audio files; business records; personal diaries; ad infinitum. Even your research notes, drafts and various jottings are part of the research data you amass. The Digital Scholarship Program has an excellent website to guide you through the process of managing your data and also provides a helpful checklist and a management plan template. In this week’s post, we’ll look at one particular aspect of research data management that is essential for all researchers at any stage of their career and at any stage of their project: file management and version control. Thing 22 was written by Dr Leo Konstantelos (Research Data Curator, Digital Scholarship, University Library).
File Management refers to methods for storing, organising, naming, discovering and retrieving files in a structured, consistent manner. In the digital world, computer systems use hierarchical file systems in order to control how data is stored and retrieved. Mastering file management methods can help you make the most of your audio and video materials.
Maintaining Master files and Derivative copies
Whether you’re managing audio, video, images, texts, or numerical data, one major consideration is how you will manage your master files (i.e., the originals created in yourself or that you acquired from a source) and any derivative files made from them.
For instance, you might have recorded an hour-long interview that you then proceeded to edit, crop, and save in a different file format and upload online. The hour-long video from your camcorder is your Master file, from which a number of derivatives might have been created: a file saved in your video editing software’s native format; a file with the final (edited) video saved as AVI; and another video file using high compression for online delivery. Alternatively, you may have written a draft of a thesis chapter, revised it after comments from your supervisor—removing information that might not be ideally placed here but that may be useful in a future context—and then prepared a further version for a conference paper or a journal article. Your first draft might be your Master file, with the subsequent derivatives clearly and separately maintained.
It’s recommended that you maintain only one Master file and that it’s stored separately from any derivatives. This helps to ensure that the Master file does not get accidentally deleted or over-written while working on derivatives.
Storing and Organising
Have you ever come across a case when—try as you might—you just can’t remember where you stored a file and you end up spending more than fifteen minutes trying to locate it? And when you did, you were bemused by the location where it was stored: ‘how did it end up in that folder?’ you might have wondered.
What seems reasonable when you create or save a digital file might become a mystery in the future. This researcher’s experience is a good example: Dave Anderson from the National Climatic Data Center in the U.S. on what not to do to with file management.
Digital audio and visual materials present additional challenges, in that the files tend to be large, costly to create/acquire and their content is non-textual, so full-text searches might not locate the file. By spending some time to consider the way you can store and organise your data in a structural manner, you can save time and frustration in the future.
Adopting a File Structure System
The most basic file structure system is to use your operating system’s organisation of files into a hierarchy of folders and sub-folders. There is generally “no right or wrong way” to create a file structure (sometimes referred to as a directory structure). However, the way you store your material in folders and sub-folders must be meaningful to you and your collaborators: stable, scalable and consistent. The following resources offer some good advice on how to create and maintain your file structure hierarchies:
- ‘How to Create a Logical and Manageable Folder Structure’ from the Digital Asset Management Learning Center.
- ‘Directory Structure’ from American Society of Media Photographers.
Using Tags and Embedding Metadata
Most modern Operating Systems and Digital Asset Management systems offer ways to organise files, including audio and video material, by using tags. With these, you can organise the same file in more than one category, but the same principles apply as for file structures. This Library Research Guide has some good tips for tagging your files.
Many audio and video file formats allow users to embed metadata into the file, either at the time of creation, at post-capture or both. Here are a couple of good resources to get you started with creating metadata for audio and video:
- ‘Technical metadata’ from JISC Digital Media.
- ‘How to Capture Metadata and Documentation’ from Archivists’ Guide to Archiving Video.
Using a file naming convention in consistent and systematic manner is key in maintaining research data, whatever the format. Naming your files and folders is the most fundamental element of any file structure and provides the basic information to identify, locate and retrieve your data. An excellent guide to file naming has been prepared by the JISC Digital Media group (here).
Simple version control
In its simplest form, version control is the process whereby every iteration of a file is saved under a new filename. You never know when you might need to revisit an earlier version: perhaps to trace an error or to revisit in a new context ideas that you had rejected. It’s advisable that you create a new version every time you make major changes or progress towards your final product. Instead of saving over the same file (normally using the ‘Save’ command), try to ‘Save As’ a new file and append a version number at the end. You can choose a naming convention for version control, but an example could be:
You can find further guidance here.
Consistency, Consistency, Consistency
In order to make the most of your file management methods, you need to apply them consistently. Make sure that you document your file management rules and follow them systematically. If you keep changing the rules randomly, then your file management system will soon become unmanageable.
Finally, Katherine McNeill and Helen Bailey from MIT Libraries have put together this detailed but very clear guide to file management and version control; well worth book-marking.