Thing 13: Manipulating Images
Previous posts have considered storing and finding images, but what if you’d like to take it a step further and manipulate images? The University Digitisation Centre (UDC) processes over 1 million images per year, and this post introduces some of the tools and techniques UDC staff use for automating tasks when manipulating images.
In a blog post published in 2014 we considered managing and manipulating images but focussed on tools (such as photogrammetry) and the types of manipulations you might like to carry out. Now, we’ll deal entirely with command line scripts for windows using freely available tools listed on UDC’s Command line tools for digitisation. The examples below deal primarily with EXIFTool and ImageMagick. These scripts will require some editing in a text editor to include the directories where you have installed the tools and any temporary directories used by the scripts.
Writing metadata into files is widely recognised as a good thing to do but remains relatively limited in its application. The Embedded Metadata Manifesto outlines five guiding principles for the use of metadata in images but in short it is the simplest way of transmitting descriptive information about a digital file. Reusing metadata can be automated by simply reading it directly from the file and reformatting it for the intended use.
The most practical format for writing metadata into files is Adobe’s Extensible Metadata Platform (XMP), which is also an ISO standard (ISO 16684-1:2012). In crude terms, this simply means attaching an XML fragment to a file. XMP supports a wide range of metadata schemas for a wide range of uses, including:
- International Press Telecommunications Council (IPTC)
Descriptions, tagging, geotagging, rights management etc…
- Publishing Requirements for Industry Standard Metadata (PRISM)
- Dublin Core (basic)
Basic descriptive and rights metadata
- Creative Commons (CC)
An extension of XMP:Rights for CC
- Metadata Working Group (MWG)
Hierarchical keywords, collection metadata
EXIFTool is one of the major utilities used to write/read metadata to/from files as well as sorting files based on metadata (e.g. by month/year).
Reading metadata is the first step to reusing it. EXIFTool is UDC’s tool of choice for both reading and writing as it will not only read standard metadata fields, it will also read any structured metadata that it can find in a file.
One good example of reusing metadata is with geotagging and/or reverse geotagging. A track log from a GPS unit can be used to add latitude and longitude metadata to photographs using the timestamps in the images to estimate the position of the camera. The reverse example is perhaps more useful as it demonstrates how easy it is to reformat metadata into any other text-based data format… be it a tab-separated “spreadsheet” or a CSV file (e.g. with column headings matching the specifications for uploading into Omeka).
ImageMagick is a powerful image processing tool that can outperform any GUI application in terms of speed. Writing command line scripts every time you want to process images is not practical but there are other ways:
- Create a batch file for an often-repeated process to run via “drag and drop” for individual files or folders.
- Adding loops to these scripts to process several subfolders
- Customise your data management tool to write the commands for you and export and run a batch file… it is just text after all.
The latter process is how most of UDC’s workflows operate. Creating ImageMagick commands from scratch requires a certain level of imaging expertise as the order of commands is critical to achieving the correct outcome.
As you look through XMP metadata schemas you’ll notice that there is some overlap between many of the schemas for basic fields such as title, author/creator, keywords etc… The question of whether you should duplicate metadata into each schema that you use will depend partly on how other applications interpret them. Adobe products will populate the file info window a compilation of metadata it finds in the file which will lead to duplication in “File Info” displays. How different vendors handle metadata is very problematic and the Metadata Working Group was formed to try and provide some consensus for both reading and writing. Aside from adding a few useful metadata fields for XMP metadata all members of the MWG have been guilty of doing their own thing since forming the group so there is still no easy answer to this question.
Embedding metadata in files in a research context is often a chicken and egg situation. You can’t write metadata if you don’t have it and entering metadata manually takes “too much time”. Attempting to build embedding metadata into an image management process can have positive spin-offs as it requires very close scrutiny of the metadata management processes. The earlier you collect metadata the more you can reuse it, and the easier the entire research data management (or digital collection management) process can be.
- To see image processing tasks “in action”, have a look at this series of posts on the Digitisation Lab blog, which shows complete breakdowns of specific examples for image processing tasks from the UDC’s own workflows and research consultations.
- Locate the research/technical support staff with the expertise to help you create scripts for your own workflows.
- The University Digitisation Centre provide consultations for Melbourne University staff and postgrads, and cover other digitisation topics in their Digitisation Lab blog .
- Both EXIFTool and Imagemagick have excellent user forums for answering technical questions.
This post was written by Ben Kreunen (Digitisation Technical Support Officer, University Digitisation Centre).