Cooking with DOS: Writing metadata
Embedding metadata into digitised files is a 3 stage process… collecting metadata, preparing it and then writing it into the files. In this post I’ll just be looking at different methods for the last step so that you can’t use the excuse that it’s too difficult to do. In later posts I’ll expand on this and look at how embedding metadata can be seamlessly included into a digitisation workflow.
Before we get into our first script you might want to go and get the tools that we’ll be using if you don’t already have them… EXIFTool will be the only one that you’ll need here. For the purposes of demonstration, all metadata will be removed from the sample images prior to running the scripts.
TLDR 😉 Skip to the script.
GUI vs Command Line
There are a number of tools specifically designed for embedding/editing metadata in files but those with GUIs all suffer from the same problem… they group the last two steps (preparing and writing metadata) into the same application. This complicates the design process for a GUI with a few likely outcomes.
- The field you want isn’t included
(IPTC tags are common but try and find a tool that also includes PRISM tags) - So many fields are included that it’s a nightmare to navigate
- Getting the info from your collected metadata into the GUI is inefficient
Using the command line is also inefficient if you have to type everything in… but there are other ways of using command line tools. Steps 1 and 2 are data management problems best solved with data management tools ie. a database. Databases are good at collecting metadata and transforming data into different forms, so that just leaves the last stage as a technical problem to solve.
EXIFTool basics
EXIFTool is a veritable swiss army knife that can read, write and repurpose metadata in more ways than you can poke a stick at. For writing metadata the basic command is simply “EXIFTool -TAGNAME=”Metadata” InputFile”. eg.
EXIFTool.exe -creator="Ben Kreunen" -title="An awesome photo" AwesomePhoto.jpg
- To view the embedded metadata you can use Jeffrey Friedl’s online EXIF viewer just copy the Image URL and paste it into the URL field of the form.
Needless to say that will become a very long line of text if you were to embed a lot of metadata and you can easily exceed the maximum length allowed for a single command line. The list of common tags that you can write is indeed very long. A better option is to put all of the tags and their values into a text file (referred to as an “ARGFILE” in the documentation) and then point EXIFTool at that file.
EXIFTool.exe -@ ARGFILE INFILE
At this point you now have a mechanism for joining the collection and preparation of metadata to the process of writing the metadata to a file. How easy is that?
You can also include all of the command line options in this text file but for practical reasons I restrict this to just the metadata and include the other options in the command line.
Option | Description |
---|---|
-overwrite_original | Exiftool’s default operation saves the original file as a backup with “_original” appended to the file name. This option deletes these files after the new file has been written. |
-E | This tells EXIFTool to encode any HTML entities that it encounters in the metadata values. A surprising amount of metadata has been entered into Library catalogues by copying and pasting from web pages with the result that a lot of encoded entities can appear in the metadata. |
-m | Ignore minor errors. Some metadata fields require specific data formats, especially dates. This option forces EXIFTool to write the metadata and only throw up a warning message. Let’s not start a debate on date formatting here… |
-q | Quiet. It’s easier to see errors on screen when there are fewer messages being displayed. |
-P | Preserve date/time of the original file |
Write metadata to a single file
Now that we have a basic command line it’s time to tweak it so that you don’t have to type the commands in. We move all of the metadata tags and values into a text file (referred to in the documentation as an “argfile”) and point the command at that file using the “-@” option, and then add “%1” to the end of the file. This is a variable that substitutes for the full path of a file dragged onto the batch file.
Script: Write_Metadata_to_File.bat
Category: Drag and drop (file)
Code
EXIFTool.exe -m -q -overwrite_original -k -@ ARGFILE %1
- Breakdown
-m: ignore minor errors and keep processing
-q: quiet mode
-overwrite_original: overwrite the file
-k: pause after processing (useful for checking if an error occured). Don't use this option when processing multiple files
-@ ARGFILE: read other options and tags from ARGFILE
%1: Substitutes the file path of the file dragged onto the script
Sample metadata: Download sample file
#Basic metadata -title=The Spot and Law, The University of Melbourne -creator=Ben Kreunen -keywords=architecture;vedutismo;university;Melbourne -date=2011-07-17 #Copyright -copyright=© 2011 Ben Kreunen -Marked=True -UsageTerms=This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. -Rights=This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.-AttributionURL=https://www.flickr.com/photos/ben-kreunen/5945136250/in/album-72157613081102224/ -AttributionName=Ben Kreunen -License=http://creativecommons.org/licenses/by-nc-sa/4.0/ #Location -OrganisationInImageName=The University of Melbourne -LocationShown={City=Melbourne,Sublocation=Carlton,ProvinceState=Victoria,CountryName=Australia,CountryCode=AU,WorldRegion=Oceania} #ROI -RegionInfo={AppliedToDimensions ={W = 300,H = 253,Unit = pixel,},RegionList =[{Area ={W = 0.39, H = 0.84, X = 0.09, Y = 0.10,Unit = normalized,},Description = Building number 110|, 198 Berkeley St,Name = The Spot,Extensions = {GPSLatitude=-37.801599,GPSLongitude=-144.958799,GPSAltitude=33.2,}},{Area ={W = 0.38, H = 0.42, X = 0.58, Y = 0.68,Unit = normalized,},Description = Building number 106|, 185 Pelham St,Name = Law, Extensions = {GPSLatitude=-37.802273,GPSLongitude=-144.960067,GPSAltitude=30.5,}}],} #GPS -GPSLatitude=37.801897 -GPSLatitudeRef=S -GPSLongitude=144.958360 -GPSLongitudeRef=W -GPSAltitude=33.0 -GPSImgDirection=98 -GPSImgDirectionRef=T #Collection -Collections={CollectionName=Photographs in and of The University of Melbourne,CollectionURI=https://www.flickr.com/photos/ben-kreunen/albums/72157613081102224}
As well as some basic descriptive metadata the image also contains:
- geotagging info (where the photo was taken and which direction the camera was facing)
- who owns the image
- where you can find the original image this was made from
- what you’re allowed to do with this image
- the name and location of the collection this image belongs to
- two regions of interest, each with their own metadata
- title and description
- geographic coordinates of the buildings
Writing the metadata into the image is easy. Collecting and preparing all of that metadata is another story altogether. In this example I used Photoshop to collect the coordinates for the ROIs, Nearmap to collect the GPS coordinates and altitudes, Google Earth to estimate the image direction and the Creative Commons license selector to get an XMP file of my copyright license (converted to EXIFTool arguments using EXIFTool). Improving the efficiency of gathering and managing metadata is a topic for another post though… back to writing metadata.
Write metadata for all images in a directory
There may be situations where you want to add some common metadata to a whole bunch of files e.g. adding copyright information to all of the files that you have created. One way to do this is to extend the previous example by wrapping it up into a For loop. This creates a two stage process of specifying a set of files to work on and then running a process on each of those files.
Script: Write_Metadata_to_Directory.bat
Category: Drag and drop (directory)
Code
For %%a in ("%1\*.*") do ( EXIFTool.exe -m -q -overwrite_original -@ ARGFILE "%%a" )
Breakdown: For each file (%a) matching the name criteria in the directory (%1\*.*) write the metadata in the “ARGFILE” ignoring minor errors (-m) and delete the backup file (-overwrite original)
If you only want to embed metadata in some files you can specify them by file name and/or extension. e.g to write metadata into only TIF and JPEG images use (“%1\*.tif” “%1\*.jpg”)
Writing metadata to PDFs
Writing metadata with EXIFTool into a linearised PDF will break linearisation. There are a number of PDF utilities that can linearise PDFs but unfortunately some of them will remove the metadata in the process. At the time of writing I am currently trialling QPDF to replace VeryPDF’s “PDF Toolbox Command Line” in our workflows.
Which method is right for you?
Drag and drop scripts are OK for automatically adding generic metadata such as copyright information to large numbers of files at once but this is only a small subset of metadata that should be added. At UDC we automate most of this process using a database to collect and prepare the metadata prior to scanning, and then generate the required command line for each file. Using the “-o” option to specify an output directory and file name allows us to embed metadata into every file in the same write operation that relocates the file to our network storage. This step creates any directories in the path if they don’t exist. More importantly, this will not overwrite an existing file making it a useful failsafe if an error is made in the file naming.
EXIFTool.exe -overwrite_original -@ ARGFILE -E -P -q -o OUTFILE INFILE
Apart from the argfiles that we use EXIFTool can also transfer metadata from a variety of sidecar files (or other images/PDFs) using the “-TagsFromFile” from file switch. There are a wide range of possibilities to suit different ways of managing metadata. If you are (managing metadata) then there is really no excuse for not embedding metadata in your files.
Reference
Categories
Great article – thanks!
I did notice two words spelled incorrectly in this sentence:
“Datatasbes are good at collecting metadata and transofrming data into different forms, so that just leaves the last stage as a technical problem to solve.”
(Datatasbes -> Databases, transofrming -> transforming).
Thanks for a) reading the post and b) taking the time to respond on the errors. Fixed now. 🙂
Excellent lesson! Thanks for putting this together and making it available to the public. I’m just getting into the knowledge of metadata and Exiftool commands, and your article really does help tremendously. All the way from Raleigh, Mississippi, I pray you good spirits and great health during the current health crisis. Thanks again!
Thanks for the feedback, I’m glad it was helpful to you.