You are here: Home Contribute Contributing images with metadata Processing image files with MIDAS metadata
Document Actions

Processing image files with MIDAS metadata

Tools and methods for extracting and using XMP-based metadata from image files.

Types of image files

The XMP specification is not specific to any particular file or encoding type. In practice, however, the choice is limited both by available applications and by charactericts of the formats themseves. The Adobe XMP API supports a variety of image formats including TIFF, JPEG, PNG, and GIF. On the other hand, the Photoshop CS 3 image processing software from the same manufacturer limits the choice to TIFF and JPEG, failing to embed XMP metadata in PNG and GIF files. Considering that TIFF files cannot be viewed natively with most Web browsers, we are currently left with JPEG as the only interoperable image format with sufficient support for XMP metadata.


Extracting XMP metadata from image files

XMP defines a container format for statements written RDF/XML syntax. In JPEG (more precisely: JFIF) files, blocks of XML metadata reside in sections marked with an APP1 marker. A tool for extracting all XMP metadata from JPEG files can be downloaded from here:

  http://www.w3.org/People/Bos/JPEG-XMP/

The XMP metadata block as extracted by tools like rdjpgxmp (contained in the toolset above) consists of two nested wrapper elements where the inner one (xmpmeta) contains the RDF/XML-encoded metadata. When parsing RDF/XML, namespaces should not be identified by the namespace prefix, but only by the full namespace URI. There is no guarantee that all XML- or RDF-aware tools will retain the original name for a namespace prefix. Therefore, parsing of RDF-XML should alway be done with a namespace-aware parser before accessing individual metadata elements. A suitable parser for RDF can be found at:

   http://librdf.org/raptor/

If the Raptor stand-alone parser, rapper, is invoked with the -s option, then it will ignore the XMP wrapper elements. A set of sample images and binaries for i386 linux is available here:

   http://thesauri.dbalzer.net/stuff/midas-xmp-extract.tar.gz

This is not a ready-to-use toolset. It only serves to illustrate one possible way of extracting MIDAS image metadata using XMP, RDF, and standard Unix tools in combination.
Navigation