PDF Metadata¶
The primary metadata in a PDF is stored in an XMP (Extensible Metadata Platform) Metadata stream, where XMP is a metadata specification in XML format. For full information on XMP, see Adobe’s XMP Developer Center. It supercedes the older Document Info dictionaries, which are removed in the PDF 2.0 specification. The XMP data entry is optional and does not appear in all PDFs.
The XMP Specification also provides useful information.
pikepdf provides an interface to simplify viewing and making minor edits to XMP. In particular, compound quantities may be read, but only scalar quantities can be modified.
For more complex changes consider using the python-xmp-toolkit
library and
its libexempi dependency; but note that it is not capable of synchronizing
changes to the older DocumentInfo metadata.
Accessing metadata¶
The XMP metadata stream is attached the PDF’s root object, but to simplify
management of this, use pikepdf.Pdf.open_metadata()
. The returned
pikepdf.models.PdfMetadata
object may be used for reading, or entered
with a with
block to modify and commit changes. If you use this interface,
pikepdf will synchronize changes to new and old metadata.
A PDF must still be saved after metadata is changed.
In [1]: pdf = pikepdf.open('../tests/resources/sandwich.pdf')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-24cdded42ad1> in <module>()
----> 1 pdf = pikepdf.open('../tests/resources/sandwich.pdf')
NameError: name 'pikepdf' is not defined
In [2]: meta = pdf.open_metadata()