Copying and updating pages

You can rearrange or duplicate pages within a PDF, with an important caveat:

Warning

pdf.pages[0] = pdf.pages[42] will create a shallow copy of pdf.pages[42], unlike the usual behavior in Python.

Assigning one page to another within the same PDF will create a shallow copy of the source page. This does differ from the usual Python semantics, where assigning a list element to another element in the same list would merely create two references to an identical object. (Normally after setting list[0] = list[1], list[0] is list[1].) We break this convention with the shallow copy, and only guarantee page[0] == page[1].)

There is one important reason we have to do it this way: suppose that there was a table of contents entry that points to pdf.pages[42]. After we set pages[0] to be the same, where should the table of contents entry point? We leave it pointed at pdf.pages[42].

What if there was a table of contents entry that referenced pages[0]? (In PDFs, the table of contents references a page object, not a page number.) Is that entry still valid after reassignment? As the library, we don’t know. As the application developer, you have to decide. (pikepdf does not currently have support code for managing table of contents objects, but you can manipulate them.)

Updating a page in place

Use pikepdf.Object.emplace() to emplace one PDF page over top of another while preserving all references to the original page. emplace() sets all of the keys and values of the pages to be equal.