Installing additional language packsΒΆ

OCRmyPDF uses Tesseract for OCR, and relies on its language packs for languages other than English.

Tesseract supports most languages.

You can often find packages that provide language packs:

# Display a list of all Tesseract language packs
apt-cache search tesseract-ocr

# Install Chinese Simplified language pack
apt-get install tesseract-ocr-chi-sim

You can then pass the -l LANG argument to OCRmyPDF to give a hint as to what languages it should search for. Multiple languages can be requested using either -l eng+fre (English and French) or -l eng -l fre.