loongson/pypi/: pypdfium2-5.7.0 metadata and description

Homepage Simple index

Python bindings to PDFium

author pypdfium2-team
author_email geisserml@gmail.com
classifiers
  • Development Status :: 5 - Production/Stable
  • Intended Audience :: Developers
  • Intended Audience :: Information Technology
  • Programming Language :: Python :: 3
  • Programming Language :: Python :: 3 :: Only
  • Programming Language :: Python :: Implementation :: CPython
  • Topic :: Multimedia :: Graphics
  • Topic :: Software Development :: Libraries
description_content_type text/markdown
keywords pdf,pdfium
license BSD-3-Clause, Apache-2.0, dependency licenses
project_urls
  • Source, https://github.com/pypdfium2-team/pypdfium2
  • Tracker, https://github.com/pypdfium2-team/pypdfium2/issues
  • Documentation, https://pypdfium2.readthedocs.io
  • Changelog, https://pypdfium2.readthedocs.io/en/stable/changelog.html
requires_python >= 3.6
File Tox results History
pypdfium2-5.7.0-py3-none-manylinux_2_38_loongarch64.whl
Size
3 MB
Type
Python Wheel
Python
3
  • Replaced 1 time(s)
  • Uploaded to loongson/pypi by loongson 2026-04-15 02:45:58
pypdfium2-5.7.0-py3-none-musllinux_1_2_loongarch64.whl
Type
Python Wheel
Python
3

pypdfium2

pypdfium2 is an ABI-level Python 3 binding to PDFium, a powerful and liberal-licensed library for PDF rendering, inspection, manipulation and creation.

It is built with ctypesgen and external PDFium binaries. The custom setup infrastructure provides a seamless packaging and installation process. A wide range of platforms is supported with pre-built packages.

pypdfium2 includes helpers to simplify common use cases, while the raw PDFium API (ctypes) remains accessible as well.

Installation

From PyPI (recommended)

python -m pip install -U pypdfium2

If available for your platform, this will use a pre-built wheel package, which is the easiest way of installing pypdfium2. Otherwise, setup code will run. If your platform is not covered with pre-built binaries, this will look for system pdfium, or attempt to build pdfium from source.

JavaScript/XFA builds

pdfium-binaries also offer V8 (JavaScript) / XFA enabled builds. If you need them, do e.g.:

PDFIUM_PLATFORM=auto-v8 pip install -v pypdfium2 --no-binary pypdfium2

This will bypass wheels and run setup, while requesting use of V8 builds through the PDFIUM_PLATFORM=auto-v8 environment setting. See below for more info.

Optional runtime dependencies

As of this writing, pypdfium2 does not require any mandatory runtime dependencies, apart from Python and PDFium itself (which is commonly bundled).

However, some optional support model / CLI features need additional packages:

pypdfium2 tries to defer imports of optional dependencies until they are actually needed, so there should be no startup overhead if you don't use them.

From the repository / With setup

Note, unlike helpers, pypdfium2's setup is not bound by API stability promises, so it may change any time.

Setup Dependencies

System

Python

Python dependencies should be automatically installed, unless --no-build-isolation is passed to pip.

[!NOTE] pypdfium2 and its ctypesgen fork are developed in sync, i.e. each pypdfium2 commit ought to be coupled with the then HEAD of pypdfium2-ctypesgen.
Our release sdists, and latest pypdfium2 from git, will automatically use matching ctypesgen.
However, when using a non-latest commit, you'll have to set up the right ctypesgen version on your own, and install pypdfium2 without build isolation.

Get the code

git clone "https://github.com/pypdfium2-team/pypdfium2.git"
cd pypdfium2/

Default setup

# In the pypdfium2/ directory
python -m pip install -v .

This will invoke pypdfium2's setup.py. Typically, this means a binary will be downloaded from pdfium-binaries and bundled into pypdfium2, and ctypesgen will be called on pdfium headers to produce the bindings interface.

pdfium-binaries offer GitHub build provenance attestations, so it is highly recommended that you install the gh CLI for our setup to verify authenticity of the binaries.

If no pre-built binaries are available for your platform, setup will look for system pdfium, or attempt to build pdfium from source.

pip options of interest

With system pdfium

PDFIUM_PLATFORM="system-search" python -m pip install -v .

Look for a system-provided pdfium shared library, and bind against it.

Standard, portable ctypes.util.find_library() means will be used to probe for system pdfium at setup time, and the result will be hardcoded into the bindings. Alternatively, set $PDFIUM_BINARY to the path of the out-of-tree DLL to use.

If system pdfium was found, we will look for pdfium headers from which to generate the bindings (e.g. in /usr/include). If the headers are in a location not recognized by our code, set $PDFIUM_HEADERS to the directory in question.

Also, we try to determine the pdfium version, either from the library filename itself, or via pkg-config. If this fails, you can pass the version alongside the setup target, e.g. PDFIUM_PLATFORM=system-search:XXXX, where XXXX is the pdfium build version. If the version is not known in the end, NaN placeholders will be set.

If the version is known but no headers were found, they will be downloaded from upstream. If neither headers nor version are known (or ctypesgen is not installed), the reference bindings will be used as a last resort. This is ABI-unsafe and thus discouraged.

If find_library() failed to find pdfium, we may do additional, custom search, such as checking for a pdfium shared library included with LibreOffice, and – if available – determining its version.
Our search heuristics currently expect a Linux-like filesystem hierarchy (e.g. /usr), but contributions for other systems are welcome.

[!IMPORTANT] When pypdfium2 is installed with system pdfium, the bindings ought to be re-generated with the new headers whenever the out-of-tree pdfium DLL is updated, for ABI safety reasons.[^upstream_abi_policy]
For distributors, we highly recommend the use of versioned libraries (e.g. libpdfium.so.140.0.7269.0) or similar concepts that enforce binary/bindings version match, so outdated bindings will safely stop working with a meaningful error, rather than silently continue unsafely, at risk of hard crashes.

[!TIP] If you mind pypdfium2's setup making a web request to resolve the full version, you may pass it in manually via GIVEN_FULLVER=$major.$minor.$build.$patch (colon-separated if there are multiple versions), or less ideally, set IGNORE_FULLVER=1 to use NaN placeholders. This applies to other setup targets as well.
For distributors, we recommend that you use the full version in binary filename or pkgconfig info, so pypdfium2's setup will not need to resolve it in the first place.

[^upstream_abi_policy]: Luckily, upstream tend to be careful not to change the ABI of existing stable APIs, but they don't mind ABI-breaking changes to APIs that have not been promoted to stable tier yet, and pypdfium2 uses many of them, so it is still prudent to care about downstream ABI safety as well (it always is). You can read more about upstream's policy here.

Related targets

There is also a system-generate:$VERSION target, to produce system pdfium bindings in a host-independent fashion. This will call find_library() at runtime, and may be useful for packaging.

Further, you can set just system to consume pre-generated files from the data/system staging directory. See the section on caller-provided data files for more info.

With self-built pdfium

You can also install pypdfium2 with a self-compiled pdfium shared library, by placing it in data/sourcebuild/ along with a bindings interface and version info, and setting the PDFIUM_PLATFORM="sourcebuild" directive to use these files on setup.

This project comes with two scripts to automate the build process: build_toolchained.py and build_native.py (in setupsrc/).

[!TIP] The native sourcebuild can either use system libraries, or pdfium's vendored libraries. When invoked directly, by default, system libraries need to be installed. However, when invoked through fallback setup (PDFIUM_PLATFORM=fallback), vendored libraries will be used.
The --vendor ... and --no-vendor ... options can be used to control vendoring on a per-library basis. See build_native.py --help for details.

You can also set PDFIUM_PLATFORM to sourcebuild-native or sourcebuild-toolchained to trigger either build script through setup, and pass command-line flags with $BUILD_PARAMS. However, for simplicity, both scripts/subtargets share just sourcebuild as staging directory.

Dependencies:

To do the toolchained build, you'd run something like:

# call build script with --help to list options
python setupsrc/build_toolchained.py
PDFIUM_PLATFORM="sourcebuild" python -m pip install -v .

Or for the native build, on Ubuntu 24.04, you could do e.g.:

# Install dependencies
sudo apt-get install generate-ninja ninja-build libfreetype-dev liblcms2-dev libjpeg-dev libopenjp2-7-dev libpng-dev libtiff-dev zlib1g-dev libicu-dev libglib2.0-dev
# Build with GCC
python ./setupsrc/build_native.py --compiler gcc
# Alternatively, build with Clang
sudo apt-get install llvm lld
VERSION=18
ARCH=$(uname -m)
sudo ln -s "/usr/lib/clang/$VERSION/lib/linux" "/usr/lib/clang/$VERSION/lib/$ARCH-unknown-linux-gnu"
sudo ln -s "/usr/lib/clang/$VERSION/lib/linux/libclang_rt.builtins-$ARCH.a" "/usr/lib/clang/$VERSION/lib/linux/libclang_rt.builtins.a"
python ./setupsrc/build_native.py --compiler clang
# Install
PDFIUM_PLATFORM="sourcebuild" python -m pip install -v .

[!NOTE] The native sourcebuild currently supports Linux (or similar). macOS and Windows are not handled, as we do not have access to these systems, and working over CI did not turn out feasible – use the toolchain-based build for now. Community help / pull requests to extend platform support would be welcome.

Android (Termux)

The native build may also work on Android with Termux in principle.

Click to expand for instructions

First, make sure git can work in your checkout of pypdfium2:

# set $PROJECTS_FOLDER accordingly
git config --global --add safe.directory '$PROJECTS_FOLDER/*'

To install the dependencies, you'll need something like

pkg install gn ninja freetype littlecms libjpeg-turbo openjpeg libpng zlib libicu libtiff glib

Then apply the clang symlinks as described above, but use ARCH=$(uname -m)-android and substitute /usr with $PREFIX (/data/data/com.termux/files/usr).

Last time we tested build_native on Android, there were some bugs with freetype/openjpeg includes. A quick & dirty workaround with symlinks is:

# freetype
ln -s "$PREFIX/include/freetype2/ft2build.h" "$PREFIX/include/ft2build.h"
ln -s "$PREFIX/include/freetype2/freetype" "$PREFIX/include/freetype"

# openjpeg
OPJ_VER="2.5"  # adapt this to your setup
ln -s "$PREFIX/include/openjpeg-$OPJ_VER/openjpeg.h" "$PREFIX/include/openjpeg.h"
ln -s "$PREFIX/include/openjpeg-$OPJ_VER/opj_config.h" "$PREFIX/include/opj_config.h"

Now, you should be ready to run the build.

On Android, PDFium's build system outputs libpdfium.cr.so by default, thus you'll want to rename the binary so pypdfium2's library search can find it:

mv data/sourcebuild/libpdfium.cr.so data/sourcebuild/libpdfium.so

Then install with PDFIUM_PLATFORM=sourcebuild.

In case dependency libraries were built separately, you may also need to set the OS library search path, e.g.:

PY_VERSION="3.12"  # adapt this to your setup
LD_LIBRARY_PATH="$PREFIX/lib/python$PY_VERSION/site-packages/pypdfium2_raw"

By default, our build script currently bundles everything into a single DLL, though.

cibuildwheel

Sourcebuild can be run through cibuildwheel. For targets configured in our pyproject.toml, the basic invocation is as simple as p.ex.

CIBW_BUILD="cp311-manylinux_x86_64" cibuildwheel

A more involved use case could look like this:

CIBW_BUILD="cp310-musllinux_s390x" CIBW_ARCHS=s390x CIBW_CONTAINER_ENGINE=podman TEST_PDFIUM=1 cibuildwheel

See also our cibuildwheel workflow. For more options, see the upstream documentation.

On Linux, this will use the native sourcebuild with vendored dependency libraries. On Windows and macOS, the toolchained sourcebuild is used.

Note, for Linux, cibuildwheel requires Docker, or Podman. On the author's version of Fedora, Docker can be installed as follows:

sudo dnf in moby-engine  # this provides the docker command
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker $USER
# then reboot (re-login might also suffice)

For other ways of installing Docker, refer to the cibuildwheel docs (Setup, Platforms) and the links therein.

[!WARNING] cibuildwheel copies the project directory into a container, not taking .gitignore rules into account. Thus, it is advisable to make a fresh checkout of pypdfium2 before running cibuildwheel. In particular, a toolchained checkout of pdfium within pypdfium2 is problematic, and will cause a halt on the Copying project into container... step. For development, make sure the fresh checkout is in sync with the working copy.

[!TIP] pdfium itself has first-class cross-compilation support. In particular, for Linux architectures supported by upstream's toolchain but not available natively on CI, we recommend to forego cibuildwheel, and instead cross-build pdfium using its own toolchain, e.g.:

# assuming cross-compilation dependencies are installed
python setupsrc/build_toolchained.py --target-cpu arm
PDFIUM_PLATFORM=sourcebuild CROSS_TAG="manylinux_2_17_armv7l" python -m build -wxn

This typically achieves a lower glibc requirement than we can with cibuildwheel.

With caller-provided data files

pypdfium2 is like any other Python project in essentials, except that it needs some data files: a pdfium DLL (either bundled or out-of-tree), a bindings interface (generated via ctypesgen), and pdfium version info (JSON).

The main point of pypdfium2's custom setup is to automate deployment of these files, in a way that suits end users / contributors, and our PyPI packaging.

However, if you want to (or have to) forego this automation, you can also just supply these files yourself, as shown below. This allows to largely sidestep pypdfium2's own setup code.
The idea is basically to put your data files in a staging directory, data/sourcebuild or data/system (depending on whether you want to bundle or use system pdfium), and set the matching $PDFIUM_PLATFORM target to consume from that directory on setup.

This setup strategy should be inherently free of web requests. Mind though, we don't support the result. If you bring your own files, it's your own responsibility, and it's quite possible your pypdfium2 might turn out subtly different from ours.

# First, ask yourself: Do you want to bundle pdfium (in-tree), or use system
# pdfium (out-of-tree)? For bundling, set "sourcebuild", else set "system".
TARGET="sourcebuild"  # or "system"
STAGING_DIR="data/$TARGET"

# If you have decided for bundling, copy over the pdfium DLL in question.
# Otherwise, skip this step.
cp "$MY_BINARY_PATH" "$STAGING_DIR/libpdfium.so"

# Now, we will call ctypesgen to generate the bindings interface.
# Reminder: You'll want to use the pypdfium2-team fork of ctypesgen.
# It generates much cleaner bindings, and it's what our source expects
# (there may be subtle API differences in terms of output).
# How exactly you do this is down to you.
# See ctypesgen --help or base.py::run_ctypesgen() for further options.
ctypesgen --library pdfium --rt-libpaths $MY_RT_LIBPATHS --ct-libpaths $MY_CT_LIBPATHS \
--headers $MY_INCLUDE_DIR/fpdf*.h -o $STAGING_DIR/bindings.py [-D $MY_RAW_FLAGS]

# Then write the version file (fill the placeholders).
# Note, this is not a mature interface yet and might change any time!
# See also https://pypdfium2.readthedocs.io/en/stable/python_api.html#pypdfium2.version.PDFIUM_INFO
# major/minor/build/patch: integers forming the pdfium version being packaged
# n_commits/hash: git describe like post-tag info (0/null for release commit)
# origin: a string to identify the build
# flags: a comma-delimited list of pdfium feature flag strings
#        (e.g. "V8", "XFA") - may be empty for default build
cat > "$STAGING_DIR/version.json" <<END
{
  "major": $PDFIUM_MAJOR,
  "minor": $PDFIUM_MINOR,
  "build": $PDFIUM_BUILD,
  "patch": $PDFIUM_PATCH,
  "n_commits": $POST_TAG_COMMIT_COUNT,
  "hash": $POST_TAG_HASH,
  "origin": "$TARGET-$MY_ORIGIN",
  "flags": [$MY_SHORT_FLAGS]
}
END

# Finally, run setup (through pip, pyproject-build or whatever).
# The PDFIUM_PLATFORM value will instruct pypdfium2's setup to use the files
# we supplied, rather than to generate its own.
PDFIUM_PLATFORM=$TARGET python -m pip install --no-build-isolation -v .

Further setup info (formal summary)

This is a somewhat formal description of pypdfium2's setup capabilities. It is meant to sum up and complement the above documentation on specific sub-targets.

Disclaimer: As it is hard to keep up with constantly evolving setup code, it is possible this documentation may be outdated/incomplete. Also keep in mind that these APIs could change any time, and may be mainly of internal interest.

[^platform_ids]: Intended for packaging, so that wheels can be crafted for any platform without access to a native host.

From Conda

[!WARNING] Beware: Any conda packages/recipes of pypdfium2 or pdfium-binaries that might be provided by other distributors, including anaconda/main or conda-forge default channels, are unofficial.

[!NOTE] Wait a moment: Do you really need this? pypdfium2 is best installed from PyPI (e.g. via pip),[^pypi_reasons] which you can also do in a conda env. Rather than asking your users to add custom channels, consider making pypdfium2 optional at install time, and ask them to install it via pip instead.
This library has no hard runtime dependencies, so you don't need to worry about breaking the conda env.

[^pypi_reasons]: To name some reasons: + pypdfium2 from PyPI covers platforms that we cannot cover on conda. + pypdfium2 from PyPI has extensive fallback setup, while conda does not provide an opportunity to run custom setup code. + With conda, in-project publishing / custom channels are second class. + With conda, it seems there is no way to create platform-specific but interpreter-independent python packages, so we cannot reasonably bundle pdfium. Thus, we have to use external pdfium, which is more complex and has some pitfalls.

Note: Conda packages are normally managed using recipe feedstocks driven by third parties, in a Linux repository like fashion. However, with some quirks it is also possible to do conda packaging within the original project and publish to a custom channel, which is what pypdfium2-team does, and the above instructions are referring to.

Unofficial packages

The authors of this project have no control over and are not responsible for possible third-party builds of pypdfium2, and we do not support them. Please use our official packages where possible. If you have an issue with a third-party build, either contact your distributor, or try to reproduce with our official builds.

Do not expect us to add/change code for downstream-specific setup tasks. Related issues or PRs may be closed without further notice if we don't see fit for upstream. Enhancements of general value that are maintainable and align well with the idea of our setup code are welcome, though.

[!IMPORTANT] If you are a third-party distributor, please point out in the description that your package is unofficial, i.e. not affiliated with or endorsed by the pypdfium2 authors.
In particular, if you feel like you need patches to package pypdfium2, please submit them on the Discussions page so we can figure out if there isn't a better way (there usually is).

Usage

Support model

Here are some examples of using the support model API.

Raw PDFium API

While helper classes conveniently wrap the raw PDFium API, it may still be accessed directly and is available in the namespace pypdfium2.raw. Lower-level utilities that may aid with using the raw API are provided in pypdfium2.internal.

import pypdfium2.raw as pdfium_c
import pypdfium2.internal as pdfium_i

Since PDFium is a large library, many components are not covered by helpers yet. However, as helpers expose their underlying raw objects, you may seamlessly integrate raw APIs while using helpers as available. When passed as ctypes function parameter, helpers automatically resolve to the raw object handle (but you may still access it explicitly if desired):

permission_flags = pdfium_c.FPDF_GetDocPermission(pdf.raw)  # explicit
permission_flags = pdfium_c.FPDF_GetDocPermission(pdf)      # implicit

For PDFium docs, please look at the comments in its public header files.[^pdfium_docs] A variety of examples on how to interface with the raw API using ctypes is already provided with support model source code. Nonetheless, the following guide may be helpful to get started with the raw API, if you are not familiar with ctypes yet.

[^pdfium_docs]: Unfortunately, no recent HTML-rendered docs are available for PDFium at the moment.

[^bindings_decl]: From the auto-generated bindings file. We maintain a reference copy at autorelease/bindings.py. Or if you have an editable install, there will also be src/pypdfium2_raw/bindings.py.

Command-line Interface

pypdfium2 also ships with a simple command-line interface, providing access to key features of the support model in a shell environment (e. g. rendering, content extraction, document inspection, page rearranging, ...).

The primary motivation behind this is to have a nice testing interface, but it may be helpful in a variety of other situations as well. Usage should be largely self-explanatory, assuming some familiarity with the command-line. See pypdfium2 --help or pypdfium2 $SUBCOMMAND --help for available commands and options.

If you wish to call pypdfium2's CLI through python -m, note that the module name is pypdfium2_cli (not just pypdfium2), unlike the entrypoint script.

Licensing

[!NOTE] Disclaimer: This project is provided on an "as-is" basis. This is not legal advice, and there is ABSOLUTELY NO WARRANTY for any information provided in this document or elsewhere in the pypdfium2 project, including earlier revisions. We disclaim liability for any possible damages resulting from using this license information. It is the embedder's responsibility to check on licensing. See also GitHub's disclaimer.

pypdfium2 itself is available by the terms and conditions of Apache-2.0 / BSD-3-Clause. Documentation and examples of pypdfium2 are licensed under CC-BY-4.0. pypdfium2 includes SPDX headers in source files. License information for data files is provided in REUSE.toml as per the reuse standard.

PDFium is available under "a BSD-style license that can be found in [its] LICENSE file".
Various other open-source licenses apply to dependencies included with PDFium. PDFium's license as well as dependency licenses have to be shipped with binary distributions.
See the BUILD_LICENSES/ directory, or the licenses shipped with our wheel builds.

PDFium's dependencies might change over time. Please notify us if you think a relevant license is missing.

To the author's knowledge, pypdfium2 is one of the rare Python libraries capable of PDF rendering while not being covered by strong-copyleft licenses.[^liberal_pdf_renderlibs]

[!IMPORTANT] The exact licensing situation depends on how the builds were made.
Note that a subset of pypdfium2 builds might link with the libgcc runtime library. Check the builds you use and, if affected, libgcc's license to see if that's OK for your use.

[^liberal_pdf_renderlibs]: The only other liberal-licensed PDF rendering libraries known to the author are pdf.js (JavaScript) and Apache PDFBox (Java), but python bindings packages don't exist yet or are unsatisfactory. However, we wrote some gists that show it'd be possible in principle: pdfbox (+ setup), pdfjs.

Issues / Contributions

While using pypdfium2, you might encounter bugs or missing features. In this case, feel free to open an issue or discussion thread. If applicable, include details such as tracebacks, OS and CPU type, as well as the versions of pypdfium2 and used dependencies.

Roadmap:

Response policy

Given this is a volunteer open-source project, it is possible you may not get a response to your issue, or it may be closed without much feedback. Conversations may be locked if we feel like our attention is getting DDOSed. We may not have time to provide usage support.

The same applies to Pull Requests. We will accept contributions only if we find them suitable. Do not reach out with a strong expectation to get your change merged; it is solely up to the repository owner to decide if and when a PR will be merged, and we are free to silently reject PRs we do not like.

Known limitations

Incompatibility with Threading

PDFium is inherently not thread-safe. See the API docs for more information.

Risk of unknown object lifetime violations

As outlined in the raw API section, it is essential that Python-managed resources remain available as long as they are needed by PDFium.

The problem is that the Python interpreter may garbage collect objects with reference count zero at any time, so it can happen that an unreferenced but still required object by chance stays around long enough before it is garbage collected. However, it could also disappear too soon and cause breakage. Such dangling objects result in non-deterministic memory issues that are hard to debug. If the timeframe between reaching reference count zero and removal is sufficiently large and roughly consistent across different runs, it is even possible that mistakes regarding object lifetime remain unnoticed for a long time.

Although we intend to develop helpers carefully, it cannot be fully excluded that unknown object lifetime violations might still be lurking around somewhere, especially if unexpected requirements were not documented by the time the code was written.

Missing raw PDF access

As of this writing, PDFium's public interface does not provide access to the raw PDF data structure (see issue 1694). It does not expose APIs to read/write PDF dictionaries, streams, name/number trees, etc. Instead, it merely offers a predefined set of abstracted functions. This considerably limits the library's potential, compared to other products such as pikepdf.

Limitations of ABI bindings

PDFium's non-public backend would provide extended capabilities, including raw access, but it is written in C++, which (unlike pure C) does not result in a stable ABI, so we cannot use it with ctypes. This means it's out of scope for this project.

Also, while ABI bindings tend to be more convenient, they have some technical drawbacks compared to API bindings (see e.g. 1, 2)

Development

Long lines

The pypdfium2 codebase does not hard wrap long lines. It is recommended to set up automatic word wrap in your text editor, e.g. VS Code:

editor.wordWrap = bounded
editor.wordWrapColumn = 100

Command recipes

The pypdfium2 project uses the just command runner, which can be seen as a more modern, more flexible alternative to make. In particular, there's no good way to pass through positional arguments with make.

Run just -l (or open the justfile) to view the available commands.

Docs

pypdfium2 provides API documentation using Sphinx, which can be rendered to various formats, including HTML:

sphinx-build -b html ./docs/source ./docs/build/html/
just docs-build  # short alias

Built docs are primarily hosted on readthedocs.org. It may be configured using a .readthedocs.yaml file (see instructions), and the administration page on the web interface. RTD theoretically supports hosting multiple versions, but currently, we only host one build for the latest release through the stable branch. New builds are automatically triggered by a webhook whenever a linked branch is pushed.

Additionally, one doc build can also be hosted on GitHub Pages. It is implemented with a CI workflow, which is supposed to be triggered automatically on release. This provides us with full control over build env and used commands, whereas RTD may be less liberal in this regard.

Testing

pypdfium2 contains a small test suite to verify the library's functionality. It is written with pytest:

python -m pytest tests/  # or `just test`

Pass -sv to get more detailed output. Environment variables used by the CLI are also honored in the test suite: PYPDFIUM_LOGLEVEL, DEBUG_AUTOCLOSE, DEBUG_SYSFONTS, DEBUG_UNSUPPORTED. See pypdfium2 --help for description.

To get code coverage statistics, you may call

just coverage

Sometimes, it can also be helpful to test code on many PDFs.[^testing_corpora] In this case, the command-line interface and find come in handy:

# Example A: Analyse PDF images (in the current working directory)
find . -name '*.pdf' -exec bash -c "echo \"{}\" && pypdfium2 pageobjects \"{}\" --filter image" \;
# Example B: Parse PDF table of contents
find . -name '*.pdf' -exec bash -c "echo \"{}\" && pypdfium2 toc \"{}\"" \;

[^testing_corpora]: For instance, one could use the testing corpora of open-source PDF libraries (pdfium, pikepdf/ocrmypdf, mupdf/ghostscript, tika/pdfbox, pdfjs, ...)

Release workflow

The release process is fully automated using Python scripts and scheduled release workflows. You may also trigger the workflow manually from the GitHub Actions panel or similar.

Python release scripts are located in the folder setupsrc, along with custom setup code:

The autorelease script has some peculiarities maintainers should know about:

In case of necessity, you may also forego CI and do the release locally, which would roughly work like this (though ideally it should never be needed):

If something went wrong with commit or tag, you can still revert the changes:

# perform an interactive rebase to change history (substitute $N_COMMITS with the number of commits to drop or modify)
git rebase -i HEAD~$N_COMMITS
git push --force
# delete remote tag (substitute $TAGNAME accordingly)
git push --delete origin $TAGNAME
# delete local tag
git tag -d $TAGNAME

Faulty PyPI releases may be yanked using the web interface.

Popular dependents

pypdfium2 is used by popular packages such as langchain, dify, docling, nougat, pdfplumber, doctr, and nv-ingest.

This results in pypdfium2 being part of a large dependency tree.

Thanks to[^thanks_to]

... and further code contributors (GitHub stats).

If you have contributed to this project but are not mentioned here yet, please let us know.

[^thanks_to]: People listed in this section may not necessarily have contributed any copyrightable code to the repository. Many have rather helped with ideas, or contributions to dependencies of pypdfium2.

History

PDFium

The PDFium code base was originally developed as part of the commercial Foxit SDK, before being acquired and open-sourced by Google, who maintain PDFium independently ever since, while Foxit continue to develop their SDK closed-source.

pypdfium2

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

Inspired by wowpng, the first known proof of concept Python binding to PDFium using ctypesgen, the initial pypdfium package was created. It had to be updated manually, which did not happen frequently. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels and consolidate the setup scripts. Further additions include ...