Collaborative Working Sessions - Native python repro

native: python code with compiled extensions and/or non-python processed assets

Backgrounds: oss-rebuild project, C-code compiled into platform-dependent wheels, pypi registry not having enough traceability, experience in reproducibility of 700 dependencies of AirFlow, is it necessary rebuild wheels for a custom build of Python, sysadmin perspective of not knowing the provenience of installed software, Fedora packaging experience.

Stuff causing reproducibility problems: - bytecode files – Fedora add-det tooling - shared libraries, e.g. libxml, expected to be preinstalled at deployed systems - postgress client driver, libpq

Is python metadata a separate problem to build reproducibility? Metadata in pyproject.toml does not contain enough information to describe all the details needed to rebuild pytorch reproducibly. conda is rebuilding packages for different architectures.

To reproduce pypi wheels, maybe build on five popular distros and compare outputs?

pypi is moving from gpgp signatures to trusted publishing. This implies building in the public cloud infra, so the environment is known. Can we describe this in metadata if only some common build environments are used.

Trusted publishing is about publishing, not build reproducibility.

pyx – registry of stuff being rebuilt when appropriate. Reproduciblity is easier to achieve within that ecosystem. Google has some similar product.

Should pypi do rebuilds of packages? The resource requirements would be huge. No interest at this point.

pypi sdists might not match the binary wheels. No checking in place or even requirements. We may need source reproduciblity before build reproducibility. Pypi should do a better job of tracking source provenance, structured field.

On pypi top 4000 packages → 98% of downloads.