Collaborative Working Sessions - Filtering diffoscope output

Goal: add patterns to filter out some parts of output, or filters to only show some parts of output

Requirements:

  • print info that parts output are being ignored
  • indicate in return code that files are not identical

A number of options exist:

  • --exclude
  • --exclude-command=REGEXP: this skips command matching REGEXP (--exclude-command '^readelf.*gdb_index') but then diffoscope tries the next command, possibly falling back to hexdump comparison

  • output formats: --json, --html, --htmldir. Multiple output formats can be use together.

  • --load-existing-diff FILE. Diffoscope will produce all kinds of output from JSON. This can be combined with ‘jq’ filtering or some other way to filter.

Internally, state is a series of deeply nested dictionaries. The comparator is called with a paths of keys.

Issues about –exclude* already exist: https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/130 https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/53 https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/52

Filtering by “output level” is not enough. For example, in an RPM header, some specific fields should be ignored, but only those.

Idea: provide a command to filter the output using a jq-like path.