Johan van der Knijff · @bitsgalore
363 followers · 619 posts · Server digipres.club

ICYMI, I ran some experiments to see if ’s parse status can be used to predict rendering problems, using an existing dataset of synthetic PDFs as ground truth. I also looked at how this compares against the occurrence of validation errors.

Details in this blog post:

bitsgalore.org/2023/06/29/vera

#JHOVE #pdf #veraPDF

Last updated 1 year ago

Johan van der Knijff · @bitsgalore
362 followers · 616 posts · Server digipres.club

New blog post - parse status as a proxy for rendering: experiments with the Synthetic PDF Testset:

bitsgalore.org/2023/06/29/vera

#pdf #veraPDF

Last updated 1 year ago

Johan van der Knijff · @bitsgalore
360 followers · 606 posts · Server digipres.club

Oh, this looks good - and the have released a first development preview of a -powered checker. The software is based on the PDF model, and analyses PDF files against the full PDF 2.0 specification:

openpreservation.org/news/deve

Haven't tried it yet, but based on what I'm reading this looks like the future of validation to me!

#arlington #pdf #veraPDF #pdfassociation #OPF #OpenPreservationFoundation

Last updated 1 year ago

Johan van der Knijff · @bitsgalore
356 followers · 591 posts · Server digipres.club

Out of curiosity I ran both and on the "Synthetic Testset for File Format Validation" by @mickylindlar et al. (link: radar-service.eu/radar/en/data).

Then did a quick comparison between validation errors as reported by JHOVE, and parse errors and logged warnings by VeraPDF.

Main result so far is that majority of PDFs for which JHOVE reports validation errors, also result in either parser error or warning in VeraPDF. Sneak peek here:

github.com/KBNLresearch/pdf-ch

#pdf #veraPDF #JHOVE

Last updated 1 year ago

Johan van der Knijff · @bitsgalore
353 followers · 565 posts · Server digipres.club

I explored to what extent and can be used to identify features that are potential preservation risks. Check out this (massive!) blog post for the full lowdown :

bitsgalore.org/2023/05/25/iden

#wtfPDF #pdf #JHOVE #veraPDF

Last updated 1 year ago

Bertrand Caron · @BertrandCaron
159 followers · 99 posts · Server digipres.club

Thomas Ledoux advocates for a standard edition tool to enforce institutional policies on , & outputs.

#jpylyzer #veraPDF #JHOVE #schematron #OPFOAG

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 91 posts · Server digipres.club

@mickylindlar
Carl: "PDF is a huge tree of objects linked one to another." Which makes interpreting errors far from intuitive!

But , and soon , should be able to associate an error to the problematic zone in the PDF.

#JHOVE #veraPDF

Last updated 2 years ago