FedSearch - Federated network search engine

Johan van der Knijff · @bitsgalore

363 followers · 619 posts · Server digipres.club

Open media

ICYMI, I ran some experiments to see if #VeraPDF’s parse status can be used to predict #PDF rendering problems, using an existing dataset of synthetic PDFs as ground truth. I also looked at how this compares against the occurrence of #JHOVE validation errors.

Details in this blog post:

https://www.bitsgalore.org/2023/06/29/verapdf-parse-status-as-a-proxy-for-rendering

#JHOVE #pdf #veraPDF

Last updated 2 years ago

Original post

Johan van der Knijff · @bitsgalore

362 followers · 616 posts · Server digipres.club

Open media

New blog post - #VeraPDF parse status as a proxy for #PDF rendering: experiments with the Synthetic PDF Testset:

https://www.bitsgalore.org/2023/06/29/verapdf-parse-status-as-a-proxy-for-rendering

#pdf #veraPDF

Last updated 2 years ago

Original post

Johan van der Knijff · @bitsgalore

360 followers · 606 posts · Server digipres.club

Oh, this looks good - #OpenPreservationFoundation #OPF and the #PDFAssociation have released a first development preview of a #VeraPDF-powered #PDF checker. The software is based on the #Arlington PDF model, and analyses PDF files against the full PDF 2.0 specification:

https://openpreservation.org/news/development-preview-pdf-file-checker-based-on-the-arlington-pdf-model/

Haven't tried it yet, but based on what I'm reading this looks like the future of #PDF validation to me!

#arlington #pdf #veraPDF #pdfassociation #OPF #OpenPreservationFoundation

Last updated 2 years ago

Original post

Johan van der Knijff · @bitsgalore

356 followers · 591 posts · Server digipres.club

Out of curiosity I ran both #JHOVE and #VeraPDF on the "Synthetic #PDF Testset for File Format Validation" by @mickylindlar et al. (link: https://www.radar-service.eu/radar/en/dataset/JtlOdwQquZWDqQdq).

Then did a quick comparison between validation errors as reported by JHOVE, and parse errors and logged warnings by VeraPDF.

Main result so far is that majority of PDFs for which JHOVE reports validation errors, also result in either parser error or warning in VeraPDF. Sneak peek here:

https://github.com/KBNLresearch/pdf-characterisation/blob/main/output/lindlar-tunnat-wilson/jhove-vera-status-errors-warnings.csv

#pdf #veraPDF #JHOVE

Last updated 2 years ago

Original post

Johan van der Knijff · @bitsgalore

353 followers · 565 posts · Server digipres.club

Open media

I explored to what extent #VeraPDF and #JHOVE can be used to identify #PDF features that are potential preservation risks. Check out this (massive!) blog post for the full lowdown #wtfPDF:

https://www.bitsgalore.org/2023/05/25/identification-of-pdf-preservation-risks-with-verapdf-and-jhove

#wtfPDF #pdf #JHOVE #veraPDF

Last updated 2 years ago

Original post