ICYMI, I ran some experiments to see if #VeraPDF’s parse status can be used to predict #PDF rendering problems, using an existing dataset of synthetic PDFs as ground truth. I also looked at how this compares against the occurrence of #JHOVE validation errors.
Details in this blog post:
https://www.bitsgalore.org/2023/06/29/verapdf-parse-status-as-a-proxy-for-rendering
New blog post - #VeraPDF parse status as a proxy for #PDF rendering: experiments with the Synthetic PDF Testset:
https://www.bitsgalore.org/2023/06/29/verapdf-parse-status-as-a-proxy-for-rendering
Oh, this looks good - #OpenPreservationFoundation #OPF and the #PDFAssociation have released a first development preview of a #VeraPDF-powered #PDF checker. The software is based on the #Arlington PDF model, and analyses PDF files against the full PDF 2.0 specification:
Haven't tried it yet, but based on what I'm reading this looks like the future of #PDF validation to me!
#arlington #pdf #veraPDF #pdfassociation #OPF #OpenPreservationFoundation
Out of curiosity I ran both #JHOVE and #VeraPDF on the "Synthetic #PDF Testset for File Format Validation" by @mickylindlar et al. (link: https://www.radar-service.eu/radar/en/dataset/JtlOdwQquZWDqQdq).
Then did a quick comparison between validation errors as reported by JHOVE, and parse errors and logged warnings by VeraPDF.
Main result so far is that majority of PDFs for which JHOVE reports validation errors, also result in either parser error or warning in VeraPDF. Sneak peek here:
#OPFOAG Thomas Ledoux advocates for a standard #schematron edition tool to enforce institutional policies on #JHOVE, #veraPDF & #jpylyzer outputs.
#jpylyzer #veraPDF #JHOVE #schematron #OPFOAG
@mickylindlar
Carl: "PDF is a huge tree of objects linked one to another." Which makes interpreting errors far from intuitive!
But #veraPDF, and soon #JHOVE, should be able to associate an error to the problematic zone in the PDF.