Johan van der Knijff · @bitsgalore
363 followers · 619 posts · Server digipres.club

ICYMI, I ran some experiments to see if ’s parse status can be used to predict rendering problems, using an existing dataset of synthetic PDFs as ground truth. I also looked at how this compares against the occurrence of validation errors.

Details in this blog post:

bitsgalore.org/2023/06/29/vera

#JHOVE #pdf #veraPDF

Last updated 1 year ago

Johan van der Knijff · @bitsgalore
356 followers · 591 posts · Server digipres.club

Out of curiosity I ran both and on the "Synthetic Testset for File Format Validation" by @mickylindlar et al. (link: radar-service.eu/radar/en/data).

Then did a quick comparison between validation errors as reported by JHOVE, and parse errors and logged warnings by VeraPDF.

Main result so far is that majority of PDFs for which JHOVE reports validation errors, also result in either parser error or warning in VeraPDF. Sneak peek here:

github.com/KBNLresearch/pdf-ch

#pdf #veraPDF #JHOVE

Last updated 1 year ago

Johan van der Knijff · @bitsgalore
353 followers · 565 posts · Server digipres.club

I explored to what extent and can be used to identify features that are potential preservation risks. Check out this (massive!) blog post for the full lowdown :

bitsgalore.org/2023/05/25/iden

#wtfPDF #pdf #JHOVE #veraPDF

Last updated 1 year ago

Micky · @mickylindlar
285 followers · 701 posts · Server digipres.club

jicymi 1.28 was fully released last week.

Download and info:
jhove.openpreservation.org/

Release notes:
github.com/openpreserve/jhove/

#oag3 #JHOVE

Last updated 1 year ago

Micky · @mickylindlar
285 followers · 700 posts · Server digipres.club

conformance levels of (and other validators):

(1) well-formed = meets the purely syntactic requirements (i.e., what's in the standard)

(2) vaild - well-formed and meets the higher-level semantic req (i.e., what's in the schema)

(3) consistent - valid and internally extracted info is consistent with externally supplied information (i.e., what's in your policy)

#oag3 #JHOVE

Last updated 1 year ago

Micky · @mickylindlar
285 followers · 699 posts · Server digipres.club

Carl Wilson is now giving an update on starting out with a short background story of the tool. JHOVE "identifies, characterizes and validates file formats". I'm wondering if we should stop claiming that JHOVE identifies file formats ....

#oag3 #JHOVE

Last updated 1 year ago

Georgia Moppett · @Georgia
32 followers · 41 posts · Server digipres.club

@mickylindlar i like this, and i definitely think there would be an appetite ( users are hungry!) i fully agree that a group raising issues like this would be really productive, and like that it would be just for users. should i put some feelers out?

#JHOVE

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
195 followers · 207 posts · Server digipres.club

Hi @MediaArea ! I have two WAVE files (among 16 "regular" files) that identifies as PCMWAVEFORMAT and MediaInfo WAVE with a DTS encoding. These come from a transferred audio CD (ark.bnf.fr/ark:/12148/cb435422). Have you any idea why? For further investigation, should we share the file? Thanks for your help!!

#JHOVE

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
180 followers · 155 posts · Server digipres.club

@marhop @bitsgalore @archivist_Liz
IFDs can be used to store a thumbnail or EXIF metadata, but unlike , seems to return information only for IFDs that contain images with significant content (though nothing prevents you from embedding a thumbnail that is just a small image with no relation with the main one!).

We use for such a task, we parse the XML output and count the IFDs of type "TIFF" whose "Newsubfiletype" = "0".

#exiftool #JHOVE

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
180 followers · 155 posts · Server digipres.club

@marhop @bitsgalore @archivist_Liz
Yup, but in this case, will return "reduced-resolution image of another image in this file" in its "NewSubFileType" element. If it's an image with a different content, should return "0".

#hove #JHOVE

Last updated 2 years ago

Archivist Liz · @archivist_Liz
278 followers · 97 posts · Server digipres.club

I just encountered a multi-page TIFF out in the wild for the first time. friends, do you have a tool of choice for easily identifying multi-page TIF? I find it with , but would be better. Thoughts?

#exiftool #JHOVE #digipres #wtfTIF

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 99 posts · Server digipres.club

Thomas Ledoux advocates for a standard edition tool to enforce institutional policies on , & outputs.

#jpylyzer #veraPDF #JHOVE #schematron #OPFOAG

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 98 posts · Server digipres.club

considers creating a new module for validating spreadsheet.

#JHOVE #OPF #OPFOAG

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 96 posts · Server digipres.club

@mickylindlar shows how output is mapped to properties, so that files can be queried by a specific error. Wow!

#rosetta #JHOVE

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 95 posts · Server digipres.club

Remember that when trying to validate / characterize a file with , you should specify the module to use. Otherwise, if the file does not comply with its specification, it will be considered as a plain octet stream...

#JHOVE #OPFOAG

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 91 posts · Server digipres.club


Eh, les personnes intéressées par la préservation numérique : existe lui aussi en version en ligne pour des analyses unitaires...
openpreservation.org/tools/jho

(Un peu honteux de le découvrir aujourd'hui mais bon.)

#PINFormats #DigiPres_FR #JHOVE #OPFOAG

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 91 posts · Server digipres.club

@mickylindlar
Carl: "PDF is a huge tree of objects linked one to another." Which makes interpreting errors far from intuitive!

But , and soon , should be able to associate an error to the problematic zone in the PDF.

#JHOVE #veraPDF

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
159 followers · 86 posts · Server digipres.club

tutorial: Carl Wilson reminds that the software is extensible, it's pretty simple to plug in a module for a format that you would have developed by yourself.

#JHOVE

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
154 followers · 58 posts · Server digipres.club

Vous êtes intéressés par la numérisation et la préservation numérique ? Vous êtes à Paris le 7 décembre ? L'Open Preservation Foundation est à la BnF. Le matin, un atelier sur l'outil (openpreservation.org/events/jh) est prévu et l'après-midi, l'organisation tient une assemblée.

L'inscription est payante pour les organisations non membres. Mais c'est une occasion à ne pas manquer pour rencontrer des membres éminents et très ouverts de la communauté !

#PINFormats #DigiPres_FR #JHOVE

Last updated 2 years ago

Bertrand Caron · @BertrandCaron
154 followers · 58 posts · Server digipres.club

Vous êtes intéressés par la numérisation et la préservation numérique ? Vous êtes à Paris le 7 décembre ? L'Open Preservation Foundation est à la BnF. Le matin, un atelier sur l'outil (openpreservation.org/events/jh) est prévu et l'après-midi, l'organisation tient une assemblée.

L'inscription est payante pour les organisations non membres. Mais c'est une occasion à ne pas manquer pour rencontrer des membres éminents et très ouverts de la communauté !

#JHOVE

Last updated 2 years ago