๐ฃ 119,085 digitised newspaper articles added to #Trove last week. Once again they're mostly (112,604) from the Sydney Daily Mirror, 1944-45. But there's also 6,481 added to the Kyabram Free Press and Rodney and Deakin Shire Advocate in 1954.
See the Trove Data Dashboard: https://wragge.github.io/trove-newspaper-totals/ #GLAM #histodons
So #Trove has already used machine learning to improve the OCR of at least 10 million newspaper articles: http://nla-overproof.projectcomputing.com/
Today in the #Trove Data Guide โ I think the getting data from newspaper pages section is nearly finished: https://wragge.github.io/trove-data-guide/accessing-data/newspapers-and-gazettes-pages.html
Documentation is hard. Every time I work on a section in the #Trove Data Guide I realise I need to update/create several other sections. It just keeps getting bigger. Anyway, nearly finished this 'HOW TO' on harvesting a complete set of search results using the Trove API: https://wragge.github.io/trove-data-guide/how-to/harvest-complete-results.html #digitalHumanities #GLAM
#trove #digitalhumanities #glam
I'm continuing to log bugs in the #Trove v3 API here: https://github.com/GLAM-Workbench/trove-api-intro/issues/49 (Trove itself doesn't have any public list of issues/bugs)
@warpedtime If you can bear to use FB, there is an unofficial #Trove user group: https://www.facebook.com/groups/troveusergroup
๐ฃ Just like last week the only change to #Trove's digitised newspapers in the past week has been the addition of more articles from the Sydney Daily Mirror โ 123,565 articles from 1944-45.
See the Trove Newspaper Data Dashboard for more: https://wragge.github.io/trove-newspaper-totals/
Just resubmitted a #Trove bug report from 2021 as it's still not fixed -- affects advanced search when filtering by holding organisation.
Looks like there might have been a #Trove update yesterday. The following bugs reported in the last couple of months have been fixed:
https://github.com/GLAM-Workbench/trove-api-intro/issues/49#issuecomment-1652789707
https://github.com/GLAM-Workbench/trove-api-intro/issues/49#issuecomment-1652791988
https://github.com/GLAM-Workbench/trove-api-intro/issues/49#issuecomment-1652797423
#trove #glam #digitalhumanities
I've been playing around a lot with RO-Crate lately. It's a way of describing & packaging research data. Here's a post about how I've updated the #Trove Newspaper & Gazette Harvester to automatically document every harvest it creates using RO-Crate: https://updates.timsherratt.org/2023/08/31/some-important-updates.html #researchInfrastructure #rocrate #glam #digitalHumanities
#trove #researchinfrastructure #rocrate #glam #digitalhumanities
Not really much in this webinar to help people undertake new forms of digital research (which I thought was the point of the ARDC investment). But anyway, in a few more months the #Trove Data Guide will cover all of that and more (still much to do... ๐ฌ): https://wragge.github.io/trove-data-guide/home.html
Now talking about citations... Guess what? It would be a hell of a lot easier to capture and manage citations if #Trove hadn't broken the Zotero translator with the 2020 update... ๐ก (though Zotero still works with individual newspaper articles)
Tuning in to the "How to research on #Trove" webinar. Includes an update on some recent ARDC-funded improvements to the API and web interface. Chat is disabled... so maybe I'll drop some comments here.
There's a new 'How to research' page on #Trove, but I have to say it's a bit disappointing: https://trove.nla.gov.au/blog/2023/08/31/how-research-trove Hopefully, I can fill in some gaps with the Trove Data Guide.
New version of the #Trove Newspaper Harvester section of the #GLAMWorkbench (v2.0.0). Now using v3 of the Trove API. https://glam-workbench.net/trove-harvester/ #GLAM #digitalHumanities
#trove #glamworkbench #glam #digitalhumanities
There's a new version of the #Trove Newspaper & Gazette Harvester Python package โ now using v3 of the Trove API, and automatically generating an RO-Crate file to capture the details of each harvest. Use it as a library or a command line tool to harvest metadata, text, images & PDFs from thousands (even millions) of digitised newspaper articles.
Release details: https://github.com/wragge/trove-newspaper-harvester/releases/tag/v0.7.1
Full documentation: https://wragge.github.io/trove-newspaper-harvester/ #GLAM #digitalHumanities #histodons
#trove #glam #digitalhumanities #histodons
Aaand I've updated the #GLAMWorkbench's list of breaking changes in the #Trove API v3 with today's discoveries: https://glam-workbench.net/trove-api-v3/ #GLAM #digitalHumanities
#glamworkbench #trove #glam #digitalhumanities
So today's unexpected updates...
Trove Query Parser now v0.2.1: https://github.com/wragge/trove_query_parser/releases/tag/v0.2.1
Trove API Console updated to use the changed v3 facets `wordCount` and `illustrationType`, eg: https://troveconsole.herokuapp.com/v3/?url=https%3A//api.trove.nla.gov.au/v3/result%3Fq%3Dwragge%26category%3Dnewspaper%26encoding%3Djson%26l-illustrated%3Dtrue%26l-illustrationType%3DPhoto
#trove #glam #digitalhumanities
Accidently typed 'arse_query` instead of `parse_query` and that's about how I'm feeling about the #Trove API update at the moment...
๐ก Another undocumented, breaking change in v3 of the #Trove API. the `illtype` facet has been renamed `illustrationType`. Excuse while I now go waste an hour or so updating the Trove API Console, the trove-query-parser etc... #digitalHumanities