Have added test coverage for #StormCrawler
https://coveralls.io/github/DigitalPebble/storm-crawler?branch=master
As expected pretty low on average, partly explained by the fact that writing tests for Bolts is not trivial but at least we can now see where new tests should be added.
BTW #tests are great #opensource #contributions
#StormCrawler #tests #opensource #contributions
@krisfreedain
2.7.0 now used in #stormcrawler
https://github.com/DigitalPebble/storm-crawler/pull/1064
@davidshq
Definitely. To give an example, one of the top EU online retailers use #StormCrawler but won't publicise (or sponsor) it. Their legal department advised them not to because it would expose the way they use it and that is seen as a risk.
We're super excited about #StormCrawler being used by the #OpenWebSearch project.
Should we support tracing in #stormcrawler? Anyone using tools like Datadog when crawling to track slow URLs and bottlenecks?
Missing Link: Offener Web-Index soll Europa bei der Suche unabhängig machen
Mit der von der EU geförderten Entwicklung eines Open Web Index wollen Forscher die Dominanz von Google & Co. brechen und das menschliche Wissen verbreitern.
#SearchEngine #eu #OWI #EuropeanOpenWebIndex #OWSAI #OpenWebSearchAndAnalysisInfrastructure #OpenWebSearch #Suma #OSF #Serci #StormCrawler
Ferner #Gigablast #FindX #Quaero #Theseus #CommonCrawls
#searchengine #eu #owi #EuropeanOpenWebIndex #OWSAI #OpenWebSearchAndAnalysisInfrastructure #openwebsearch #suma #osf #serci #StormCrawler #gigablast #findx #quaero #Theseus #commoncrawls
We are pleased to announce that DigitalPebble Ltd is a partner of the OpenSearch Project.
In case you have missed it, #StormCrawler has a module for #OpenSearch since its latest release and hopefully there will be more good things to come!
Call to all #StormCrawler users: we will release a new version shortly so that people can benefit from the latest additions (#Opensearch) and improvements (#WARC). Any chance you could test some crawls with the latest code in the main branch and report any issues? Thanks
#StormCrawler #opensearch #warc
Call to all #StormCrawler users: we will release a new version shortly so that people can benefit from the latest additions (#Opensearch) and improvements (#WARC). Any chance you could test some crawls with the latest code in the main branch and report any issues? Thanks
#StormCrawler #opensearch #warc
Just committed a Maven #archetype for crawling with the #OpenSearch module of #StormCrawler.
#archetype #opensearch #StormCrawler
Just committed a Maven #archetype for crawling with the #OpenSearch module of #StormCrawler.
#archetype #opensearch #StormCrawler
Just opened a PR to port the content of the #Elasticsearch module of #StormCrawler to #OpenSearch
includes simple #dashboards
Feedback welcome as usual
#elasticsearch #StormCrawler #opensearch #dashboards
A very nice contribution to #StormCrawler improving the generation of #WARC files
#StormCrawler #warc #webarchiving
#StormCrawler 2.6 released
https://github.com/DigitalPebble/storm-crawler/releases/tag/2.6
Thanks to our contributors and users
#StormCrawler #opensource #webcrawl
There is a paradox with the sponsoring of #StormCrawler: the only organisations who have financially supported our work are very small, typically less than 5 employees. Meanwhile, larger ones (some of which have multi-million $£€ budgets and use SC on a large scale) do not donate at all, nor contribute any code. Most of them are also very reluctant to acknowledging publicly their use of it. Is it down to the bureaucratic hassle of convincing ppl up the decision ladder? What do you think?
Fancy trying the new version of the #StormCrawler archetype which uses #URLFrontier as a backend?