What are your favorite / the best #WebCrawlers for broad / #WebScale #crawling?
I've built a list but am looking for anything I missed: https://github.com/davidshq/awesome-search-engines/blob/main/WebCrawlers.md
Main options I've found include #Apache #Nutch, #StormCrawler, #Scrapy, #Norconex, #PulsarR, #Heritrix, and #sparkler
#WebCrawlers #webscale #crawling #apache #nutch #stormcrawler #scrapy #norconex #pulsarr #heritrix #sparkler #question #search #searchengines
latest update to awesome search engines is here:
https://github.com/davidshq/awesome-search-engines
Biggest news is I've added a page for #BuildingSearchEngines - it's very partial at the moment but includes sections on #SearchEngines (open source), #WebCrawlers, and #CommonCrawl.
Know of other web-scale search engines, crawlers, etc. I should be aware of?
#buildingsearchengines #searchengines #WebCrawlers #commoncrawl
Katana is a crawling and spidering framework
#WebCrawlers #tools #scrapers #apps