Technical SEO Tip: The New York Times uses a "Today" sitemap to link to their most recent articles. This strategy ensures that Googlebot always has a path to crawl their highest priority content:
#newyorktimes #seotips #sitemap #seostrategy #googlebot #technicalseo #crawling #content #contentstrategy
#newyorktimes #seotips #sitemap #seostrategy #googlebot #TechnicalSEO #crawling #content #contentstrategy
Who's blocking OpenAI's GPTBot?
π΅ Use the #advertools robotstxt_to_df function to fetch robots files in bulk (one, five, ten thousand... ) in one go.
π΅ Run as many times as you want, for as many domains
π΅ Top domains list obtained from the Majestic (Majestic.com) Million dataset (thank you)
π΅ This was run for 10k domains (7.3k successful)
π΅ Get the code and data (and answer to the poll question):
#advertools #datascience #ai #generativeAI #chatgpt #seo #crawling
"I'm coming for you. You can't run forever.~"
Based on this https://twitter.com/LEEDLEMANN/status/1685687948803715072
#nsfw #creepy #posing #spiderpose #crawlingpose #crawling #tongueout #freaky
#nsfw #creepy #posing #spiderpose #crawlingpose #crawling #tongueout #freaky
π·πΈπ·πΈπ·πΈπ·
#JSON-LD errors on webpages:
#advertools reports those errors in the "jsonld_errors" column, and provides detailed error messages. For example:
Expecting ',' delimiter: line 11 column 437 (char 665)
Invalid control character at: line 27 column 450 (char 1728)
Invalid \\escape: line 27 column 466 (char 2096)
Simply filter for the columns "url" and "json_ld" to get them.
python3 -m pip install advertools
#json #advertools #datascience #seo #DigitalMarketing #crawling
Q: How many lines of code does it take to analyze segments of a website by any available metric?
A: 3
1. Open the crawl file
2. Split URLs into segments (path, dir_1, dir_2, ..)
3. Summarize segments by any metric (page size, latency, etc.)
Code and more examples here:
#advertools #pandas #datascience #crawling
πΈοΈπ·οΈπΈοΈπ·οΈπΈοΈπ·οΈπΈοΈ
Here is a list of custom extraction XPath selectors to take your crawling to the next level.
This can be expanded to include other extractors and/or ones for popular sites/CMSes
Amazon, WP, Shopify etc.
If you have a favorite list that you would like to contribute or create please let me know.
#datascience #advertools #seo #crawling #python