Lyrical Garfield 🎢 · @LyricalGarfield
814 followers · 4407 posts · Server masto.ai
Muhammad Junaid · @M_Junaid
4 followers · 25 posts · Server seo.chat

Technical SEO Tip: The New York Times uses a "Today" sitemap to link to their most recent articles. This strategy ensures that Googlebot always has a path to crawl their highest priority content:

#newyorktimes #seotips #sitemap #seostrategy #googlebot #TechnicalSEO #crawling #content #contentstrategy

Last updated 1 year ago

Lyrical Garfield 🎢 · @LyricalGarfield
814 followers · 4273 posts · Server masto.ai
Elias Dabbas :verified: · @elias
63 followers · 105 posts · Server seocommunity.social

Who's blocking OpenAI's GPTBot?

πŸ”΅ Use the robotstxt_to_df function to fetch robots files in bulk (one, five, ten thousand... ) in one go.
πŸ”΅ Run as many times as you want, for as many domains
πŸ”΅ Top domains list obtained from the Majestic (Majestic.com) Million dataset (thank you)
πŸ”΅ This was run for 10k domains (7.3k successful)
πŸ”΅ Get the code and data (and answer to the poll question):

bit.ly/45It98N

#advertools #datascience #ai #generativeAI #chatgpt #seo #crawling

Last updated 1 year ago

Lyrical Garfield 🎢 · @LyricalGarfield
814 followers · 4139 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
808 followers · 4072 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
790 followers · 3918 posts · Server masto.ai
ThatIsFreakish · @ThatIsFreakish
56 followers · 673 posts · Server baraag.net
Lyrical Garfield 🎢 · @LyricalGarfield
788 followers · 3842 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
783 followers · 3773 posts · Server masto.ai
Elias Dabbas :verified: · @elias
55 followers · 90 posts · Server seocommunity.social

πŸ•·πŸ•ΈπŸ•·πŸ•ΈπŸ•·πŸ•ΈπŸ•·
-LD errors on webpages:

reports those errors in the "jsonld_errors" column, and provides detailed error messages. For example:

Expecting ',' delimiter: line 11 column 437 (char 665)
Invalid control character at: line 27 column 450 (char 1728)
Invalid \\escape: line 27 column 466 (char 2096)

Simply filter for the columns "url" and "json_ld" to get them.

python3 -m pip install advertools

#json #advertools #datascience #seo #DigitalMarketing #crawling

Last updated 1 year ago

Lyrical Garfield 🎢 · @LyricalGarfield
769 followers · 3670 posts · Server masto.ai
Elias Dabbas :verified: · @elias
55 followers · 83 posts · Server seocommunity.social

Q: How many lines of code does it take to analyze segments of a website by any available metric?

A: 3

1. Open the crawl file
2. Split URLs into segments (path, dir_1, dir_2, ..)
3. Summarize segments by any metric (page size, latency, etc.)

Code and more examples here:

bit.ly/3OlaDwH

#advertools #pandas #datascience #crawling

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 81 posts · Server seocommunity.social

πŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈ

Here is a list of custom extraction XPath selectors to take your crawling to the next level.

bit.ly/3Di5TBO

This can be expanded to include other extractors and/or ones for popular sites/CMSes
Amazon, WP, Shopify etc.

If you have a favorite list that you would like to contribute or create please let me know.

#datascience #advertools #seo #crawling #python

Last updated 1 year ago

Lyrical Garfield 🎢 · @LyricalGarfield
756 followers · 3569 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
751 followers · 3532 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
748 followers · 3485 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
695 followers · 3342 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
677 followers · 3263 posts · Server masto.ai
Lyrical Garfield 🎢 · @LyricalGarfield
677 followers · 3243 posts · Server masto.ai