Elias Dabbas :verified: · @elias
64 followers · 109 posts · Server seocommunity.social

The recording of yesterday's discussion is available: scaling your use of using two techniques

1. Bulk prompts: creating prompt templates using rich structured data

2. Fine tuning: creating a very specific functionality by training the model to do one particular task by learning from hundreds/thousands of examples. An entity extraction app that also provides Wikipedia URLs of extracted entities.

bit.ly/486bb1m

#chatgpt #datascience #python #generativeAI #advertools #seo #llm

Last updated 1 year ago

Elias Dabbas :verified: · @elias
63 followers · 108 posts · Server seocommunity.social

Happy to announce a new cohort for my course:

Data Science with Python for SEO πŸŽ‰ πŸŽ‰ πŸŽ‰

πŸ”΅ For absolute beginners
πŸ”΅ Run, automate, and scale many SEO tasks with Python like crawling, analyzing XML sitemaps, text/keyword analysis
πŸ”΅ Intro to data manipulation and visualization skills
πŸ”΅ Get started with and
πŸ”΅ Make the transition from to
πŸ”΅ Online, live, cohort-based, interactive
πŸ”΅ Spans three days in one week

bit.ly/dsseo-course

#advertools #pandas #plotly #excel #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
63 followers · 107 posts · Server seocommunity.social

This week: Crawl with , scale with

Two techniques to scale your prompts

1. Generating prompts on a large scale by creating prompt templates + structured data (e.g. creating many product descriptions)

2. Using fine-tuning to train ChatGPT to perform a highly specialized task, using hundreds/thousands of training examples. I'll share details on my entity extraction app.

Join us Thursday:
lnkd.in/d2uyr_6U

#advertools #chatgpt #datascience #DigitalMarketing #python #structureddata #seo

Last updated 1 year ago

Elias Dabbas :verified: · @elias
63 followers · 105 posts · Server seocommunity.social

Who's blocking OpenAI's GPTBot?

πŸ”΅ Use the robotstxt_to_df function to fetch robots files in bulk (one, five, ten thousand... ) in one go.
πŸ”΅ Run as many times as you want, for as many domains
πŸ”΅ Top domains list obtained from the Majestic (Majestic.com) Million dataset (thank you)
πŸ”΅ This was run for 10k domains (7.3k successful)
πŸ”΅ Get the code and data (and answer to the poll question):

bit.ly/45It98N

#advertools #datascience #ai #generativeAI #chatgpt #seo #crawling

Last updated 1 year ago

Elias Dabbas :verified: · @elias
61 followers · 100 posts · Server seocommunity.social

Analyzing SERPs on a large scale with and

The recording is now available

πŸ”΅ Creating a large set of queries in an industry
πŸ”΅ Creating query variants
πŸ”΅ Running the requests in bulk
πŸ”΅ Running the requests across various dimensions (country, language, etc)
πŸ”΅ Visualizing the results with a heatmap

bit.ly/3E321Fi

#python #advertools #datascience #seo #datavisualization

Last updated 1 year ago

Elias Dabbas :verified: · @elias
57 followers · 98 posts · Server seocommunity.social

What's the longest regular expression that I wrote?

140,820 characters (one hundred and forty thousand)

It's a regex for finding emojis (any of them).

Here's how to create it, with general explanations on regex in general:

bit.ly/3qpuy4t

We'll discuss more text processing and analysis techniques in the office hours tomorrow if you'd like to join.

#advertools #datascience #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
57 followers · 98 posts · Server seocommunity.social

Log file analysis

πŸ”΅ Parse file fields IP, datetime, request, method, status, size, referer, user-agent
πŸ”΅ Compress to parquet
πŸ”΅ Bulk reverse DNS lookup for IPs
πŸ”΅ Split request & referer URLs into their components
πŸ”΅ Parse user-agents into their components (OS, version, device name, etc)
πŸ”΅ 7-8 fields become hundreds of columns
πŸ”΅ Generate any report, ask any question about any combination of those elements

Example
bit.ly/3qnfLr5


#advertools #seo #datrascience #digitalanalytics #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 91 posts · Server seocommunity.social

1/2

Happy to announce my course:

Data Science with Python for SEO πŸŽ‰ πŸŽ‰ πŸŽ‰

πŸ”΅ For absolute beginners
πŸ”΅ Make a leap in your data skills
πŸ”΅ Run, automate, and scale many SEO tasks with Python like crawling, analyzing XML sitemaps, text/keyword analysis
πŸ”΅ In depth intro to data manipulation and visualization skills
πŸ”΅ Get started with and
πŸ”΅ Make the transition from Excel to Python
πŸ”΅ Online, live, cohort-based, interactive

bit.ly/dsseo-course

#advertools #pandas #plotly

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 90 posts · Server seocommunity.social

πŸ•·πŸ•ΈπŸ•·πŸ•ΈπŸ•·πŸ•ΈπŸ•·
-LD errors on webpages:

reports those errors in the "jsonld_errors" column, and provides detailed error messages. For example:

Expecting ',' delimiter: line 11 column 437 (char 665)
Invalid control character at: line 27 column 450 (char 1728)
Invalid \\escape: line 27 column 466 (char 2096)

Simply filter for the columns "url" and "json_ld" to get them.

python3 -m pip install advertools

#json #advertools #datascience #seo #DigitalMarketing #crawling

Last updated 1 year ago

Elias Dabbas :verified: · @elias
54 followers · 88 posts · Server seocommunity.social

πŸ•ΈπŸ•·πŸ•ΈπŸ•·πŸ•ΈπŸ•·πŸ•Έ
+
@JupyterNaas
= Cloud

πŸ”΅ Low code
πŸ”΅ Save crawl templates to re-run multiple times
πŸ”΅ Create a separate template for each website
πŸ”΅ Run multiple crawls at the same time
πŸ”΅ Enjoy!

bit.ly/42YuOFC

#advertools #seo #crawler #datascience #python #DigitalMarketing #digitalanalytics

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 85 posts · Server seocommunity.social

πŸ•·πŸ•ΈπŸ•·πŸ•ΈπŸ•·
My website has ten pages:

Title tag lengths: [10, 10, 10, 10, 10, 130, 130, 130, 130, 130]
Average title length: 70 characters
Good, right?
Wrong.

πŸ”΅ Show length distributions
πŸ”΅ Show counts per bin [0, 10], [11, 20], etc...
πŸ”΅ Interactive, downloadable, emailable, HTML chart
πŸ”΅ Show shortest/longest desired lengths with vertical guides
πŸ”΅ Hover to see URL and title

bit.ly/3OnAOmB

Suggestions?

#datascience #advertools #seo #DigitalMarketing #digitalanalytics #datavisualization

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 85 posts · Server seocommunity.social

+ Naas.ai = Automated bulk status code checker & email notifier

πŸ”΅ Runs in bulk fast & light
πŸ”΅ Runs on Naas.ai (zero setup)
πŸ”΅ Low code: start with the notebook we created, configure URLs, email notification settings, how often to run the checker, where to get URLs from, etc.
πŸ”΅ Get response headers
πŸ”΅ Improve: report bugs, issues, suggest changes

bit.ly/42YuOFC

Use notebook: Advertools_Check_status_code_and_Send_notifications

#advertools #datascience #seo #automation #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 83 posts · Server seocommunity.social

Q: How many lines of code does it take to analyze segments of a website by any available metric?

A: 3

1. Open the crawl file
2. Split URLs into segments (path, dir_1, dir_2, ..)
3. Summarize segments by any metric (page size, latency, etc.)

Code and more examples here:

bit.ly/3OlaDwH

#advertools #pandas #datascience #crawling

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 82 posts · Server seocommunity.social

office hours - 3

Thursday, same time, same link:

Using the parquet file format to
1. Reduce the size of crawl files
2. Speed up the analysis process

Join if you're interested:
bit.ly/adv-office-hours

#advertools #datascience #seo #DigitalMarketing #digitalanalyticw #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 81 posts · Server seocommunity.social

πŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈ

Here is a list of custom extraction XPath selectors to take your crawling to the next level.

bit.ly/3Di5TBO

This can be expanded to include other extractors and/or ones for popular sites/CMSes
Amazon, WP, Shopify etc.

If you have a favorite list that you would like to contribute or create please let me know.

#datascience #advertools #seo #crawling #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 78 posts · Server seocommunity.social

πŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈπŸ•·οΈπŸ•ΈοΈ

Analyzing links of a crawled website begins with organizing them in a "tidy" (long form) DataFrame, allowing you to:

πŸ”΅ Get link URL, anchor text, & nofollow tag
πŸ”΅ Split internal/external links to easily get inlinks & outlinks
πŸ”΅ Run network analysis on internal links (pagerank, betweenness centrality, etc)
πŸ”΅ Analyze anchor text

This function takes the links from an crawl DataFrame and organizes them for easier analysis

bit.ly/3Dd3F6D

#advertools #datascience #seo #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 77 posts · Server seocommunity.social

office hours - episode 2

Today at 14:00 GMT

We'll discuss redirects, and how to get and analyze them.

Join here if you're interested:

bit.ly/adv-office-hours

#advertools #datascience #seo #digitalanalytics #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
55 followers · 73 posts · Server seocommunity.social

office hours - 2

Same time (Thursday), same link. Sign up here if you haven't

bit.ly/adv-office-hours

A better way to analyze redirects on a website

with full redirect chains, status codes & the logic behind them.

(nudged by Nitin Manchanda)

#advertools #datascience #seo #python

Last updated 1 year ago

Elias Dabbas :verified: · @elias
52 followers · 70 posts · Server seocommunity.social

Country flags can make your charts/reports easier to read, & can give more space vs full country names.

Just released a simple new function flag() which converts a 2 or 3-letter country code or country name to its respective flag

python3 -m pip install --upgrade adviz

bit.ly/adviz-flag

#adviz #advertools #datascience #datavisualization #python #plotly

Last updated 1 year ago

Elias Dabbas :verified: · @elias
52 followers · 69 posts · Server seocommunity.social

Happy to announce
office hours

Free
Live coding (you'll also code, make charts, analyze data)
For beginners (advanced users more than welcome)
No recording
1st episode - Crawling: July 6th

bit.ly/adv-office-hours

#advertools #datascience #seo #python

Last updated 1 year ago