The recording of yesterday's discussion is available: scaling your use of #ChatGPT using two techniques
1. Bulk prompts: creating prompt templates using rich structured data
2. Fine tuning: creating a very specific functionality by training the model to do one particular task by learning from hundreds/thousands of examples. An entity extraction app that also provides Wikipedia URLs of extracted entities.
#chatgpt #datascience #python #generativeAI #advertools #seo #llm
Happy to announce a new cohort for my course:
Data Science with Python for SEO π π π
π΅ For absolute beginners
π΅ Run, automate, and scale many SEO tasks with Python like crawling, analyzing XML sitemaps, text/keyword analysis
π΅ Intro to data manipulation and visualization skills
π΅ Get started with #advertools #pandas and #plotly
π΅ Make the transition from #Excel to #Python
π΅ Online, live, cohort-based, interactive
π΅ Spans three days in one week
#advertools #pandas #plotly #excel #python
This week: Crawl with #advertools, scale with #ChatGPT
Two techniques to scale your prompts
1. Generating prompts on a large scale by creating prompt templates + structured data (e.g. creating many product descriptions)
2. Using fine-tuning to train ChatGPT to perform a highly specialized task, using hundreds/thousands of training examples. I'll share details on my entity extraction app.
Join us Thursday:
https://lnkd.in/d2uyr_6U
#advertools #chatgpt #datascience #DigitalMarketing #python #structureddata #seo
Who's blocking OpenAI's GPTBot?
π΅ Use the #advertools robotstxt_to_df function to fetch robots files in bulk (one, five, ten thousand... ) in one go.
π΅ Run as many times as you want, for as many domains
π΅ Top domains list obtained from the Majestic (Majestic.com) Million dataset (thank you)
π΅ This was run for 10k domains (7.3k successful)
π΅ Get the code and data (and answer to the poll question):
#advertools #datascience #ai #generativeAI #chatgpt #seo #crawling
Analyzing SERPs on a large scale with #Python and #advertools
The recording is now available
π΅ Creating a large set of queries in an industry
π΅ Creating query variants
π΅ Running the requests in bulk
π΅ Running the requests across various dimensions (country, language, etc)
π΅ Visualizing the results with a heatmap
#python #advertools #datascience #seo #datavisualization
What's the longest regular expression that I wrote?
140,820 characters (one hundred and forty thousand)
It's a regex for finding emojis (any of them).
Here's how to create it, with general explanations on regex in general:
We'll discuss more text processing and analysis techniques in the #advertools office hours tomorrow if you'd like to join.
#advertools #datascience #python
Log file analysis
π΅ Parse file fields IP, datetime, request, method, status, size, referer, user-agent
π΅ Compress to parquet
π΅ Bulk reverse DNS lookup for IPs
π΅ Split request & referer URLs into their components
π΅ Parse user-agents into their components (OS, version, device name, etc)
π΅ 7-8 fields become hundreds of columns
π΅ Generate any report, ask any question about any combination of those elements
Example
https://bit.ly/3qnfLr5
#advertools #seo #datrascience #digitalanalytics #python
1/2
Happy to announce my course:
Data Science with Python for SEO π π π
π΅ For absolute beginners
π΅ Make a leap in your data skills
π΅ Run, automate, and scale many SEO tasks with Python like crawling, analyzing XML sitemaps, text/keyword analysis
π΅ In depth intro to data manipulation and visualization skills
π΅ Get started with #advertools #pandas and #plotly
π΅ Make the transition from Excel to Python
π΅ Online, live, cohort-based, interactive
π·πΈπ·πΈπ·πΈπ·
#JSON-LD errors on webpages:
#advertools reports those errors in the "jsonld_errors" column, and provides detailed error messages. For example:
Expecting ',' delimiter: line 11 column 437 (char 665)
Invalid control character at: line 27 column 450 (char 1728)
Invalid \\escape: line 27 column 466 (char 2096)
Simply filter for the columns "url" and "json_ld" to get them.
python3 -m pip install advertools
#json #advertools #datascience #seo #DigitalMarketing #crawling
πΈπ·πΈπ·πΈπ·πΈ
#advertools +
@JupyterNaas
= Cloud #SEO #Crawler
π΅ Low code
π΅ Save crawl templates to re-run multiple times
π΅ Create a separate template for each website
π΅ Run multiple crawls at the same time
π΅ Enjoy!
#advertools #seo #crawler #datascience #python #DigitalMarketing #digitalanalytics
π·πΈπ·πΈπ·
My website has ten pages:
Title tag lengths: [10, 10, 10, 10, 10, 130, 130, 130, 130, 130]
Average title length: 70 characters
Good, right?
Wrong.
π΅ Show length distributions
π΅ Show counts per bin [0, 10], [11, 20], etc...
π΅ Interactive, downloadable, emailable, HTML chart
π΅ Show shortest/longest desired lengths with vertical guides
π΅ Hover to see URL and title
Suggestions?
#DataScience #advertools #SEO #DigitalMarketing #DigitalAnalytics #DataVisualization
#datascience #advertools #seo #DigitalMarketing #digitalanalytics #datavisualization
#advertools + Naas.ai = Automated bulk status code checker & email notifier
π΅ Runs in bulk fast & light
π΅ Runs on Naas.ai (zero setup)
π΅ Low code: start with the notebook we created, configure URLs, email notification settings, how often to run the checker, where to get URLs from, etc.
π΅ Get response headers
π΅ Improve: report bugs, issues, suggest changes
Use notebook: Advertools_Check_status_code_and_Send_notifications
#advertools #datascience #seo #automation #python
Q: How many lines of code does it take to analyze segments of a website by any available metric?
A: 3
1. Open the crawl file
2. Split URLs into segments (path, dir_1, dir_2, ..)
3. Summarize segments by any metric (page size, latency, etc.)
Code and more examples here:
#advertools #pandas #datascience #crawling
#advertools office hours - 3
Thursday, same time, same link:
Using the parquet file format to
1. Reduce the size of crawl files
2. Speed up the analysis process
Join if you're interested:
https://bit.ly/adv-office-hours
#DataScience #SEO #DigitalMarketing #DigitalAnalyticw #Python
#advertools #datascience #seo #DigitalMarketing #digitalanalyticw #python
πΈοΈπ·οΈπΈοΈπ·οΈπΈοΈπ·οΈπΈοΈ
Here is a list of custom extraction XPath selectors to take your crawling to the next level.
This can be expanded to include other extractors and/or ones for popular sites/CMSes
Amazon, WP, Shopify etc.
If you have a favorite list that you would like to contribute or create please let me know.
#datascience #advertools #seo #crawling #python
πΈοΈπ·οΈπΈοΈπ·οΈπΈοΈπ·οΈπΈοΈ
Analyzing links of a crawled website begins with organizing them in a "tidy" (long form) DataFrame, allowing you to:
π΅ Get link URL, anchor text, & nofollow tag
π΅ Split internal/external links to easily get inlinks & outlinks
π΅ Run network analysis on internal links (pagerank, betweenness centrality, etc)
π΅ Analyze anchor text
This function takes the links from an #advertools crawl DataFrame and organizes them for easier analysis
#advertools #datascience #seo #python
#advertools office hours - episode 2
Today at 14:00 GMT
We'll discuss redirects, and how to get and analyze them.
Join here if you're interested:
#advertools #datascience #seo #digitalanalytics #python
#advertools office hours - 2
Same time (Thursday), same link. Sign up here if you haven't
https://bit.ly/adv-office-hours
A better way to analyze redirects on a website
with full redirect chains, status codes & the logic behind them.
(nudged by Nitin Manchanda)
#advertools #datascience #seo #python
Country flags can make your charts/reports easier to read, & can give more space vs full country names.
Just released a simple new #adviz function flag() which converts a 2 or 3-letter country code or country name to its respective flag
python3 -m pip install --upgrade adviz
#adviz #advertools #datascience #datavisualization #python #plotly
Happy to announce
#advertools office hours
Free
Live coding (you'll also code, make charts, analyze data)
For beginners (advanced users more than welcome)
No recording
1st episode - Crawling: July 6th
#advertools #datascience #seo #python