FedSearch - Federated network search engine

22 · @22

601 followers · 6764 posts · Server octodon.social

One of my favorite views of the stock market is the sector/sub-sector daily/weekly/etc. returns visualization at https://finviz.com/map.ashx. Many bad "just-so" stories about a stock's behavior can be dispelled by looking at what that stock's peers did (see https://octodon.social/@22/109327333967001573 about that LLY insulin fake tweet nonsense).

But #FinViz also has the only public open-source database I've found that breaks down 5000+ US publicly-traded companies into sectors and sub-sectors, which I didn't discover until I created a tiny #gitscraping project for it (thanks to @simon for being a passionate promoter of this trick).

https://github.com/fasiha/finviz-git-scraper#all-us-sector-and-subsector-breakdowns has the collapsible list of sectors, sub-sectors, and stocks . This is the first time I've seen all these in one view and it's cool because I've always been interested in the variety of things people make and jobs people do. It's cool to see the consumer sector broken down into alcohol, non-alcohol beverages, candy-maker ("confectioners" so fancy!), discount stores, education, farm products… Reminds me a lot of that Grain Into Gold booklet for role-playing gamers except updated to modern life.

#finviz #gitscraping

Last updated 3 years ago

Original post

Jeremia Kimelman · @jeremiak

221 followers · 91 posts · Server journa.host

In November, the most recent month of data, parole was granted in 14.1% of cases - greater than the 13.9% average rate for this year but less than the 14.8% of cases granted in October.

8,206 parole hearings have occurred this year and 1,143 have resulted in parole. In all of last year, there were 8,739 hearings and 1,419 were granted.

#gitscraping #parole #opendata

Last updated 3 years ago

Original post

Jeremia Kimelman · @jeremiak

221 followers · 90 posts · Server journa.host

New monthly data was released by the #California Board of Parole Hearings and scraped by a bot I maintain.

It's every parole hearing that is scheduled with the board and the outcome - it can help us understand if parole is granted at different rates over time.

I was tweeting out when the new data drops but I guess that'll happen on Mastodown now :)

So what does it say?

https://github.com/jeremiak/ca-bph-hearing-results

#gitscraping #parole #opendata

#California #gitscraping #parole #opendata

Last updated 3 years ago

Original post

Jeremia Kimelman · @jeremiak

73 followers · 42 posts · Server journa.host

And I have a few #gitscraping projects where I want the deploy to be triggered by the scraping workflow, which Github doesn't enable automatically to avoid recursive actions.

This is in the Github docs but it took me until this project to internalize it: you have to use an access token that is called something other than `GITHUB_TOKEN`: https://github.com/jeremiak/motley-fool-earning-transcripts/blob/main/.github/workflows/scrape.yml#L14

That way whenever the scraper runs and finds new data (which it commits), a deploy will be triggered. Woot!

#gitscraping

Last updated 3 years ago

Original post

Jeremia Kimelman · @jeremiak

62 followers · 21 posts · Server journa.host

And I've got a #GitScraping project to create an archive of these Forest Service rasters since I can't see to find any historical versions available online.

Would be super stoked to be wrong about that though.

https://github.com/jeremiak/usfs-dead-fuel-moisture

#gitscraping

Last updated 3 years ago

Original post

Elizabeth Viera · @shinyeliza

4 followers · 23 posts · Server mastodon.social

Open media

For #Caturday I’m going to share the #GitScraping project I did to find my cats! Every day I ran a short code snippet that got an updated list of cats at my local shelter, which then made a “commit” that told me which cats were added and removed that day. When Toast and Fondue were added, we knew they were the perfect cats for us and contacted the organization very quickly! https://www.elizabethviera.com/catfind

#caturday #gitscraping

Last updated 3 years ago

Original post

Jeremia Kimelman · @jeremiak

9 followers · 8 posts · Server journa.host

How do y'all generally structure your #GitScraping projects?

#gitscraping

Last updated 3 years ago

Original post

Jeremia Kimelman · @jeremiak

8 followers · 5 posts · Server journa.host

But I'm not entirely sure what the best way to store the data in a #GitScraping project is.

In the NWS scraper the same file is overwritten each time it runs.

But sometimes I've had the scraper append any new data to an existing JSON file, like this one with CA's board of parole hearings:
https://github.com/jeremiak/ca-bph-hearing-results

#gitscraping

Last updated 3 years ago

Original post

Jeremia Kimelman · @jeremiak

8 followers · 4 posts · Server journa.host

I've been onboard the #GitScraping since reading @simon's blog posts on the subject and I've found a lot of use for them to create quick, updating archives of a government dataset.

The technique is simple enough that I was able to set one up yesterday almost entirely w/ the Github web interface:
https://github.com/jeremiak/nws-hazards-warnings

It scrapes the National Weather Service hazards & warnings then stores it in a JSON file.

I'm working on having it use `git-history` to create a datasette instance of the data.

#gitscraping

Last updated 3 years ago

Original post