julia ferraioli :cc_by: · @juliaferraioli
3460 followers · 830 posts · Server floss.social

I dig into the recently published "Name-based demographic inference and the unequal distribution of misrecognition", by Lockhard, King, and Munsch.

As the paper highlights, what seem like reasonable overall error rates of name-based demographic inference tools quickly degrade into unacceptable (and unpublicized) error rates when looking at intersections of demographics.

juliaferraioli.com/blog/2023/i

#dataScience #socialScience #systemicbias

Last updated 1 year ago

Berlin Buzzwords · @berlinbuzzwords
244 followers · 176 posts · Server floss.social
Berlin Buzzwords · @berlinbuzzwords
244 followers · 176 posts · Server floss.social

The second part of the event report by @anna__geller โ€“ find out what her impressions and highlights were (and which session recordings you should watch now):
2023.berlinbuzzwords.de/2023/0

#conference #Berlin #search #llm #machineLearning #dataScience #bbuzz

Last updated 1 year ago

Berlin Buzzwords · @berlinbuzzwords
240 followers · 173 posts · Server floss.social

This year @anna__geller wrote event reports for each conference day of 2023 summarizing her personal highlights. Read her review of day one now:
2023.berlinbuzzwords.de/2023/0

#bbuzz #search #conference #dataScience #machineLearning #llm #Berlin #OpenSource

Last updated 1 year ago

Berlin Buzzwords · @berlinbuzzwords
240 followers · 172 posts · Server floss.social

The first reviews and recaps are in! See what other participants thought of 2023 and get their recommendations for talk recordings to watch:
2023.berlinbuzzwords.de/2023/0

#bbuzz #conference #OpenSource #search #dataScience #Berlin

Last updated 1 year ago

Berlin Buzzwords · @berlinbuzzwords
239 followers · 170 posts · Server floss.social
julia ferraioli :cc_by: · @juliaferraioli
3383 followers · 800 posts · Server floss.social

How we collect information matters as does how we analyze, share, and build upon it.

When people voluntarily give us data, they are giving us their trust.

juliaferraioli.com/blog/2023/i

#data #dataScience #ethics #socialJustice #equality

Last updated 1 year ago

Jan :rust: :ferris: · @janriemer
460 followers · 1737 posts · Server floss.social

Yay! Sorting the diff result by columns has just been merged into ! ๐Ÿฅณ

github.com/jqnatividad/qsv/pul

#data #CLI #dataScience #CSVDiff #QSV #csv

Last updated 2 years ago

Jan :rust: :ferris: · @janriemer
456 followers · 1708 posts · Server floss.social

New release of , the CSV toolkit, is out! ๐ŸŽ‰

The `diff` command now sorts by line when no other sort option is given (before, order of diffresult was not stable across runs). ๐Ÿงฎ ๐Ÿ“ƒ

This release also introduces a new command `joinp` - the first command that is powered by pola.rs! ๐Ÿš€

Check the full release notes here:
github.com/jqnatividad/qsv/rel

#polars #dataScience #data #OpenSource #rustlang #Rust #terminal #CLI #csv #QSV

Last updated 2 years ago

Jan :rust: :ferris: · @janriemer
453 followers · 1625 posts · Server floss.social

A new version of csv-diff is out (v0.1.0-beta.2) ๐ŸŽ‰

lib.rs/crates/csv-diff

This version adds a method, which allows you to sort your diff result by columns (it was already possible to sort by lines).

See the changelog for an example:
gitlab.com/janriemer/csv-diff/

Sorting by columns will soon be integrated into qsv, the toolkit:
github.com/jqnatividad/qsv/iss

Thank you @jqnatividad for the idea of this feature! ๐Ÿ’š

#CLI #QSV #dataScience #CSVDiff #rustlang #Rust #csv

Last updated 2 years ago

Jan :rust: :ferris: · @janriemer
412 followers · 915 posts · Server floss.social

Announcement ๐ŸŽ‰ ๐Ÿฅณ

csv-diff will be integrated into qsv, the CSV toolkit soon! ๐ŸŽ‰ :ferris:

PR:
github.com/jqnatividad/qsv/pul

Comparing the majestic million dataset with 1,000,000 rows x 12 columns takes less than 800ms and only about 150mb of RAM!
With this, it is the fastest differ in the world!๐Ÿš€

See the following svg recording for a demo:

gist.githubusercontent.com/jan

csv-diff:
gitlab.com/janriemer/csv-diff

#oxidization #dataScience #performance #difference #CSVDiff #diff #data #rustlang #Rust #csv

Last updated 2 years ago

Jan :rust: :ferris: · @janriemer
398 followers · 573 posts · Server floss.social

๐Ÿฅณ A new version of csv-diff has just been released! ๐Ÿš€

docs.rs/csv-diff/latest/csv_di

csv-diff is the fastests CSV-diffing library in the world - written in

It can compare two 1,000,000 rows x 9 columns CSVs in < 600ms!

Note that this is still a beta release and the library itself is still very young.

#crate #OpenSource #difference #diff #data #dataScience #performance #CSVDiff #csv #release #rustlang #Rust

Last updated 2 years ago

Jan :rust: :ferris: · @janriemer
395 followers · 532 posts · Server floss.social

@kdwarn Oh wow, this looks awesome! ๐Ÿ˜

I definitely have a use for this right now.

Some similar tools that are also pretty neat and all written in :

csvlens
github.com/YS-L/csvlens

csview
github.com/wfxr/csview

Thank you for sharing.

#rustlang #dataScience #CLITool #CLI #csv #Rust

Last updated 2 years ago

Matia๐•ค · @sigsegv
354 followers · 553 posts · Server floss.social

Did a very basic implementation of image segmentation, using K-means clustering: codeberg.org/matiaslavik/KMean

How it works:
1. Convert pixels to 6-dimensional points of colour (RGB) and position (XY)
2. Create N clusters at random positions in this 6D space
3. Find nearest cluster of each pixel
4. Recalculate cluster centre positions as average of points within each cluster
5. Repeat form 3 many times (until it "converges").

Note: RGB-space might not be the best choice here.

#dataScience #graphics

Last updated 2 years ago

Jesus M. Gonzalez-Barahona · @jgbarah
202 followers · 6428 posts · Server floss.social

RT @milos_agathon
I mapped % of people employed in science and tech in Europe ๐ŸŒ

#maps #dataViz #dataScience #Rstats #IT #tech #science

Last updated 2 years ago

Jesus M. Gonzalez-Barahona · @jgbarah
202 followers · 6428 posts · Server floss.social

RT @erinbugbee
Check out this new interactive article from Amazon's Machine Learning University: Logistic Regression! Written and developed by myself and @jdwlbr as part of my internship at @AmazonScience @awscloud.

mlu-explain.github.io/logistic

#dataVisualization #dataScience #machineLearning

Last updated 2 years ago

Fabian N. T. ๐Ÿฆ† · @fabian
204 followers · 911 posts · Server floss.social

๐Ÿ”– Paul Tol's Notes on - and contrast-friendly colour schemes, palettes, gradients etc. personal.sron.nl/~pault/

#dataScience #visualization #webdev #colors #colourblind

Last updated 2 years ago

DebugPoint - Linux &Dev Portal · @debugpoint
208 followers · 737 posts · Server floss.social

Meet DAT Linux: Ubuntu LTS Spin for Data Science Projects
debugpoint.com/dat-linux-revie

#dataScience #OpenSource #Ubuntu #Linux

Last updated 2 years ago

Jesus M. Gonzalez-Barahona · @jgbarah
203 followers · 6428 posts · Server floss.social

RT @EduardoBonet
Available now on GitLab.com and soon on 15.1, images on @ProjectJupyter Notebooks are rendered on diffs! With this, we DSs can finally use code reviews to discuss the report its entirety: both the code and the conclusions

#dataScience #Python

Last updated 3 years ago

Jesus M. Gonzalez-Barahona · @jgbarah
204 followers · 6428 posts · Server floss.social

RT @PeerJCompSci
: A toolset for software development analytics - published in @PeerJCompSci bit.ly/3xjq2D1

#softwareEngineering #dataScience #grimoirelab

Last updated 3 years ago