I dig into the recently published "Name-based demographic inference and the unequal distribution of misrecognition", by Lockhard, King, and Munsch.
As the paper highlights, what seem like reasonable overall error rates of name-based demographic inference tools quickly degrade into unacceptable (and unpublicized) error rates when looking at intersections of demographics.
#DataScience #SocialScience #SystemicBias
https://www.juliaferraioli.com/blog/2023/influential-articles-jun/
#dataScience #socialScience #systemicbias
Save the Date! Berlin Buzzwords is coming back in 2024 on June 9th to 11th:
https://2023.berlinbuzzwords.de/2023/06/28/save-the-date-for-berlin-buzzwords-2024/ #conference #berlin #opensource #search #llm #machinelearning #datascience #bbuzz
#conference #Berlin #OpenSource #search #llm #machineLearning #dataScience #bbuzz
The second part of the event report by @anna__geller โ find out what her impressions and highlights were (and which session recordings you should watch now):
https://2023.berlinbuzzwords.de/2023/06/22/event-report-1st-day-highlights-takeaways-2/ #conference #berlin #search #llm #machinelearning #datascience #bbuzz
#conference #Berlin #search #llm #machineLearning #dataScience #bbuzz
This year @anna__geller wrote event reports for each conference day of #bbuzz 2023 summarizing her personal highlights. Read her review of day one now:
https://2023.berlinbuzzwords.de/2023/06/22/event-report-1st-day-highlights-takeaways/ #search #conference #datascience #machinelearning #llm #berlin #opensource
#bbuzz #search #conference #dataScience #machineLearning #llm #Berlin #OpenSource
The first reviews and recaps are in! See what other participants thought of #bbuzz 2023 and get their recommendations for talk recordings to watch:
https://2023.berlinbuzzwords.de/2023/06/30/recaps-voices/ #conference #opensource #search #datascience #berlin
#bbuzz #conference #OpenSource #search #dataScience #Berlin
Berlin Buzzwords was a huge success and a lot of fun. Have a look at our wrap-up post:
https://2023.berlinbuzzwords.de/2023/06/29/this-was-berlin-buzzwords-2023/ #bbuzz #conference #datascience #search #opensource #berlin
#bbuzz #conference #dataScience #search #OpenSource #Berlin
How we collect information matters as does how we analyze, share, and build upon it.
When people voluntarily give us data, they are giving us their trust.
#Data #DataScience #Ethics #SocialJustice #Equality
https://www.juliaferraioli.com/blog/2023/influential-articles-may/
#data #dataScience #ethics #socialJustice #equality
Yay! Sorting the #csv diff result by columns has just been merged into #qsv! ๐ฅณ
#data #CLI #dataScience #CSVDiff #QSV #csv
New release of #qsv, the CSV toolkit, is out! ๐
The `diff` command now sorts by line when no other sort option is given (before, order of diffresult was not stable across runs). ๐งฎ ๐
This release also introduces a new command `joinp` - the first command that is powered by pola.rs! ๐
Check the full release notes here:
https://github.com/jqnatividad/qsv/releases/tag/0.90.0
#CSV #CLI #Terminal #Rust #RustLang #OpenSource #Data #DataScience #Polars
#polars #dataScience #data #OpenSource #rustlang #Rust #terminal #CLI #csv #QSV
A new version of csv-diff is out (v0.1.0-beta.2) ๐
https://lib.rs/crates/csv-diff
This version adds a method, which allows you to sort your diff result by columns (it was already possible to sort by lines).
See the changelog for an example:
https://gitlab.com/janriemer/csv-diff/-/blob/8642a8a7ba14e22d076cee8c3f690c17f41d7528/CHANGELOG.md#010-beta2-19-february-2023
Sorting by columns will soon be integrated into qsv, the #CSV toolkit:
https://github.com/jqnatividad/qsv/issues/714
Thank you @jqnatividad for the idea of this feature! ๐
#CLI #QSV #dataScience #CSVDiff #rustlang #Rust #csv
Announcement ๐ ๐ฅณ
csv-diff will be integrated into qsv, the CSV toolkit soon! ๐ :ferris:
PR:
https://github.com/jqnatividad/qsv/pull/711
Comparing the majestic million dataset with 1,000,000 rows x 12 columns takes less than 800ms and only about 150mb of RAM!
With this, it is the fastest #CSV differ in the world!๐
See the following svg recording for a demo:
csv-diff:
https://gitlab.com/janriemer/csv-diff
#Rust #RustLang #Data #Diff #CsvDiff #Difference #Performance #DataScience #Oxidization
#oxidization #dataScience #performance #difference #CSVDiff #diff #data #rustlang #Rust #csv
๐ฅณ A new version of csv-diff has just been released! ๐
https://docs.rs/csv-diff/latest/csv_diff/
csv-diff is the fastests CSV-diffing library in the world - written in #Rust
It can compare two 1,000,000 rows x 9 columns CSVs in < 600ms!
Note that this is still a beta release and the library itself is still very young.
#RustLang #Release #CSV #CSVDiff #Performance #DataScience #Data #Diff #Difference #OpenSource #Crate
#crate #OpenSource #difference #diff #data #dataScience #performance #CSVDiff #csv #release #rustlang #Rust
@kdwarn Oh wow, this looks awesome! ๐
I definitely have a use for this right now.
Some similar tools that are also pretty neat and all written in #Rust:
csvlens
https://github.com/YS-L/csvlens
csview
https://github.com/wfxr/csview
Thank you for sharing.
#rustlang #dataScience #CLITool #CLI #csv #Rust
Did a very basic implementation of image segmentation, using K-means clustering: https://codeberg.org/matiaslavik/KMeansImageSegmentation
How it works:
1. Convert pixels to 6-dimensional points of colour (RGB) and position (XY)
2. Create N clusters at random positions in this 6D space
3. Find nearest cluster of each pixel
4. Recalculate cluster centre positions as average of points within each cluster
5. Repeat form 3 many times (until it "converges").
Note: RGB-space might not be the best choice here.
#Graphics #DataScience
RT @erinbugbee
Check out this new interactive article from Amazon's Machine Learning University: Logistic Regression! Written and developed by myself and @jdwlbr as part of my internship at @AmazonScience @awscloud.
#dataVisualization #dataScience #machineLearning
๐ Paul Tol's Notes on #colourblind- and contrast-friendly colour schemes, palettes, gradients etc. https://personal.sron.nl/~pault/
#colors #webdev #visualization #datascience
#dataScience #visualization #webdev #colors #colourblind
Meet DAT Linux: Ubuntu LTS Spin for Data Science Projects
https://www.debugpoint.com/dat-linux-review/
#linux #ubuntu #opensource #datascience
#dataScience #OpenSource #Ubuntu #Linux
RT @EduardoBonet
Available now on http://GitLab.com and soon on 15.1, images on @ProjectJupyter Notebooks are rendered on diffs! With this, we DSs can finally use code reviews to discuss the report its entirety: both the code and the conclusions
RT @PeerJCompSci
#GrimoireLab: A toolset for software development analytics - published in @PeerJCompSci https://bit.ly/3xjq2D1
#DataScience #SoftwareEngineering
#softwareEngineering #dataScience #grimoirelab