Pierre Monnin · @piermonn
29 followers · 23 posts · Server sigmoid.social

PyGraft will help you generate new and tailored benchmark KG useful in various fields including but not limited to , , , , repairing, pattern mining, reasoning, scalability studies, etc.

Feel free to download, star, fork, share and tell us about any usage you foresee! We welcome all contributions or ideas to improve PyGraft! Looking forward to feedback from and other communities!

#datasets #NeuroSymbolicAI #linkprediction #nodeclassification #nodeclustering #ontology #semanticweb #machinelearning

Last updated 1 year ago

Jo Tiffe · @Jo_designart
7 followers · 80 posts · Server digitalcourage.social

Heute um 17:30h ist die Finissage von techturk.form-f.art. Kommt vorbei: Lausitzer Str. 10, Sonnenhof.
Es gibt neues zu sehen + zu hören.

#datamining #datasets #algorithms #machinelearing #ml #foto #techturk

Last updated 1 year ago

New Submissions to TMLR · @tmlrsub
208 followers · 773 posts · Server sigmoid.social

Unlocking Unlabeled Data: Ensemble Learning with the Hui-Walter Paradigm for Performance Estimation in Online and Static Settings

openreview.net/forum?id=s8TsIP

#ensemble #datasets #dataset

Last updated 1 year ago

Jo Tiffe · @Jo_designart
7 followers · 80 posts · Server digitalcourage.social

Es war eine total schöne Vernissage, falls noch jemand aus Berlin Lust hat vorbei zu kommen: am 7.Sept., 17:30 ist Finissage, mehr noch mehr Arbeiten.

Foto:© Volker Hoffmann

#stablediffusion #berlin #dataminig #datasets #algorithms #maerchenvon5eeigeln #fairytaleof5seaurchins #techturk

Last updated 1 year ago

Miguel Afonso Caetano · @remixtures
727 followers · 2825 posts · Server tldr.nettime.org

: "Ultimately, when we think about what a more ethical development of this technology could look like, I think it looks a lot like machine learning before 2015, or before the last 10 years. It looks like building systems for specific purposes with specific scopes and with specific goals. Then you can start to ask questions like “What values do we want the system to reinforce?” “What data makes sense to use here?” It’s the opposite of the “move fast and break things” philosophy. It’s really hard to advocate for that kind of work because it’s not flashy. It’s frustrating because I think none of this was inevitable. We’ve been talking about it in the field for ages, these issues with general or universal AI systems. I think one of the biggest arguments that we make at DAIR is that we don’t need to be building systems this way, we do not need to be making general purpose AI, we don’t need to be making these kinds of generative AI systems. It’s really hard to go down this line of work without independent funding, because that’s not where the money is right now."

cchange.xyz/dylan-baker/

#ai #generativeAI #ml #algorithms #datasets

Last updated 1 year ago

Published papers at TMLR · @tmlrpub
564 followers · 596 posts · Server sigmoid.social

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series

Jean-Christophe Gagnon-Audet, Kartik Ahuja, Mohammad Javad Darvishi Bayazi et al.

Action editor: Antoni Chan.

openreview.net/forum?id=mvftzo

#generalization #generalize #datasets

Last updated 1 year ago

IOB Friday fresh ink
GoodFibes: An R Package for The Detection of Fibers from Scans
J H Arbour

doi.org/10.1093/iob/obad030

"We hope that this will increase the number of comparative and studies incorporating these rich and functionally important .

#muscle #dicect #evolutionary #datasets

Last updated 1 year ago

Tarun Gupta · @tarungupta
5 followers · 125 posts · Server me.dm

🧠💡 Level up your data skills! Dive into the Curse of Dimensionality and learn how high dimensionality affects data analysis. 📈📊 Don't let the complexity stop you from becoming a great or Engineer! 🚀💪

youtu.be/Br-RfOdopR8

#datascientist #ml #datasets #curseofdimensionality

Last updated 1 year ago

NFDI4Microbiota · @NFDI4Microbiota
42 followers · 19 posts · Server nfdi.social

❓Confused about , , and and how they can help you and your ?
❗️Begin with our hands-on workshop about the description of biological data.
🗓️16-17 October in Aachen. Register by 20 September: tinyurl.com/biome01

#fair #ontologies #metadata #datasets #microbiology #nfdirocks

Last updated 1 year ago

New Submissions to TMLR · @tmlrsub
203 followers · 731 posts · Server sigmoid.social

Learning domain-specific causal discovery from time series

openreview.net/forum?id=JFaZ94

#causal #causality #datasets

Last updated 1 year ago

SPAAM-community · @spaam_community
16 followers · 11 posts · Server genomic.social

💻 AMDirT facilitates automated metadata curation and data validation, as well as rapid data filtering and downloading. Together, both standardised metadata and tooling will help towards easier incorporation and reuse of public into future analyses. 📊

#ancient #metagenomic #datasets

Last updated 1 year ago

Miguel Afonso Caetano · @remixtures
670 followers · 2632 posts · Server tldr.nettime.org

: "Flickr Faces High-Quality (FFHQ) is a dataset of Flickr face photos originally created for face generation research by NVIDIA in 2019. It includes 70,000 total face images from 67,646 unique Flickr photos. Since its release the dataset has become of the most widely used face datasets for a wide variety of research and commercial applications ranging from face recognition to oral region gender recognition. The images in FFHQ were taken from Flickr users without explicit consent and were selected because they contained high quality face images with a permission Creative Commons license. Many of the images contain infants and children and over 10% of the dataset no longer exists on the original source yet NVIDIA, a $1T company, continues to use and benefit from the 70,000 face images taken on Flickr.com to develop commercial AI technologies.
(...)
Even though the main dataset and its derivatives mention the Creative Commons licenses associated with the media, of which many require attribution, no human readable attribution was provided for any photo in any dataset. Attribution is only provided in a 256MB JSON file that could not be opened on a standard laptop computer using Sublime text editor, let alone parsed to understand author attribution. This may amount to a large-scale breach of the Creative Commons attribution requirement. For further reading on the exploitation of Creative Commons licensing scheme, read "Creative Commons and The Face Recognition Problem". To further complicate the issue, it may not be possible at all to use non-consensual face images for AI/ML when attribution is required because including the subject or author name can force the face photo to become PII (personally identifiable information), a protected class of data."

exposing.ai/ffhq/

#ai #datasets #facialrecognition #ml #cc #creativecommons #flickr #nvidi

Last updated 1 year ago

Paul R. Pival (he/him) · @ppival
145 followers · 290 posts · Server glammr.us

Data from U.S. 2020 Presidential Election Facebook and Instagram Study Now Available at ICPSR icpsr.umich.edu/web/about/cms/

#socialmedia #meta #facebook #instagram #datasets #election2020

Last updated 1 year ago

Picanúmeros · @Picanumeros
1772 followers · 776 posts · Server mathstodon.xyz

Anoche tuve un día muy largo, disculpad.

#datasets #estadistica

Last updated 1 year ago

Miguel Afonso Caetano · @remixtures
671 followers · 2621 posts · Server tldr.nettime.org

: The Google memo points to the dawning realization that improvements in AI will require putting a lot more care and thought into how data is collected and curated. Even OpenAI, which relies on gargantuan datasets to make its products, is now pointing to this issue. A close engagement with datasets has been deeply undervalued in the AI field, and this neglect has had serious consequences downstream, from technical failures to human rights violations.

This is why investigating datasets is so important. Not because companies want an edge in the current AI wars, but to understand the ideologies, viewpoints, and harms that are being ingested, concentrated, and reproduced by AI systems. The new internet-scale datasets require new investigative methods, new research questions. What political and cultural inflections are baked into training sets? Who and what is represented? What is rendered invisible and unintelligible? Who profits from all this data, and at whose expense? What legal issues does the mass extraction of data raise for copyright, privacy, moral rights, and the right to publicity? What about the people whose creative work and livelihoods are impacted? How could these practices change? And as the accelerating machines of scrape-generate-publish-repeat begin to ingest their own material, what logics, perspectives, and aesthetics will be reinforced in this recursive loop?"

knowingmachines.org/9-ways-to-

#ai #datasets #aiethics

Last updated 1 year ago

George Macgregor · @g3om4c
429 followers · 356 posts · Server code4lib.social

A useful study as it provides direction for corrective action; but in the meantime its findings reinforce a reality (rather than perception) that responsible can have few academic incentives.

"the vast majority of have no recorded citations, & most cited datasets only have a single citation... sharing will rarely lead to replication or new knowledge production that can be identified through a formal "

arxiv.org/abs/2308.04379

#openresearch #datasets #data #citation #fairdata #opendata

Last updated 1 year ago

New Submissions to TMLR · @tmlrsub
200 followers · 717 posts · Server sigmoid.social

On the Efficacy of Differentially Private Few-shot Image Classification

openreview.net/forum?id=hFsr59

#privacy #private #datasets

Last updated 1 year ago

LaPingvino 🟙 :ir: · @lapingvino
302 followers · 2085 posts · Server neurodifferent.me

Through a look into the things people were saying about the upcoming Go 1.21 release I discovered and today. I have been looking for ages for something like that:

A compatible database system that works like for data, and a like environment for , free for public datasets.

For people here struggling with in the : one of the projects they are working out there is a dataset of pricing per hospital and data-based work to push for more and better policy reform around this. I think we can do MUCH better if we have good , and now we have a platform to share and work together about data, so let's use it!

#doltdb #dolthub #mysql #git #GitHub #datasets #healthcare #USA #data

Last updated 1 year ago

Leonardo Grando · @lgrando123
119 followers · 437 posts · Server sciencemastodon.com

[EN] for list:

[PT] Lista de conjuntos de dados para DS.

kdnuggets.com/datasets/index.h

#datasets #data #science

Last updated 1 year ago

Published papers at TMLR · @tmlrpub
551 followers · 541 posts · Server sigmoid.social

Mitigating Real-World Distribution Shifts in the Fourier Domain

Kiran Krishnamachari, See-Kiong Ng, Chuan-Sheng Foo

Action editor: Hanie Sedghi.

openreview.net/forum?id=lu4oAq

#fourier #adaptation #datasets

Last updated 1 year ago