The Stack: 3 TB of permissively licensed source code
Denis Kocetkov, Raymond Li, Loubna Ben allal et al.
Action editor: Swarat Chaudhuri.
A new report from @Sourcegraph warns the issue with #BigCode will hit crisis mode if companies don't get a handle on how their #developers use #AI at work. https://venturebeat.com/ai/developers-embrace-ai-tools-but-face-big-code-challenges-survey-finds/ #press
#bigcode #developers #ai #press
#HuggingFace just released the #SantaCoder models for the holiday season. Part of the #BigCode project, these 1.1B parameter models are trained on #Python, #Java, and #JavaScript and use advanced techniques like near-deduplication and comment-to-code ratio.
https://huggingface.co/bigcode/santacoder
#AI #DeepLearning 🤗
#huggingface #santacoder #bigcode #Python #java #javascript #ai #deeplearning
137 million #OpenSource repositories of 92 terabyte source code data from #GitHub: Very impressive how much code is being processed for the @huggingface #BigCode project! https://huggingface.co/bigcode Presented by #LeandroVonWerra at #DINAcon22 #dinacon in Bern: https://dinacon.ch
#OpenSource #GitHub #bigcode #leandrovonwerra #dinacon22 #dinacon