I am developing a research application that requires very fast analysis of very large tabular data from sequencing experiments. While I eventually settled in , someone kindly suggested I check out what IT does.
The porno servers handle a massive amount of data in real time, executing complex queries in response to what users are watching. At least 3 orders of magnitude larger a problem than mine. Here is what the pros do:
news.ycombinator.com/item?id=3
davidwalsh.name/pornhub-interv

#rstats #datatable #porn

Last updated 1 year ago

Taras Novak 🇺🇦 · @dataSamurai
176 followers · 318 posts · Server vis.social

Going back to our new 📓 🛠️ development, and updating our generic 🈸 for Notebooks 📚 this week.

File your new feature requests and enhancements in our VS Code ⊞ repo for now:

🗃️ github.com/RandomFractals/vsco

#datatable #vscode #datatablerenderers #prodatatools #datanotebook

Last updated 1 year ago

Imagine you have a bunch of data points and you want to know how many belong to different categories. This is where grouped counting comes in. We've got three fantastic methods for you to explore, each with its own flair: **`aggregate()`**, **`dplyr`**, and **`data.table`**.

Happy counting, fellow data explorer! 🎉🔍

Post: spsanderson.com/steveondata/po

#datatable #RStats #r #baser #aggregate #dplyr #exploredata #rprogramming #dataanalysis

Last updated 1 year ago

magljo · @magljo
15 followers · 18 posts · Server fosstodon.org

Spent 3 hours this evening trying to parse a deeply nested json file and convert to an R data.table. Thought I'd encountered enough json data to be able to handle anything thrown at me but had to admit temporary defeat - I'll try again tomorrow. Maybe I'm just rusty with parsing json, or the two beers I had after work addled my brain. Anyone know of any good resources for handling deeply nested Json?

#rstats #json #datatable

Last updated 1 year ago

devSJR :python: :rstats: · @devSJR
160 followers · 302 posts · Server fosstodon.org

Occasionally, I think about how to work effectively with . Currently, I am teaching my courses with again. I try to do most of it with packages from the base installation. is an exception. But otherwise, I like to use (very fast) instead of .
But there are more approaches, which are often simpler/faster/stable:

- github.com/matloff/TidyverseSk
- davidhughjones.medium.com/dont

#rstats #bioinformatics #rkward #datatable #within #mutate

Last updated 2 years ago

Taras Novak 🇺🇦 · @dataSamurai
155 followers · 201 posts · Server vis.social

Running queries in our new 📓 @code extension, rendering results with simple , 🈷️ & 中 from our 🈸 + CSV from VSCode Notebook cell output all in one go. We doubt is as flexible. 😎 🔬 ...

#datatools #MalloyData #dataexport #datatablerenderers #flatdatagrid #datasummary #datatable #datanotebook #sql

Last updated 2 years ago

I had recently posted on benchmarking the reading in of a .csv file but received an email over the weekend pointing out the omission of something like csv.gz file(s).

Functions tested in the benchmark:

✅ read.table
✅ read.csv
✅ fread
✅ vroom with altrep=false
✅ vroom with altrep=true
✅ read_csv

Post: spsanderson.com/steveondata/po

#benchmarking #Software #Technology #innovation #OpenSource #baser #tidyverse #readr #datatable #vroom #RStats #r #gz #compression #softwaredevelopment #Help #Data

Last updated 2 years ago

Today I wanted to share some out of the box benchmarking for reading in a square in the idea behind this was to see how fast the default settings where for reading in these various files.

Post: spsanderson.com/steveondata/po

#innovation #Technology #softwareengineering #Software #opensourcesoftware #OpenSource #datatable #arrow #fst #vroom #RStats #r #Matrix

Last updated 2 years ago

The original data.table function I wrote was slower than the original solution of tidy_bernoulli(), but with the help of Reddit, LinkedIn, and Mastadon users, I got a few great improvements thanks to users from Reddit, LinkedIn, and Mastadon.

🙌 Reddit Help from: reddit.com/user/NewHere_Hi_eve

🙌 LinkedIn Help from: Chris Kypridemos

🙌 Mastadon Help from: @datamaps

Post: spsanderson.com/steveondata/po

#RStats #benchmarking #datatable #Technology #Software #opensourcesoftware #OpenSource #innovation

Last updated 2 years ago

devSJR :python: :rstats: · @devSJR
154 followers · 280 posts · Server fosstodon.org

Everybody knows (hopefully) that data.table is great. Today, I noticed that it comes with its own update mechanism.

data.table::update_dev_pkg()

That is really useful if there is a feature you see in the development version but prefer a tested package.

#rstats #datatable

Last updated 2 years ago

Joris Meys · @JorisMeys
449 followers · 840 posts · Server mstdn.social

Well, that settles it then.

(joking aside, it's spooky how well it responds to all kinds of questions students would throw at it. )

#chatgpt #datatable #dplyr #RStats

Last updated 2 years ago

Taras Novak 🇺🇦 · @dataSamurai
122 followers · 114 posts · Server vis.social

Our 🈸 for Notebooks 📚 has over 30,000 installs. It's one of the most widely used 📓 extensions in VS marketplace. Extension includes scrollable , & output renderers. Try it!
📥 marketplace.visualstudio.com/i

🛠️ 💎💎💎...

#datatools #datasummary #flatdatagrid #datatable #datanotebook #vscode #datatablerenderers

Last updated 2 years ago

Surya Teja K · @shanmukhateja
15 followers · 58 posts · Server social.linux.pizza

As part of improving documentation and encouraging best practices, I will be updating angular-datatables in near future. There won't be major changes to the library source code.

Our goal is to shuffle the menu items and update GitHub Support templates to reflect these changes.

I'll share more details on my blog in a few weeks once I've made some progress. (hopefully!!!)

Happy new year (in advance) folks! 🎉

#newplans #opensource #datatable #angular

Last updated 2 years ago

Ewan Donnachie · @ERDonnachie
508 followers · 704 posts · Server mstdn.social

Unpopular opinion: {data.table} syntax is:

1. Confusing in comparison with the indexing notation for base R data.frames

2. Cryptic with all those [, x := fn(a), by=var]

3. Difficult for the uninitiated to understand (unlike SQL and dplyr)

I know {data.table} is a good package with a lot of very happy users, but for some reason these disadvantages are rarely mentioned. Alongside the lack of database backend, they're the main reason I don't use the package much.

@rstats

#datatable #RStats

Last updated 2 years ago

Ewan Donnachie · @ERDonnachie
560 followers · 858 posts · Server mstdn.social

Unpopular opinion: {data.table} syntax is:

1. Confusing in comparison with the indexing notation for base R data.frames

2. Cryptic with all those [, x := fn(a), by=var]

3. Difficult for the uninitiated to understand (unlike SQL and dplyr)

I know {data.table} is a good package with a lot of very happy users, but for some reason these disadvantages are rarely mentioned. Alongside the lack of database backend, they're the main reason I don't use the package much.

@rstats

#datatable #RStats

Last updated 2 years ago

devSJR :python: :rstats: · @devSJR
136 followers · 162 posts · Server fosstodon.org

Every time I work with data.table I think what a great package. The speed alone is great. What I also like is the tibble-like behavior when displaying data.

#tibble #datatable #rstats

Last updated 2 years ago

There might be times when you may want to get some sort of like a or on your data.

With my {TidyDensity} this is possible given the data comes from a tidy_ distribution function. If you have a vector of data you can use tidy_empirical() as a cheat.

With this function you can get output as or a where is doing the work.

Post: spsanderson.com/steveondata/po

See attached!

#datatable #tibble #lapply #sapply #package #r #distribution #irq #quantile #statistic #summary

Last updated 2 years ago

docfleetwood · @docfleetwood
39 followers · 23 posts · Server fosstodon.org

@chrisadamsecon Definitely faster than on my laptop. More vectorized functions. like syntax and you can mostly mix and match with dplyr as you require. Another package, is quite interesting also. You can just type regular dplyr and it will convert to without worrying about any extra steps like dtplyr.

#dplyr #tidyverse #tidytable #datatable #rstats

Last updated 2 years ago