| ▲ | RobinL 15 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I think a lot of this comes down to the question: Why aren't tables first class citizens in programming languages? If you step back, it's kind of weird that there's no mainstream programming language that has tables as first class citizens. Instead, we're stuck learning multiple APIs (polars, pandas) which are effectively programming languages for tables. R is perhaps the closest, because it has data.frame as a 'first class citizen', but most people don't seem to use it, and use e.g. tibbles from dplyr instead. The root cause seems to be that we still haven't figured out the best language to use to manipulate tabular data yet (i.e. the way of expressing this). It feels like there's been some convergence on some common ideas. Polars is kindof similar to dplyr. But no standard, except perhaps SQL. FWIW, I agree that Python is not great, but I think it's also true R is not great. I don't agree with the specific comparisons in the piece. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | RodgerTheGreat 14 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
There are a number of dynamic languages to choose from where tables/dataframes are truly first-class datatypes: perhaps most notably Q[0]. There are also emerging languages like Rye[1] or my own Lil[2]. I suspect that in the fullness of time, mainstream languages will eventually fully incorporate tabular programming in much the same way they have slowly absorbed a variety of idioms traditionally seen as part of functional programming, like map/filter/reduce on collections. [0] https://en.wikipedia.org/wiki/Q_(programming_language_from_K... [1] https://ryelang.org/blog/posts/comparing_tables_to_python/ | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | OkayPhysicist 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
There's a number of structures that I think are missing in our major programming languages. Tables are one. Matrices are another. Graphs, and relatedly, state machines are tools that are grossly underused because of bad language-level support. Finally, not a structure per se, but I think most languages that are batteries-included enough to included a regex engine should have a a full-fledged PEG parsing engines. Most, if not all, Regex horror stories derive from a simple "Regex is built in". What tools are easily available in a language, by default, shape the pretty path, and by extension, the entire feel of the language. An example that we've largely come around on is key-value stores. Today, they're table stakes for a standard library. Go back to 90's, and the most popular languages at best treated them as second-class citizens, more like imported objects than something fundamental like arrays. Sure, you can implement a hash map in any language, or import some else's implementation, but oftentimes you'll instead end up with nightmarish, hopefully-synchronized arrays, because those are built-in, and ready at hand. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | genidoi 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
This is an interesting observation. One possible explanation for a lack of robust first class table manipulation support in mainstream languages could be due to the large variance in real-world table sizes and the mutually exclusive subproblems that come with each respective jump in order-of-magnitude row size. The problems that one might encounter in dealing with a 1m row table are quite different to a 1b row table, and a 1b row table is a rounding error compared to the problems that a 1t row table presents. A standard library needs to support these massive variations at least somewhat gracefully and that's not a trivial API surface to design. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | don-bright 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Every copy of Microsoft Excel includes Power Query which is in the M language and has tables as a type. Programs are essentially transformations of table columns and rows. Not sure if its mainstream but is widely available. M language is also included in other tools like PowerBI and Power Automate. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | riskassessment 10 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> R is perhaps the closest, because it has data.frame as a 'first class citizen', but most people don't seem to use it, and use e.g. tibbles from dplyr instead. Everyone in R uses data.frame because tibble (and data.table) inherits from data.frame. This means that "first class" (base R) functions work directly on tibble/data.table. It also makes it trivial to convert between tibble, data.table, and data.frames. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | maest 7 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> Why aren't tables first class citizens in programming languages? They are in q/kdb and it's glorious. Sql expressions are also first class citizens and it makes it very pleasant to write code | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | paddleon 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> R is perhaps the closest, because it has data.frame as a 'first class citizen', but most people don't seem to use it, and use e.g. tibbles from dplyr instead. You're forgetting R's data.table, https://cran.r-project.org/web/packages/data.table/vignettes..., which is amazing. Tibbles only wins because they fought the docs/onboarding battle better, and dplyr ended up getting industry buy-in. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | riidom 10 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
PyTorch was first only Torch, and in Lua. I didn't follow it too close at the time, but apparently due to popular demand it got redone in Python and voila PyTorch. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | brikym an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
This. I really really want some kind of data frame which has actual compile time typing my LSP/IDE can understand. Kusto query language (Azure Data Explorer) has it and the auto completion and error checking is extremely useful. But kusto query language is really just limited to one cloud product. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nextos 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I don't think this is the real problem. In R and Julia tables are great, and they are libraries. The key is that these languages are very expressive and malleable. Simplifying a lot, R is heavily inspired by Scheme, with some lazy evaluation added on top. Julia is another take at the design space first explored by Dylan. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | kelipso 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
People use data.table in R too (my favorite among those but it’s been a few years). data.table compared to dplyr is quite a contrast in terms of language to manipulate tabular data. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jna_sh 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I know the primary data structure in Lua is called a table, but I’m not very familiar with them and if they map to what’s expected from tables in data science. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | IgorPartola 10 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
SQL is not just about a table but multiple tables and their relationships. If it was just about running queries against a single table then basic ordering, filtering, aggregation, and annotation would be easy to achieve in almost any language. Soon as you start doing things like joins, it gets complicated but in theory you could do something like an API of an ORM to do most things. With using just operators you quickly run into the fact that you have to overload (abuse) operators or write a new language with different operator semantics:
Where * means cross product/outer join and | means filter. Once you add an ordering operator, a group by, etc. you basically get SQL with extra steps.But it would be nice to have it built in so talking to a database would be a bit more native. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | RA_Fisher 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
R’s the best, bc it’s been a statistical analysis language from the beginning in 1974 (and was built and developed for the purpose of analysis / modeling). Also, the tidyverse is marvelous. It provides major productivity in organizing and augmenting the data. Then there’s ggplot, the undisputed best graphical visualization system + built-ins like barplot(), or plot(). But ultimately data analysis is going beyond Python and R into the realm of Stan and PyMC3, probabilistic programming languages. It’s because we want to do nested integrals and those software ecosystems provide the best way to do it (among other probabilistic programming languages). They allow us to understand complex situations and make good / valuable decisions. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | kevinhanson 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
this is my biggest complaint about SAS--everything is either a table or text. most procs use tables as both input and output, and you better hope the tables have the correct columns. you want a loop? you either get an implicit loop over rows in a table, write something using syscalls on each row in a table, or you're writing macros (all text). | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | m_mueller 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Fortran gives you that and more, it has first class multidimensional arrays, including matrix operations. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 127 10 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Because there's no obvious universal optimal data structure for heterogeneous N-dimensional data with varying distributions? You can definitely do that, but it requires an order of magnitude more resource use as baseline. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| [deleted] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | alexnewman 11 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
APL Is great | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | CivBase 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
What is a table other than an array of structs? | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ModernMech 10 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
It makes sense from a historical perspective. Tables are a thing in many languages, just not the ones that mainstream devs use. In fact, if you rank programming languages by usage outside of devs, the top languages all have a table-ish metaphor (SQL, Excel, R, Matlab). The languages devs use are largely Algol derived. Algol is a language that was used to express algorithms, which were largely abstractions over Turing machines, which are based around an infinite 1D tape of memory. This model of 1D memory was built into early computers, and early operating systems and early languages. We call it "mechanical sympathy". Meanwhile, other languages at the same time were invented that weren't tied so closely to the machine, but were more for the purpose of doing science and math. They didn't care as much about this 1D view of the world. Early languages like Fortran and Matlab had notions of 2D data matrices because math and science had notions of 2D data matrices. Languages like C were happy to support these things by using an array of pointers because that mapped nicely to their data model. The same thing can be said for 1-based and 0-based indexing -- languages like Matlab, R, and Excel are 1-based because that's how people index tables; whereas languages like C and Java are 0-based because that's how people index memory. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | constantcrying 12 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
>Why aren't tables first class citizens in programming languages? Matlab has them, in fact it has multiple competing concepts of it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 11 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| [deleted] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dm319 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Dplyr is quite happy with data.frame. R is built around tabular data. Other statistical languages are too, such as Stata. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | getnormality 7 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Saying that SQL is the standard for manipulating tabular data is like saying that COBOL is the standard for financial transactions. It may be true based on current usage, but nobody thinks it's a good idea long term. They're both based on the outdated idea that a programming language should look like pidgin English rather than math. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||