| ▲ | solatic 14 hours ago | |
Shell is the best language for data science. Pick the best tools for each of getting data, cleaning data, transforming data, and visualizing data, then stitch them together by sheer virtue of the fact that text is the universal interoperable protocol and files are the universal way of saving intermediate stages of data. Best part is, write a --help, and you can load them into LLMs as tools to help the LLMs figure it out for you. Fight me. | ||
| ▲ | xn 11 hours ago | parent [-] | |
redo[1] with shell scripts has become my goto method of dealing with multi-step data problems. It makes it easy to review each step of data retrieval, clean-up, transformation, etc. I use mlr, sqlite, rye, souffle, and goawk in the shell scripts, and visidata to interactively review the intermediate files. | ||