Remix.run Logo
wenc 3 hours ago

MCPs (especially remote MCPs) are like a black box API -- you don't have to install anything, provision any resources, etc. You just call it and get an answer. There's a place for that, but an MCP is ultimately a blunt instrument.

CLI tools on the other hand are like precision instruments. Yes, you have to install them locally once, but after that, they have access to your local environment and can discover things on their own. There are two CLIs are particularly powerful for working with large structured data: `jq` and `duckdb` cli. I tell the agent to never load large JSON, CSV or Parquet files into context -- instead, introspect them intelligently by sampling the data with said CLI tools. And Opus 4.6 is amazing at this! It figures out the shape of the data on its own within seconds by writing "probing" queries in DuckDB and jq. When it hits a bottleneck, Opus 4.6 figures out what's wrong, and tries other query strategies. It's amazing to watch it go down rabbit holes and then recovering automatically. This is especially useful for doing exploratory data analysis in ML work. The agent uses these tools to quickly check data edge cases, and does a way more thorough job than me.

CLIs also feel "snappier" than MCPs. MCPs often have latency, whereas you can see CLIs do things in real time. There's a certain ergonomic niceness to this.

p.s. other CLIs I use often in conjunction with agents:

`showboat` (Simon Willison) to do linear walkthroughts of code.

`br` (Rust port of Beads) to create epics/stories/tasks to direct Opus in implementing a plan.

`psql` to probe Postgres databases.

`roborev` (Wes McKinney) to do automatic code reviews and fixes.

itintheory an hour ago | parent | next [-]

> you have to install them locally once

or install Docker and have the agent run CLI commands in docker containers that mount the local directory. That way you essentially never have to install anything. I imagine there's a "skill" that you could set up to describe how to use docker (or podman or whatever) for all CLI interactions, but I haven't tried yet.

leohart 2 hours ago | parent | prev [-]

I have also found this as well. CLI outputs text and input text in an interactive manner, exactly the way that is most conducive to text-based-text-trained LLM.

I do believe that as vision/multi-modal models get to a better state, we would see even crazier interaction surfaces.

RE: duckdb. I have a wonderful time with ChatGPT talking to duckdb but I have kept it to inmemory db only. Do you set up some system prompt that tell it to keep a duckdb database locally on disk in the current folder?

wenc an hour ago | parent [-]

> RE: duckdb. I have a wonderful time with ChatGPT talking to duckdb but I have kept it to inmemory db only. Do you set up some system prompt that tell it to keep a duckdb database locally on disk in the current folder?

No, I don't use DuckDB's database format at all. DuckDB for me is more like an engine to work with CSV/Parquet (similar to `jq` for JSON, and `grep` for strings).

Also I don't use web-based chat (you mentioned ChatGPT) -- all these interactions are through agents like Kiro or Claude Code.

I often have CSVs that are 100s of MBs and there's no way they fit in context, so I tell Opus to use DuckDB to sample data from the CSV. DuckDB works way better than any dedicated CSV tool because it packs a full database engine that can return aggregates, explore the limits of your data (max/min), figure out categorical data levels, etc.

For Parquet, I just point DuckDB to the 100s of GBs of Parquet files in S3 (our data lake), and it's blazing fast at introspecting that data. DuckDB is one of the best Parquet query engines on the planet (imo better than Apache Spark) despite being just a tiny little CLI tool.

One of the use cases is debugging results from an ML model artifact (which is more difficult that debugging code).

For instance, let's say a customer points out a weird result in a particular model prediction. I highlight that weird result, and tell Opus to work backwards to figure how the ML model (I provide the training code and inference code) arrived at that number. Surprisingly, Opus 4.6 is does a great job using DuckDB to figure out how the input data produced that one weird output. If necessary, Opus will even write temporary Python code to call the inference part of the ML model to do inference on a sample to verify assumptions. If the assumptions turn out to be wrong, Opus will change strategies. It's like watching a really smart junior work through the problem systematically. Even if Opus doesn't end up nailing the actual cause, it gets into the proximity of the real cause and I can figure out the rest. (usually it's not the ML model itself, but some anomaly in the input). This has saved me so much time in deep-diving weird results. Not only that, I can have confidence in the deep-dive because I can just run the exact DuckDB SQL to convince myself (and others) of the source of the error, and that it's not something Opus hallucinated. CLI tools are deterministic and transparent that way. (unlike MCPs which are black boxes)