I de-vibed a vibe-coded NLP app

Earlier this year I built a local app called How I Prompt that analyzes your AI coding conversations and gives you a prompt/persona breakdown.

The first version was very much vibe-coded. It looked good, but once I audited it properly I found several issues:

- the behavioral axes used to generate prompting personas were heavily correlated and mostly measuring prompt length

- one persona was mathematically unreachable

- the pipeline was counting logs / tool noise / machine-generated text as human prompts (this was the highest LOE for me to clean the data myself)

- the whole thing had a stronger appearance of rigor than actual rigor

I wrote up the rebuild here: https://theasymptotic.substack.com/p/how-i-de-vibed-a-vibe-c...

Repo: https://github.com/eeshansrivastava89/howiprompt

What I changed in v2:

- much more aggressive data cleaning

- simpler feature-based scoring using logistic regression instead of embeddings (something I understand better)

- external prompt datasets for broader validation

- a more transparent 2-axis system that seems to behave much better than the original

It runs locally and doesn't upload your prompt data anywhere. Point your agent at the repo to validate yourself.

Would especially love feedback from people who have worked on behavioral measurement, NLP evaluation, or human/AI interaction. I'm definitely not a domain expert. One of the main things I wanted to document here was the difference between "AI helped me ship a prototype fast" and "this is actually a sound measurement system."

	▲	farrukh23buttt 6 hours ago \| parent [-]
		The most interesting part is that the first version wasn’t just noisy — some of the personas were structurally meaningless because the axes were correlated and one outcome was unreachable. That feels like a good reminder that in measurement work, a simpler model with cleaner data can be more honest than a fancier one with embeddings.