This is an awesome direction. Few thoughts:

It would be awesome if the custom rules were generalized on the fly from ongoing reviewer conversations. Imaging two devs quibble about line length in a PR, and in a future PR, the AI reminds about this convention.

Would this work seamlessly with AI Engineers like Devin? I imagine so.

This will be very handy for solo devs as well, even those who don't use Coding CoPilots could benefit from an AI reviewer, if it does not waste their time.

Maybe there can be multiple AI models review the PR at the same time, and over time, we promote the ones whose feedback is accepted more.

▲

allisonee 3 months ago | parent | next [-]

Appreciate the feedback! We currently auto-suggest custom rules based on your comment history (and .cursorrules), however continuing to suggest from history is now on the roadmap thanks to your suggestion!

On working with Devin: Yes, right now we're focused on code review, so whatever AI IDE you use would work. In fact, it might even be better with autonomous tools like Devin since we focus on helping you (as a human) understand the code they've written faster.

Interesting idea on multiple AI models --we were also separately toying with the idea of having different personas (security, code architecture), will keep this one in mind!

	▲	justanotheratom 2 months ago \| parent [-]
		personas sounds great!

▲

8organicbits 2 months ago | parent | prev | next [-]

Line length isn't something I'd want reviewed in a PR. Typically I'd set up a linter with relevant limits and defer to that, ideally using pre-commit testing or directly in my IDE. Line length isn't an AI feature, it's largely a solved problem.

	▲	justanotheratom 2 months ago \| parent [-]
		bad example, sorry.

▲

pomarie 3 months ago | parent | prev [-]

These are all amazing ideas. We actually already see a lot of solo devs using mrge precisely because they want something to catch bugs before code goes live—they simply don't have another pair of eyes.

And I absolutely love your idea of having multiple AI models review PRs simultaneously. Benchmarking LLMs can be notoriously tricky, so a "wisdom of the crowds" approach across a large user base could genuinely help identify which models perform best for specific codebases or even languages. We could even imagine certain models emerging as specialists for particular types of issues.

Really appreciate these suggestions!