I see the appeal of it. It is a good start. But I don't think it's quite useful yet. This proves to be a great distribution model for an MCP project.

FWIW, this project creates two tools for a GitHub repo on demand

  fetch_cosmos_sdk_documentation
  search_cosmos_sdk_documentation

These tools would be available for the MCP client to call when it needs information. The search tool didn't quite work for me, but the fetch did. It pulled the readme and made it available to the MCP client. Like I said before, it's not so helpful at the moment. But I am interested in the possibilities.

▲ sdesol 3 months ago | parent [-]

Full Disclosure: I built an indexing engine for Git and GitHub that can process repos at scale and my words should be taken with scepticism.

I think using MCP is an interesting idea, but the heavy lifting that can provide insights, is not with MCP. For fetch and search to work effectively, the MCP will need quality context to know what to consider. I'm biased, but I really looked into chunking documents, but given how the LLM landscape is evolving, I don't think chunking makes a lot sense any more (for code at least).

I've committed to generating short and long overviews for directories and files. Short overviews are two to three sentences. And long overviews are two to three paragraphs. Given how effectively newer LLMs can process 100,000 tokens or less, you can feed it a short overview for all files/directories to determine what files to sub query with. That is, what long overviews to load into context for the sub query.

I also believe most projects in the future will start to produce READMEs for LLMs that are verbose and not easy to grok for humans, but is rich in detail for LLMs. You may not want the LLM to generate the code for you, but the LLM can certainly help us navigate complex/unfamiliar code in a semantic manner, which can be game changer for onboarding.

	▲	liadyo 3 months ago \| parent [-]
		That sounds really interesting! What got us into this project is the problem in with the LLM a large llms-full.txt file as a context, for example. We wanted to provide the agents an easy way to get the documentation for every repo (be it llms.txt, readme, etc) - but also search chunks of it using semantic search. Will be happy to chat more, if you like - sounds like we can benefit from bouncing ideas and notes