▲ | Show HN: Llms.txt Generator – Turn websites into a text file to feed to any LLM(llmstxt.firecrawl.dev) | ||||||||||||||||
28 points by ericciarla 4 days ago | 6 comments | |||||||||||||||||
Hey HN! It’s Eric from Firecrawl (https://firecrawl.dev). I just launched llms.txt Generator, a tool that transforms any website into a clean, structured text file optimized for feeding to LLMs. You can learn more about the standard at https://llmstxt.org. Here’s how it works under the hood: 1. We use Firecrawl, our open-source scraper, to fetch the full site, handling JavaScript-heavy pages and complex structures. 2. The markdown content is parsed and then the title and description are extracted using GPT-4o-mini. 3. The everything is combined and the result is a lightweight llms.txt file that you can paste into any LLM. Let me know what you think! | |||||||||||||||||
▲ | throwaway314155 3 days ago | parent | next [-] | ||||||||||||||||
For a simple solution, you can just right click->Save Page As.. and upload the resulting `.html` file into Claude/ChatGPT as an attachment. They're both more than capable of parsing the article content from the HTML without needing any pre-processing. | |||||||||||||||||
▲ | IndieCoder 3 days ago | parent | prev | next [-] | ||||||||||||||||
I like the idea but Firecrawl and GPT4o is quite heavy. I use https://github.com/unclecode/crawl4ai in some projects, it works very well without these dependencies and is modular so you can use LLMs but do not have to :) | |||||||||||||||||
▲ | jondwillis 3 days ago | parent | prev | next [-] | ||||||||||||||||
Plain HTTP and passing an API key as a URL query parameter? Yikes! | |||||||||||||||||
▲ | DrillShopper 3 days ago | parent | prev [-] | ||||||||||||||||
Thanks for facilitating even more widespread and frictionless copyright violations | |||||||||||||||||
|