Remix.run Logo
faangguyindia 3 days ago

I am curious if any good existing solution exist for this tool:

`Tool name: WebFetch Tool description: - Fetches content from a specified URL and processes it using an AI model - Takes a URL and a prompt as input - Fetches the URL content, converts HTML to markdown - Processes the content with the prompt using a small, fast model - Returns the model's response about the content - Use this tool when you need to retrieve and analyze web content`

I came up with this one:

`import asyncio from playwright.async_api import async_playwright from readability import Document from markdownify import markdownify as md

async def web_fetch_robust(url: str, prompt: str) -> str: """ Fetches content from a URL using a headless browser to handle JS-heavy sites, processes it, and returns a summary. """ try: async with async_playwright() as p: # Launch a headless browser (Chromium is a good default) browser = await p.chromium.launch() page = await browser.new_page()

            # --- Avoiding Blocks ---
            # Set a realistic User-Agent to mimic a real browser
            await page.set_extra_http_headers({
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
            })

            # Navigate to the URL
            await page.goto(url, wait_until='networkidle', timeout=15000) # wait_until='networkidle' is key

            # --- Extracting Content ---
            # Get the fully rendered HTML content
            html_content = await page.content()
            await browser.close()

            # --- Processing for Token Minimization ---
            # 1. Extract main content using Readability.js
            doc = Document(html_content)
            main_content_html = doc.summary()

            # 2. Convert to clean Markdown
            markdown_content = md(main_content_html, strip=['a', 'img']) # Strip links/images to save tokens

            # 3. Use the small, fast model to process the clean content
            # summary = small_model.process(prompt, markdown_content) # Placeholder for your model call

            # For demonstration, we'll just return a message
            summary = f"A summary of the JS-rendered content from {url} would be generated here."

            return summary

    except Exception as e:
        return f"Error fetching or processing URL with headless browser: {e}"
# To run this async function # result = asyncio.run(web_fetch_robust("https://example.com", "Summarize this.")) # print(result) `