A multi-purpose scrapper to turn any webpage into structured data: https://news.ycombinator.com/item?id=45870231
It uses LLMs to generate python code to scrap a webpage to fit any Pydantic model provided:
from hikugen import HikuExtractor
from pydantic import BaseModel
from typing import List
class Article(BaseModel):
title: str
author: str
published_date: str
content: str
class ArticlePage(BaseModel):
articles: List[Article]
extractor = HikuExtractor(api_key="your-openrouter-api-key")
result = extractor.extract(
url="https://example.com/articles",
schema=ArticlePage
)
for a in result.articles:
print(a.title, a.author)