A multi-purpose scrapper to turn any webpage into structured data: https://news.ycombinator.com/item?id=45870231

It uses LLMs to generate python code to scrap a webpage to fit any Pydantic model provided:

  from hikugen import HikuExtractor
  from pydantic import BaseModel
  from typing import List
  
  class Article(BaseModel):
      title: str
      author: str
      published_date: str
      content: str
  
  class ArticlePage(BaseModel):
      articles: List[Article]
  
  extractor = HikuExtractor(api_key="your-openrouter-api-key")
  
  result = extractor.extract(
      url="https://example.com/articles",
      schema=ArticlePage
  )
  
  for a in result.articles:
      print(a.title, a.author)