Posted by timeproofs 5 hours ago
Show HN: Turning noisy webpages into clean JSON for LLMs
HTML mixes real content with navigation, footers, cookie banners, scripts, ads, and layout noise. This makes prompts larger, chunking worse, and RAG pipelines less reliable.
AI2JSON is a small public API that converts any public webpage into a clean, deterministic JSON structure: - main content only - ordered sections - stable output - SHA-256 hash for change detection
No summary, no interpretation — just a minimal contract between the web and AI systems.
You can paste a URL and instantly compare: - what an LLM sees with raw HTML - vs the same content as structured JSON
Free sandbox, no API key. I’m mainly looking for developer feedback: does this actually improve your AI workflows?