Embeddings and vector search lab

This demo turns approved ARC Raiders sources into a reusable corpus: cached markdown ingestion, normalized source and chunk schemas, AI SDK embeddings through the gateway, Upstash Vector storage, a semantic search API, and a reproducible chunking comparison.

Embedding model

text-embedding-3-small

Generated through the AI Gateway with the repo's existing AI SDK setup.

Vector store

Upstash Vector

Dense custom index with 1536 dimensions and cosine similarity.

Default chunking

Semantic

Validated on the approved Metaforge item catalog benchmark, where semantic chunking beat overlapping 9/10 to 8/10 recall@3.

Ingest the approved ARC Raiders corpus

Scrape the approved ARC Raiders seeds with Firecrawl, cache markdown-only artifacts in durable storage, normalize them into the shared source schema, derive patch records from official updates, chunk with the current default, embed with AI SDK, and upsert the final chunks into Upstash Vector.

official docsofficial updatescommunity itemsderived patch records

Ingest is disabled from the public deployment. Run this locally, or call the route with the server-side ingest secret.

Compare chunking strategies

Run the same retrieval-style query set against fixed, overlapping, and semantic chunking on the approved long-form ARC Raiders item catalog benchmark.

Semantic search API

Embed the query with AI SDK and return the top-ranked chunks, metadata, and similarity scores from the ARC Raiders vector index.

Run an ingest, then search the corpus index to inspect chunk-level results and citation metadata.