Build an email RAG pipeline without the plumbing.
InboxParse handles IMAP sync, HTML cleaning, threading, and Markdown conversion - so you can focus on the retrieval layer, not the email parsing layer.
import { InboxParse } from "@/lib/inboxparse"
import { embed } from "ai"
import { openai } from "@ai-sdk/openai"
import { supabase } from "@/lib/supabase"
// 1. Fetch recent threads from InboxParse
const res = await fetch(
"https://inboxparse.com/api/v1/threads?limit=50&format=markdown",
{ headers: { Authorization: "Bearer ip..." } }
)
const { data: threads } = await res.json()
// 2. Chunk and embed each thread
for (const thread of threads) {
const text = thread.messages
.map((m) => m.content.markdown)
.join("\n\n")
const { embedding } = await embed({
model: openai.embedding("text-embedding-3-small"),
value: text,
})
// 3. Upsert into your vector store
await supabase.from("email_embeddings").upsert({
thread_id: thread.id,
subject: thread.subject,
content: text,
embedding,
labels: thread.labels,
})
}Thread-level chunks
Emails grouped by RFC 5256 thread - each chunk contains full conversation context, not isolated messages.
Pre-cleaned Markdown
HTML, tracking pixels, and boilerplate removed before you embed. Your vectors are cleaner, your retrieval sharper.
AI labels as metadata
Filter retrieval by category, action or sentiment labels - built-in metadata for precision retrieval.
Built-in semantic search
Use InboxParse's own hybrid search endpoint instead of managing a separate vector store.
Webhook-driven ingestion
Receive a webhook on every new email. Trigger your embedding pipeline in real time, not via a polling loop.
Works with any vector store
Output is plain Markdown + JSON metadata. Works with pgvector, Pinecone, Weaviate, Chroma, or any other store.
Query your email RAG pipeline
Copy-and-paste ready. No boilerplate.
// Option A - use InboxParse's built-in semantic search
const res = await fetch(
"https://inboxparse.com/api/v1/search" +
"?q=renewal+offer+pricing&mode=semantic&limit=5",
{ headers: { Authorization: "Bearer ip..." } }
)
const { data: results } = await res.json()
// Option B - bring your own vector store
import { cosineSimilarity } from "ai"
const { embedding: queryEmbedding } = await embed({
model: openai.embedding("text-embedding-3-small"),
value: "renewal offer pricing",
})
const { data: rows } = await supabase.rpc("match_email_embeddings", {
query_embedding: queryEmbedding,
match_count: 5,
})
// Use results as context for your LLM
const context = rows.map((r) => r.content).join("\n\n---\n\n")
const reply = await generateText({
model: openai("gpt-4o-mini"),
prompt: `Based on these emails:\n\n${context}\n\nAnswer: What renewal offers did we make?`,
})Frequently asked questions
What vector stores work with InboxParse?+
InboxParse outputs plain Markdown and JSON metadata, so it works with any vector store - pgvector, Pinecone, Weaviate, Chroma, Qdrant, or Supabase Vector. You can also skip external stores entirely and use InboxParse's built-in semantic search.
How are emails chunked for embeddings?+
Emails are grouped into threads using RFC 5256 threading. Each thread becomes a single chunk with full conversation context, rather than isolated messages. This gives your retrieval layer better semantic coherence.
Can I trigger embedding ingestion in real time?+
Yes. Configure an InboxParse webhook to fire on every new email. Your embedding pipeline runs on each webhook event, so your vector store stays up to date without polling.
Explore more use cases
Email is your most valuable unstructured data source.
InboxParse makes it RAG-ready. No boilerplate. No parsers. Just vectors.