Most document-parsing tools are output-first: you configure field names, maybe write a sample, and the system tries to extract those fields by name. The problem surfaces quickly in production — the tool returns something for every field whether or not the data was actually present, there is no signal about extraction quality, and the output type is always a flat string regardless of what you asked for.
We took a different path. MailFrame is schema-first: you define the exact shape and types of data you want before any parsing happens, and every extraction attempt is validated and scored against that contract.
Defining a Schema
A MailFrame schema is a standard JSON Schema (draft-07) document stored against your account. You can use the full vocabulary: required fields, type constraints, string formats (date, email, uri), enumerations, nested objects, and arrays.
Here is a schema for parsing shipping notifications:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["carrier", "tracking_number", "status", "estimated_delivery"],
"properties": {
"carrier": {
"type": "string",
"enum": ["UPS", "FedEx", "DHL", "USPS", "Other"]
},
"tracking_number": {
"type": "string",
"minLength": 8
},
"status": {
"type": "string",
"enum": ["label_created", "in_transit", "out_for_delivery", "delivered", "exception"]
},
"estimated_delivery": {
"type": "string",
"format": "date"
},
"origin": { "type": "string" },
"destination": { "type": "string" },
"events": {
"type": "array",
"items": {
"type": "object",
"required": ["timestamp", "description"],
"properties": {
"timestamp": { "type": "string", "format": "date-time" },
"description": { "type": "string" },
"location": { "type": "string" }
}
}
}
}
}
Notice the enum constraints on carrier and status. These are not hints — they are hard constraints. If the model extracts a value that does not appear in the enum, the field fails schema validation and is surfaced as a low-confidence result requiring review.
The Extraction Pipeline
When an email arrives, the pipeline runs roughly as follows:
- Input preparation — Raw MIME is decoded; the body and any text attachments are separated and normalized so the model sees clean, structured text rather than transport encoding.
- Schema-conditioned prompt construction — The schema is compiled into an extraction prompt that tells the model exactly which fields to extract, their types, and their constraints. The model never sees a generic “extract everything” instruction.
- Schema validation — The model output is validated against the submitted JSON Schema. Fields that fail type checks, format constraints, or enum membership are flagged individually, so a malformed or out-of-contract value never silently reaches your application.
- Confidence scoring — Each field gets an independent confidence score derived from model signals and, where applicable, cross-field consistency checks (e.g., does the extracted
total_amountequal the sum ofline_items?).
On the roadmap. Several reliability features are planned rather than shipping today: rendering PDF and image inputs (including scan-quality enhancement) into the same schema-first pipeline, constrained decoding that restricts the output token space to valid JSON during generation, and automatic secondary-model fallback that re-runs a low-confidence extraction on a different model before returning a single result. We call these out explicitly so you can plan against what exists now versus what is coming.
Confidence Scoring in Practice
The top-level confidence value is the geometric mean of per-field scores. Individual scores appear in field_scores and anything below the configured threshold appears in low_confidence_fields:
{
"confidence": 0.91,
"field_scores": {
"carrier": 0.99,
"tracking_number": 0.99,
"status": 0.97,
"estimated_delivery": 0.72,
"events": 0.88
},
"low_confidence_fields": ["estimated_delivery"]
}
estimated_delivery scored 0.72 because the email said “should arrive by end of week” with no explicit date — the model made a reasonable inference, but reported low certainty. Your application code can branch on this:
interface ParseResult {
job_id: string;
confidence: number;
low_confidence_fields: string[];
data: Record<string, unknown>;
}
function handleShipmentWebhook(payload: ParseResult) {
const CONFIDENCE_THRESHOLD = 0.85;
if (payload.confidence < CONFIDENCE_THRESHOLD || payload.low_confidence_fields.length > 0) {
// Route to a human review queue
enqueueForReview(payload.job_id, payload.low_confidence_fields);
return;
}
// Safe to auto-process
processShipmentUpdate(payload.data);
}
Verifying Webhook Authenticity
Before acting on any payload, verify the HMAC-SHA256 signature in the MailFrame-Signature header. The signature is computed over the raw request body using your webhook signing secret:
import { createHmac, timingSafeEqual } from "node:crypto";
function verifyMailFrameSignature(
rawBody: Buffer,
signatureHeader: string,
secret: string
): boolean {
try {
const expected = createHmac("sha256", secret).update(rawBody).digest();
// Header is "sha256=<hex>" — strip the scheme prefix and decode.
const received = Buffer.from(signatureHeader.replace(/^sha256=/, ""), "hex");
// timingSafeEqual throws on a length mismatch, so guard first; the
// constant-time compare prevents inferring the signature byte by byte.
return received.length === expected.length && timingSafeEqual(received, expected);
} catch {
// Malformed signature (bad hex / wrong length) — reject, never throw.
return false;
}
}
Always use timingSafeEqual rather than ===. Constant-time comparison prevents an attacker from inferring the correct signature one byte at a time through response latency differences.
Why Not Just Prompt-Engineer Your Way Through?
Tools like Parseur and Mailparser take a template or rule-based approach. That works well for highly regular document formats from a single predictable source. Schema-first LLM extraction with validation wins in different scenarios:
- Documents from many different senders with varying layouts and terminology
- Nested structures (line items inside invoices, events inside shipment records) that are difficult to express as flat field rules
- Cases where you need fine-grained confidence rather than a binary pass/fail
- Schemas that evolve — add a new required field to your JSON Schema and extraction immediately enforces it without rewriting any rules
The trade-off is latency: schema-validated LLM extraction takes longer than regex or positional template matching. For high-frequency, highly regular documents from a single sender, a template tool may be faster and cheaper. MailFrame targets the long tail of document variety where templates break.
What’s Next
We are actively working on schema versioning with migration helpers, field-level extraction explanations (why did the model score this field low?), and batch webhooks for high-volume ingestion tiers.
If you want to try defining a schema against your own documents, request early access and we will work through the schema design with you directly.