Medical Insurance

Medical Insurance Claims

40 hours → 8 hours per week — specialists back to expert work, not data entry

Challenge

The client is a medical insurance organization (named under NDA) processing roughly 700 reimbursement claims per week. Clinics submit through an internal system; private clients submit only by email. Every message was picked up manually by an operator, entered into the internal accounting system, and passed to a specialist who reviewed documents and decided on reimbursement.

Out of a 40-hour work week, each specialist spent more than 40 hours on purely mechanical work: sorting mail, recognizing documents, manually filling fields in the system, routing the claim to the right colleague. Almost no time remained for expert work — analyzing the case and deciding on reimbursement. It was repetitive, high-cognitive-load pipeline work, and that is where most errors accumulated.

Average client response time per claim exceeded one week.

The task was to automate intake and first-pass processing of claims from private clients without losing review quality and without sending personal medical data outside the client's infrastructure.

Approach

The root issue was architectural: specialists were acting as data-entry operators for a document pipeline that could be automated end-to-end up to the decision point.

We built a digital employee — a VLM and OCR pipeline that carries a claim from incoming email to a completed card in the accounting system. Incoming attachments are deliberately heterogeneous: phone photos, scanned certificates, PDF statements, invoices — varying image quality, crumpled pages, handwritten notes, blurry stamps. Mistral OCR was selected after evaluation because it performed best on low-quality scans and non-standard document formats.

After recognition, the system validates documents against strict checklists — one per claim type, eight types in total. It extracts required fields, validates completeness, fills forms in the internal system, and routes the claim to the responsible specialist.

Human-in-the-loop was designed as architecture, not fallback. The system handles 94% of claims independently. The remaining 6% — procedural violations, unrecognized documents, non-standard situations — are not guessed at. The system alerts a specialist with an explanation of what failed and why, so they go straight to the problem instead of searching for it.

Quality is monitored through Langfuse tracing with binary operator scoring (correct / incorrect) — continuous feedback that improves accuracy over time.

Security: deployed in the client's private cloud under a signed DPA. No data leaves their infrastructure — critical for medical data and regulator requirements.

Solution

The pipeline runs as an internal tool — specialists and operators use it; end clients still submit by email as before.

An incoming email with attachments triggers the flow. The system runs OCR on each document, classifies the claim type, applies the matching checklist, extracts and validates fields, and creates a structured claim card in the accounting system. Routing rules assign the case to the responsible specialist.

AI prepares; the human decides. The system does not replace the specialist — it removes everything that does not require their expertise. The final reimbursement decision still rests with a person. The difference: instead of sorting mail and typing data, the specialist opens a ready, structured claim card and starts with analysis.

For the 6% edge cases, escalation includes full context — which documents failed, which checklist items were missing, what the system attempted. Specialists engage on genuinely complex cases only.

Stack: Mistral OCR, VLM pipeline, Langfuse for tracing and quality control, private cloud deployment, DPA in place.

Results

Specialist time on incoming claim intake dropped from 40 hours per week to 8 hours per week — a 5× reduction. Specialists stopped being data-entry operators and returned to expert work.

Average client response time went from more than one week to one day. That is the change the end customer of the insurer actually feels.

Processing errors fell by 57% through elimination of manual entry and reduced cognitive load on specialists.

94% automated handling accuracy — the system closes the overwhelming majority of the inbound flow without human triage.

Production timeline: 3 weeks from kick-off to live deployment.

Economic effect: at an approximate cost-of-ownership of ~$3,000 per month per operator on pipeline processing, the system frees the equivalent of several FTE — without headcount cuts, but with a sharp shift of load toward tasks that require expertise rather than typing.

Learnings

The highest-value automation boundary in regulated claims is not "replace the decision" — it is "replace everything before the decision." Specialists trusted the system once they saw it stopped at the reimbursement call and escalated with context instead of guessing.

Human-in-the-loop works when escalation is informative, not when the model silently fails. The 6% escalation path — explicit alert, reason, partial extraction — was what made 94% auto-handling acceptable to operators from day one.

Private cloud deployment was not a nice-to-have; it was the procurement gate. Medical data that never leaves the client's infrastructure under a signed DPA unlocked sign-off that a SaaS OCR tool could not get.

OCR choice mattered more than model size. On intentionally messy real-world attachments, Mistral OCR outperformed alternatives on the scans that actually arrive from private clients — not clean lab PDFs.

Discuss your project

30-minute call. We tell you honestly if we can help.

Custom AI builds Back to cases