Curate LabsCurate Labs

Research

Document Understanding

Making long, visual, and semi-structured documents usable.

Important business context often lives in PDFs, statements, filings, contracts, scans, and long notes. Document understanding studies how to parse that material while keeping structure, layout, and provenance intact.

Core Questions

  • How should a system represent pages, sections, tables, images, spans, and extracted facts together?

  • What should be preserved from layout and visual structure before a model summarizes or reasons?

  • How do we make document-derived outputs reviewable by operators and advisors?

Artifacts

  • DocUnderstand experiments.

  • Document parsing and provenance notes.

  • Research reads on graph-aware document extraction.

What It Means

Where It Shows Up

Evidence-Backed Records

Parse long-form and semi-structured documents into evidence-backed records.

Source-Linked Review

Support source-linked review for compliance, finance, and operating decisions.

Document-To-Graph Inputs

Feed extraction and graph workflows without losing page-level context.

Why It Matters

How This Research Gets Used

Applied

Product direction

Research themes shape product workflows, internal evaluation, and open-source implementation choices.

Evidence

Reviewable decisions

The work emphasizes assumptions, provenance, and feedback loops that humans can inspect.

Browse Research

Making long, visual, and semi-structured documents usable.