Powered by Docling, IBM Research's open-source converter

Compile your files into AI-ready context.

Hand an AI a whole folder of messy documents and trust what it reads. FileDigest turns the packet into one clean, source-labeled Markdown digest for ChatGPT, Claude, Gemini, Cursor, and RAG.

Free · up to 25 files or 100 MB per job · no credit card

filedigest.dev · real public demo run · job df56be01

A messy folder

  • nist-ai-risk-management-framework.pdf48 pages · dense tables
  • scanned-field-log.pdfimage-only scan · auto-OCR
  • Earth_Lithograph.pdfNASA · 2 pages
  • ffc.docx + ffc.pptxOffice formats
  • mdn-beginner-html-index.htmlraw HTML
  • good-readme-template.mdMarkdown

One digest.md

# FileDigest Context Pack
Generated: 2026-06-10 · Job: df56be01...
Engine: Modal L4 + Docling
Files processed: 7/7

## Source Index
| ID   | File                          | Pages | Tokens |
|------|-------------------------------|------:|-------:|
| S001 | nist-ai-risk-management...pdf |    48 | 20,671 |
| S002 | scanned-field-log.pdf         |     1 |    148 |
| S003 | Earth_Lithograph.pdf          |     2 |  1,563 |
| S004 | ffc.docx                      |     0 |     57 |

## S001: nist-ai-risk-management-framework.pdf
...complete extraction, abstract through
appendix, 144k characters of clean Markdown

This is the actual output of a real production run, not a mockup. Download the full packet.

Verify, do not guess

Read the original and the output side by side

Scroll the original document on the left. Read the converted output on the right, in whichever format you need.

Loading example…

Examples are public or permissively licensed; see attribution.

The engine

Why the output is faithful: Docling

The conversion is done by Docling, the open-source document converter from IBM Research, now an LF AI and Data project. It reads a document's real layout, reading order, tables, and headings, and rebuilds them as structure, so a table stays a table instead of collapsing into copy-paste mush.

No language model rewrites, summarizes, or paraphrases your text along the way. FileDigest runs the real engine on warm GPU workers and wraps it with upload, storage, and the side-by-side viewer. Because the converter is open source and deterministic, there is no model being trained on your files, and Enterprise teams can self-host the exact same engine.

How FileDigest works, step by step

  • Tables stay tables, not a wall of run-together numbers
  • Multi-column pages are read in the right order
  • Headings and document structure are kept
  • Scanned and image-only pages are read with OCR
  • Formulas and code can be recovered as an option
  • Deterministic: the same file gives the same output

What comes back

Clean Markdown. JSON. RAG chunks.

Every job returns two artifacts you can use right away, plus five representations of each source so you pick the one your workflow needs. Token counts come from tiktoken, per source and for the whole pack, so the size is a number you read.

  • digest.md: one readable, source-labeled Markdown pack with a token count on every file
  • manifest.json: structured metadata for sources, status, pages, sizes, warnings, and token estimates
  • Side-by-side viewer to check the original against the converted output
  • Private by default: per-user storage, ownership checks, and short-lived signed links

Five representations per source

  • MarkdownClean, source-labeled text you can paste into any model.
  • HTMLThe same content as markup, structure intact.
  • DocTagsDocling's layout tags for fine-grained downstream parsing.
  • JSONThe full structured Docling document for your own tooling.
  • RAG chunksHeading-aware chunks, split on structure, not character count.
FAQ

Common questions

Stop pasting. Start compiling.

Free on small jobs, no card: up to 25 files or 100 MB per job, every output format, the side-by-side viewer, and files that auto-delete after 72 hours.