Question 1

Which file types are supported, and what do I get back?

Accepted Answer

In: PDF (digital and scanned), DOCX, PPTX, TXT, Markdown, HTML, and HTM, plus ZIP bundles that mix any of these. Drop a ZIP and FileDigest expands it and processes the files inside as one job. Out: a combined digest.md and manifest.json, plus per-source Markdown, HTML, Docling DocTags, Docling JSON, and heading-aware RAG chunks. Everything is downloadable.

Question 2

How can I check the conversion is accurate?

Accepted Answer

You read it. The viewer puts the original document on the left and the converted output on the right so you can scroll them together, and PDFs render inline. Switch the right pane between Markdown, HTML, chunks, DocTags, and JSON to see exactly what would reach the model. The source index in the pack lists every file with its type, page count, token count, and status.

Question 3

Does a language model rewrite my documents?

Accepted Answer

No. The engine is Docling, a deterministic open-source converter. It recovers structure from the document itself, with no LLM paraphrasing in the pipeline. Because there is no model in the loop, there is nothing being trained on your files.

Question 4

How does FileDigest handle scanned PDFs and tables?

Accepted Answer

Accurate-tables mode uses Docling's TableFormer to reconstruct table structure, and OCR reads scanned and image-only pages. FileDigest also auto-detects a scanned PDF that came back nearly empty and retries it with OCR, so you do not have to flag it yourself. OCR is available on Pro. Optional enrichments can also convert formulas to LaTeX, extract code, and describe pictures.

Question 5

Is there an API for agents and scripts?

Accepted Answer

Yes. The same Docling pipeline runs behind a REST API: submit a job with POST /v1/parse and poll it with GET /v1/jobs/{id} using a Bearer key. The contract is published as OpenAPI 3.1, errors follow RFC 9457 problem+json, and idempotency keys let retries stay safe. The /v1 API is available to any signed-in account within its plan limits, and it returns the same output as the dashboard.

Question 6

Where are my files stored, and how long are they kept?

Accepted Answer

Uploads go to per-user private storage, every request is gated by an ownership check, and downloads are served through short-lived signed links rather than public URLs. Files auto-delete on a schedule set by your plan: 72 hours on Free and 30 days on Pro, with custom retention including zero-retention on Enterprise. Docling does not train on your documents.

Question 7

What are the limits and pricing?

Accepted Answer

Free covers up to 25 files or 100 MB per job with all output formats. Pro is 15 dollars a month and adds OCR, the API, and up to 100 files or 1 GB per job. Enterprise adds custom volume and retention, a DPA, SSO, an SLA, and the option to self-host the same Docling engine.

Compile your files into AI-ready context.

Read the original and the output side by side

Why the output is faithful: Docling

Clean Markdown. JSON. RAG chunks.

Common questions

Stop pasting. Start compiling.