# FileDigest Full Public Context
FileDigest converts PDFs, DOCX, PPTX, TXT, Markdown, HTML, and ZIP bundles into AI-ready Markdown digests and manifest.json files. This file is a crawlable public context bundle generated from product pages and docs.
# Product Pages
## FileDigest Examples (Real Output Packet)
URL: https://filedigest.dev/examples
Description: Download a reproducible public demo packet: the original source files plus the real FileDigest outputs, including per-source Markdown, HTML, Docling DocTags, JSON, and RAG chunks.
Start here to inspect the actual output FileDigest produces. This public demo packet was generated by uploading real public / permissively licensed files through the live FileDigest app. The files below are the actual stored artifacts from the job.
## Public demo packet
The featured document is the NIST AI Risk Management Framework (NIST AI 100-1), a born-digital government report dense with tables and structured sections, so you can see Docling extract clean, faithful structure from real layout. The packet also includes an image-only scanned page to show automatic OCR recovering text from an unselectable scan.
| Item | Value |
| --- | --- |
| Production job | `df56be0156354d259b5b63b4e08dabd4` |
| Final status | `SUCCEEDED` |
| Files parsed | 7 of 7 |
| Output tokens | 24,017 |
| RAG chunks | 69 |
| Warnings | None |
| Engine | Docling on GPU workers |
## Download the generated outputs
- [Download digest.md](/proof/filedigest-public-demo-2026-04-30/outputs/digest.md): the combined, source-organized Markdown context pack.
- [Download manifest.json](/proof/filedigest-public-demo-2026-04-30/outputs/manifest.json): structured run metadata plus, for each source, the full set of representations (see below).
- [Download provenance.json](/proof/filedigest-public-demo-2026-04-30/provenance.json): source URLs, hashes, and job provenance.
## What is inside each manifest source
The upgraded engine returns more than plain text. For every source file, `manifest.json` includes a `representations` block with:
- `markdown`: clean Markdown for that source.
- `html`: rendered HTML.
- `doctags`: Docling DocTags (structured layout tokens with positions).
- `docling_json`: the full DoclingDocument JSON.
- `chunks`: heading-contextualized chunks ready to embed for retrieval.
In the app these are shown in a side-by-side viewer: the original file on the left and any representation (Markdown, HTML, Chunks, DocTags, JSON) on the right, so you can confirm tables, headings, and figures landed correctly before using the output.
## Download the original inputs
| File | Source | License / status |
| --- | --- | --- |
| [nist-ai-risk-management-framework.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/nist-ai-risk-management-framework.pdf) | NIST AI Risk Management Framework (NIST AI 100-1, January 2023) | Public domain (US Gov) |
| [scanned-field-log.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/scanned-field-log.pdf) | Image-only scan generated for this demo (auto-OCR showcase) | CC0-1.0 |
| [Earth_Lithograph.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/Earth_Lithograph.pdf) | NASA Earth Lithograph | NASA educational media |
| [ffc.docx](/proof/filedigest-public-demo-2026-04-30/inputs/ffc.docx) | file-format-commons DOCX sample | CC0-1.0 |
| [ffc.pptx](/proof/filedigest-public-demo-2026-04-30/inputs/ffc.pptx) | file-format-commons PPTX sample | CC0-1.0 |
| [mdn-beginner-html-index.html](/proof/filedigest-public-demo-2026-04-30/inputs/mdn-beginner-html-index.html) | MDN beginner HTML sample | CC0-1.0 |
| [good-readme-template.md](/proof/filedigest-public-demo-2026-04-30/inputs/good-readme-template.md) | Public README template | CC0-1.0 |
## How this packet was produced
1. Seven public or permissively licensed files were collected and archived.
2. The files were uploaded through the app into private storage.
3. The job was processed by the production Docling engine (worker time 21.3 seconds on this run).
4. The generated `digest.md` and `manifest.json` were downloaded from the job detail page.
5. The production job was deleted after the public artifact copies were saved.
This is one public demo packet, not a universal benchmark. It does not prove how every scanned, damaged, encrypted, image-heavy, or unusually formatted file will parse. It does show the exact output contract the live app produces on a mixed public packet.
## How to reproduce it
Download the input files above, open the FileDigest dashboard, then drop, paste, or choose them. Processing starts automatically (there is no separate upload-then-process step) and routes you to a live job view. Keep the default fast extraction mode. The outputs should follow the same contract, though token counts and worker time may vary with engine updates.
## Use it from an agent
The same job runs behind the API. Submit with `POST /v1/parse` (Bearer key) and poll `GET /v1/jobs/{id}`; when complete the response carries the digest plus the per-source representations and chunks. See the [OpenAPI spec](/openapi.json) and [agent docs](/llms.txt).
## Privacy
URL: https://filedigest.dev/privacy
Description: FileDigest privacy overview.
Last updated: April 28, 2026.
FileDigest prepares user-uploaded documents for AI workflows. This notice explains what we collect, how we use it, and how to contact us about privacy or deletion requests.
## What we collect
FileDigest collects account information such as email address, authentication identifiers, subscription metadata, and basic operational logs.
When you create a document job, FileDigest stores metadata about the job, including file names, file sizes, MIME types, job status, processing options, token estimates, artifact metadata, timestamps, and error states.
Uploaded source files and generated artifacts are stored in private object storage paths associated with your user and job. Document processing runs through the Modal Docling engine.
FileDigest also collects privacy-safe analytics and attribution data, such as page views, CTA clicks, UTM parameters, ad click identifiers, landing paths, and referrer host names. We do not send document contents, file names, storage keys, email addresses, or raw referrer URLs in product analytics events.
## How we use data
We use account, billing, storage, and processing data to provide the FileDigest service, enforce plan limits, process documents, generate artifacts, troubleshoot failures, prevent abuse, and improve product reliability.
We use attribution and analytics data to understand which pages and campaigns lead to signup, successful document jobs, and checkout intent.
We do not sell uploaded documents. We do not intentionally train foundation models on user-uploaded files as part of the FileDigest service. Document conversion runs on the deterministic open-source Docling engine rather than a proprietary model, so there is no model that learns from your files.
## Processors
FileDigest relies on third-party infrastructure providers for hosting, storage, and document parsing, including Vercel, Supabase, Modal, Stripe, and optional email, monitoring, and analytics providers. These providers process data only as needed to operate the service.
## Retention
Artifact retention depends on your plan. Free jobs use 72 hour artifact retention, Pro jobs use 30 day artifact retention, and Enterprise jobs use custom retention (including zero-retention). Deleted jobs are designed to remove application metadata and associated private storage objects through the cleanup workflow.
## Security
The browser never receives the Modal engine API key. Downloads are served through authenticated ownership checks and short-lived signed URLs. Plan limits are enforced before expensive processing begins.
## Contact
For privacy or deletion requests, contact support@filedigest.dev.
## Terms & Conditions
URL: https://filedigest.dev/terms
Description: FileDigest terms and acceptable use overview.
Last updated: April 28, 2026.
These terms describe the operating rules for FileDigest accounts, uploads, processing, billing, and generated artifacts.
## Service
FileDigest is a document preparation SaaS that converts uploaded files into AI-ready artifacts such as Markdown digests and JSON manifests. Processing runs on the Modal Docling engine. The service is not a legal, medical, financial, or compliance adviser.
## Accounts
You are responsible for maintaining access to your account and for activity performed under it. You may only upload files that you have the right to process.
## Acceptable use
Do not use FileDigest to process illegal material, violate third-party rights, bypass security controls, attack the service, or upload content that you are not permitted to handle.
## Billing
Paid plans are billed through Stripe. Plan limits, OCR access, monthly quotas, retention, and pricing are shown on the pricing page and may change for future customers. Existing subscriptions are managed through the billing portal.
## Uploaded content
You keep ownership of your uploaded content. FileDigest stores and processes uploaded files only to provide the service, generate artifacts, enforce limits, and operate the platform.
## Availability
FileDigest depends on third-party infrastructure providers. We aim to keep the service reliable, but processing failures, provider outages, timeouts, and document parsing errors can occur.
## Limitation of liability
Use FileDigest outputs with human review. Document conversion can be incomplete or incorrect, especially for scanned files, complex layouts, tables, figures, or OCR-heavy material.
## Contact
Questions about these terms can be sent to support@filedigest.dev.
# Documentation
## How FileDigest Works
URL: https://filedigest.dev/docs
Description: A user-facing overview of FileDigest document preparation and the pipeline that runs after you upload.
FileDigest turns supported source documents into AI-ready artifacts you can inspect before using them in ChatGPT, Claude, RAG prep, or analyst workflows.
The core output is a readable `digest.md` plus a structured `manifest.json`. The digest is for humans and LLM context windows. The manifest is for file-level review, metadata checks, and repeatable downstream workflows.
### Upload a document packet
Start with PDFs, DOCX, PPTX, TXT, Markdown, HTML, or a ZIP bundle containing supported files.
### Choose processing options
Select fast text extraction for normal jobs, accurate tables when structure matters more, or OCR when your plan includes scanned-document processing.
### Review the result
Open the completed job to inspect the digest, manifest, parsed files, warnings, failed files, and token estimates.
### Download private artifacts
Download `digest.md` and `manifest.json` through authenticated, short-lived links tied to your account.
FileDigest is intentionally narrow: it prepares source documents for AI use. It is not a chat app, not a public file host, and not a replacement for human review.
## What happens after you upload
FileDigest separates upload, validation, processing, and artifact download so each step is visible.
### Create a job
The workbench checks file count, job size, estimated output tokens, OCR access, and monthly quota.
### Upload privately
Your browser uploads selected files to private storage paths assigned to your job.
### Register files
After upload, FileDigest confirms the files exist and prepares the packet for processing.
### Generate artifacts
The processing engine converts supported inputs into `digest.md` and `manifest.json`.
### Review the job
The job page shows status, warnings, failed files, digest preview, manifest preview, and private downloads.
Ready to try it? Follow [Create your first digest](/docs/first-digest), or call the same pipeline from code with the [API](/docs/api).
## API Reference
URL: https://filedigest.dev/docs/api
Description: Parse any document into AI-ready context from your own code or an agent. One endpoint to submit, one to poll.
FileDigest ships a small public API so an agent or a script can do exactly what the dashboard does: send a file, get back clean Markdown, structured per-source representations, and RAG chunks. There are two endpoints, both authenticated with a Bearer key.
## Authentication
Every request needs an API key. Create one in your dashboard under [FileDigest Settings](/dashboard/filedigest/settings), then send it as a Bearer token:
```bash
Authorization: Bearer fd_live_...
```
Keys are tied to your account and your plan limits. Calls without a valid key return `401`.
## Submit a parse job
`POST /v1/parse` accepts either a multipart `file` or a JSON `{ source_url }`. It hides the create, upload, register, and process steps and returns `202` with a job id to poll.
```bash
curl -X POST https://filedigest.dev/v1/parse \
-H "Authorization: Bearer fd_live_..." \
-F "file=@report.pdf" \
-F "mode=accurate_tables"
```
To parse a file by URL instead of uploading bytes:
```bash
curl -X POST https://filedigest.dev/v1/parse \
-H "Authorization: Bearer fd_live_..." \
-H "Content-Type: application/json" \
-d '{ "source_url": "https://example.com/report.pdf", "ocr": true }'
```
Response:
```json
{ "job_id": "abc123", "status": "accepted", "poll": "/v1/jobs/abc123" }
```
### Options
| Field | Values | What it does |
| --- | --- | --- |
| `mode` | `fast_text`, `accurate_tables` | Extraction strategy. Use accurate tables when structure matters. |
| `ocr` | `true`, `false` | Run OCR on scanned or image-only pages (requires a plan with OCR). |
| `quality` | `standard`, `high` | High uses the VLM pipeline for hard layouts (slower). |
| `enrich_formulas` | `true`, `false` | Convert math to LaTeX (slower). |
| `enrich_code` | `true`, `false` | Detect code blocks and language (slower). |
| `describe_pictures` | `true`, `false` | Generate image captions (slower, VLM). |
### Idempotency
Send an `Idempotency-Key` header to make retries safe. Replaying the same key returns the original job instead of creating a duplicate.
The file size limit is 100MB per API request. Over-limit, quota, and engine errors come back as RFC 9457 problem details with a `code` field (for example `QUOTA_EXCEEDED`, `FILE_TOO_LARGE`, `MODAL_UNAVAILABLE`).
## Poll for the result
`GET /v1/jobs/{id}` returns the current status. While the job is `pending` or `processing`, poll until it reaches `completed` or `failed`.
```bash
curl https://filedigest.dev/v1/jobs/abc123 \
-H "Authorization: Bearer fd_live_..."
```
A completed job carries the result inline:
```json
{
"job_id": "abc123",
"status": "completed",
"result": {
"tokens": 24017,
"parsed_files": 7,
"failed_files": 0,
"digest": "# report.pdf\n...AI-ready Markdown...",
"manifest": { }
}
}
```
## Output
The `result` block holds everything you need downstream:
- `digest`: the combined, source-organized Markdown context pack (the same `digest.md` you download in the app).
- `manifest`: structured run metadata plus, for each source, a `representations` block with `markdown`, `html`, `doctags`, `docling_json`, and heading-contextualized `chunks` ready to embed.
- `tokens`, `parsed_files`, `failed_files`: counts for the run.
The dashboard also exposes the matching `provenance.json` for source URLs, hashes, and job provenance. See the [Examples](/examples) page for a real packet you can download.
## Machine-readable contract and agent files
- [OpenAPI 3.1 spec](/openapi.json): the full machine contract for both endpoints.
- [llms.txt](/llms.txt): a short agent-discovery file describing the product and its endpoints.
- [llms-full.txt](/llms-full.txt): the expanded agent-discovery file.
These let an agent discover and call FileDigest without reading this page first.
## Dashboard Guide
URL: https://filedigest.dev/docs/dashboard-guide
Description: Main FileDigest dashboard areas and what each one is for.
## FileDigest
Create new jobs, upload source files, choose OCR or table settings, and start processing.
## Job detail
Review the current status, output preview, manifest preview, file list, warnings, and private downloads for one job.
## Usage
Check monthly output-token usage and recent processing activity.
## Billing
Review the active plan and open billing management for paid subscriptions.
## Settings
Review plan limits, retention, OCR availability, and account settings.
## Create Your First Digest
URL: https://filedigest.dev/docs/first-digest
Description: How to upload files and produce your first FileDigest output.
### Open the workbench
Sign in and open the FileDigest workbench from the dashboard.
### Choose files
Upload PDFs, DOCX, PPTX, TXT, Markdown, HTML, or a ZIP bundle. The page shows your current plan limits before processing starts.
### Select options
Use fast extraction for most jobs. Choose accurate tables for structure-heavy files. Turn on OCR only when your plan supports it and the file needs image-based text recognition.
### Start processing
FileDigest uploads files to private storage, registers the job, and starts secure document processing.
### Inspect and download
When the job finishes, review `digest.md`, inspect `manifest.json`, copy the digest, or download the private artifacts.
If a job fails, check the file list and warnings first. Most failures come from unsupported file types, oversized jobs, password-protected files, or scans that need OCR.
## Login And Email
URL: https://filedigest.dev/docs/login-email
Description: Account access, email sign-in, and support contact guidance.
FileDigest uses email-based account access so your jobs, plan, and private artifacts stay tied to your user account.
### Sign in
Use the sign-in button and enter the email address you want associated with your FileDigest work.
### Confirm access
Follow the sign-in email from the same browser when possible. If a link expires, request a fresh one.
### Open your dashboard
After sign-in, the dashboard opens the FileDigest workbench and your job history.
### Contact support
Use `support@filedigest.dev` for account access, billing, failed jobs, or retention questions.
## Options And Limits
URL: https://filedigest.dev/docs/options-limits
Description: Processing choices, plan limits, and retention behavior.
FileDigest checks your plan before processing starts.
## Processing options
- Fast text extraction is the default for clean digital PDFs, DOCX, PPTX, text, Markdown, and HTML.
- Accurate tables is useful when table structure matters more than speed.
- OCR is available on paid plans for scanned or image-heavy PDFs.
## Plan limits
| Plan | Files per job | Job size | Monthly output tokens | Retention |
|---|---:|---:|---:|---:|
| Free | 25 | 100 MB | 2M | 72 hours |
| Pro | 100 | 1 GB | 100M | 30 days |
| Enterprise | Custom | Custom | Custom | Custom |
Output token estimates are safeguards. They help prevent a packet from exceeding your monthly quota or producing an artifact too large for practical AI use.
## Retention
Artifacts are retained according to plan. Download important digests before the retention window closes.
## Plans And Billing
URL: https://filedigest.dev/docs/plans-billing
Description: FileDigest plan limits, billing behavior, and subscription changes.
FileDigest plans control file count, job size, OCR access, monthly output tokens, and artifact retention.
| Plan | Price | Core limits |
|---|---:|---|
| Free | $0 | 25 files/job, 100 MB/job, OCR off, 2M output tokens/month, 72 hour retention |
| Pro | $15/month or $144/year | 100 files/job, 1 GB/job, OCR on, 100M output tokens/month, 30 day retention |
| Enterprise | Contact sales | Custom volume and retention (including zero-retention), OCR on, DPA, SSO, and an SLA |
## Upgrades
Choose a paid plan from pricing when you need larger packets, OCR, more monthly output tokens, or longer retention.
## Billing management
Paid users manage plan changes, invoices, cancellation, and renewal details from the billing page.
## Custom needs
Email `support@filedigest.dev` for retention, volume, team workflow, or API roadmap questions.
## Supported Files
URL: https://filedigest.dev/docs/supported-files
Description: File types, bundles, and outputs supported by FileDigest.
## Inputs
Primary formats are PDF, DOCX, and PPTX. FileDigest also accepts TXT, Markdown, HTML, HTM, and ZIP bundles containing supported files.
ZIP bundles are useful for packets that belong together: a paper plus appendix, a client packet plus notes, or a policy document plus supporting text.
## Outputs
Every successful job is built around two artifacts:
- `digest.md`: readable Markdown designed for AI context windows and human review.
- `manifest.json`: structured metadata for file status, artifact status, sizes, warnings, and token estimates.
## What to avoid
Avoid password-protected documents, unsupported binaries, huge media files, and ZIP bundles that mostly contain unsupported formats.
For scans or image-heavy PDFs, use OCR on a paid plan.
## Troubleshooting
URL: https://filedigest.dev/docs/troubleshooting
Description: Common FileDigest job issues and what to check first.
### Unsupported file
Use PDF, DOCX, TXT, Markdown, HTML, HTM, or ZIP bundles. Files inside a ZIP must also use supported extensions.
### Job too large
Reduce the number of files, split the ZIP bundle, or upgrade if your packet exceeds your plan's file, size, or token limits.
### OCR needed
Image-heavy scans may produce poor text until OCR is enabled on a paid plan.
### Partial output
A partial job can still produce a useful digest. Check the file tab for failed files and warnings before deciding whether to re-run the packet.
### Download unavailable
Downloads require the signed-in owner of the job. If artifacts expired under your plan's retention window, create a new job.