# FileDigest Full Public Context

FileDigest converts PDFs, DOCX, PPTX, TXT, Markdown, HTML, and ZIP bundles into AI-ready Markdown digests and manifest.json files. This file is a crawlable public context bundle generated from product pages and docs.

# Product Pages

## FileDigest Examples (Real Output Packet)

URL: https://filedigest.dev/examples
Description: Download a reproducible public demo packet: the original source files plus the real FileDigest outputs, including per-source Markdown, HTML, Docling DocTags, JSON, and RAG chunks.

Start here to inspect the actual output FileDigest produces. This public demo packet was generated by uploading real public / permissively licensed files through the live FileDigest app. The files below are the actual stored artifacts from the job.

## Public demo packet

The featured document is the NIST AI Risk Management Framework (NIST AI 100-1), a born-digital government report dense with tables and structured sections, so you can see Docling extract clean, faithful structure from real layout. The packet also includes an image-only scanned page to show automatic OCR recovering text from an unselectable scan.

| Item | Value |
| --- | --- |
| Production job | `df56be0156354d259b5b63b4e08dabd4` |
| Final status | `SUCCEEDED` |
| Files parsed | 7 of 7 |
| Output tokens | 24,017 |
| RAG chunks | 69 |
| Warnings | None |
| Engine | Docling on GPU workers |

## Download the generated outputs

- [Download digest.md](/proof/filedigest-public-demo-2026-04-30/outputs/digest.md): the combined, source-organized Markdown context pack.
- [Download manifest.json](/proof/filedigest-public-demo-2026-04-30/outputs/manifest.json): structured run metadata plus, for each source, the full set of representations (see below).
- [Download provenance.json](/proof/filedigest-public-demo-2026-04-30/provenance.json): source URLs, hashes, and job provenance.

## What is inside each manifest source

The upgraded engine returns more than plain text. For every source file, `manifest.json` includes a `representations` block with:

- `markdown`: clean Markdown for that source.
- `html`: rendered HTML.
- `doctags`: Docling DocTags (structured layout tokens with positions).
- `docling_json`: the full DoclingDocument JSON.
- `chunks`: heading-contextualized chunks ready to embed for retrieval.

In the app these are shown in a side-by-side viewer: the original file on the left and any representation (Markdown, HTML, Chunks, DocTags, JSON) on the right, so you can confirm tables, headings, and figures landed correctly before using the output.

## Download the original inputs

| File | Source | License / status |
| --- | --- | --- |
| [nist-ai-risk-management-framework.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/nist-ai-risk-management-framework.pdf) | NIST AI Risk Management Framework (NIST AI 100-1, January 2023) | Public domain (US Gov) |
| [scanned-field-log.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/scanned-field-log.pdf) | Image-only scan generated for this demo (auto-OCR showcase) | CC0-1.0 |
| [Earth_Lithograph.pdf](/proof/filedigest-public-demo-2026-04-30/inputs/Earth_Lithograph.pdf) | NASA Earth Lithograph | NASA educational media |
| [ffc.docx](/proof/filedigest-public-demo-2026-04-30/inputs/ffc.docx) | file-format-commons DOCX sample | CC0-1.0 |
| [ffc.pptx](/proof/filedigest-public-demo-2026-04-30/inputs/ffc.pptx) | file-format-commons PPTX sample | CC0-1.0 |
| [mdn-beginner-html-index.html](/proof/filedigest-public-demo-2026-04-30/inputs/mdn-beginner-html-index.html) | MDN beginner HTML sample | CC0-1.0 |
| [good-readme-template.md](/proof/filedigest-public-demo-2026-04-30/inputs/good-readme-template.md) | Public README template | CC0-1.0 |

## How this packet was produced

1. Seven public or permissively licensed files were collected and archived.
2. The files were uploaded through the app into private storage.
3. The job was processed by the production Docling engine (worker time 21.3 seconds on this run).
4. The generated `digest.md` and `manifest.json` were downloaded from the job detail page.
5. The production job was deleted after the public artifact copies were saved.

This is one public demo packet, not a universal benchmark. It does not prove how every scanned, damaged, encrypted, image-heavy, or unusually formatted file will parse. It does show the exact output contract the live app produces on a mixed public packet.

## How to reproduce it

Download the input files above, open the FileDigest dashboard, then drop, paste, or choose them. Processing starts automatically (there is no separate upload-then-process step) and routes you to a live job view. Keep the default fast extraction mode. The outputs should follow the same contract, though token counts and worker time may vary with engine updates.

## Use it from an agent

The same job runs behind the API. Submit with `POST /v1/parse` (Bearer key) and poll `GET /v1/jobs/{id}`; when complete the response carries the digest plus the per-source representations and chunks. See the [OpenAPI spec](/openapi.json) and [agent docs](/llms.txt).

## Privacy

URL: https://filedigest.dev/privacy
Description: FileDigest privacy overview.

Last updated: April 28, 2026.

FileDigest prepares user-uploaded documents for AI workflows. This notice explains what we collect, how we use it, and how to contact us about privacy or deletion requests.

## What we collect

FileDigest collects account information such as email address, authentication identifiers, subscription metadata, and basic operational logs.

When you create a document job, FileDigest stores metadata about the job, including file names, file sizes, MIME types, job status, processing options, token estimates, artifact metadata, timestamps, and error states.

Uploaded source files and generated artifacts are stored in private object storage paths associated with your user and job. Document processing runs through the Modal Docling engine.

FileDigest also collects privacy-safe analytics and attribution data, such as page views, CTA clicks, UTM parameters, ad click identifiers, landing paths, and referrer host names. We do not send document contents, file names, storage keys, email addresses, or raw referrer URLs in product analytics events.

## How we use data

We use account, billing, storage, and processing data to provide the FileDigest service, enforce plan limits, process documents, generate artifacts, troubleshoot failures, prevent abuse, and improve product reliability.

We use attribution and analytics data to understand which pages and campaigns lead to signup, successful document jobs, and checkout intent.

We do not sell uploaded documents. We do not intentionally train foundation models on user-uploaded files as part of the FileDigest service. Document conversion runs on the deterministic open-source Docling engine rather than a proprietary model, so there is no model that learns from your files.

## Processors

FileDigest relies on third-party infrastructure providers for hosting, storage, and document parsing, including Vercel, Supabase, Modal, Stripe, and optional email, monitoring, and analytics providers. These providers process data only as needed to operate the service.

## Retention

Artifact retention depends on your plan. Free jobs use 72 hour artifact retention, Pro jobs use 30 day artifact retention, and Enterprise jobs use custom retention (including zero-retention). Deleted jobs are designed to remove application metadata and associated private storage objects through the cleanup workflow.

## Security

The browser never receives the Modal engine API key. Downloads are served through authenticated ownership checks and short-lived signed URLs. Plan limits are enforced before expensive processing begins.

## Contact

For privacy or deletion requests, contact support@filedigest.dev.

## Terms & Conditions

URL: https://filedigest.dev/terms
Description: FileDigest terms and acceptable use overview.

Last updated: April 28, 2026.

These terms describe the operating rules for FileDigest accounts, uploads, processing, billing, and generated artifacts.

## Service

FileDigest is a document preparation SaaS that converts uploaded files into AI-ready artifacts such as Markdown digests and JSON manifests. Processing runs on the Modal Docling engine. The service is not a legal, medical, financial, or compliance adviser.

## Accounts

You are responsible for maintaining access to your account and for activity performed under it. You may only upload files that you have the right to process.

## Acceptable use

Do not use FileDigest to process illegal material, violate third-party rights, bypass security controls, attack the service, or upload content that you are not permitted to handle.

## Billing

Paid plans are billed through Stripe. Plan limits, OCR access, monthly quotas, retention, and pricing are shown on the pricing page and may change for future customers. Existing subscriptions are managed through the billing portal.

## Uploaded content

You keep ownership of your uploaded content. FileDigest stores and processes uploaded files only to provide the service, generate artifacts, enforce limits, and operate the platform.

## Availability

FileDigest depends on third-party infrastructure providers. We aim to keep the service reliable, but processing failures, provider outages, timeouts, and document parsing errors can occur.

## Limitation of liability

Use FileDigest outputs with human review. Document conversion can be incomplete or incorrect, especially for scanned files, complex layouts, tables, figures, or OCR-heavy material.

## Contact

Questions about these terms can be sent to support@filedigest.dev.


# Documentation

## How FileDigest Works

URL: https://filedigest.dev/docs
Description: A user-facing overview of FileDigest document preparation and the pipeline that runs after you upload.

FileDigest turns supported source documents into AI-ready artifacts you can inspect before using them in ChatGPT, Claude, RAG prep, or analyst workflows.

The core output is a readable `digest.md` plus a structured `manifest.json`. The digest is for humans and LLM context windows. The manifest is for file-level review, metadata checks, and repeatable downstream workflows.

<Steps>

### Upload a document packet

Start with PDFs, DOCX, PPTX, TXT, Markdown, HTML, or a ZIP bundle containing supported files.

### Choose processing options

Select fast text extraction for normal jobs, accurate tables when structure matters more, or OCR when your plan includes scanned-document processing.

### Review the result

Open the completed job to inspect the digest, manifest, parsed files, warnings, failed files, and token estimates.

### Download private artifacts

Download `digest.md` and `manifest.json` through authenticated, short-lived links tied to your account.

</Steps>

FileDigest is intentionally narrow: it prepares source documents for AI use. It is not a chat app, not a public file host, and not a replacement for human review.

## What happens after you upload

FileDigest separates upload, validation, processing, and artifact download so each step is visible.

<Steps>

### Create a job

The workbench checks file count, job size, estimated output tokens, OCR access, and monthly quota.

### Upload privately

Your browser uploads selected files to private storage paths assigned to your job.

### Register files

After upload, FileDigest confirms the files exist and prepares the packet for processing.

### Generate artifacts

The processing engine converts supported inputs into `digest.md` and `manifest.json`.

### Review the job

The job page shows status, warnings, failed files, digest preview, manifest preview, and private downloads.

</Steps>

Ready to try it? Follow [Create your first digest](/docs/first-digest), or call the same pipeline from code with the [API](/docs/api).

## API Reference

URL: https://filedigest.dev/docs/api
Description: Parse any document into AI-ready context from your own code or an agent. One endpoint to submit, one to poll.

FileDigest ships a small public API so an agent or a script can do exactly what the dashboard does: send a file, get back clean Markdown, structured per-source representations, and RAG chunks. There are two endpoints, both authenticated with a Bearer key.

## Authentication

Every request needs an API key. Create one in your dashboard under [FileDigest Settings](/dashboard/filedigest/settings), then send it as a Bearer token:

```bash
Authorization: Bearer fd_live_...
```

Keys are tied to your account and your plan limits. Calls without a valid key return `401`.

## Submit a parse job

`POST /v1/parse` accepts either a multipart `file` or a JSON `{ source_url }`. It hides the create, upload, register, and process steps and returns `202` with a job id to poll.

```bash
curl -X POST https://filedigest.dev/v1/parse \
  -H "Authorization: Bearer fd_live_..." \
  -F "file=@report.pdf" \
  -F "mode=accurate_tables"
```

To parse a file by URL instead of uploading bytes:

```bash
curl -X POST https://filedigest.dev/v1/parse \
  -H "Authorization: Bearer fd_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "source_url": "https://example.com/report.pdf", "ocr": true }'
```

Response:

```json
{ "job_id": "abc123", "status": "accepted", "poll": "/v1/jobs/abc123" }
```

### Options

| Field | Values | What it does |
| --- | --- | --- |
| `mode` | `fast_text`, `accurate_tables` | Extraction strategy. Use accurate tables when structure matters. |
| `ocr` | `true`, `false` | Run OCR on scanned or image-only pages (requires a plan with OCR). |
| `quality` | `standard`, `high` | High uses the VLM pipeline for hard layouts (slower). |
| `enrich_formulas` | `true`, `false` | Convert math to LaTeX (slower). |
| `enrich_code` | `true`, `false` | Detect code blocks and language (slower). |
| `describe_pictures` | `true`, `false` | Generate image captions (slower, VLM). |

### Idempotency

Send an `Idempotency-Key` header to make retries safe. Replaying the same key returns the original job instead of creating a duplicate.

The file size limit is 100MB per API request. Over-limit, quota, and engine errors come back as RFC 9457 problem details with a `code` field (for example `QUOTA_EXCEEDED`, `FILE_TOO_LARGE`, `MODAL_UNAVAILABLE`).

## Poll for the result

`GET /v1/jobs/{id}` returns the current status. While the job is `pending` or `processing`, poll until it reaches `completed` or `failed`.

```bash
curl https://filedigest.dev/v1/jobs/abc123 \
  -H "Authorization: Bearer fd_live_..."
```

A completed job carries the result inline:

```json
{
  "job_id": "abc123",
  "status": "completed",
  "result": {
    "tokens": 24017,
    "parsed_files": 7,
    "failed_files": 0,
    "digest": "# report.pdf\n...AI-ready Markdown...",
    "manifest": { }
  }
}
```

## Output

The `result` block holds everything you need downstream:

- `digest`: the combined, source-organized Markdown context pack (the same `digest.md` you download in the app).
- `manifest`: structured run metadata plus, for each source, a `representations` block with `markdown`, `html`, `doctags`, `docling_json`, and heading-contextualized `chunks` ready to embed.
- `tokens`, `parsed_files`, `failed_files`: counts for the run.

The dashboard also exposes the matching `provenance.json` for source URLs, hashes, and job provenance. See the [Examples](/examples) page for a real packet you can download.

## Machine-readable contract and agent files

- [OpenAPI 3.1 spec](/openapi.json): the full machine contract for both endpoints.
- [llms.txt](/llms.txt): a short agent-discovery file describing the product and its endpoints.
- [llms-full.txt](/llms-full.txt): the expanded agent-discovery file.

These let an agent discover and call FileDigest without reading this page first.

## Dashboard Guide

URL: https://filedigest.dev/docs/dashboard-guide
Description: Main FileDigest dashboard areas and what each one is for.

## FileDigest

Create new jobs, upload source files, choose OCR or table settings, and start processing.

## Job detail

Review the current status, output preview, manifest preview, file list, warnings, and private downloads for one job.

## Usage

Check monthly output-token usage and recent processing activity.

## Billing

Review the active plan and open billing management for paid subscriptions.

## Settings

Review plan limits, retention, OCR availability, and account settings.

## Create Your First Digest

URL: https://filedigest.dev/docs/first-digest
Description: How to upload files and produce your first FileDigest output.

<Steps>

### Open the workbench

Sign in and open the FileDigest workbench from the dashboard.

### Choose files

Upload PDFs, DOCX, PPTX, TXT, Markdown, HTML, or a ZIP bundle. The page shows your current plan limits before processing starts.

### Select options

Use fast extraction for most jobs. Choose accurate tables for structure-heavy files. Turn on OCR only when your plan supports it and the file needs image-based text recognition.

### Start processing

FileDigest uploads files to private storage, registers the job, and starts secure document processing.

### Inspect and download

When the job finishes, review `digest.md`, inspect `manifest.json`, copy the digest, or download the private artifacts.

</Steps>

If a job fails, check the file list and warnings first. Most failures come from unsupported file types, oversized jobs, password-protected files, or scans that need OCR.

## Login And Email

URL: https://filedigest.dev/docs/login-email
Description: Account access, email sign-in, and support contact guidance.

FileDigest uses email-based account access so your jobs, plan, and private artifacts stay tied to your user account.

<Steps>

### Sign in

Use the sign-in button and enter the email address you want associated with your FileDigest work.

### Confirm access

Follow the sign-in email from the same browser when possible. If a link expires, request a fresh one.

### Open your dashboard

After sign-in, the dashboard opens the FileDigest workbench and your job history.

### Contact support

Use `support@filedigest.dev` for account access, billing, failed jobs, or retention questions.

</Steps>

## Options And Limits

URL: https://filedigest.dev/docs/options-limits
Description: Processing choices, plan limits, and retention behavior.

FileDigest checks your plan before processing starts.

## Processing options

- Fast text extraction is the default for clean digital PDFs, DOCX, PPTX, text, Markdown, and HTML.
- Accurate tables is useful when table structure matters more than speed.
- OCR is available on paid plans for scanned or image-heavy PDFs.

## Plan limits

| Plan | Files per job | Job size | Monthly output tokens | Retention |
|---|---:|---:|---:|---:|
| Free | 25 | 100 MB | 2M | 72 hours |
| Pro | 100 | 1 GB | 100M | 30 days |
| Enterprise | Custom | Custom | Custom | Custom |

Output token estimates are safeguards. They help prevent a packet from exceeding your monthly quota or producing an artifact too large for practical AI use.

## Retention

Artifacts are retained according to plan. Download important digests before the retention window closes.

## Plans And Billing

URL: https://filedigest.dev/docs/plans-billing
Description: FileDigest plan limits, billing behavior, and subscription changes.

FileDigest plans control file count, job size, OCR access, monthly output tokens, and artifact retention.

| Plan | Price | Core limits |
|---|---:|---|
| Free | $0 | 25 files/job, 100 MB/job, OCR off, 2M output tokens/month, 72 hour retention |
| Pro | $15/month or $144/year | 100 files/job, 1 GB/job, OCR on, 100M output tokens/month, 30 day retention |
| Enterprise | Contact sales | Custom volume and retention (including zero-retention), OCR on, DPA, SSO, and an SLA |

## Upgrades

Choose a paid plan from pricing when you need larger packets, OCR, more monthly output tokens, or longer retention.

## Billing management

Paid users manage plan changes, invoices, cancellation, and renewal details from the billing page.

## Custom needs

Email `support@filedigest.dev` for retention, volume, team workflow, or API roadmap questions.

## Supported Files

URL: https://filedigest.dev/docs/supported-files
Description: File types, bundles, and outputs supported by FileDigest.

## Inputs

Primary formats are PDF, DOCX, and PPTX. FileDigest also accepts TXT, Markdown, HTML, HTM, and ZIP bundles containing supported files.

ZIP bundles are useful for packets that belong together: a paper plus appendix, a client packet plus notes, or a policy document plus supporting text.

## Outputs

Every successful job is built around two artifacts:

- `digest.md`: readable Markdown designed for AI context windows and human review.
- `manifest.json`: structured metadata for file status, artifact status, sizes, warnings, and token estimates.

## What to avoid

Avoid password-protected documents, unsupported binaries, huge media files, and ZIP bundles that mostly contain unsupported formats.

For scans or image-heavy PDFs, use OCR on a paid plan.

## Troubleshooting

URL: https://filedigest.dev/docs/troubleshooting
Description: Common FileDigest job issues and what to check first.

<Steps>

### Unsupported file

Use PDF, DOCX, TXT, Markdown, HTML, HTM, or ZIP bundles. Files inside a ZIP must also use supported extensions.

### Job too large

Reduce the number of files, split the ZIP bundle, or upgrade if your packet exceeds your plan's file, size, or token limits.

### OCR needed

Image-heavy scans may produce poor text until OCR is enabled on a paid plan.

### Partial output

A partial job can still produce a useful digest. Check the file tab for failed files and warnings before deciding whether to re-run the packet.

### Download unavailable

Downloads require the signed-in owner of the job. If artifacts expired under your plan's retention window, create a new job.

</Steps>