Portable Document Format (PDF) is a widely used format for sharing and publishing fixed-layout documents. Stencila supports both reading from and writing to PDF.
Use the .pdf file extension, or the --to pdf or --from pdf options, when converting to/from PDF e.g.
stencila convert doc.smd doc.pdf
When encoding to PDF, the default rendering method uses a headless browser to convert DOM HTML to PDF. Alternatively, use the --tool latex or --tool xelatex option to render via LaTeX instead:
stencila convert doc.smd doc.pdf --tool latex
Reading (decoding): PDFs are converted to Stencila documents using Mistral OCR (mistral-ocr-2505), which extracts text, structure, and images from PDF pages as Markdown. The extracted Markdown is then parsed into Stencila Schema nodes. For small PDFs (8 pages or fewer), metadata extraction and content extraction are done in a single pass. For larger PDFs, metadata is extracted from the first pages separately and combined with content from the full document. Results are cached based on the PDF's content hash to avoid redundant API calls.
Writing (encoding): By default, documents are encoded to DOM HTML and converted to PDF using a headless browser. When the --tool latex option is used, documents are encoded to LaTeX first and compiled to PDF using a LaTeX engine.