Skip to content

Sidedoc

AI-native document format that separates content from formatting.

Sidedoc enables efficient AI interaction with documents while preserving rich formatting for human consumption. Sidedoc extracts a .sidedoc/ directory containing markdown content and formatting metadata that can reconstruct the original docx. A .sdoc ZIP archive provides a single-file format for sharing.

Project Status: Tables, Track Changes & Hyperlinks

What Works: Extract, sync, diff, build — including tables, track changes, inline formatting, lists, images, and hyperlinks Coming Next: Nested lists, headers/footers

The Problem

Current document workflows force a tradeoff between AI efficiency and human usability:

  • Reading documents: Extracting content for AI is expensive (raw OOXML runs 325,000+ tokens per document) and loses formatting connections
  • Creating documents: Tools like Pandoc generate docx from markdown, but it's one-way with no formatting preservation
  • Iterative collaboration: Repeated extraction and regeneration is lossy and expensive - each cycle costs orders of magnitude more than necessary and degrades formatting

Cost impact: AI document workflows using traditional extraction methods pay dramatically more in API costs than necessary and lose formatting with every iteration.

The Solution

Documents should have two representations that stay in sync:

  • Markdown - optimized for AI reading and writing
  • Formatted docx - optimized for human consumption

Changes to either propagate to the other.

Key Benefits

Massive Token Savings: Measured 1,524x fewer prompt tokens than raw OOXML across 45 LLM task runs

Format Preservation: Original docx formatting is preserved in metadata and automatically reapplied

Iterative Editing: Edit content repeatedly without formatting degradation — each sync maintains fidelity for supported elements

Human + AI Friendly: AI works with clean markdown; humans get familiar Word documents

Prove It Yourself

Run the Benchmark Suite to measure token efficiency, format fidelity, and cost savings on your own documents. Compare Sidedoc against Pandoc, raw DOCX extraction, and Azure Document Intelligence.

Quick Example

# Extract a Word document to sidedoc format
sidedoc extract quarterly_report.docx

# AI edits the markdown content...

# Sync changes back, preserving formatting
sidedoc sync quarterly_report.sidedoc/

# Rebuild the formatted Word document
sidedoc build quarterly_report.sidedoc/

What's Supported

Fully supported: Headings, paragraphs, bold/italic, lists, images, hyperlinks, tables (including merged cells and cell styling), and track changes (insertions/deletions with author attribution).

Not yet supported: Nested lists (2+ levels), headers/footers, footnotes, comments, text boxes, shapes, charts.

What's in a .sidedoc directory?

File Purpose
content.md Clean markdown that AI reads/writes
structure.json Block structure and mappings to docx paragraphs
styles.json Formatting information per block
manifest.json Metadata and version info
assets/ Images and embedded files

Get Started

See the Getting Started guide for installation and usage instructions.