Sidedoc
AI-native document format that separates content from formatting.
Sidedoc enables efficient AI interaction with documents while preserving rich formatting for human consumption. A .sidedoc file is a ZIP archive containing markdown content and formatting metadata that can reconstruct the original docx.
Project Status: MVP Complete + Hyperlinks
Current Version: 0.1.0 Status: All MVP features implemented with 188 passing tests What Works: Extract, sync, diff, build — including inline formatting, lists, images, and hyperlinks Coming Next: Tables, nested lists
The Problem
Current document workflows force a tradeoff between AI efficiency and human usability:
- Reading documents: Extracting content for AI is expensive (15,000+ tokens for a 10-page document via XML) and loses formatting connections
- Creating documents: Tools like Pandoc generate docx from markdown, but it's one-way with no formatting preservation
- Iterative collaboration: Repeated extraction and regeneration is lossy and expensive - each cycle costs 10x more than necessary and degrades formatting
Cost impact: AI document workflows using traditional extraction methods pay 10x more in API costs than necessary and lose formatting with every iteration.
The Solution
Documents should have two representations that stay in sync:
- Markdown - optimized for AI reading and writing
- Formatted docx - optimized for human consumption
Changes to either propagate to the other.
Key Benefits
10x Token Efficiency: Sidedoc markdown uses ~1,500 tokens vs. 15,000+ tokens for XML-based extraction
Perfect Format Preservation: Original docx formatting is preserved in metadata and automatically reapplied
Lossless Iteration: Edit content infinitely without formatting degradation - each sync maintains perfect fidelity
Human + AI Friendly: AI works with clean markdown; humans get familiar Word documents
Prove It Yourself
Run the Benchmark Suite to measure token efficiency, format fidelity, and cost savings on your own documents. Compare Sidedoc against Pandoc, raw DOCX extraction, and Azure Document Intelligence.
Quick Example
# Extract a Word document to sidedoc format
sidedoc extract quarterly_report.docx
# AI edits the markdown content...
# Sync changes back, preserving formatting
sidedoc sync quarterly_report.sidedoc
# Rebuild the formatted Word document
sidedoc build quarterly_report.sidedoc
What's in a .sidedoc file?
| File | Purpose |
|---|---|
content.md |
Clean markdown that AI reads/writes |
structure.json |
Block structure and mappings to docx paragraphs |
styles.json |
Formatting information per block |
manifest.json |
Metadata and version info |
assets/ |
Images and embedded files |
Get Started
See the Getting Started guide for installation and usage instructions.