Frequently Asked Questions
What problem does Sidedoc solve?
Current document workflows force a painful tradeoff: AI tools work efficiently with plain text, but humans need rich formatting. The typical solutions are inadequate:
| Approach | Problem |
|---|---|
| Extract content with Document Intelligence | Loses formatting connections; can't update original |
| Convert with Pandoc | One-way; repeated conversions degrade formatting |
| Parse docx XML directly | Expensive (15,000+ tokens); complex; brittle |
| Store everything as markdown | Humans get ugly, unformatted documents |
Sidedoc solves this by maintaining two synchronized representations: clean markdown for AI efficiency and formatted docx for human consumption. Edit either one, and changes propagate to the other without losing formatting.
Why not just use Pandoc?
Pandoc is excellent at one-way conversion, but it wasn't designed for round-trip workflows:
- No formatting memory: Pandoc doesn't remember the original document's fonts, colors, or custom styles
- Lossy round-trips: Each markdown → docx → markdown cycle loses information
- No maintained link: Once you convert, there's no connection between source and output
- Generic output: Generated docx uses default styles, not your corporate templates
Sidedoc differs by storing formatting metadata alongside content. When you edit the markdown and rebuild, Sidedoc reapplies the original formatting—fonts, sizes, colors, alignment—automatically.
Why not store documents as pure markdown?
Pure markdown works for technical documentation, but fails for business documents:
- Formatting matters: Quarterly reports, proposals, and contracts need specific fonts, colors, and layouts
- Brand consistency: Organizations have style guides that plain markdown can't enforce
- Human expectations: Non-technical stakeholders expect Word documents, not raw text files
Sidedoc gives you both: AI works with clean markdown (efficient, diffable, version-controllable) while humans receive properly formatted Word documents (professional, branded, familiar).
Why markdown instead of HTML or plain text?
We chose markdown because it hits the sweet spot:
| Format | Pros | Cons |
|---|---|---|
| Plain text | Simple | No structure; can't represent headings, lists, emphasis |
| HTML | Rich structure | Verbose; mixes content with presentation; harder for AI |
| Markdown | Clean structure; readable; compact | Limited formatting (which is actually fine for content) |
Markdown provides enough structure (headings, lists, emphasis, images) to represent document content while staying clean and AI-efficient. The formatting details that markdown can't express (fonts, colors, exact spacing) are stored separately in styles.json.
Why use a ZIP container instead of a single file?
The ZIP container approach has several advantages:
- Separation of concerns: Content, structure, styles, and assets live in separate files
- Easy debugging: Unzip and inspect any component with standard tools
- Selective access: Read just
content.mdwithout parsing everything - Binary asset handling: Images stored as-is, not base64-encoded inline
- Familiar pattern: docx, xlsx, and epub all use ZIP containers internally
You can inspect a sidedoc with standard tools:
# View the markdown content
unzip -p document.sidedoc content.md
# Extract everything for debugging
unzip document.sidedoc -d ./unpacked/
Why block-level sync instead of character-level?
We chose block-level sync (paragraphs, headings, list items) over character-level diffing because:
- Simpler implementation: Block matching is more robust than character alignment
- Natural edit units: People add/remove/edit paragraphs, not individual characters
- Formatting boundaries: Word formatting is typically paragraph-based anyway
- Predictable behavior: Easier to understand what will happen when you sync
The tradeoff is that merging changes within a single paragraph is less granular—but in practice, AI edits tend to rewrite entire paragraphs rather than tweaking individual words.
How is this different from Google Docs or Office 365?
Cloud collaboration tools solve a different problem:
| Feature | Google Docs / O365 | Sidedoc |
|---|---|---|
| Primary use | Real-time human collaboration | AI-human document workflows |
| AI integration | Requires API, expensive | Native markdown format |
| Offline support | Limited | Fully offline |
| Format control | Platform-dependent | Preserves exact docx formatting |
| Version control | Proprietary history | Git-friendly markdown |
| Privacy | Cloud-hosted | Local files only |
Sidedoc is complementary: You might use Sidedoc to prepare a document with AI assistance, then upload the final docx to Google Docs for human review.
What document elements are NOT supported?
The MVP focuses on the most common elements. Here's what's supported and what's not:
Fully Supported
- Headings (H1-H6)
- Paragraphs
- Bold and italic text
- Bulleted lists (single level)
- Numbered lists (single level)
- Images
- Hyperlinks (
[text](url)syntax)
Preserved but Not Editable
These elements are kept intact but editing them in markdown won't work correctly:
- Nested lists (2+ levels)
- Underlined text
- Custom named styles
- Font colors and highlighting
Not Supported (MVP)
These elements are not preserved:
- Tables
- Headers and footers
- Footnotes and endnotes
- Track changes and comments
- Text boxes and shapes
- Charts
Why was this project created?
Sidedoc emerged from a practical frustration: working with AI coding assistants on business documents was painful. Every iteration required:
- Extract content from docx (expensive, loses formatting)
- AI edits the content
- Manually recreate the document in Word (tedious, error-prone)
- Repeat
The core insight: If documents had a stable, AI-friendly representation that maintained a live connection to formatting, this workflow could be 10x more efficient and completely lossless.
Sidedoc is that representation.