Architecture
Three layers
.claude/ The brain — agents and skills that do the thinking
distillary/ The hands — Python utilities agents call
brain/ The output — Obsidian vault with all knowledge
.claude/ (21 agents, 13 skills)
Agents are individual workers with specific models:
- Haiku (13 agents): extract, dedupe, entities, entity-link, link, doctor, annotate, source-index, brain-index, bridge-builder, verify, analytics, combine
- Opus (8 agents): group, pyramid, concept-mapper, compare, research, explore, review, compose
Skills are orchestration workflows that chain agents:
distillary-add-source: full pipeline from file to brain (chunks always stored locally)distillary-research: deep iterative question-answering with 6 strategies + 11 advanced methodsdistillary-retrieval: self-contained skill for querying published brainsdistillary-publish: Quartz build + agent.json generationdistillary-brains: manage connected brains (add, remove, list, clone)distillary-doctor: find and fix brain issuesdistillary-combine: merge brains or move sourcesdistillary-redo: re-extract a source with current promptsdistillary-decompose: core extraction pipelinedistillary-use-brain: full navigation guideobsidian-bases: analytical .base filesquartz-rendering: what Quartz can renderdocs-writing: how to write good docs
distillary/ (8 Python files)
extraction/loader.py — extract_text(), split_text(), pdf_to_images(), ScannedPDFError
notes.py — Note dataclass, parse/serialize, wikilink helpers, parent/child traversal
vault_ops.py — fix_vault(), reinforce_links(), build_entity_hubs(),
fix_ghost_links(), write_index()
doctor.py — doctor() — fix + discover + _suggestions.md
cross_vault.py — combine_vaults(), _build_bridges()
agent_index.py — generate agent.json for publishing
sharing.py — init_vault() — README + .gitignore
publish.py — Quartz preview + deploy + auto-generate agent.json
brain/ (the vault)
brain/
_index.md Start here
_suggestions.md Doctor findings
*.base Analytical Bases
sources/ Processed sources (agent-generated)
{source-slug}/
_source.md Metadata (title, author, type, year, publishable)
_index.md Narrative overview of this source
chunks/ Source text chunks (always stored)
chunk_00.txt
chunk_01.txt
claims/
{root-thesis}.md Layer 3: single thesis
structure/ Layer 2: major arguments
clusters/ Layer 1: thematic groups
atoms/ Layer 0: individual claims with backing + passages
entities/
concepts/ Abstract ideas, principles
people/ Scholars, historical figures
shared/ Cross-source connections
concepts/ Bridge entities (source/cross-vault tag)
analytics/ Comparison, mapping, graphs, stats
personal/ Your voice
annotations/ Reactions to claims
notes/ Freeform thinking
questions/ Things to explore
research/ Deep research outputs
The claim format (v4.0)
Every atom claim can have up to 4 layers of information:
What is claimed → proposition, body text
How it's argued → backing (category, subtype, strength, warrant)
Where it's from → passages (chunk file, lines, snippet)
How certain → confidence (exact / synthesized / inferred)
This makes claims not just assertions but auditable, evidence-graded, source-traceable arguments.
Agent pipeline (11 steps)
1. Extract + split text → save chunks to brain/sources/{slug}/chunks/
2. Extract claims with backing + passages
3. Dedupe (merge passages) + entities
4. Entity-link (add wikilinks, preserve passages/backing)
5. Group + pyramid (from linked claims, preserve passages/backing)
6. Link (tensions, patterns)
7. Verify (optional — spot-check claims against chunks)
8. Assemble into brain
9. Post-process (reinforce links, entity hubs, doctor flags missing passages)
10. Auto-bridge (concept-mapper + bridge-builder)
11. Update brain index
All steps are run automatically by /dist:add. The verify step is optional and can also be run standalone:
/dist:verify brain/sources/kiyosaki-rich-dad/
Critical rule: Always use agent definition files (.claude/agents/{name}.md). Never rewrite agent instructions from memory. The definition files contain tested rules that prevent data loss.
Design decisions
- Agents are the pipeline, Python is the plumbing. Agents do extraction, grouping, comparison. Python does file I/O, link fixing, validation.
- Haiku for bulk, opus for reasoning. Parallel haiku agents = minutes not hours. Opus only where deep reasoning matters.
- Post-processing catches agent mistakes.
fix_ghost_links,_wire_parent_links,_split_frontmatterhandle format inconsistencies mechanically. - One vault, not many. Everything in
brain/. Sources accumulate. Bridges grow. Your annotations are part of the graph. - Static publishing. Quartz generates a website.
agent.jsonis auto-generated. No server needed. - Backlinks are the search engine. Entity pages + backlinks = “what does this brain know about X?” No keyword search needed.
- Entity-link before group. Wikilinks are added to claims BEFORE grouping, so atom files written to the vault already contain entity links.
- Backing captures argumentation. 9 universal categories work across any domain. The same framework handles Islamic jurisprudence, cybersecurity regulation, academic research, and business books.
- Chunks always stored, publish-gated. Chunks are always saved locally during ingestion (they cost nothing, enable full fact-checking). The
publishablefield in_source.mdcontrols whether chunks are included when publishing. Copyrighted source chunks stay local; public domain chunks get published. - Passages are lightweight. Claims store only pointers (chunk + lines + ~15-word snippet), not full text. The full context stays in chunk files.
- Scanned PDFs handled automatically. If
extract_text()detects a scanned PDF, it raisesScannedPDFError. The pipeline then converts pages to images withpdf_to_images()and OCRs them using parallel Haiku vision agents. - Source language preserved. Arabic sources produce Arabic claims, entities, and notes. The pipeline matches the source language automatically.