Architecture

Three layers

.claude/             The brain — agents and skills that do the thinking
distillary/          The hands — Python utilities agents call
brain/               The output — Obsidian vault with all knowledge

.claude/ (21 agents, 13 skills)

Agents are individual workers with specific models:

Haiku (13 agents): extract, dedupe, entities, entity-link, link, doctor, annotate, source-index, brain-index, bridge-builder, verify, analytics, combine
Opus (8 agents): group, pyramid, concept-mapper, compare, research, explore, review, compose

Skills are orchestration workflows that chain agents:

distillary-add-source: full pipeline from file to brain (chunks always stored locally)
distillary-research: deep iterative question-answering with 6 strategies + 11 advanced methods
distillary-retrieval: self-contained skill for querying published brains
distillary-publish: Quartz build + agent.json generation
distillary-brains: manage connected brains (add, remove, list, clone)
distillary-doctor: find and fix brain issues
distillary-combine: merge brains or move sources
distillary-redo: re-extract a source with current prompts
distillary-decompose: core extraction pipeline
distillary-use-brain: full navigation guide
obsidian-bases: analytical .base files
quartz-rendering: what Quartz can render
docs-writing: how to write good docs

distillary/ (8 Python files)

extraction/loader.py  — extract_text(), split_text(), pdf_to_images(), ScannedPDFError
notes.py              — Note dataclass, parse/serialize, wikilink helpers, parent/child traversal
vault_ops.py          — fix_vault(), reinforce_links(), build_entity_hubs(),
                        fix_ghost_links(), write_index()
doctor.py             — doctor() — fix + discover + _suggestions.md
cross_vault.py        — combine_vaults(), _build_bridges()
agent_index.py        — generate agent.json for publishing
sharing.py            — init_vault() — README + .gitignore
publish.py            — Quartz preview + deploy + auto-generate agent.json

brain/ (the vault)

brain/
  _index.md                        Start here
  _suggestions.md                  Doctor findings
  *.base                           Analytical Bases
  sources/                         Processed sources (agent-generated)
    {source-slug}/
      _source.md                   Metadata (title, author, type, year, publishable)
      _index.md                    Narrative overview of this source
      chunks/                      Source text chunks (always stored)
        chunk_00.txt
        chunk_01.txt
      claims/
        {root-thesis}.md           Layer 3: single thesis
        structure/                 Layer 2: major arguments
        clusters/                  Layer 1: thematic groups
        atoms/                     Layer 0: individual claims with backing + passages
      entities/
        concepts/                  Abstract ideas, principles
        people/                    Scholars, historical figures
  shared/                          Cross-source connections
    concepts/                      Bridge entities (source/cross-vault tag)
    analytics/                     Comparison, mapping, graphs, stats
  personal/                        Your voice
    annotations/                   Reactions to claims
    notes/                         Freeform thinking
    questions/                     Things to explore
    research/                      Deep research outputs

The claim format (v4.0)

Every atom claim can have up to 4 layers of information:

What is claimed     → proposition, body text
How it's argued     → backing (category, subtype, strength, warrant)
Where it's from     → passages (chunk file, lines, snippet)
How certain         → confidence (exact / synthesized / inferred)

This makes claims not just assertions but auditable, evidence-graded, source-traceable arguments.

Agent pipeline (11 steps)

 1. Extract + split text → save chunks to brain/sources/{slug}/chunks/
 2. Extract claims with backing + passages
 3. Dedupe (merge passages) + entities
 4. Entity-link (add wikilinks, preserve passages/backing)
 5. Group + pyramid (from linked claims, preserve passages/backing)
 6. Link (tensions, patterns)
 7. Verify (optional — spot-check claims against chunks)
 8. Assemble into brain
 9. Post-process (reinforce links, entity hubs, doctor flags missing passages)
10. Auto-bridge (concept-mapper + bridge-builder)
11. Update brain index

All steps are run automatically by /dist:add. The verify step is optional and can also be run standalone:

/dist:verify brain/sources/kiyosaki-rich-dad/

Critical rule: Always use agent definition files (.claude/agents/{name}.md). Never rewrite agent instructions from memory. The definition files contain tested rules that prevent data loss.

Design decisions

Agents are the pipeline, Python is the plumbing. Agents do extraction, grouping, comparison. Python does file I/O, link fixing, validation.
Haiku for bulk, opus for reasoning. Parallel haiku agents = minutes not hours. Opus only where deep reasoning matters.
Post-processing catches agent mistakes. fix_ghost_links, _wire_parent_links, _split_frontmatter handle format inconsistencies mechanically.
One vault, not many. Everything in brain/. Sources accumulate. Bridges grow. Your annotations are part of the graph.
Static publishing. Quartz generates a website. agent.json is auto-generated. No server needed.
Backlinks are the search engine. Entity pages + backlinks = “what does this brain know about X?” No keyword search needed.
Entity-link before group. Wikilinks are added to claims BEFORE grouping, so atom files written to the vault already contain entity links.
Backing captures argumentation. 9 universal categories work across any domain. The same framework handles Islamic jurisprudence, cybersecurity regulation, academic research, and business books.
Chunks always stored, publish-gated. Chunks are always saved locally during ingestion (they cost nothing, enable full fact-checking). The publishable field in _source.md controls whether chunks are included when publishing. Copyrighted source chunks stay local; public domain chunks get published.
Passages are lightweight. Claims store only pointers (chunk + lines + ~15-word snippet), not full text. The full context stays in chunk files.
Scanned PDFs handled automatically. If extract_text() detects a scanned PDF, it raises ScannedPDFError. The pipeline then converts pages to images with pdf_to_images() and OCRs them using parallel Haiku vision agents.
Source language preserved. Arabic sources produce Arabic claims, entities, and notes. The pipeline matches the source language automatically.

Distillary

Explorer

Architecture

Architecture

Three layers

.claude/ (21 agents, 13 skills)

distillary/ (8 Python files)

brain/ (the vault)

The claim format (v4.0)

Agent pipeline (11 steps)

Design decisions

Graph View

Table of Contents

Backlinks