Architecture

Three layers

.claude/             The brain — agents and skills that do the thinking
distillary/          The hands — Python utilities agents call
brain/               The output — Obsidian vault with all knowledge

.claude/ (21 agents, 13 skills)

Agents are individual workers with specific models:

  • Haiku (13 agents): extract, dedupe, entities, entity-link, link, doctor, annotate, source-index, brain-index, bridge-builder, verify, analytics, combine
  • Opus (8 agents): group, pyramid, concept-mapper, compare, research, explore, review, compose

Skills are orchestration workflows that chain agents:

  • distillary-add-source: full pipeline from file to brain (chunks always stored locally)
  • distillary-research: deep iterative question-answering with 6 strategies + 11 advanced methods
  • distillary-retrieval: self-contained skill for querying published brains
  • distillary-publish: Quartz build + agent.json generation
  • distillary-brains: manage connected brains (add, remove, list, clone)
  • distillary-doctor: find and fix brain issues
  • distillary-combine: merge brains or move sources
  • distillary-redo: re-extract a source with current prompts
  • distillary-decompose: core extraction pipeline
  • distillary-use-brain: full navigation guide
  • obsidian-bases: analytical .base files
  • quartz-rendering: what Quartz can render
  • docs-writing: how to write good docs

distillary/ (8 Python files)

extraction/loader.py  — extract_text(), split_text(), pdf_to_images(), ScannedPDFError
notes.py              — Note dataclass, parse/serialize, wikilink helpers, parent/child traversal
vault_ops.py          — fix_vault(), reinforce_links(), build_entity_hubs(),
                        fix_ghost_links(), write_index()
doctor.py             — doctor() — fix + discover + _suggestions.md
cross_vault.py        — combine_vaults(), _build_bridges()
agent_index.py        — generate agent.json for publishing
sharing.py            — init_vault() — README + .gitignore
publish.py            — Quartz preview + deploy + auto-generate agent.json

brain/ (the vault)

brain/
  _index.md                        Start here
  _suggestions.md                  Doctor findings
  *.base                           Analytical Bases
  sources/                         Processed sources (agent-generated)
    {source-slug}/
      _source.md                   Metadata (title, author, type, year, publishable)
      _index.md                    Narrative overview of this source
      chunks/                      Source text chunks (always stored)
        chunk_00.txt
        chunk_01.txt
      claims/
        {root-thesis}.md           Layer 3: single thesis
        structure/                 Layer 2: major arguments
        clusters/                  Layer 1: thematic groups
        atoms/                     Layer 0: individual claims with backing + passages
      entities/
        concepts/                  Abstract ideas, principles
        people/                    Scholars, historical figures
  shared/                          Cross-source connections
    concepts/                      Bridge entities (source/cross-vault tag)
    analytics/                     Comparison, mapping, graphs, stats
  personal/                        Your voice
    annotations/                   Reactions to claims
    notes/                         Freeform thinking
    questions/                     Things to explore
    research/                      Deep research outputs

The claim format (v4.0)

Every atom claim can have up to 4 layers of information:

What is claimed     → proposition, body text
How it's argued     → backing (category, subtype, strength, warrant)
Where it's from     → passages (chunk file, lines, snippet)
How certain         → confidence (exact / synthesized / inferred)

This makes claims not just assertions but auditable, evidence-graded, source-traceable arguments.

Agent pipeline (11 steps)

 1. Extract + split text → save chunks to brain/sources/{slug}/chunks/
 2. Extract claims with backing + passages
 3. Dedupe (merge passages) + entities
 4. Entity-link (add wikilinks, preserve passages/backing)
 5. Group + pyramid (from linked claims, preserve passages/backing)
 6. Link (tensions, patterns)
 7. Verify (optional — spot-check claims against chunks)
 8. Assemble into brain
 9. Post-process (reinforce links, entity hubs, doctor flags missing passages)
10. Auto-bridge (concept-mapper + bridge-builder)
11. Update brain index

All steps are run automatically by /dist:add. The verify step is optional and can also be run standalone:

/dist:verify brain/sources/kiyosaki-rich-dad/

Critical rule: Always use agent definition files (.claude/agents/{name}.md). Never rewrite agent instructions from memory. The definition files contain tested rules that prevent data loss.

Design decisions

  • Agents are the pipeline, Python is the plumbing. Agents do extraction, grouping, comparison. Python does file I/O, link fixing, validation.
  • Haiku for bulk, opus for reasoning. Parallel haiku agents = minutes not hours. Opus only where deep reasoning matters.
  • Post-processing catches agent mistakes. fix_ghost_links, _wire_parent_links, _split_frontmatter handle format inconsistencies mechanically.
  • One vault, not many. Everything in brain/. Sources accumulate. Bridges grow. Your annotations are part of the graph.
  • Static publishing. Quartz generates a website. agent.json is auto-generated. No server needed.
  • Backlinks are the search engine. Entity pages + backlinks = “what does this brain know about X?” No keyword search needed.
  • Entity-link before group. Wikilinks are added to claims BEFORE grouping, so atom files written to the vault already contain entity links.
  • Backing captures argumentation. 9 universal categories work across any domain. The same framework handles Islamic jurisprudence, cybersecurity regulation, academic research, and business books.
  • Chunks always stored, publish-gated. Chunks are always saved locally during ingestion (they cost nothing, enable full fact-checking). The publishable field in _source.md controls whether chunks are included when publishing. Copyrighted source chunks stay local; public domain chunks get published.
  • Passages are lightweight. Claims store only pointers (chunk + lines + ~15-word snippet), not full text. The full context stays in chunk files.
  • Scanned PDFs handled automatically. If extract_text() detects a scanned PDF, it raises ScannedPDFError. The pipeline then converts pages to images with pdf_to_images() and OCRs them using parallel Haiku vision agents.
  • Source language preserved. Arabic sources produce Arabic claims, entities, and notes. The pipeline matches the source language automatically.