The note format

Every note in a Distillary brain follows the same shape: YAML frontmatter for machine-readable metadata, followed by a markdown body with wikilinks for human navigation. This uniformity is the foundation that makes everything else work — filtering, linking, bridging, publishing, and agent retrieval all depend on every note having the same structure.

Why YAML frontmatter

Frontmatter is the only metadata format that Obsidian, Quartz, Dataview, and Bases all read natively. It sits between --- fences at the top of the file, parsed as structured data by every tool in the ecosystem. No plugins needed, no custom syntax, no lock-in.

---
tags:
  - type/claim/atom
  - priority/core
  - certainty/argued
  - stance/endorsed
  - domain/methodology
  - role/argument
  - source/ries-lean-startup
kind: claim
layer: 0
proposition: "validated learning → measures progress → better than vanity metrics"
source_ref: "Chapter 7: Measure"
published: 2011
extracted_by: claude-haiku-4.5
---

Every field serves a specific purpose. Nothing is there for decoration.

The fields

tags — filterable dimensions

Tags are a flat list of dimension/value pairs. Obsidian uses them for search, graph view color groups, and Bases filters. Every claim has tags across 6 dimensions (explained fully in Tag taxonomy).

Tags live in YAML because Obsidian’s Properties UI renders them as clickable chips, and Bases can filter by them directly. Putting them in the body text would make them invisible to filtering tools.

kind — claim or entity

Two kinds of notes exist in a brain: claims (things the source argues) and entities (things the source references). This field lets any tool instantly separate the two. Claims have layers and propositions. Entities have types and aliases.

layer — position in the pyramid

An integer from 0 to 3 that indicates where this note sits in the argument hierarchy. Layer 0 is an atomic claim — a single assertion with one reason. Layer 3 is the root thesis of the entire source. Layers in between are groupings that summarize what’s below them.

The layer field exists because agents and Bases need to filter by it. “Show me all layer-2 cluster notes” is a common query for understanding a source’s structure without reading every atom.

proposition — the semantic key

This is the most important field. It encodes the claim in canonical form: subject → relationship → object, lowercase, no qualifiers, no author voice. Two claims about the same fact, written by different authors in different words, should produce the same proposition.

# From The Lean Startup
proposition: "validated learning → measures startup progress → better than shipped features"
 
# From The Mom Test  
proposition: "customer commitment → measures authentic interest → better than compliments"

Propositions enable cross-vault deduplication and set operations. Without them, comparing two brains requires an LLM to judge semantic equivalence for every pair of claims. With them, exact string matching catches most overlaps.

The proposition is for machines, the body is for humans

The proposition is deliberately stripped of nuance — it’s a database key. The full argument, with caveats and evidence, lives in the body text below the frontmatter.

source_ref — traceability

A string indicating where in the source this claim comes from: chapter title, section header, or the nearest heading visible in the text. This field exists for one reason: trust. A reader can trace any claim back to its origin and verify it.

source_ref: "Chapter 5: Commitment and Advancement"

EPUB files don’t have page numbers, but they have chapter titles and section headers. That’s enough for a reader to find the passage.

published — temporal ordering

The year the source was published. Enables temporal operations across brains: tracking how a field’s ideas evolved, finding predictions that were later confirmed or falsified, identifying forgotten ideas.

extracted_by and prompt_version — provenance

Which model produced this note and which version of the extraction prompt was used. Enables metalevel operations: comparing how different models extract differently, detecting prompt drift over time.

The body

Below the frontmatter, the body is plain markdown with two special conventions:

[[Brackets]] around any concept create a link. If a file with that name exists in the vault, Obsidian renders it as a clickable link and adds a backlink on the target page. If no file exists, it becomes a ghost link — visible in the graph as a faded node marking the frontier of explored territory.

A ## Related section at the bottom carries structural relationships:

## Related
 
Parent: [[Parent claim title]]
 
Children:
- [[Child claim 1]]
- [[Child claim 2]]
 
- ⚡ Tension with [[Other claim]] — one sentence why
- 🔁 Same pattern as [[Other claim]] — one sentence why
- 📊 Supported by [[Other claim]] — one sentence why

Parent/children links form the pyramid hierarchy. Tension, pattern, and evidence links form the lateral network. Together they create the navigable graph.

Why this shape works

The format is deliberately simple — YAML metadata that tools can query, plus markdown body that humans can read. No custom syntax, no nested structures, no arrays-of-objects in YAML. Every field is either a flat scalar or a flat list of strings.

This simplicity means:

  • Any YAML parser reads it (Python, JavaScript, Go)
  • Obsidian Properties UI renders it without custom plugins
  • Bases can filter by any field
  • Quartz publishes it as-is
  • Agents can parse it from raw markdown with a regex

The note format is the contract between agents, tools, and humans. It stays simple so nothing breaks.