Knowledge Base
📝 Context Summary
The Knowledge Pipeline (KPL)
The Knowledge Pipeline (KPL) is the operational backbone of the Strategic Intelligence Engine. It is the system responsible for transforming authored content into two critical outputs: machine-queryable knowledge for AI agents (via vector embeddings) and published content for human audiences (via WordPress). Every other component of the SIE — agents, retrieval, governance, publishing — depends on the KPL functioning correctly.
The KPL enforces a non-negotiable architectural principle: Obsidian is the single source of truth. Content is authored and governed in markdown. All downstream systems — the vector database, WordPress, agent memory — are projections of that canonical source. If there is ever a conflict between what exists in Obsidian and what exists in a downstream system, Obsidian wins.
Pipeline Architecture
The KPL operates as a four-stage pipeline. Each stage transforms the content and pushes it closer to activation.
Stage 1: Authoring and Governance
All content originates in Obsidian as structured markdown files. Each file includes YAML frontmatter containing metadata fields required for both human navigation and machine processing: title, semantic summary, synthetic questions, key concepts, epistemic markers, and SEO fields.
Content authored at this stage must conform to the Dual-Readability standard. Every paragraph is written to be semantically complete as a standalone unit — optimized for both human comprehension and vector chunking. This is not optional; content that fails Dual-Readability degrades retrieval precision downstream.
Version control is managed through Git. Every change to the Knowledge Core is committed, providing a full audit trail of what changed, when, and why. This history is critical for the Steady Presence Incident Loop, which traces agent failures back to specific knowledge changes.
Stage 2: Embedding and Indexing
When content is committed and pushed, the synchronization process reads the markdown files, splits them into semantically coherent chunks, and generates vector embeddings. These embeddings are high-dimensional numerical representations of meaning — they encode what the content is about, not just what words it contains.
The embeddings are indexed in Pinecone, the vector database that powers the SIE’s semantic search. Each vector is stored with metadata extracted from the frontmatter: the source file path, the knowledge topic mapping, key concepts, and the document’s status. This metadata enables filtered retrieval — agents can search within specific topic boundaries rather than querying the entire corpus.
The synchronization process maintains a mapping between Obsidian file paths and knowledge topic taxonomy terms. This mapping ensures that when an agent retrieves a chunk, it can trace the result back to its canonical source and its position in the knowledge hierarchy.
Stage 3: WordPress Synchronization
The KPL’s publishing layer synchronizes content from the Knowledge Core to WordPress as knowledge base posts. This is a one-directional sync: Obsidian pushes to WordPress, never the reverse.
The sync process reads each markdown file, transforms it into WordPress-compatible HTML, maps frontmatter fields to WordPress custom fields and taxonomy terms, and creates or updates the corresponding knowledge_base post via the WordPress REST API. Media assets referenced in markdown are uploaded to the WordPress media library and linked to the post.
Taxonomy mapping is handled through the _sie_path_pattern term meta field. Each knowledge_topic term stores the KB folder path it corresponds to (e.g., /AI/0_fundamentals/). The sync process matches file paths against these patterns to automatically assign the correct taxonomy terms — no manual categorization required.
Stage 4: Verification and Monitoring
The final stage ensures pipeline integrity. After synchronization completes, the system verifies that the vector database and WordPress are consistent with the source content. This includes checking that embedding counts match published file counts, that taxonomy assignments are correct, and that no orphaned vectors exist for deleted content.
The SIE Site Health integration surfaces pipeline status in the WordPress admin, providing the Fleet Commander with a dashboard view of sync health, last sync timestamp, embedding coverage, and any files that failed processing.
The Freshness Cycle
The KPL is not a one-time migration — it is a continuous cycle. When content is updated in Obsidian, the pipeline detects the change, re-embeds the affected chunks, updates the vector index, and re-syncs the WordPress post. When content is deleted, the pipeline removes the corresponding vectors and optionally trashes the WordPress post.
This continuous cycle is what makes the Knowledge Core a living system rather than a static export. It is also what creates the maintenance challenge addressed in the Freshness framework — the KPL can only keep downstream systems current if the source content itself remains current.
Why the Pipeline Matters
Without the KPL, the Knowledge Core is just a folder of markdown files. The pipeline is what activates the knowledge — making it searchable by agents, publishable for audiences, and governable by the Fleet Commander. Every investment in content quality, Dual-Readability, and semantic authoring pays dividends only because the KPL faithfully propagates those qualities to the systems that consume them.