The Unstructured Corpus

For three years I’ve been thinking out loud into AI tools the way I used to think into sketchbooks and production shops. Brainstorming, arguing with myself, changing direction mid-sentence, working through problems at 2 AM when the idea won’t let go. 1,643 ChatGPT sessions. 700+ Claude sessions. Gemini exports. Voice notes dictated while driving. Unfinished thoughts. Arguments with no resolution.

Most prompt advice treats unstructured input as a problem to solve. Pre-organize, use few-shot examples, define output formats. The assumption is that messy input produces messy output.

I went the other way. And the unstructured corpus turned out to be the most valuable thing I’ve built.

The substrate

Every tool in the accommodation design framework depends on this corpus. Voice sampling needs unstructured speech to sample from, because published writing is performance and conversation is how someone actually talks. Knowledge traversal needs a body of unfiltered ideation to trace through, because the first time an idea appears in conversation history, it probably wasn’t called by its final name. The interview process needs raw material to mine for real stories and real language, because the stories that matter are rarely the ones you’d put on a resume.

Without the corpus, the tools have nothing to work with.

The artifact proves the accommodation works

The site you’re reading was compiled from this material. The source was never polished drafts. It was how I actually talk when I’m working something out. The quality of what the tools produce is traceable back to properties the corpus has that polished writing does not: real voice, real sentence structure, real changes of direction, real moments where an idea first appeared under a different name.

The corpus is evidence of bidirectional accommodation in practice. I stayed raw. The tools handled the translation to structured input for the model. The model got what it needed. The result is a site compiled from material that would fail every prompt-engineering best practice.

Three years is not the minimum

The scale matters less than the rawness. A month of voice notes contains real voice, real thinking patterns, real concerns. Enough for the tools to begin extracting.

For someone starting fresh, an interview works. “Tell me about what you’re building.” Open-ended questions that produce stories, language, instincts. That becomes the initial corpus. Over time, ongoing conversations and voice notes add to it. The corpus grows as the person keeps thinking out loud, and each addition gives the tools more to draw from.

What the corpus cannot be is retroactively assembled from polished writing. Published work is performance. The properties that make the corpus valuable (unfinished thoughts, contradictions, the moment an idea first appears under the wrong name) only exist in material that was never edited for an audience.

The inversion

Structured input puts the burden on the human. You pre-organize your thoughts to compensate for the model’s limitations. The alternative is building tools that handle the translation.

I stopped organizing my input and started organizing my tools. The tools accommodate the model’s processing reality. My thinking stays raw. The output quality went up because the tools had richer material to work with, not in spite of the mess but because of it.

The full framework is documented in the accommodation design whitepaper. Input inversion is Section 4.6.