At SVA, the critique room worked like this. You hang your work. Your peers and teachers sit around you. Everyone evaluates in good faith, from their own perspective, trying to make the work the best the room can make it. Where they agree, you have a strong signal. Where they disagree, you have a decision to make. The accumulated decisions are yours. The work that comes out belongs to you, not to any single voice in the room.

That room is hard to reconstruct outside of school. In practice, most evaluation collapses to one perspective: the maker’s own instinct, one mentor’s preference, one client’s taste. And one perspective produces imitation. You follow a single influence and the result looks derivative.

I needed a way to build the room and fill it with people whose judgment I trust. Specifically, their evaluative instinct, codified and testable.

The “act as” problem

Everyone using AI for creative evaluation has tried some version of this. “Act as Massimo Vignelli and evaluate my grid.” “Pretend you’re Dieter Rams and critique this interface.” “What would Paula Scher think of this layout?”

The results aren’t specific enough for real evaluation. The AI produces a surface impression: Vignelli becomes “use Helvetica and grids,” Rams becomes “less is more,” Scher becomes “make the type bigger.” These are caricatures built from what’s most commonly written about these people online. They capture the received wisdom, not the evaluative instinct underneath the visible decisions.

The reason is structural. When you tell an AI to “act as” someone, it retrieves associations from training data. For well-known practitioners, those associations cluster around the most-repeated facts and the most-cited quotes. Vignelli’s actual discipline around typographic hierarchy, his specific rules about when a grid should flex and when it should hold, his intolerance for arbitrary decoration that doesn’t serve communication: none of that survives the “act as” compression. You get the Wikipedia version of a practitioner, not the practitioner.

Extraction instead of imitation

I built something different. The process has four steps, and the order matters.

Study the output. The work itself, and enough of it to see the pattern underneath the surface variation. Read their books, look at their projects across decades, listen to how they talk about other people’s work (that reveals more than how they talk about their own).

Extract the framework. What questions does this person consistently ask? What do they always notice first? What do they never tolerate? The framework is the invisible discipline underneath their visible decisions. Vignelli didn’t just “use grids.” He asked specific questions about typographic restraint, color economy, and whether structural limitation was producing clarity or just constraint. Those questions are extractable if you study enough of the work.

Codify as testable criteria. Turn the extracted questions into specific checks that produce clear verdicts when applied to real work. “Does the type system use deliberate limitation to produce clarity?” is evaluable. “Is this Vignelli-like?” is not. Each criterion needs to return a verdict when you run it against a real project.

Validate against their known work. Run the criteria against work the original practitioner produced or praised. If your Vignelli lens doesn’t confirm what Vignelli actually built, the extraction is wrong. This is the step most people skip, and it’s the one that separates a diagnostic tool from a costume.

What the lenses actually look like

Each lens exists because the project needed a specific evaluative capability and I identified the practitioner whose body of work is the best available source for extracting it. This isn’t curation by admiration. It’s selection by diagnostic need — the same way a creative director staffs a team by matching specific expertise to specific project requirements.

Two examples from my working system.

The Millman Lens: “Is this person real?”

I needed an authenticity diagnostic: does this portfolio read as a real person with real stakes, or a professional template? Debbie Millman’s body of work is the best source I found for extracting that specific framework — her interviews consistently surface the same questions about vulnerability, personal narrative, and the gap between stated identity and demonstrated identity. Five criteria:

M1. What can only this person do? Is positioning genuinely distinct, or could you swap in twenty other names? Look for specifics that anchor identity to actual history. Red flag: positioning that describes a category, not a person.

M2. Does vulnerability match authority? Are failures and what broke shown alongside what worked? Is difficulty demonstrated through iterations and breakage, or just claimed? Red flag: only polished success stories.

M3. Best-day self or a character? Does the voice sound like a real person with real stakes, or like a brand strategy document? Is there a gap between the stated philosophy and how the thing actually reads?

M4. Does the constellation tell a life-arc story? Can you see one person’s specific obsessions producing all of this? Is there a visible throughline from earliest work to most recent?

M5. Where is the courage? Evidence of risk-taking: projects started before knowing they’d succeed, work made public while still developing, unconventional approaches that don’t have a safety net.

Each criterion produces a verdict: STRONG, HOLDS, WEAK, or BROKEN. When I run this against a portfolio page, I get specific findings per criterion, not a general “this feels authentic.” The specificity is what makes it useful. “M2 WEAK: difficulty claimed but not shown. The copy says ‘complex problem’ twice without describing what actually broke” is actionable. “This doesn’t feel authentic enough” is not.

The Vignelli Lens: “Is the restraint producing clarity?”

I needed a restraint diagnostic: is the design system producing clarity through limitation, or just following convention? Vignelli’s body of work is the clearest source for extracting that framework — his specific intolerance for arbitrary decoration and his insistence that structural limitation should produce order, not just constraint. Five criteria:

V1. Does the type system use deliberate limitation? Are font choices minimal and purposeful? Is the restraint producing clarity, or just constraint? Red flag: typefaces chosen for variety rather than function.

V2. Is the grid earning its keep? Does the underlying structure create order that the viewer can feel, even if they can’t name it? Red flag: a grid that’s technically present but not doing compositional work.

V3. Is color serving communication? Every color choice should have a reason traceable to content, hierarchy, or identity. Red flag: decorative color that could be swapped without losing meaning.

V4. Is decoration justified by function? Every visual element needs to earn its place. If you can remove something without losing information or structure, it shouldn’t be there. Red flag: ornament added for visual interest rather than communication.

V5. Does structural limitation produce coherence? Across the whole project, do the constraints create a system that reads as intentional? The test: does the limitation make the work feel disciplined, or does it feel like the designer ran out of ideas?

Where it gets interesting: convergence

Running one lens gives you one perspective. Running multiple lenses gives you what the critique room gave me at SVA.

When the Millman lens says STRONG on a page and the Vignelli lens says WEAK, that’s information. It means the page reads as authentic and vulnerable (Millman’s criteria) but the design isn’t earning its restraint (Vignelli’s criteria). Those two things can both be true simultaneously, and the tension between them is where the actual decision lives. Do I lean into the raw personal directness at the expense of visual discipline? Or do I tighten the design system, knowing it might polish away some of the roughness that makes the page feel real?

That’s a real creative decision, surfaced by structure. Two specific, codified perspectives that disagree on a specific element for articulable reasons.

The tensions between personas are where the interesting decisions live.

I built a convergence skill that maps these patterns across all lenses. Where five or more lenses agree: high-confidence signal, act on it. Where lenses disagree: decision point, and the system articulates what’s at stake in each direction. The choices I make at those decision points, accumulated across dozens of them, produce work that couldn’t have come from following any single influence. The path through the tensions is mine.

Why this matters beyond this site

The extraction protocol is medium-independent. I developed it against this site as a working laboratory, but the architecture transfers. A restaurant could have a structural plate staffed with service design practitioners and a narrative plate staffed with practitioners whose restaurants have a feeling you remember. A record could have a production plate and a world-feel plate. A curriculum could have a pedagogical plate and a voice plate.

Swap the plates. Swap the personas. The architecture holds because the process holds: study the output, extract the framework, codify as testable criteria, validate against known work. Anyone can build their own room and staff it with the people whose judgment they trust.

I looked for someone else doing this specific thing: extracting evaluative frameworks from real practitioners’ bodies of work into testable, codified diagnostic lenses that run independently and surface their disagreements as decision points. I couldn’t find it. If it exists and I missed it, I’d like to know. If it doesn’t, this is how I built it.