Mimesis Visual Failure Packet - 2026-06-15
Mimesis Visual Failure Packet - 2026-06-15
이 artifact는 “미메시스가 시각 디자인을 개선했다"는 성공 사례가 아니다.
목적은 반대다.
하나의 local visual workbench run에서 규칙형 visual Mimesis와 retina loop가 인간 선호 게이트를 통과하지 못했음을 공개 가능한 경계로 남긴다.
따라서 허용되는 주장은 작다.
Digital Factory에는 시각 미메시스의 실패와 한계를 기록한 local/private workbench evidence가 있고, 이 evidence는 “규칙, 체크리스트, 자동 시각 루프가 인간이 선호하는 디자인을 만든다"는 주장을 금지하는 데 쓰인다.
External Standards
| source | what it contributes | boundary for this artifact |
|---|---|---|
| NIST Human-Centered Design | Human-centered design involves users, iterative evaluation, and the whole user experience. | A local owner verdict can be a useful direction signal, but not external user validation. |
| Microsoft HAX Playbook and Guidelines for Human-AI Interaction | Human-AI work should proactively explore likely failures and design recovery paths. | A failure packet is a useful design artifact; it is not proof that the method succeeds. |
| Nielsen and Molich, Heuristic Evaluation of User Interfaces plus NN/g heuristic reference list | Interface evaluation can use evaluators and inspection, not only self-checklists. | One owner verdict is weaker than multiple independent evaluators. |
| WCAG 2.2 and WCAG Techniques | Contrast and accessibility techniques create important minimum checks. | Passing contrast or technique checks is not proof of holistic visual quality. |
| Playwright visual comparisons | Screenshot comparison can catch visual differences against references. | Screenshot diffs prove visual change, not human preference or taste fit. |
| BackstopJS | Maintained OSS pattern for visual regression by comparing screenshots over time. | Visual regression infrastructure is not the same as design taste validation. |
| Model Cards for Model Reporting | Public AI documentation should state intended use, evaluation conditions, and limitations. | This artifact uses that spirit for a workbench packet, not as a model-card claim. |
| US20210011592A1 | Prior-art pattern: AI UI generation can use historical rendered screens and user selection. | Historical UI matching plus user selection is a precedent for anchor-based generation, not proof that this workbench succeeded. |
Local Source Snapshot
Observed on 2026-06-15 KST from the private/local Digital Factory workbench.
| local source class | observed evidence | public boundary |
|---|---|---|
Digital Factory/README.md | The workbench describes itself as a private/local Mimesis Engineering workbench, not public proof, external validation, adoption evidence, or a production product. | Public copy can say the workbench exists behind the next Mimesis iteration; it cannot treat the workbench itself as validation. |
mimesis-plugin/CLAIMS.md | The claim pack explicitly forbids saying visual design wins, retina improves human-perceived quality, external validation exists, or the plugin is production-ready. | This artifact inherits those forbidden claims. |
cases/004-real-world-visual/CASE-NOTE.md | A local portfolio hero task recorded repeated naked wins and a checklist-versus-gestalt gap. | This is one local case, not a general law. |
HUMAN-VERDICT-002.md | The owner chose the naked version over the copy+visual Mimesis version even after visual-designer acceptance criteria passed. | Passing local module criteria is not enough to claim visual quality. |
EXP-005-PREREG.md and HUMAN-VERDICT-003.md | A condition-blind owner run found example-anchor outputs stronger than full rules, while rules could suppress taste. | This supports a redesign direction: anchors and human gates matter more than rigid visual rules in this case. |
RETINA-RESULT.md and HUMAN-VERDICT-003.md | The retina loop changed measurable surface properties, but the owner preferred the pre-retina version in the blind comparison. | The retina loop is not public proof of human-perceived improvement. |
mimesis-source-packet/08-FINDINGS.md | The findings packet records the visual case as a first holistic human-judged Mimesis loss and keeps n=1, owner-judge, single-task limits visible. | This is a learning/failure artifact, not statistical evidence. |
What Actually Happened
| stage | attempted claim | observed result | corrected claim |
|---|---|---|---|
| Copy-only Mimesis | Better copy structure would improve the hero. | The owner preferred the naked version. | Copy structure did not carry the visual first impression in this task. |
| Copy+visual rules | Adding visual-designer rules would beat the naked baseline. | The output passed local acceptance criteria but still lost the owner visual verdict. | Hygiene checks can pass while gestalt quality fails. |
| Truth framing vs visual quality | A more proof-bounded product story might also feel better visually. | The Mimesis output was stronger on status/factual framing but weaker on first-impression visual gestalt. | Honesty and visual taste must be evaluated as separate axes. |
| Artifact-only anchors | Example exposure may preserve taste better than explicit visual rules. | In the condition-blind owner run, anchor outputs beat the full-rule outputs. | Anchor fit is a stronger candidate than rule stuffing, but still unproven beyond this run. |
| Frontier/source-culture rerun | Better source culture and font setup might make the visual module beat naked. | The available owner notes did not provide a per-code ranking strong enough to compute a condition average. | Treat this as weak evidence; do not claim frontier visual Mimesis won. |
| Retina loop | Seeing the render and patching CSS would improve the result. | It changed visible properties, but the owner preferred the pre-retina version. | Automated visual iteration needs target-aesthetic fit and a human gate. |
Allowed Public Claim
The blog and GitHub profile may say:
- Mimesis v.next has local/private failure evidence in visual design.
- In one owner-judged portfolio hero task, rule/checklist visual conditioning did not beat naked or anchor-style variants.
- In that task, truthful status framing and first-impression visual quality separated; better claim boundaries did not automatically create better visual taste.
- This failure sharpened the method: visual Mimesis should not be sold as “rules improve design”; it needs artifact anchors, target-aesthetic fit, and human verdict gates.
- The retina loop produced measurable visual changes, but those changes did not pass the owner preference gate.
- This is a redacted proof-of-learning artifact, not external validation.
Forbidden Public Claim
Do not say or imply:
- Mimesis improves visual design quality.
- The visual workbench is externally validated.
- The result is statistically significant.
- The owner verdict is the same as an external blind panel.
- The retina loop improves human-perceived visual quality.
- The private
mimesis-pluginworkbench is public proof. - Digital Factory is a production product, customer proof, adoption evidence, or commercial validation.
Redaction Rules
Do not publish the raw boards, screenshots, HTML, decode files, or private workbench paths as public product proof.
If a later public packet includes images, it must redact or crop:
- real names and handles,
- raw owner quotes,
- portfolio details that are not already public-safe,
- unreconciled project-status facts,
- financial or trading numbers,
- third-party reference screenshots,
- sealed/decode files without chronology,
- any copy implying customer or commercial validation.
The publishable unit is not the raw artifact. The publishable unit is:
source set -> condition labels -> observed verdict -> failure mode -> banned claims -> next gate
Marketing Use
This is useful marketing only because it lowers the claim ceiling.
Bad marketing:
Mimesis makes AI design better.
Allowed marketing:
Mimesis Engineering is being rebuilt through failure packets. The visual case showed that explicit rules and automated visual loops can lose to better-matched anchors and human taste gates.
Weak Evidence And Reconciliation Notes
These points stay visible so the artifact cannot become a victory story later:
| weak point | why it matters | current handling |
|---|---|---|
| Single owner judge | The owner is relevant for the owner’s own portfolio hero, but this is not an external user panel. | Call it an owner verdict, not external validation. |
| Single task and model family | One hero task cannot prove a general design law. | Treat as directional failure evidence only. |
| EXP-006 lacks per-code ranking | Without a forced ranking, no reliable frontier-vs-naked average can be claimed. | Use only as weak/context evidence. |
| Retina metric conflict | The loop changed measurable surface properties, but later human preference rejected the changed version. | Claim visual change, not human-preference improvement. |
| Internal project-status wording conflict | Some workbench materials use different maturity wording for the underlying product idea. | Do not publish unreconciled status facts or raw boards. |
Next Proof
The next stronger artifact is not a prettier screenshot.
It is one of:
- a redacted before/after board with condition labels and no sensitive raw context,
- an external blind panel with at least five evaluators,
- a Playwright or BackstopJS visual regression harness that proves visual change while keeping human preference separate,
- or a revised visual Mimesis module that states target-aesthetic fit and human verdict gates before claiming improvement.
Until then, the public claim remains:
visual Mimesis failure evidence exists locally, and the main public value is the banned-claim boundary.