Key findings
- →Stylometric similarity converged to >0.9 with as little as 50K tokens of target text.
- →Decision agreement on held-out moral dilemmas plateaued near 62% — close to chance for two-of-three response classes.
- →Adding explicit value statements to the prompt raised agreement to 71% but introduced sycophancy.
What it means in practice
Practitioners should not infer judgment-level fidelity from style-level fidelity. Clones marketed as 'thinking like you' should be evaluated on decision benchmarks, not text similarity.
