Fidelity of Style Versus Fidelity of Judgment in Personal Language Models

Key findings

→Stylometric similarity converged to >0.9 with as little as 50K tokens of target text.
→Decision agreement on held-out moral dilemmas plateaued near 62% — close to chance for two-of-three response classes.
→Adding explicit value statements to the prompt raised agreement to 71% but introduced sycophancy.

What it means in practice

Practitioners should not infer judgment-level fidelity from style-level fidelity. Clones marketed as 'thinking like you' should be evaluated on decision benchmarks, not text similarity.

← Back to research