Reproducing DE11: evidence plan and limitations

The Distinction Engine results are a starting point for reproduction, not proof that a browser runtime or language model exhibits teleodynamic learning.

The source paper reports competitive results on three tabular datasets and a much weaker DIGITS result. A reproduction should pin the source revision, data splits, preprocessing, seeds, initial energy, structural action costs, phase windows, and all hyperparameters.

Reproduce more than final accuracy

Retain hypothesis lineages, energy traces, candidate-action scores, transition rate, noop streaks, freeze step, and ablation runs. Final accuracy alone cannot establish the dynamics.

Stress the listed limitations

Tests should vary feature dimension, class count, parameter correlation, initial energy, score calibration, and action vocabulary. Independent negative results are useful evidence.