if the probability mass is on a single token, its a precise answer like `1 + 1 = `
if next token predicted shares probability with other token, then there are multiple answers like `position: `
you can generate and train answers by exploring on varying the length of the code generated
you can generate and train answers by exploring on varying the length of the code generated