HN2new | past | comments | ask | show | jobs | submitlogin

Diffusion-based reasoning is fascinating - curious how it handles sequential dependencies vs traditional autoregressive. For complex planning tasks where step N heavily depends on steps 1-N, does the parallel generation sometimes struggle with consistency? Or does the model learn to encode those dependencies in a way that works well during parallel sampling?
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: