Both approaches are valid, but I would hope they are using a separate model to validate responses, rather than crippling the base model(s).
In OpenAI's case, we don't know for sure, but it seems like a combination of both, resulting in lower quality responses overall.
I imagine LLaMA was fed highly-vetted training data, as opposed to being "fixed" afterwards.
I imagine LLaMA was fed highly-vetted training data, as opposed to being "fixed" afterwards.