CUDA is more than a C++ dialect, which is a big thing people keep missing with all those "CUDA replacements".
Until CUDA 3.0, it was similar to OpenCL, a C dialect, however afterwards it became a C, C++ dialect, with common infrastructure PTX.
PGI targeted PTX, with their C, C++, and very relevant, Fortran compilers for HPC.
PGI was acquired by NVidia, and became the main set of CUDA compilers.
Given PTX, many other languages started targeting CUDA as well, Java, .NET, Haskell, Julia, at very least.
NVidia is now invested into a Python JIT for CUDA as well.
So yeah, while C++20 is the main language in CUDA, there is also a whole ecosystem of programming languages, that the "CUDA replacements" keep ignoring.
In these CUDA vs ROCm comparisons I think they mostly compare the C++ dialects. And it's not even particularly the language implementation where ROCm is weak, but rather, the whole tool chain. Am I mistaken?
Well the C++ segment is important and from what I gather, ROCm is failing at that. AMD would be a lot better off if the C++ part worked, even if the other parts don't.
Until CUDA 3.0, it was similar to OpenCL, a C dialect, however afterwards it became a C, C++ dialect, with common infrastructure PTX.
PGI targeted PTX, with their C, C++, and very relevant, Fortran compilers for HPC.
PGI was acquired by NVidia, and became the main set of CUDA compilers.
Given PTX, many other languages started targeting CUDA as well, Java, .NET, Haskell, Julia, at very least.
NVidia is now invested into a Python JIT for CUDA as well.
So yeah, while C++20 is the main language in CUDA, there is also a whole ecosystem of programming languages, that the "CUDA replacements" keep ignoring.