You need to install the XDNA and XRT drivers, then install the Vitis AI compiler (needs a free license) and then you can compile your own custom kernels using mlir-aie using plain old C++. They give you an easy to use API for vectorization. The hardest part is organizing the data transfer through the tiles.
In principle this can accelerate anything you can run on a normal CPU as long as it is a streaming workload i.e. single pass or your entire data fits in the memory tiles.
The vector registers are at least 1024 bit wide and you get 32 tiles/cores.
In principle this can accelerate anything you can run on a normal CPU as long as it is a streaming workload i.e. single pass or your entire data fits in the memory tiles.
The vector registers are at least 1024 bit wide and you get 32 tiles/cores.