Given that it is in "platform/bootable/bootloader/legacy", I'm not sure there is much point getting all worked up about performance -- presumably something much more efficient would be used for the vast majority of cases.
I haven't updated my source tree in a long time, but the ARM devices were using bionic/libc/arch-arm/bionic/memset.S. This is pure ARM assembly optimized for the pipeline and cache.
My ARM assembly is a little rusty, but it looks like they implement memset and bzero, the latter just calling memset with R1=0.