I wouldn't worry about it. At the scale I could see it being an issue I think the right answer becomes to simply store all the transcoded versions on disk and serve them directly. It would in all likelihood be more cost effective to pay for the storage than to pay for the CPU time to encode everything on the fly given the small size of audio files.