Did you want an audio file, or the speech synthesis which is available on Chrome and Safari ?
Either way i would use Javascript if there are more than 5 words, so when the click event happens, the word sets a small script running.
Do you have an example file you can upload ?
I would also check this out: Javascript Audio not working