Another issue is that the start of samples do not coincide with the start of notes; for example, the “sa” sound spends a lot of time on the “s” sound, but the start of a note sung with the syllable “sa” would be more aligned with the beginning of the “a” sound. Hence, timing of samples would have to be shifted in order for the sung notes to be in rhythm with the song.
Hmm that sounds a lot like Cz’s otoing theory —
Next on the presentation was a description of how a sound bank is created. The person voice the sound bank first sings through a specialized script, which sounds a bit like Buddhist chanting according to Kenmochi.
VCV…I KNOW THIS FEELING…
Apparently, once they invited an American singer to record for VOCALOID, and he was happy and enthusiastic at first. However, as recording progressed, he gradually got angrier and angrier and eventually escaped.
the pain of English ragequitting
VOCALOID3 also introduced triphones, where a specific sequence of three phonemes can trigger a special triphone sample instead of using a blend of two diphone samples
and consonant length control, which apparently reduces how metallic certain consonants sounded.
a little jarred