[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03168] Questions from a novice about improving voice quality

Subject: [hts-users:03168] Questions from a novice about improving voice quality
From: Matt Campbell <mattcampbell@xxxxxxxxx>
Date: Wed, 15 Feb 2012 08:52:53 -0600
Delivered-to: hts-users@xxxxxxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=mime-version :content-type:content-transfer-encoding:date:from:to:subject :message-id; s=sasl; bh=QvuxK4XKZIHG5yFR7iCuuzbqqdQ=; b=KTeDgvy9 lNh0r+oT+1gU7sBJ+0ekwngNxjC7m6tAFTUYhstW2c4b4jCJuz6f2OXHyDPpIvd1 W9EwcIF+y6Uvr2anBh6mJH+8V8iFB9OZF7YpkpIQ7+fg7Lk641DLmGBIhvUAzzJk ieRT8nYIBRRXUBDB5qEG2xf18XkpHWYFaFA=
Domainkey-signature: a=rsa-sha1; c=nofws; d=pobox.com; h=mime-version :content-type:content-transfer-encoding:date:from:to:subject :message-id; q=dns; s=sasl; b=d8yD35mEY2/EEDjE/CFnI5ozDMgUJXRs8/ 4/NwxoM8o4lFcnVZDGbSgDVs94xZDrdpu08BuMEOFrjZg2gExvM+hQZRvp9/i4bP mYxizugqsV/KNfuD9dB4gjF7W8mFLPygXnFHR6juHOdGWyHvyX71EevEn6gEKa82 24DnmWqFE=

Hello:

I've just started learning about HMM-based speech synthesis, and I findit quite interesting. It seems to me that HMM-based synthesis is moreflexible than unit selection, and produces a more consistent quality ofoutput. In particular, HMM-based synthesis seems to avoid the glitchesthat are common in unit selection when the input text doesn't overlapmuch with the pre-recorded units. So HMM-based synthesis seems like areturn to the strengths of formant-based synthesis. But it seems to methat unit selection still does a much better job of reproducing theunique sound of the original speaker's voice.

I noticed that the CMU ARCTIC SLT demo voice is quite small, ~2 MBuncompressed. And as one might expect for such a small voice definition,it doesn't faithfully reproduce the sound of the woman's voice in theoriginal recordings. Indeed, it sounds much like the small-footprintvoices that I've heard on mobile devices, e.g. the compact US Englishfemale voice on the iPhone (Samantha) or the US English female voice onAndroid (SVOX Pico).

So, can a larger HMM voice be built, that more accurately reproducesthe voice in the original recordings? For example, would it be possibleto produce a larger mgc.pdf file to achieve this goal? In the case ofthe CMU ARCTIC SLT demo package, would increasing the MGCORDER optionhelp? I ask about the MGC data because mgc.pdf is by far the largestfile in the binary package, suggesting that it contains the mostinformation about the sound of the voice.

If not the MGC order, then are there any other numbers that can betuned to improve voice quality, potentially at the expense of a largermodel or longer training time?


Or is this a fundamental limitation of HMM-based synthesis?

Even if the latter is true, I would probably accept the SLT voice inday-to-day use; after all, my current favorite synthesizer is a formantsynthesizer. I just want to know more about what's possible withHMM-based synthesis.


Thanks,
Matt

Prev by Subject: [hts-users:03167] How to use HMgeTool?
Next by Subject: [hts-users:03169] Re: How to use HMgeTool?
Previous by thread: [hts-users:03169] Re: How to use HMgeTool?
Next by thread: [hts-users:03170] Should we enable MGE and GV at the same time?