[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03015] Error speed synthesized speech while using 16K data with HTS-2.2


Dear all,
  I'm recently switching my HTS project from HTS-2.01 to HTS-2.2. For using the English speaker
 dependent training demo from HTS-2.2 project.
I installed HTS-2.2_for_HTK-3.4.1 without any trouble, and also change my HTS_Engine to 1.05.
In fact, the whole training process went well smoothly, and the synthesized speech sounds good.
But when I want to change the wave data to cmu-bdl (16KHz), I got very bad synthesized speech.
The voice sounds broken, and the speed of the speech is also weird.

I changed the feature extraction parameters in data/Makfile as:

SAMPFREQ    = 16000 &nb sp; # 48000 Sampling frequency (48kHz)
FRAMELEN    = 400     # 1200  Frame length in point (1200 = 48000 * 0.025)
FRAMESHIFT  = 80      # 240   Frame shift in point (240 = 48000 * 0.005)
WINDOWTYPE  = 1       # Window type -> 0: Blackman 1: Hamming 2: Hanning
NORMALIZE   = 1       # Normalization -> 0: none  1: by power  2: by magnitude
FFTLEN      = 1024    # FFT length in point
FREQWARP    = 0.42    # 0.55   # frequency warping factor
GAMMA       = 0       # pole/zero weight for mel-generalized cepstral (MGC) analysis
MGCORDER    = 24      # order of MGC analysis
LNGAIN      = 1       # use logarithmic gain rather than linear gain
LOWERF0     = 40      # lower limit for f0 extraction (Hz)
UPPERF0     = 400     # upper limit for f0 extraction (Hz)
NOISEMASK   = 50      # standard deviation of white noise to mask noises in f0 extraction


and the training parameters in scrpits/Config.pm
as

# Speech Analysis/Synthesis Setting ==============
# speech analysis
$sr = 16000; #48000;  # sampling rate (Hz)
$fs = 80;    #240;    # frame period (point)
$fw = 0.42;  #0.55;   # frequency warping
$gm = 0;              # pole/zero representation weight
$lg = 1;              # use log gain instead of linear gain
$fr = $fs/$sr;        # frame period (sec)

# speech synthesis
$pf = 1.4;     # postfiltering factor
$fl = 4096;    # length of impulse response
$co = 2047;    # order of cepstrum to approximate mel-generalized cepstrum

The rest of t he training parameters remain the same, but I cannot get correct result from training.
Could anyone tell me where can I possibly go wrong? 

Thanks in advance!
Sincerely,
Mandy

Follow-Ups
[hts-users:03016] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Keiichiro Oura