[hts-users:03017] Re: Error speed synthesized speech while using 16K dat

Dear Keiichiro,

The data in /raw are already prepared as 16KHz-16bit, but I get bad results, is there any chance that some settings is incorrect?

Sincerely,
Mandy

> Date: Wed, 7 Sep 2011 12:52:11 +0900
> From: uratec@xxxxxxxxxxxxxxx
> Subject: [hts-users:03016] Re: Error speed synthesized speech while using 16K data with HTS-2.2
> To: hts-users@xxxxxxxxxxxxxxx
> CC: uratec@xxxxxxxxxxxx
>
> Hi,
>
> data/raw/*.raw should be down-sampled from 48kHz to 16kHz.
>
> x2x +sf < 48kHz_16bit.raw | \
> ds -s 32 | \
> ds -s 21 | \
> x2x +fs > 16kHz_16bit.raw
>
> Regards,
> Keiichiro Oura
>
>
> 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>:
> > Dear all,
> > I'm recently switching my HTS project from HTS-2.01 to HTS-2.2. For using
> > the English speaker
& gt; > dependent training demo from HTS-2.2 project.
> > I installed HTS-2.2_for_HTK-3.4.1 without any trouble, and also change my
> > HTS_Engine to 1.05.
> > In fact, the whole training process went well smoothly, and the synthesized
> > speech sounds good.
> > But when I want to change the wave data to cmu-bdl (16KHz), I got very bad
> > synthesized speech.
> > The voice sounds broken, and the speed of the speech is also weird.
> > I changed the feature extraction parameters in data/Makfile as:
> > SAMPFREQ = 16000 &nb sp; # 48000 Sampling frequency (48kHz)
> > FRAMELEN = 400 # 1200 Frame length in point (1200 = 48000 * 0.025)
> > FRAMESHIFT = 80 # 240 Frame shift in point (240 = 48000 * 0.005)
> > WINDOWTYPE = 1 # Window type -> 0: Blackman 1: Ham ming 2: Hanning
> > NORMALIZE = 1 # Normalization -> 0: none 1: by power 2: by
> > magnitude
> > FFTLEN = 1024 # FFT length in point
> > FREQWARP = 0.42 # 0.55 # frequency warping factor
> > GAMMA = 0 # pole/zero weight for mel-generalized cepstral (MGC)
> > analysis
> > MGCORDER = 24 # order of MGC analysis
> > LNGAIN = 1 # use logarithmic gain rather than linear gain
> > LOWERF0 = 40 # lower limit for f0 extraction (Hz)
> > UPPERF0 = 400 # upper limit for f0 extraction (Hz)
> > NOISEMASK = 50 # standard deviation of white noise to mask noises in
> > f0 extrac tion
> >
> > and the training parameters in scrpits/Config.pm
> > as
> > # Speech Analysis/Synthesis Setting ==============
> > # speech analysis
> > $sr = 16000; #48000; # sampling rate (Hz)
> > $fs = 80; #240; # frame period (point)
> > $fw = 0.42; #0.55; # frequency warping
> > $gm = 0; # pole/zero representation weight
> > $lg = 1; # use log gain instead of linear gain
> > $fr = $fs/$sr; # frame period (sec)
> > # speech synthesis
> > $pf = 1.4; # postfiltering factor
> > $fl = 4096; # length of impulse response
> > $co = 2047; # order of cepstrum to approximate mel-generalized cepstrum
> > The rest of t he training parameter s remain the same, but I cannot get
> > correct result from training.
> > Could anyone tell me where can I possibly go wrong?
> > Thanks in advance!
> > Sincerely,
> > Mandy
>

[hts-users:03017] Re: Error speed synthesized speech while using 16K data with HTS-2.2