[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03017] Re: Error speed synthesized speech while using 16K data with HTS-2.2


Dear Keiichiro,
   The data in /raw are already prepared as 16KHz-16bit, but I get bad results, is there any chance that some settings is incorrect?

Sincerely,
Mandy

> Date: Wed, 7 Sep 2011 12:52:11 +0900
> From: uratec@xxxxxxxxxxxxxxx
> Subject: [hts-users:03016] Re: Error speed synthesized speech while using 16K data with HTS-2.2
> To: hts-users@xxxxxxxxxxxxxxx
> CC: uratec@xxxxxxxxxxxx
>
> Hi,
>
> data/raw/*.raw should be down-sampled from 48kHz to 16kHz.
>
> x2x +sf < 48kHz_16bit.raw | \
> ds -s 32 | \
> ds -s 21 | \
> x2x +fs > 16kHz_16bit.raw
>
> Regards,
> Keiichiro Oura
>
>
> 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>:
> > Dear all,
> >   I'm recently switching my HTS project from HTS-2.01 to HTS-2.2. For using
> > the English speaker
& gt; >  dependent training demo from HTS-2.2 project.
> > I installed HTS-2.2_for_HTK-3.4.1 without any trouble, and also change my
> > HTS_Engine to 1.05.
> > In fact, the whole training process went well smoothly, and the synthesized
> > speech sounds good.
> > But when I want to change the wave data to cmu-bdl (16KHz), I got very bad
> > synthesized speech.
> > The voice sounds broken, and the speed of the speech is also weird.
> > I changed the feature extraction parameters in data/Makfile as:
> > SAMPFREQ    = 16000 &nb sp; # 48000 Sampling frequency (48kHz)
> > FRAMELEN    = 400     # 1200  Frame length in point (1200 = 48000 * 0.025)
> > FRAMESHIFT  = 80      # 240   Frame shift in point (240 = 48000 * 0.005)
> > WINDOWTYPE  = 1       # Window type -> 0: Blackman 1: Ham ming 2: Hanning
> > NORMALIZE   = 1       # Normalization -> 0: none  1: by power  2: by
> > magnitude
> > FFTLEN      = 1024    # FFT length in point
> > FREQWARP    = 0.42    # 0.55   # frequency warping factor
> > GAMMA       = 0       # pole/zero weight for mel-generalized cepstral (MGC)
> > analysis
> > MGCORDER    = 24      # order of MGC analysis
> > LNGAIN      = 1       # use logarithmic gain rather than linear gain
> > LOWERF0     = 40      # lower limit for f0 extraction (Hz)
> > UPPERF0     = 400     # upper limit for f0 extraction (Hz)
> > NOISEMASK   = 50      # standard deviation of white noise to mask noises in
> > f0 extrac tion
> >
> > and the training parameters in scrpits/Config.pm
> > as
> > # Speech Analysis/Synthesis Setting ==============
> > # speech analysis
> > $sr = 16000; #48000;  # sampling rate (Hz)
> > $fs = 80;    #240;    # frame period (point)
> > $fw = 0.42;  #0.55;   # frequency warping
> > $gm = 0;              # pole/zero representation weight
> > $lg = 1;              # use log gain instead of linear gain
> > $fr = $fs/$sr;        # frame period (sec)
> > # speech synthesis
> > $pf = 1.4;     # postfiltering factor
> > $fl = 4096;    # length of impulse response
> > $co = 2047;    # order of cepstrum to approximate mel-generalized cepstrum
> > The rest of t he training parameter s remain the same, but I cannot get
> > correct result from training.
> > Could anyone tell me where can I possibly go wrong?
> > Thanks in advance!
> > Sincerely,
> > Mandy
>

Follow-Ups
[hts-users:03019] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Keiichiro Oura
References
[hts-users:03015] Error speed synthesized speech while using 16K data with HTS-2.2, Yu-Chieh Chen
[hts-users:03016] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Keiichiro Oura