[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03020] Re: Error speed synthesized speech while using 16K data with HTS-2.2


Hi,
  The raw files are from the previous Adapt project (I may processed them in Matlab in Windows, but I can't recall), and I have been using them with HTS-2.1 for a while without any trouble.
Yesterday, I tried to downsample the 48K_slt.raw to 16K_slt.raw, and run the HTS-2.2 project on them, the synthesized speech looks fine to me.
So I guess it must be something wrong with my bdl-16K raw files.

Thanks for your help!
Mandy

> Date: Thu, 8 Sep 2011 23:04:04 +0900
> From: uratec@xxxxxxxxxxxxxxx
> Subject: [hts-users:03019] Re: Error speed synthesized speech while using 16K data with HTS-2.2
> To: hts-users@xxxxxxxxxxxxxxx
> CC: uratec@xxxxxxxxxxxx
>
> Hi,
>
> How did you prepare 16kHz *.raw files?
>
> Regards,
> Keiichiro Oura
>
>
> 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>:
> > Dear Keiichiro,
> >    The data in /raw are already prepared as 16KHz-16bit, but I get bad
> > results, is there any chance that some settings is incorrect?
> > Sincerely,
> > Mandy
> >> Date: Wed, 7 Sep 2011 12:52:11 +0900
> >> From: uratec@xxxxxxxxxxxxxxx
> >> Subject: [hts-users:03016] Re: Error speed synthesized speech while using
> >> 16K data with HTS-2.2
> >> To: hts-users@xxxxxxxxxxxxxxx
> >> CC: uratec@xxxxxxxxxxxx
> >>
> >> Hi,
> >>
> >> data/raw/*.raw should be down-sampled from 48kHz to 16kHz.
> >>
> >> x2x +sf < 48kHz_16bit.raw | \
> >> ds -s 32 | \
> >> ds -s 21 | \
> >> x2x +fs > 16kHz_16bit.raw
> >>
> >> Regards,
> >> Keiichiro Oura
> >>
> >>
> >> 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>:
> >> > Dear all,
> >> >   I'm recently switching my HTS project from HTS-2.01 to HTS-2.2. For
> >> > using
> >> > the English speaker
> > & gt; >  dependent training demo from HTS-2.2 project.
> >> > I installed HTS-2.2_for_HTK-3.4.1 without any trouble, and also change
> >> > my
> >> > HTS_Engine to 1.05.
> >> > In fact, the whole training process went well smoothly, and the
> >> > synthesized
> >> > speech sounds good.
> >> > But when I want to change the wave data to cmu-bdl (16KHz), I got very
> >> > bad
> >> > synthesized speech.
> >> > The voice sounds broken, and the speed of the speech is also weird.
> >> > I changed the feature extraction parameters in data/Makfile as:
> >& gt; > SAMPFREQ    = 16000 &nb sp; # 48000 Sampling frequency (48kHz)
> >> > FRAMELEN    = 400     # 1200  Frame length in point (1200 = 48000 *
> >> > 0.025)
> >> > FRAMESHIFT  = 80      # 240   Frame shift in point (240 = 48000 * 0.005)
> >> > WINDOWTYPE  = 1       # Window type -> 0: Blackman 1: Ham ming 2:
> >> > Hanning
> >> > NORMALIZE   = 1       # Normalization -> 0: none  1: by power  2: by
> >> > magnitude
> >> > FFTLEN      = 1024    # FFT length in point
> >> > FREQWARP    = 0.42    # 0.55   # frequency warping factor
> >> > GAMMA       = 0       # pole/zero weight for mel-generalized cepstral
> >> > (M GC)
> >> > analysis
> >> > MGCORDER    = 24      # order of MGC analysis
> >> > LNGAIN      = 1       # use logarithmic gain rather than linear gain
> >> > LOWERF0     = 40      # lower limit for f0 extraction (Hz)
> >> > UPPERF0     = 400     # upper limit for f0 extraction (Hz)
> >> > NOISEMASK   = 50      # standard deviation of white noise to mask noises
> >> > in
> >> > f0 extrac tion
> >> >
> >> > and the training parameters in scrpits/Config.pm
> >> > as
> >> > # Speech Analysis/Synthesis Setting ==============
> >> > # speech analysis
> >> > $sr = 16000; #48000;  # sampling rate (Hz)
> >> > $fs = 80;    #240; &nbs p;  # frame period (point)
> >> > $fw = 0.42;  #0.55;   # frequency warping
> >> > $gm = 0;              # pole/zero representation weight
> >> > $lg = 1;              # use log gain instead of linear gain
> >> > $fr = $fs/$sr;        # frame period (sec)
> >> > # speech synthesis
> >> > $pf = 1.4;     # postfiltering factor
> >> > $fl = 4096;    # length of impulse response
> >> > $co = 2047;    # order of cepstrum to approximate mel-generalized
> >> > cepstrum
> >> > The rest of t he training parameter s remain the same, but I cannot get
> >> > correct result from training.
> >> > Could anyone tell me where can I possibly go wrong?
> >> > Thanks in advance!
> >> > Sincerely,
> >> > Mandy
> >>
> >
>

References
[hts-users:03015] Error speed synthesized speech while using 16K data with HTS-2.2, Yu-Chieh Chen
[hts-users:03016] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Keiichiro Oura
[hts-users:03017] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Yu-Chieh Chen
[hts-users:03019] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Keiichiro Oura