[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03019] Re: Error speed synthesized speech while using 16K data with HTS-2.2


Hi,

How did you prepare 16kHz *.raw files?

Regards,
Keiichiro Oura


2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>:
> Dear Keiichiro,
>    The data in /raw are already prepared as 16KHz-16bit, but I get bad
> results, is there any chance that some settings is incorrect?
> Sincerely,
> Mandy
>> Date: Wed, 7 Sep 2011 12:52:11 +0900
>> From: uratec@xxxxxxxxxxxxxxx
>> Subject: [hts-users:03016] Re: Error speed synthesized speech while using
>> 16K data with HTS-2.2
>> To: hts-users@xxxxxxxxxxxxxxx
>> CC: uratec@xxxxxxxxxxxx
>>
>> Hi,
>>
>> data/raw/*.raw should be down-sampled from 48kHz to 16kHz.
>>
>> x2x +sf < 48kHz_16bit.raw | \
>> ds -s 32 | \
>> ds -s 21 | \
>> x2x +fs > 16kHz_16bit.raw
>>
>> Regards,
>> Keiichiro Oura
>>
>>
>> 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>:
>> > Dear all,
>> >   I'm recently switching my HTS project from HTS-2.01 to HTS-2.2. For
>> > using
>> > the English speaker
> & gt; >  dependent training demo from HTS-2.2 project.
>> > I installed HTS-2.2_for_HTK-3.4.1 without any trouble, and also change
>> > my
>> > HTS_Engine to 1.05.
>> > In fact, the whole training process went well smoothly, and the
>> > synthesized
>> > speech sounds good.
>> > But when I want to change the wave data to cmu-bdl (16KHz), I got very
>> > bad
>> > synthesized speech.
>> > The voice sounds broken, and the speed of the speech is also weird.
>> > I changed the feature extraction parameters in data/Makfile as:
>> > SAMPFREQ    = 16000 &nb sp; # 48000 Sampling frequency (48kHz)
>> > FRAMELEN    = 400     # 1200  Frame length in point (1200 = 48000 *
>> > 0.025)
>> > FRAMESHIFT  = 80      # 240   Frame shift in point (240 = 48000 * 0.005)
>> > WINDOWTYPE  = 1       # Window type -> 0: Blackman 1: Ham ming 2:
>> > Hanning
>> > NORMALIZE   = 1       # Normalization -> 0: none  1: by power  2: by
>> > magnitude
>> > FFTLEN      = 1024    # FFT length in point
>> > FREQWARP    = 0.42    # 0.55   # frequency warping factor
>> > GAMMA       = 0       # pole/zero weight for mel-generalized cepstral
>> > (MGC)
>> > analysis
>> > MGCORDER    = 24      # order of MGC analysis
>> > LNGAIN      = 1       # use logarithmic gain rather than linear gain
>> > LOWERF0     = 40      # lower limit for f0 extraction (Hz)
>> > UPPERF0     = 400     # upper limit for f0 extraction (Hz)
>> > NOISEMASK   = 50      # standard deviation of white noise to mask noises
>> > in
>> > f0 extrac tion
>> >
>> > and the training parameters in scrpits/Config.pm
>> > as
>> > # Speech Analysis/Synthesis Setting ==============
>> > # speech analysis
>> > $sr = 16000; #48000;  # sampling rate (Hz)
>> > $fs = 80;    #240;    # frame period (point)
>> > $fw = 0.42;  #0.55;   # frequency warping
>> > $gm = 0;              # pole/zero representation weight
>> > $lg = 1;              # use log gain instead of linear gain
>> > $fr = $fs/$sr;        # frame period (sec)
>> > # speech synthesis
>> > $pf = 1.4;     # postfiltering factor
>> > $fl = 4096;    # length of impulse response
>> > $co = 2047;    # order of cepstrum to approximate mel-generalized
>> > cepstrum
>> > The rest of t he training parameter s remain the same, but I cannot get
>> > correct result from training.
>> > Could anyone tell me where can I possibly go wrong?
>> > Thanks in advance!
>> > Sincerely,
>> > Mandy
>>
>

Follow-Ups
[hts-users:03020] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Yu-Chieh Chen
References
[hts-users:03015] Error speed synthesized speech while using 16K data with HTS-2.2, Yu-Chieh Chen
[hts-users:03016] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Keiichiro Oura
[hts-users:03017] Re: Error speed synthesized speech while using 16K data with HTS-2.2, Yu-Chieh Chen