Hi,
The raw files are from the previous Adapt project (I may processed them in Matlab in Windows, but I can't recall), and I have been using them with HTS-2.1 for a while without any trouble. Yesterday, I tried to downsample the 48K_slt.raw to 16K_slt.raw, and run the HTS-2.2 project on them, the synthesized speech looks fine to me. So I guess it must be something wrong with my bdl-16K raw files. Thanks for your help! Mandy > Date: Thu, 8 Sep 2011 23:04:04 +0900 > From: uratec@xxxxxxxxxxxxxxx > Subject: [hts-users:03019] Re: Error speed synthesized speech while using 16K data with HTS-2.2 > To: hts-users@xxxxxxxxxxxxxxx > CC: uratec@xxxxxxxxxxxx > > Hi, > > How did you prepare 16kHz *.raw files? > > Regards, > Keiichiro Oura > > > 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>: > > Dear Keiichiro, > > The data in /raw are already prepared as 16KHz-16bit, but I get bad > > results, is there any chance that some settings is incorrect? > > Sincerely, > > Mandy > >> Date: Wed, 7 Sep 2011 12:52:11 +0900 > >> From: uratec@xxxxxxxxxxxxxxx > >> Subject: [hts-users:03016] Re: Error speed synthesized speech while using > >> 16K data with HTS-2.2 > >> To: hts-users@xxxxxxxxxxxxxxx > >> CC: uratec@xxxxxxxxxxxx > >> > >> Hi, > >> > >> data/raw/*.raw should be down-sampled from 48kHz to 16kHz. > >> > >> x2x +sf < 48kHz_16bit.raw | \ > >> ds -s 32 | \ > >> ds -s 21 | \ > >> x2x +fs > 16kHz_16bit.raw > >> > >> Regards, > >> Keiichiro Oura > >> > >> > >> 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>: > >> > Dear all, > >> > I'm recently switching my HTS project from HTS-2.01 to HTS-2.2. For > >> > using > >> > the English speaker > > & gt; > dependent training demo from HTS-2.2 project. > >> > I installed HTS-2.2_for_HTK-3.4.1 without any trouble, and also change > >> > my > >> > HTS_Engine to 1.05. > >> > In fact, the whole training process went well smoothly, and the > >> > synthesized > >> > speech sounds good. > >> > But when I want to change the wave data to cmu-bdl (16KHz), I got very > >> > bad > >> > synthesized speech. > >> > The voice sounds broken, and the speed of the speech is also weird. > >> > I changed the feature extraction parameters in data/Makfile as: > >& gt; > SAMPFREQ = 16000 &nb sp; # 48000 Sampling frequency (48kHz) > >> > FRAMELEN = 400 # 1200 Frame length in point (1200 = 48000 * > >> > 0.025) > >> > FRAMESHIFT = 80 # 240 Frame shift in point (240 = 48000 * 0.005) > >> > WINDOWTYPE = 1 # Window type -> 0: Blackman 1: Ham ming 2: > >> > Hanning > >> > NORMALIZE = 1 # Normalization -> 0: none 1: by power 2: by > >> > magnitude > >> > FFTLEN = 1024 # FFT length in point > >> > FREQWARP = 0.42 # 0.55 # frequency warping factor > >> > GAMMA = 0 # pole/zero weight for mel-generalized cepstral > >> > (M GC) > >> > analysis > >> > MGCORDER = 24 # order of MGC analysis > >> > LNGAIN = 1 # use logarithmic gain rather than linear gain > >> > LOWERF0 = 40 # lower limit for f0 extraction (Hz) > >> > UPPERF0 = 400 # upper limit for f0 extraction (Hz) > >> > NOISEMASK = 50 # standard deviation of white noise to mask noises > >> > in > >> > f0 extrac tion > >> > > >> > and the training parameters in scrpits/Config.pm > >> > as > >> > # Speech Analysis/Synthesis Setting ============== > >> > # speech analysis > >> > $sr = 16000; #48000; # sampling rate (Hz) > >> > $fs = 80; #240; &nbs p; # frame period (point) > >> > $fw = 0.42; #0.55; # frequency warping > >> > $gm = 0; # pole/zero representation weight > >> > $lg = 1; # use log gain instead of linear gain > >> > $fr = $fs/$sr; # frame period (sec) > >> > # speech synthesis > >> > $pf = 1.4; # postfiltering factor > >> > $fl = 4096; # length of impulse response > >> > $co = 2047; # order of cepstrum to approximate mel-generalized > >> > cepstrum > >> > The rest of t he training parameter s remain the same, but I cannot get > >> > correct result from training. > >> > Could anyone tell me where can I possibly go wrong? > >> > Thanks in advance! > >> > Sincerely, > >> > Mandy > >> > > > |