Dear Keiichiro,
The data in /raw are already prepared as 16KHz-16bit, but I get bad results, is there any chance that some settings is incorrect? Sincerely, Mandy > Date: Wed, 7 Sep 2011 12:52:11 +0900 > From: uratec@xxxxxxxxxxxxxxx > Subject: [hts-users:03016] Re: Error speed synthesized speech while using 16K data with HTS-2.2 > To: hts-users@xxxxxxxxxxxxxxx > CC: uratec@xxxxxxxxxxxx > > Hi, > > data/raw/*.raw should be down-sampled from 48kHz to 16kHz. > > x2x +sf < 48kHz_16bit.raw | \ > ds -s 32 | \ > ds -s 21 | \ > x2x +fs > 16kHz_16bit.raw > > Regards, > Keiichiro Oura > > > 2011/9/7 Yu-Chieh Chen <tobysworld@xxxxxxxxxxx>: > > Dear all, > > I'm recently switching my HTS project from HTS-2.01 to HTS-2.2. For using > > the English speaker & gt; > dependent training demo from HTS-2.2 project. > > I installed HTS-2.2_for_HTK-3.4.1 without any trouble, and also change my > > HTS_Engine to 1.05. > > In fact, the whole training process went well smoothly, and the synthesized > > speech sounds good. > > But when I want to change the wave data to cmu-bdl (16KHz), I got very bad > > synthesized speech. > > The voice sounds broken, and the speed of the speech is also weird. > > I changed the feature extraction parameters in data/Makfile as: > > SAMPFREQ = 16000 &nb sp; # 48000 Sampling frequency (48kHz) > > FRAMELEN = 400 # 1200 Frame length in point (1200 = 48000 * 0.025) > > FRAMESHIFT = 80 # 240 Frame shift in point (240 = 48000 * 0.005) > > WINDOWTYPE = 1 # Window type -> 0: Blackman 1: Ham ming 2: Hanning > > NORMALIZE = 1 # Normalization -> 0: none 1: by power 2: by > > magnitude > > FFTLEN = 1024 # FFT length in point > > FREQWARP = 0.42 # 0.55 # frequency warping factor > > GAMMA = 0 # pole/zero weight for mel-generalized cepstral (MGC) > > analysis > > MGCORDER = 24 # order of MGC analysis > > LNGAIN = 1 # use logarithmic gain rather than linear gain > > LOWERF0 = 40 # lower limit for f0 extraction (Hz) > > UPPERF0 = 400 # upper limit for f0 extraction (Hz) > > NOISEMASK = 50 # standard deviation of white noise to mask noises in > > f0 extrac tion > > > > and the training parameters in scrpits/Config.pm > > as > > # Speech Analysis/Synthesis Setting ============== > > # speech analysis > > $sr = 16000; #48000; # sampling rate (Hz) > > $fs = 80; #240; # frame period (point) > > $fw = 0.42; #0.55; # frequency warping > > $gm = 0; # pole/zero representation weight > > $lg = 1; # use log gain instead of linear gain > > $fr = $fs/$sr; # frame period (sec) > > # speech synthesis > > $pf = 1.4; # postfiltering factor > > $fl = 4096; # length of impulse response > > $co = 2047; # order of cepstrum to approximate mel-generalized cepstrum > > The rest of t he training parameter s remain the same, but I cannot get > > correct result from training. > > Could anyone tell me where can I possibly go wrong? > > Thanks in advance! > > Sincerely, > > Mandy > |