[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04640] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong.


Is your training data actually downsampled to 16KHz? hts doesn't do it automatically.
MGCORDER=39 might be too high for 16KHz.

On Mon, Dec 31, 2018 at 1:07 AM ljh_dev <ljh_dev@xxxxxxx> wrote:
I had ever configure as follow, but the synthesised voice is also completely wrong:
    ./configure  --with-fest-search-path=/usr/local/festival/examples 
                 --with-sptk-search-path=/usr/local/SPTK-3.9/bin \
                 --with-hts-search-path=/usr/local/HTS-2.3_for_HTK-3.4.1/bin \
                 --with-hts-engine-search-path=/usr/local/hts_engine_API-1.10/bin  \
                 SAMPFREQ=16000 FRAMESHIFT=80  FRAMELEN=200 FREQWARP=0.42 GAMMA=0 MGCORDER=39 MGCLSP=0 LNGAIN=1





在 2018-12-31 16:56:24,"Ghost Gloomy" <gloomyghostbit@xxxxxxxxx> 写道:
Seems that

ljh_dev <ljh_dev@xxxxxxx> 于2018年12月31日周一 下午4:16写道:
It said "verify  STRAIGHT supports 16k...", if do not used STRAIGHT,is still OK?





At 2018-12-31 01:53:26, "Oytun Turk" <oytunturk@xxxxxxxxx> wrote:

I don't have a functional installation of latest hts here but quickly checking configure.ac, I'd suggest trying as follows:


AC_ARG_VAR([USESTRAIGHT],[turn on STRAIGHT-based analysis (0:off or 1:on, default=0)]) #--> if this is 'on' in your case, I'd verify STRAIGHT analysis supports 16KHz out of the box

AC_ARG_VAR([FRAMELEN],[frame length in point (default=1200)]) # should be 400 for 16KHz

AC_ARG_VAR([FRAMESHIFT],[frame shift in point (default=240)]) # should be 80 for 16KHz

AC_ARG_VAR([FFTLEN],[FFT length in point (default=2048)]) # can be lower (1024 or even 512) for 16KHz

AC_ARG_VAR([SAMPFREQ],[sampling frequency in Hz (default=48000)]) # should be 16000 for 16KHz

AC_ARG_VAR([FREQWARP],[frequency warping factor (default=0.55)]) # not sure what a reasonable 16Khz setting would be for MGCs, I'd experiment with default then change if that fails

AC_ARG_VAR([GAMMA],[pole/zero weight factor (0: mel-cepstral analysis  1: LPC analysis  2,3,...,N: mel-generalized cepstral (MGC) analysis) (default=0)])

AC_ARG_VAR([MGCORDER],[order of MGC analysis (default=34 for cepstral form, default=12 for LSP form)]) # can be reduced to 24 or even 20. It may break things at the vocoder side if the vocoder you use does not support the order you specified

AC_ARG_VAR([BAPORDER],[order of BAP analysis (default=24)]) # can be reduced. Experiment with 8 or maybe even 5.

AC_ARG_VAR([LOWERF0],[lower limit for F0 extraction in Hz (default=110)]) # Depends on min f0 of speaker. This would be too high for a male voice

AC_ARG_VAR([UPPERF0],[upper limit for F0 extraction in Hz (default=280)]) # Depends on max f0 of speaker.

AC_ARG_VAR([PSTFILTER_MCP],[postfiltering factor for mel-cepstral (default=1.4)]) # Experiment with it if using MGC

AC_ARG_VAR([PSTFILTER_LSP],[postfiltering factor for LSP (default=0.7)]) # Experiment with it if using LSP

AC_ARG_VAR([IMPLEN],[length of impulse response (default=4096 for cepstral form, default=576 for LSP form)]) # If you change FFTLEN above, try 2*FFTLEN here


I assume generating hts training config files from scratch with adjustments to configure.ac would handle feature stream sizes etc in htk properly. If not, you'd need to inspect the generated config files for data extraction and training and fix stuff manually. And, you'll need to start with 16KHz recordings of course, extract all features, run training from scratch and test.


Good luck!


On Sun, Dec 30, 2018 at 9:24 AM Ericson Sarmento <ericsonsarmento@xxxxxxxxx> wrote:

Em Dom, 30 de dez de 2018 14:16, ljh_dev <ljh_dev@xxxxxxx escreveu:
Could you give me a correct parameter sets for 16K?





At 2018-12-31 01:06:52, "Ericson Sarmento" <ericsonsarmento@xxxxxxxxx> wrote:
You need to change other parameters like warping frequencie factor. Just review all the configure script 

Em Dom, 30 de dez de 2018 13:30, ljh_dev <ljh_dev@xxxxxxx escreveu:
In HTS-demo_CMU-ARCTIC-SLT dir,  I add a sample frequence parameter to run ./configure as:
./configure SAMPFREQ=16000 --with-fest-search-path=/usr/local/festival/examples \
                 --with-sptk-search-path=/usr/local/SPTK-3.9/bin \
                 --with-hts-search-path=/usr/local/HTS-2.3_for_HTK-3.4.1/bin \
--with-hts-engine-search-path=/usr/local/hts_engine_API-1.10/bin

I only changed SAMPFREQ=16000 parameters,I don't know how to setting other parameters.
Could you give me a correct parameter sets example for 16K?






At 2018-12-31 00:24:49, "Oytun Turk" <oytunturk@xxxxxxxxx> wrote:
What are your training settings at 16KHz?
You'll need to reduce the spectral feature size (MGC or LSF order depending on what you are using) and to make sure skip & window sizes are adjusted correctly.
You may need to change other parameters like spectral warping and any spectral post-filtering/enhancement related settings.
Then, of course, you'll have to extract corresponding acoustic features and retrain the voice.


On Sun, Dec 30, 2018 at 8:14 AM ljh_dev <ljh_dev@xxxxxxx> wrote:
I mean that the default sample frequence is 48k in HTS-demo_CMU-ARCTIC-SLT, but I  do intend to train a  16K model.
So how do I train a 16K model? How to configure  parameters HTS-demo_CMU-ARCTIC-SLT?





在 2018-12-30 18:35:15,"Ghost Gloomy" <gloomyghostbit@xxxxxxxxx> 写道:
Yes, You need change it in 48k

ljh_dev <ljh_dev@xxxxxxx> 于2018年12月29日周六 下午1:38写道:
 Hi,
According HTS document I have  trained successfully a 48K model(.htsvoice, and flite+hts_engline had synthesised a success voice),
but when I trained a 16K model, the synthesised voice is is completely wrong.
In HTS-demo_CMU-ARCTIC-SLT dir,  I add a sample frequence parameter to run ./configure as:
./configure SAMPFREQ=16000 --with-fest-search-path=/usr/local/festival/examples \
                 --with-sptk-search-path=/usr/local/SPTK-3.9/bin \
                 --with-hts-search-path=/usr/local/HTS-2.3_for_HTK-3.4.1/bin \
                 --with-hts-engine-search-path=/usr/local/hts_engine_API-1.10/bin


Does it need changed other parameters?






 



 



 



 



 



 



 



 


Follow-Ups
[hts-users:04641] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., ljh_dev
References
[hts-users:04627] HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., ljh_dev
[hts-users:04628] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., Ghost Gloomy
[hts-users:04629] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., ljh_dev
[hts-users:04630] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., Oytun Turk
[hts-users:04631] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., ljh_dev
[hts-users:04632] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., Ericson Sarmento
[hts-users:04634] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., ljh_dev
[hts-users:04635] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., Ericson Sarmento
[hts-users:04636] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., Oytun Turk
[hts-users:04637] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., ljh_dev
[hts-users:04638] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., Ghost Gloomy
[hts-users:04639] Re: HTS-demo_CMU-ARCTIC-SLT,config SAMPFREQ=16000 to train, use generated .htsvoice model but the synthesised voice is is completely wrong., ljh_dev