[hts-users:02667] Re: hts_engine and HMGenS output diffrences

Subject: [hts-users:02667] Re: hts_engine and HMGenS output diffrences

Date: Wed, 29 Dec 2010 22:30:04 +0800 (CST)

Delivered-to: hts-users@xxxxxxxxxxxxxxx

yes, I mean the engine GV option.

If the pdf and tree are converted from the mmf file without any change, and the param.gen. algorithms are the same, I think you might get a satisfied result.

If your model dont have any sil or pau, there wont be any non-speech frames in your gen. voice. In engine, GV will be calculated through the whole utt including sil and pau. I am not familiar with the festival+engine framework so I couldnt say.

Na Xingyu

Beijing Institute of Technology

Thanks Na Xingyu,
yes, i use the default parameter generation method in HMGenS (Cholesky decomposition-based parameter generation) and i know engine doesn't have this method. also, i used "forced alignment using WFST for no-silent GV" in training part and my trees and pdfs haven't any sil or pau label.
i use hts_engine 1.01 in the festival i think that should use the version 1.02 that support
GV without silent and pause. also, i should change the Forward-Backward substitution in to Cholesky. do you think any part that maybe causes difference?
and i don't know which GV switch you mean? if you mean hts_engine GV switch i think festival turn it on by default.

Thanks a lot,
-Ali

2010/12/29 那兴宇 <nxy-yzqs@xxxxxxx>

In the engine param. gen, LU factorization and forward/backward substitution are used to generate the acoustic parameters dim-by-dim.But in HMGenS, as I remember, there are three different param.gen. algorithms. I dont know which one did you use, and I wonder if the difference was caused by this.

However, as you said, the durations of the generated waveforms were different either. This has nothing to do with the param.gen. In engine, duration was determined by the HTS_Engine_create_sstream function. If the pdf and tree of duration are same, the duration should be the same. Did the engine model lack of some states, e.g. sil or pau?

BTW, did you turn on GV?

--

Na Xingyu

Beijing Institute of Technology

At 2010-12-29 13:33:47，"ali azimizadeh" <ali.azimizadeh@xxxxxxxxx> wrote:

No, the unseen-model isn't my problem. the main problem is output quality. look, i get a same input (unseen data) to the hts_engine and HMGenS. the output parameters are lf0, bap, mgc files. i use these files to synthesize the waveform by STRAIGHT method.
finally, i have two different .wav files that both of the are correct. but, the hts_engine output quality is less than HMGenS one. the mmf output frame size is 560 but the hts_engine one is 469. the hts_engine output has distortion. but note that i didn't use hts_engine perfectly, just have used until HTS_Engine_create_pstream(&engine);

I think the differences are little but they are effective on waveform. if you have any suggestion please send me.

Best Regards,
-Ali

2010/12/29 那兴宇 <nxy-yzqs@xxxxxxx>

Hi,

For mmf model, there might be an unseen-model problem which could be solved by the decision tree structure in hts_engine.
Is your test utt from inside or outside the training data or a pre-defined gen data?

--

Na Xingyu

Beijing Institute of Technology

At 2010-12-29 06:04:09，"ali azimizadeh" <ali.azimizadeh@xxxxxxxxx> wrote:

Hi,

I have a model in both hts_engine and mmf versions that trained by HTS2.1.1beta Adapt. i tested models by the same utterance and compared their outputs (sp, lf0 and bap). unfortunately, the frame size are not the same in outputs.

Is this difference normal? Does the initial parameters (like frame rate,...) effect on this difference?

finally, can i expect the same results from hts_engine models and mmf ones?

Best Regards,

-Ali