[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01565] Using multi-dimensional MSD streams - HTS Engine v1.0


Hello,

I am trying to make use of multi-dimensional MSD streams. Streams are of dimension 2 or more, and have dynamic features.

I was able to train models using HTS 2.1, namely a 17-dimensional stream with deltas and delta-deltas. The model correctly states I have 51 coefficients, and each pdf is represented as shown below:

17 (mean static)

17 (var. static)

2 (weights static)

17 (mean deltas)

17 (var. deltas)

2 (weights deltas)

17 (mean ddeltas)

17 (var. ddeltas)

2 (weight ddeltas)

With HTS Engine, I have noticed a few things:

1. When I try to load them into the engine, though, I've got failures. The engine is trying to read models as 51 * (mean, var., weights*2), basically as if it was made of 51 1-dimensional streams. I was able to read the models by patching functions in the engine:

    * in HTS_ModelSet_load_parameter, I switched the calls to HTS_Stream_load_pdf_and_tree and to HTS_Stream_load_dynamic_window. The latter is called first, so that when we load the pdf, the number of dynamic windows is known.

    * I modified the signature of HTS_Model_load_pdf, adding the number of windows as a parameter. This affects functions HTS_Stream_load_pdf_and_tree, and HTS_Stream_load_pdf

    * I modified the function HTS_Model_load_pdf, to read models according to the format above.

I did check that this does not affect standard log-F0 generation.

2. There is a typo in HTS_PStreamSet_create: for MSD models on voiced/unvoiced boundaries, pst->sm.ivar is set to 0 for index k, instead of m. This goes unnoticed as k and m are equal if dimensionality of the stream is 1.

3. The last point is rather a feature request for a future API: the HTS_GStreamSet_create both outputs generated coefficients and vocoded speech. If the MSD stream dimensionality is more than 1, the function returns an error, and user cannot access coefficients (at least in the GStream structures). I believe it would be worth uncoupling the 2 functionalities. For the time being, I added a boolean "vocode" argument to the function. If vocode is 0, the function returns just before the checks for vocoding.

I believe that the new API is a great step forward in supporting multi-dimensional MSD streams. You guys did a great job, these were just my 2 cents :-)

Any comments or suggestions are welcome.

Best regards,

Geoffrey