[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04181] Re: Bug Report on ApplyWindow Function in HMGenS


Hi Yang,

HTS working group will fix this bug in the next version so that the
dynamic features are calculated by the same strategy as window.pl.
Thank you for your report.

Best regards,
Kei


2014-10-22 10:52 GMT+09:00 Xingyu Na <asr.naxingyu@xxxxxxxxx>:
> Hi Matt,
>
> Yes, the generation of lf0 does suffer from hard voicing decision the the
> earlier versions of HTS(I don't know the current one). My point is that, for
> instance, we have a lf0 as
> -1e+10 -1e+10 4.5 4.4 -1e+10 3.9 -1e+10
> composed of 4 unvoiced and 3 voiced frames. The deltas will be calculated as
>     static       delta       delta-delta
>   -1e+10        0                0
>   -1e+10        0                0
>      4.5             0                0
>      4.4             0                0
>   -1e+10        0                0
>      3.9             0                0
>   -1e+10        0                0
> and used for training a MSD-HMM. During synthesis, W is a third-band
> diagonal matrix, which use the same w^t for all frames. The LU decomposition
> and f-b substitutions do not deal with anything as 'space boundary' or
> 'ends' of the trajectory exceptionally. The MLPG for MSD works as if the
> unvoiced frames (or gaps for other streams) do not exist. Instead, what I
> meant for a 'theoretically correct' treatment is to do the deltas assignment
> as
>     static       delta       delta-delta
>   -1e+10        0                0
>   -1e+10        0                0
>      4.5           2.2             -4.6
>      4.4          -0.3             -0.4
>   -1e+10        0                0
>      3.9           -2.2            -3.4
>   -1e+10        0                0
> Then the principles for training and generation of MSD are identical.
> I don't have the reference now. It was years ago when I was working on pitch
> refinement. But the key point is as above.
>
> Regards,
> Xingyu
>
>
> On 10/21/2014 05:44 PM, Matt Shannon wrote:
>>
>> Hi Xingyu,
>>
>> This issue is certainly even more of a problem for lf0 trajectories than
>> for other speech parameter trajectories.  I didn't quite understand what you
>> said about the treatment being heuristic or theoretically correct. So far I
>> haven't come across any approach (apart from perhaps marginalization) that
>> I'd really call well justified theoretically, but I'm not sure that's what
>> you were claiming.
>>
>> The issue I was saying with Yang's proposed change was that, as far as I
>> could tell, it doesn't change any part of the Cholesky generation procedure,
>> including the forwards and backwards substitutions.
>>
>> Do you happen to have a reference for some of the work you mentioned by
>> Zen and others?
>>
>> Thanks,
>>
>> Matt
>>
>>
>> On 20/10/14 12:30, asr.naxingyu@xxxxxxxxx wrote:
>>>
>>> Hi,
>>>
>>> I remember there was work, initially introduced by Heiga Zen, arguing
>>> not only ends, but all the boundary effect for calculating dynamics for
>>> logarithmic pitch. As Matt pointed, the key is to keep the strategy
>>> identical in training and synthesis.
>>>
>>> In the previous versions of HTS/hts_engine, the principle in calculating
>>> dynamics and the one used for recovering pitch trajectory from voiced
>>> segments do differ, resulting in more pitch distortion at boundaries.
>>>  From this point of view, I don't think the treatment should be
>>> heuristic. They should be theoretically correct. However, there might be
>>> better solutions dealing with the end effects, such as some tricks when
>>> implementing the forward and backward substitution. What Yang proposed
>>> is among those.
>>>
>>> Xingyu Na
>>>
>>> ----- Reply message -----
>>> From: "Matt Shannon" <sms46@xxxxxxxxx>
>>> To: <hts-users@xxxxxxxxxxxxxxx>
>>> Subject: [hts-users:04146] Re: Bug Report on ApplyWindow Function in
>>> HMGenS
>>> Date: Sat, Oct 18, 2014 04:09
>>>
>>>
>>> Hi Yang,
>>>
>>> I'm not from the HTS working group but I am very interested in this
>>> (admittedly minor!) issue.
>>>
>>> It's a complicated and subtle issue exactly what should be done about
>>> "end effects" like you mention.  There are a number of different
>>> approaches that one could use.  For example you could: assume the
>>> unobserved frames in the original trajectory before windowing are zero
>>> ("input-zero windows"); assume the unobserved frames in the original
>>> trajectory are some fixed specified value ("input-specified windows");
>>> assume any frames in the windowed trajectory that can't be computed
>>> exactly are zero ("output-zero windows"); use different window
>>> coefficients for the first few and last few frames of the trajectory,
>>> for example using 0 -1 1 instead of -0.5 0 0.5 to compute the first
>>> frame of the delta trajectory; or you could even treat the unobserved
>>> frames as unobserved and marginalize over them probabilistically.  In
>>> case it's of interest, I discuss these possibilities briefly in my PhD
>>> thesis in the context of the trajectory HMM (Appendix B,
>>>
>>> http://mi.eng.cam.ac.uk/~sms46/papers/shannon2014probabilistic-thesis.pdf).
>>>
>>> As you mention in your very well-presented analysis, almost all of these
>>> approaches are somewhat "heuristic".  The approach you suggest and the
>>> one used by window.pl is effectively to use different window
>>> coefficients for the first few and last few frames, for example using 0
>>> -0.5 0.5 instead of -0.5 0 0.5 to compute the first frame of the delta
>>> trajectory.
>>>
>>> The principle I feel like you're arguing for is that, whichever of these
>>> heuristic procedures is used for computing the delta trajectory at
>>> training time, the same procedure should be used at synthesis time.
>>> This seems very reasonable to me.  For standard generation (without GV)
>>> this means that the trajectory generated at synthesis time is should be
>>> the one which, when the corresponding delta and delta-delta trajectories
>>> are computed using the procedure used at training time, yields the
>>> highest probability under the trained model.
>>>
>>> However the patch you included does not implement this.  Indeed it does
>>> not affect the generated trajectory at all (at least for standard and GV
>>> generation; I think it might for EM-based generation), but rather only
>>> affects the log probability printed in the log file.  To properly
>>> implement the change you suggest I think would require also changing
>>> SetupPdfStreams and UpdatePdfStreams.  (Alternatively consistency could
>>> also be achieved by modifying window.pl to implement the current HGen
>>> behaviour).
>>>
>>> I hope my reasoning makes sense.  I do feel like finding a good solution
>>> for end effects is annoyingly difficult given what an apparently small
>>> issue it is in practice!
>>>
>>> Cheers,
>>>
>>> Matt
>>>
>>>
>>> On 17/10/14 09:34, Yang Wang wrote:
>>>  > HTS working group,
>>>  >
>>>  >     Thank you for your excellent work on HTS.
>>>  >
>>>  >     I find a moderate bug in ApplyWindow() in HMGenS V2.2 and
>>>  > V2.3-alpha. This bug is about how to calculate dynamics in a time
>>>  > sequence at the first frame and last frame. The current ApplyWindow()
>>>  > implementation produce incorrect observation vector at two ends, which
>>>  > further results in nearly zero probability density values.
>>>  >
>>>  >     My analysis and bug fix are supplied in attachments for your
>>> reference.
>>>  >
>>>  >     Will you please let me know if I am not correct? Thank you!
>>>  >
>>>  > Yang Wang
>>>  > Email: yangwang@xxxxxxxxxxxxx
>>>  >
>>>
>>
>
>



-- 
-----------------------------------------
Nagoya Institute of Technology
Kei Hashimoto
bonanza@xxxxxxxxxxxxxxx
-----------------------------------------

References
[hts-users:04150] [hts-users:04146] Re: Bug Report on ApplyWindow Function in HMGenS, asr.naxingyu@gmail.com
[hts-users:04152] Re: Bug Report on ApplyWindow Function in HMGenS, Matt Shannon
[hts-users:04153] Re: Bug Report on ApplyWindow Function in HMGenS, Xingyu Na