[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04152] Re: Bug Report on ApplyWindow Function in HMGenS


Hi Xingyu,

This issue is certainly even more of a problem for lf0 trajectories than for other speech parameter trajectories. I didn't quite understand what you said about the treatment being heuristic or theoretically correct. So far I haven't come across any approach (apart from perhaps marginalization) that I'd really call well justified theoretically, but I'm not sure that's what you were claiming.

The issue I was saying with Yang's proposed change was that, as far as I could tell, it doesn't change any part of the Cholesky generation procedure, including the forwards and backwards substitutions.

Do you happen to have a reference for some of the work you mentioned by Zen and others?

Thanks,

Matt


On 20/10/14 12:30, asr.naxingyu@xxxxxxxxx wrote:
Hi,

I remember there was work, initially introduced by Heiga Zen, arguing
not only ends, but all the boundary effect for calculating dynamics for
logarithmic pitch. As Matt pointed, the key is to keep the strategy
identical in training and synthesis.

In the previous versions of HTS/hts_engine, the principle in calculating
dynamics and the one used for recovering pitch trajectory from voiced
segments do differ, resulting in more pitch distortion at boundaries.
 From this point of view, I don't think the treatment should be
heuristic. They should be theoretically correct. However, there might be
better solutions dealing with the end effects, such as some tricks when
implementing the forward and backward substitution. What Yang proposed
is among those.

Xingyu Na

----- Reply message -----
From: "Matt Shannon" <sms46@xxxxxxxxx>
To: <hts-users@xxxxxxxxxxxxxxx>
Subject: [hts-users:04146] Re: Bug Report on ApplyWindow Function in HMGenS
Date: Sat, Oct 18, 2014 04:09


Hi Yang,

I'm not from the HTS working group but I am very interested in this
(admittedly minor!) issue.

It's a complicated and subtle issue exactly what should be done about
"end effects" like you mention.  There are a number of different
approaches that one could use.  For example you could: assume the
unobserved frames in the original trajectory before windowing are zero
("input-zero windows"); assume the unobserved frames in the original
trajectory are some fixed specified value ("input-specified windows");
assume any frames in the windowed trajectory that can't be computed
exactly are zero ("output-zero windows"); use different window
coefficients for the first few and last few frames of the trajectory,
for example using 0 -1 1 instead of -0.5 0 0.5 to compute the first
frame of the delta trajectory; or you could even treat the unobserved
frames as unobserved and marginalize over them probabilistically.  In
case it's of interest, I discuss these possibilities briefly in my PhD
thesis in the context of the trajectory HMM (Appendix B,
http://mi.eng.cam.ac.uk/~sms46/papers/shannon2014probabilistic-thesis.pdf).

As you mention in your very well-presented analysis, almost all of these
approaches are somewhat "heuristic".  The approach you suggest and the
one used by window.pl is effectively to use different window
coefficients for the first few and last few frames, for example using 0
-0.5 0.5 instead of -0.5 0 0.5 to compute the first frame of the delta
trajectory.

The principle I feel like you're arguing for is that, whichever of these
heuristic procedures is used for computing the delta trajectory at
training time, the same procedure should be used at synthesis time.
This seems very reasonable to me.  For standard generation (without GV)
this means that the trajectory generated at synthesis time is should be
the one which, when the corresponding delta and delta-delta trajectories
are computed using the procedure used at training time, yields the
highest probability under the trained model.

However the patch you included does not implement this.  Indeed it does
not affect the generated trajectory at all (at least for standard and GV
generation; I think it might for EM-based generation), but rather only
affects the log probability printed in the log file.  To properly
implement the change you suggest I think would require also changing
SetupPdfStreams and UpdatePdfStreams.  (Alternatively consistency could
also be achieved by modifying window.pl to implement the current HGen
behaviour).

I hope my reasoning makes sense.  I do feel like finding a good solution
for end effects is annoyingly difficult given what an apparently small
issue it is in practice!

Cheers,

Matt


On 17/10/14 09:34, Yang Wang wrote:
 > HTS working group,
 >
 >     Thank you for your excellent work on HTS.
 >
 >     I find a moderate bug in ApplyWindow() in HMGenS V2.2 and
 > V2.3-alpha. This bug is about how to calculate dynamics in a time
 > sequence at the first frame and last frame. The current ApplyWindow()
 > implementation produce incorrect observation vector at two ends, which
 > further results in nearly zero probability density values.
 >
 >     My analysis and bug fix are supplied in attachments for your
reference.
 >
 >     Will you please let me know if I am not correct? Thank you!
 >
 > Yang Wang
 > Email: yangwang@xxxxxxxxxxxxx
 >


Follow-Ups
[hts-users:04153] Re: Bug Report on ApplyWindow Function in HMGenS, Xingyu Na
References
[hts-users:04150] [hts-users:04146] Re: Bug Report on ApplyWindow Function in HMGenS, asr.naxingyu@gmail.com