[hts-users:03260] Re: Regarding to the adaptation part of the demo
- Subject: [hts-users:03260] Re: Regarding to the adaptation part of the demo
- From: li jay <lij.acd@xxxxxxxxx>
- Date: Fri, 20 Apr 2012 00:03:00 +0800
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OQvjs+Bazom+KgYbOrtSkJQBQBZYdi17icTdJNcn1+c=; b=HtR9qkMccnlMkIMvCNbUiyJ0eSgV8X0nhr1rasIknM9EknqI705W/7ppV1qkW6ZoN7 T9m6XUVaNOCQXemY3ogBAwhQvdzMwi+7lWFF5leZG4tCj3VMS0a49FhwH8tTlB4Gchaa gGQSCz94+ZSR8EooEYFRH8kdmMymry3GOIsqHYtD0w2PvIrXyhyzBxdOgHxycnHyuQz7 qECwfhC6JiuaUj22CUTVbadcqq4Qj2oUNqm4+ezCAJ2CN+LhNvSAT5eB8zunc44JWs7X v/XFOdxm+LEKVprOz1pJcSt+E1oeHBv+3zSBaOmrRRc9asiEuFkqTFJPD4XfVAZQphHX BQMA==
Hi,
Thank you for telling me this. I read the paper of Dr. Yamagishi, and I realized that speaker independent training was a conventional way and speaker adaptive training (SAT) was a new one. I think that's why my generated adapted voice was not so good.
But there is something still confusing me:
My training process used to train the average model (I call it the basic training step) was:
# preparing environments
# HCompV (computing variance floors)
# HInit & HRest (initialization & reestimation)
# HHEd (making a monophone mmf)
# HERest (embedded reestimation (monophone))
# HHEd (copying monophone mmf to fullcontext one)
# HERest (embedded reestimation (fullcontext))
# HHEd (tree-based context clustering)
# HERest (embedded reestimation (clustered))
# HHEd (untying the parameter sharing structure)
# HERest (embedded reestimation (untied))
# HHEd (tree-based context clustering)
# HERest (embedded reestimation (re-clustered))
# HFst (forced alignment using WFST for no-silent GV)
, which are in the order of what the script do, and they work. After running script above, I performed 1~5 to generate the SI model adapted voice.
As to generating SAT model adapted voice, should I still run the script above (the basic training step), then use the script below?
1 # HHEd (building regression-class trees for adaptation) and 6~9 and 10~13
Why I am confusing is because based on the paper of Dr. Yamagishi, it seems like that the training part of average voice model is different from the conventional one. I don't know if the script (the basic training step) is suitable or not.
Jay
<nxy-yzqs@xxxxxxx> 於 2012年4月19日上午12:55 ?道:
Hi,
1~5 is adaptation based on SI model.
6~9 is speaker adaptive training for average voice model
10~13 is adaptation based on average voice model
For reference, please read Dr. Yamagishi's papers on the publication list of HTS website.
在 2012-04-18 20:24:05,"li jay" <lij.acd@xxxxxxxxx> 写道:
Hi,
I want to ask something regarding to adaptation part of HTS-demo_CMU-ARCTIC-ADAPT demo Training.pl script.
I used sentences from several speakers to train a average model, and then used the following parts (1~5) of codes to adapt to specific speaker and generate voices:
1 # HHEd (building regression-class trees for adaptation)
2 # HERest (speaker adaptation (speaker independent))
3 # HERest (speaker adaptation (SI+MLLR+MAP))
4 # HMGenS (generating speech parameter sequences (speaker adapted))
5 # SPTK (synthesizing waveforms (speaker adapted))
The generated adapted voice was ok, but not so good. I want to ask what the following parts (6~9 and 10~13) are for?
6 # HERest (Speaker adaptive training (SAT))
7 # HHEd (making unseen models (SAT))
8 # HMGenS (generating speech parameter sequences (SAT))
9 # SPTK (synthesizing waveforms (SAT))
and
10 # HERest (speaker adaptation (SAT))
11 # HERest (speaker adaptation (SAT+MLLR+MAP))
12 # HMGenS (generate speech parameter sequences (SAT+adaptation))
13 # SPTK (synthesizing waveforms (SAT+adaptation))
They all seem like adaptation and voice generation. What is the difference between them(1~5, 6~9, and 10~13)?
Jay
- Follow-Ups
-
- [hts-users:03261] Re: Regarding to the adaptation part of the demo, 那兴宇
- References
-
- [hts-users:03256] Regarding to the adaptation part of the demo, li jay
- [hts-users:03257] Re: Regarding to the adaptation part of the demo, nxy-yzqs