[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03257] Re: Regarding to the adaptation part of the demo

Subject: [hts-users:03257] Re: Regarding to the adaptation part of the demo
From: nxy-yzqs@xxxxxxx
Date: Thu, 19 Apr 2012 00:55:35 +0800 (CST)
Delivered-to: hts-users@xxxxxxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Message-ID:In-Reply-To: References:Subject:MIME-Version:Content-Type; bh=juiIKwkmmV4bmLT m6Ys0bo9FAJod4PjwL2x/VyI8i3g=; b=i0bRh6DHPsbi8+LH6789uOsVEK1up69 5ehMnyrBHXTgJR7IHoQxRo9TUBFrpj5W5xVwa1t2jeFkXoYjUi+VYwGV2SWtOZRR rmZUIru3DngVVBqusb1UTCJY0qhccUURvHLvkNhtXT+zGu99ClLPAyy+uBHbDPUo mYBa6fBX0TfE=

Hi,

1~5 is adaptation based on SI model.
6~9 is speaker adaptive training for average voice model
10~13 is adaptation based on average voice model
For reference, please read Dr. Yamagishi's papers on the publication list of HTS website.

在 2012-04-18 20:24:05，"li jay" <lij.acd@xxxxxxxxx> 写道：

Hi,

I want to ask something regarding to adaptation part of HTS-demo_CMU-ARCTIC-ADAPT demo Training.pl script.
I used sentences from several speakers to train a average model, and then used the following parts (1~5) of codes to adapt to specific speaker and generate voices:

1 # HHEd (building regression-class trees for adaptation)
2 # HERest (speaker adaptation (speaker independent))
3 # HERest (speaker adaptation (SI+MLLR+MAP))
4 # HMGenS (generating speech parameter sequences (speaker adapted))

5 # SPTK (synthesizing waveforms (speaker adapted))

The generated adapted voice was ok, but not so good. I want to ask what the following parts (6~9 and 10~13) are for?

6 # HERest (Speaker adaptive training (SAT))
7 # HHEd (making unseen models (SAT))
8 # HMGenS (generating speech parameter sequences (SAT))
9 # SPTK (synthesizing waveforms (SAT))

and

10 # HERest (speaker adaptation (SAT))
11 # HERest (speaker adaptation (SAT+MLLR+MAP))
12 # HMGenS (generate speech parameter sequences (SAT+adaptation))
13 # SPTK (synthesizing waveforms (SAT+adaptation))

They all seem like adaptation and voice generation. What is the difference between them(1~5, 6~9, and 10~13)?

Jay

Follow-Ups
: [hts-users:03260] Re: Regarding to the adaptation part of the demo, li jay

References
: [hts-users:03256] Regarding to the adaptation part of the demo, li jay

Prev by Subject: [hts-users:03256] Regarding to the adaptation part of the demo
Next by Subject: [hts-users:03258] Re: Adaptation without xform in HTS
Previous by thread: [hts-users:03256] Regarding to the adaptation part of the demo
Next by thread: [hts-users:03260] Re: Regarding to the adaptation part of the demo