[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04320] Re: Style Adaptation Issue with HTS-ARCTIC_ADAPT demo


I tried CMLLR at first instead of MLLR, but same shriek and muffled sounds were present along with the synthesized speech being lesser like the required(adapted) style/emotion as compared to MLLR which was very strange. Every time i carried out adaptation, it was the same effect even when i tried different speakers and the original speech files did not have these effects. I thought this was due to the fact that i haven't applied the configurations as mentioned in the paper. But even on application of given configurations, i am unable to receive the desired adaptation to particular style or emotion (it seems more like neutral speech than the desired style). I have never tried STRAIGHT, but will give it a shot... Though i read somewhere that Hts converts the high quality params computed by STRAIGHT into native low quality ones, but still i should give it a try, thanks for the help.

Regards,
Ammarah.

> On 15 اکتوبر، 2015, at 6:18 بعد دوپہر, Keiichiro Oura <uratec@xxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> Could you try to use STRAIGHT for analysis and CMLLR for adaptation ?
> 
> Regards,
> Keiichiro Oura
> 
> 
> 
> 2015-10-15 7:41 GMT+09:00 ammarah din <ammarah.din68@xxxxxxxxx>:
>> Hi All,
>>     I have used the speaker adaptation demo package to implement "Style
>> Adaptation" for Anger emotion by following the paper written by Tachibana,
>> Yamagishi, Masuko and Kobayashi: "A Style adaptation technique for speech
>> synthesis using HSMM and supra segmental features".  I have successfully
>> obtained the style adapted voice by training, however it does not sound as
>> emotional as it does when a simple TTS using HSMM is built. Furthermore,
>> there is a sudden shriek sound or muffledness due to which some phones get
>> swallowed and is really unpleasant to listen to. I think this may be due to
>> the FRAMESHIFT and FRAMELEN i am using as its not according to that
>> mentioned in paper. Do these factors affect/disturb synthesis? Can anyone
>> guide me as to why muffled or shriek sounds are present in synthesized
>> speech? How can i improve the system as to increase the style's effect
>> (anger/joy) on natural speech? My configuration is as follows:
>> 
>> ./configure DATASET=uet_ur TRAINSPKR=mar_neu ADAPTSPKR=mar_ang ADAPTHEAD=b0
>> ALLSPKR='mar_neu mar_ang' F0_RANGES='mar_neu 70 625 mar_ang 77 635' GAMMA=0
>> FREQWARP=0.55 FRAMELEN=1440 FRAMESHIFT=288 SAMPFREQ=48000 TRANSKIND=mean
>> MGCOCCTHRESH=1000.0 LF0OCCTHRESH=150.0 ADDMAP=0 USESMAP=FALSE
>> --with-tcl-search-path=/usr/bin
>> --with-fest-search-path=/home/ammarah/Festival_SpeechTools/festival/examples
>> --with-sptk-search-path=/usr/local/SPTK/bin
>> --with-hts-search-path=/usr/local/HTS-2.2beta/bin
>> --with-hts-engine-search-path=/usr/local/bin
>> 
>> Regards,
>> Ammarah.
> 

References
[hts-users:04318] Style Adaptation Issue with HTS-ARCTIC_ADAPT demo, ammarah din
[hts-users:04319] Re: Style Adaptation Issue with HTS-ARCTIC_ADAPT demo, Keiichiro Oura