[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03873] Re: HTS model and Speech Recognition


Dear all,
I want to convert aperiodicity coefficients (extracted with Straight) to sub-band aperiodicity. I came to the old threadhttp://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00596.html and make the scipts files:

[extract.m]: extracts AP and makes ap file with Straight
addpath('path_to_Straight');
prm.defaultFrameLength = 20; 
prm.spectralUpdateInterval = 1; 
prm.F0defaultWindowLength = 20; %ms
prm.F0frameUpdateInterval = 1; %ms
fprintf(1,'name_of_audio.wav \n ');
[x,fs]=wavread('demen51-56_0001.wav');
[f0,ap]=exstraightsource(x,fs,prm);
ap=ap';
save 'name_of_ap_file.ap' ap -ascii;

and the 2 shell scripts:
[script 1] converts ap file to bap file:
echo "Converting aperiodicity file demen.ap to band aperiodicity file demen2.bap"; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L  64 -s   0 -e  63 -S 0 | ./average -l  64 > bap1; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L  64 -s  64 -e 127 -S 0 | ./average -l  64 > bap2; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 128 -s 128 -e 255 -S 0 | ./average -l 128 > bap3; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 128 -s 256 -e 383 -S 0 | ./average -l 128 > bap4; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 129 -s 384 -e 512 -S 0 | ./average -l 129 > bap5; \
   ./merge -s 0 -l 1 -L 1 bap1 bap2 | \
   ./merge -s 2 -l 2 -L 1 bap3 | \
   ./merge -s 3 -l 3 -L 1 bap4 | \
   ./merge -s 4 -l 4 -L 1 bap5 > demen2.bap;

[script 2] reconstructs ap file from bap file:
# convert band-aperiodicity to aperiodicity

   ./bcp +f -l 5 -L 1 -s 0 -e 0 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p  64 | ./dfs -a 1 -1 > ./ap/demen2.ap1; \
   ./bcp +f -l 5 -L 1 -s 1 -e 1 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p  64 | ./dfs -a 1 -1 > ./ap/demen2.ap2; \
   ./bcp +f -l 5 -L 1 -s 2 -e 2 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 128 | ./dfs -a 1 -1 > ./ap/demen2.ap3; \
   ./bcp +f -l 5 -L 1 -s 3 -e 3 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 128 | ./dfs -a 1 -1 > ./ap/demen2.ap4; \
   ./bcp +f -l 5 -L 1 -s 4 -e 4 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 129 | ./dfs -a 1 -1 > ./ap/demen2.ap5; \
   ./merge -s   0 -l  64 -L  64 ./ap/demen2.ap1 ./ap/demen2.ap2 | \
   ./merge -s 128 -l 128 -L 128 ./ap/demen2.ap3 | \
   ./merge -s 256 -l 256 -L 128 ./ap/demen2.ap4 | \
   ./merge -s 384 -l 384 -L 129 ./ap/demen2.ap5 > ./ap/demen2.ap; 

The problem is that the ap file and the reconstructed ap file are totally different. I don't know how to solve the problem. I very appreciate your helps. 

This is demen.ap file: https://www.dropbox.com/s/3rywu09u29x4kl1/demen.ap
This is demen.wav file: https://www.dropbox.com/s/xjd8outcuyl5jil/demen.wav


2013/11/2 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
Hi,

In HTS demo scripts, MSD-HSMM (MSD for F0, HSMM for state duration) is trained.
But, HVite/HDecode support only HMM.
So, HMM training is required for recognition.

Regards,
Keiichiro Oura


2013/11/2 Ibrahim Sobh <im_sobh@xxxxxxxxxxx>:
> Hi,
>
> I have trained HMM models using HTS scripts based on lf0, mgc and I have
> added some other features in other streams. I have synthesized speech
> successfully.
>
> *** Can I use the same HMM models for speech recognition task ( using HVite
> and/or HDecode)? How in practical steps?
>
>  (For example, using HVite version of HTS is enough?!)
>
> Best Regards
> Sobh




--
Kind regards,
Thanh-Son PHAN

References
[hts-users:03853] objective evaluation, Hea Young Park
[hts-users:03857] Re: objective evaluation, Matt Shannon
[hts-users:03858] Re: objective evaluation, Hea Young Park
[hts-users:03871] HTS model and Speech Recognition, Ibrahim Sobh
[hts-users:03872] Re: HTS model and Speech Recognition, Keiichiro Oura