[hts-users:03873] Re: HTS model and Speech Recognition

Dear all,

I want to convert aperiodicity coefficients (extracted with Straight) to sub-band aperiodicity. I came to the old threadhttp://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00596.html and make the scipts files:

[extract.m]: extracts AP and makes ap file with Straight

addpath('path_to_Straight');
prm.defaultFrameLength = 20;
prm.spectralUpdateInterval = 1;
prm.F0defaultWindowLength = 20; %ms
prm.F0frameUpdateInterval = 1; %ms
fprintf(1,'name_of_audio.wav \n ');
[x,fs]=wavread('demen51-56_0001.wav');
[f0,ap]=exstraightsource(x,fs,prm);
ap=ap';
save 'name_of_ap_file.ap' ap -ascii;

and the 2 shell scripts:

[script 1] converts ap file to bap file:
echo "Converting aperiodicity file demen.ap to band aperiodicity file demen2.bap"; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 64 -s   0 -e 63 -S 0 | ./average -l 64 > bap1; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 64 -s 64 -e 127 -S 0 | ./average -l 64 > bap2; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 128 -s 128 -e 255 -S 0 | ./average -l 128 > bap3; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 128 -s 256 -e 383 -S 0 | ./average -l 128 > bap4; \
   ./x2x +af demen.ap | ./bcp +f -n 512 -L 129 -s 384 -e 512 -S 0 | ./average -l 129 > bap5; \
   ./merge -s 0 -l 1 -L 1 bap1 bap2 | \
   ./merge -s 2 -l 2 -L 1 bap3 | \
   ./merge -s 3 -l 3 -L 1 bap4 | \
   ./merge -s 4 -l 4 -L 1 bap5 > demen2.bap;

[script 2] reconstructs ap file from bap file:
# convert band-aperiodicity to aperiodicity

   ./bcp +f -l 5 -L 1 -s 0 -e 0 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 64 | ./dfs -a 1 -1 > ./ap/demen2.ap1; \
   ./bcp +f -l 5 -L 1 -s 1 -e 1 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 64 | ./dfs -a 1 -1 > ./ap/demen2.ap2; \
   ./bcp +f -l 5 -L 1 -s 2 -e 2 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 128 | ./dfs -a 1 -1 > ./ap/demen2.ap3; \
   ./bcp +f -l 5 -L 1 -s 3 -e 3 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 128 | ./dfs -a 1 -1 > ./ap/demen2.ap4; \
   ./bcp +f -l 5 -L 1 -s 4 -e 4 -S 0 demen2.bap | ./dfs -b 1 -1 | ./interpolate -p 129 | ./dfs -a 1 -1 > ./ap/demen2.ap5; \
   ./merge -s   0 -l 64 -L 64 ./ap/demen2.ap1 ./ap/demen2.ap2 | \
   ./merge -s 128 -l 128 -L 128 ./ap/demen2.ap3 | \
   ./merge -s 256 -l 256 -L 128 ./ap/demen2.ap4 | \
   ./merge -s 384 -l 384 -L 129 ./ap/demen2.ap5 > ./ap/demen2.ap;

The problem is that the ap file and the reconstructed ap file are totally different. I don't know how to solve the problem. I very appreciate your helps.

This is demen.ap file: https://www.dropbox.com/s/3rywu09u29x4kl1/demen.ap

This is demen.wav file: https://www.dropbox.com/s/xjd8outcuyl5jil/demen.wav

2013/11/2 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>

Hi,

In HTS demo scripts, MSD-HSMM (MSD for F0, HSMM for state duration) is trained.
But, HVite/HDecode support only HMM.
So, HMM training is required for recognition.

Regards,
Keiichiro Oura

2013/11/2 Ibrahim Sobh <im_sobh@xxxxxxxxxxx>:

> Hi,
>
> I have trained HMM models using HTS scripts based on lf0, mgc and I have
> added some other features in other streams. I have synthesized speech
> successfully.
>
> *** Can I use the same HMM models for speech recognition task ( using HVite
> and/or HDecode)? How in practical steps?
>
> (For example, using HVite version of HTS is enough?!)
>
> Best Regards
> Sobh

Kind regards,

Thanh-Son PHAN