[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03740] Problems in the HTS 2.3alpha implementation of the LSP postfilter


Hi

I have found a few problems in the Perl implementation of postfiltering_lsp in the Training.pl script of the HTS 2.3alpha SLT demo:
http://hts.sp.nitech.ac.jp/archives/2.3alpha/HTS-demo_CMU-ARCTIC-SLT.tar.bz2

I noticed my machine spending an unusual amount of time on the SPTK-based synthesis in the $WGEN1 branch, and especially the mgc2mgc routine and found two instances of the following error in postfiltering_lsp:

-   $line .= "$MGC2MGC -m " . ( $ordr{'mgc'} - 1 ) . " -a $fw -c $gm -n -u -M " . ( $fl - 1 ) . " -A 0.0 -G 1.0 | ";
+   $line .= "$MGC2MGC -m " . ( $ordr{'mgc'} - 1 ) . " -a $fw -c $gm -n -u -M " . ( $ordr{'mgc'} - 1 ) . " -A $fw -C $gm | ";

I have attached a patch to fix these two instances. However, the patch does not resolve all the issues. I now get the following errors:

Synthesizing a speech waveform from arctic_a0005.mgc and arctic_a0005.lf0...[No. 1] is unstable frame
[No. 2] is unstable frame
[No. 3] is unstable frame
[No. 4] is unstable frame
...
for all frames and then a stream of
x2x : warning: input data is over the range of type 'short'!
x2x : warning: input data is over the range of type 'short'!
x2x : warning: input data is over the range of type 'short'!
x2x : warning: input data is over the range of type 'short'!

The resultant synthesised waveforms are just noise (and I have applied this fix too: http://hts.sp.nitech.ac.jp/hts-users/spool/2012/msg00407.html). If the postfiltering routine is bypassed, the synthesised waveforms sound fine. The hts_engine-based synthesis in the $ENGIN branch, on the other hand, works perfectly. I also tested the voice manually with hts_engine using various postfiltering coefficients and everything sounds as expected, without any errors or warnings.

I replaced the packaged 48 kHz waveforms with the original 32 kHz versions from http://festvox.org/cmu_arctic/dbs_slt.html, with the appropriate settings and FRAMESHIFT=160. The vocoder configuration I am using is MGCORDER=24 GAMMA=2 FREQWARP=0.55 LNGAIN=1 PSTFILTER_LSP=0.7.

I am using SPTK 3.6, hts_engine 1.07 and HTS 2.3alpha compiled to 64-bit binaries on OS X 10.8.3.

I plan on testing with GAMMA=1 FREQWARP=0.0 soon and will report back.

Regards
Pieter

Attachment: Training.pl.diff
Description: Binary data


Follow-Ups
[hts-users:03741] Re: Problems in the HTS 2.3alpha implementation of the LSP postfilter, Pieter Scholtz