[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01206] Re: full context LAB files


Hi Simon,

I'm working with Spanish and my database also has word level labels.
I've some language specific tools that I'll try to integrate to build up the full context labels.

Thanks for your suggestions on this.
Regards,
Paco


> Date: Mon, 3 Mar 2008 08:25:02 +0000
> From: Simon.King@xxxxxxxx
> Subject: [hts-users:01200] Re: full context LAB files
> To: hts-users@xxxxxxxxxxxxxxx
>
> Paco Pinto wrote:
> > Hello,
> >
> > I'm trying to train my own voice to use with HTS. I'm following HTS
> > demo procedure.
> > I have:
> > - RAW files with audio
> > - LAB files. Contain phonetic transcription in the format: init_time
> > end_time phoneme_name (similar to the files in data/labels/mono/ dir).
> >
> > In the demo, the full context LAB files are obtained from the
> > CMU_ARTIC database. The basic p1^p2+p3-p4=p5 format is easy to build
> > but the remaining parameters are harder to extract.
>
> What language are you using?
> >
> > Questions:
> > - Is it possible to proceed with the training using only the mono LAB
> > files ?
>
> There are two ways to do this:
>
> 1) just train a system with phonetic factors (i.e. quinphones, as you
> show above) but no prosodic factors; it will not very sound good,
> because it won't 'do' prosody at all - you *will* get some F0 variation,
> but only as predicted by phonetic factors. So, you might get some phrase
> boundary effects because of silence in the left/right context of the
> models, but you will not get stressed syllables etc.
>
> 2) adapt full-context models using your phonetic-labels-only data; I
> have tried this (I was looking at unsupervised adaptation) and it works
> surprisingly well - I haven't written this up yet, but hope to have a
> paper at Interspeech about this, with a tech report available before
> then, and possibly a video of a presentation about it coming soon
>
> > - How is it possible to generate the full context LAB files from the
> > mono LAB files ?
>
> You need a front end in order to generate the prosodic factor variables,
> but a front end takes words as input, not a phone string. Are your data
> labelled with phones but not words? Why? Can you add word labels?
>
> In other words, use Festival (for English, at least) to predict the full
> context labels from words. If you don't have a front end for the
> language you are using, then how are you going to use the models for
> TTS, once they are trained?
>
> Simon
>
>
>


Express yourself instantly with MSN Messenger! MSN Messenger

References
[hts-users:01199] full context LAB files, Paco Pinto
[hts-users:01200] Re: full context LAB files, Simon King