В Вск, 09/08/2009 в 12:35 +0300, Stas пишет: > Javi, > > Thank you for your response. > > How much data (~1000 samples, ~10000 samples, more) may be > sufficient in order to have robust models without HINIT, HEREST? > Can you please provide me more information? What is "robust"? 10 ms average mistake in label boundary, 11 ms, 12 ms? :) Really you either have labels which is good for TTS or don't have them/don't want to create them which is often several percents worse in MOS terms. It takes a lot time to hand-label the db, so it's up to you to decide if you need it. Another solution would be to segment a little part of the DB (say, 30 utterances), bootstrap a models from this part and force align the rest of the db to get more accuracy.
Attachment:
signature.asc
Description: =?koi8-r?q?=FC=D4=C1?==?koi8-r?q?_=DE=C1=D3=D4=D8?= =?koi8-r?q?_=D3=CF=CF=C2=DD=C5=CE=C9=D1?= =?koi8-r?q?_=D0=CF=C4=D0=C9=D3=C1=CE=C1?= =?koi8-r?q?_=C3=C9=C6=D2=CF=D7=CF=CA?= =?koi8-r?q?_=D0=CF=C4=D0=C9=D3=D8=C0?=