[hts-users:02060] Generated speech quality
- Subject: [hts-users:02060] Generated speech quality
- From: Girish Malkarnenkar <girish1m@xxxxxxxxx>
- Date: Thu, 2 Jul 2009 16:34:40 +0200
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:date:message-id :subject:from:to:content-type; bh=K6BQo8R+GWKLZKTMkDz5DqwCO3zZW/IO++E7Lvr8HTQ=; b=T8T4k3xcP3mUKR+Kf519/lswKF6yzkZClI+9e8pfxzQ6FBx2UIZ3AgXJSm+A0Y6hvR NmNOTpyr20P6Cizv0/U4wWkmgSK8UpiZky+JFKNqDT8bzktdmmDtvoJllxMOcbW0F2eE kE5w9xpHWMeuBz3rNzUgjSYAegpEDXbbvVK2U=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:date:message-id:subject:from:to:content-type; b=ILlCQ2BUDb9lE6LEI8hc82V2+NXAboabajm0LsN0JMrnULS+stWMzfAGfcnAfQd9im MSQbXtz3OaDwF/oMF4qjQbPWPJbU0ev1BVIK7124wUsg8XRosSnYozRt3qvl2nCxRY9a XCdnhbqnBbfITRfWOYYpZmySSdbgIQ4fZ7YpU=
Dear Sir/Madam,
I am trying to use HTS for German speech synthesis. After running the cmu_arctic speaker dependent demo successfully, I replaced the raw files with my own. However for creating utterance files, I had only .segment files. I used dummy files for the remaining 5 files ie. word,phrase,target,syllable and intevent. I was able to run training.pl after a few modifications. However the final speech is very poor in quality and is almost entirely unvoiced. My doubt is whether the reason it is thus is because of the inadequate labelling leading to insufficient information in the utt files. And if so, then can anyone tell me the format of the remaining 5 files (word,phrase,target,syllable and intevent) and which of them is important for the pronunciation? Also although I have found information in the mailing list archives about the use of each of these files there wasn't any format detailed. It would be extremely helpful if someone could send me the correct format of each of these files via a sample example. My current utt file looks like this:
EST_File utterance
DataType ascii
version 2
EST_Header_End
Features max_id 66 ; type Text ; iform nil ; fileid "1" ;
Stream_Items
1 id _1 ; end 0.1 ; name _ ;
2 id _2 ; end 0.225699 ; name eh ;
3 id _3 ; end 0.26798 ; name d ;
4 id _4 ; end 0.336543 ; name @ ;
5 id _5 ; end 0.506808 ; name k ;
6 id _6 ; end 0.555945 ; name n ;
7 id _7 ; end 0.666789 ; name eh ;
8 id _8 ; end 0.734209 ; name f ;
9 id _9 ; end 0.8142 ; name d ;
10 id _10 ; end 0.878192 ; name eh ;
11 id _11 ; end 0.907903 ; name @R ;
12 id _12 ; end 1.00046 ; name b ;
13 id _13 ; end 1.16443 ; name eh ;
14 id _14 ; end 1.27642 ; name SS ;
15 id _15 ; end 1.35527 ; name @ ;
16 id _16 ; end 1.38612 ; name gl ;
17 id _17 ; end 1.43983 ; name U ;
18 id _18 ; end 1.50039 ; name n ;
19 id _19 ; end 1.5621 ; name t ;
20 id _20 ; end 1.59752 ; name gl ;
21 id _21 ; end 1.69694 ; name eh ;
22 id _22 ; end 1.7735 ; name m ;
23 id _23 ; end 1.82378 ; name ih ;
24 id _24 ; end 1.8811 ; name l ;
25 id _25 ; end 2.04451 ; name k ;
26 id _26 ; end 2.14507 ; name eh ;
27 id _27 ; end 2.21249 ; name l ;
28 id _28 ; end 2.26277 ; name d ;
29 id _29 ; end 2.33362 ; name eh ;
30 id _30 ; end 2.37933 ; name @R ;
31 id _31 ; end 2.43304 ; name gl ;
32 id _32 ; end 2.54627 ; name eh ;
33 id _33 ; end 2.61826 ; name v ;
34 id _34 ; end 2.67425 ; name I ;
35 id _35 ; end 2.73939 ; name C ;
36 id _36 ; end 2.85366 ; name f ;
37 id _37 ; end 2.91994 ; name E ;
38 id _38 ; end 3.02964 ; name SS ;
39 id _39 ; end 3.14059 ; name @ ;
40 id _40 ; end 3.19316 ; name gl ;
41 id _41 ; end 3.28 ; name E ;
42 id _42 ; end 3.38171 ; name f ;
43 id _43 ; end 3.46284 ; name n ;
44 id _44 ; end 3.51426 ; name gl ;
45 id _45 ; end 3.62078 ; name E ;
46 id _46 ; end 3.68021 ; name d ;
47 id _47 ; end 3.78534 ; name ih ;
48 id _48 ; end 3.83904 ; name gl ;
49 id _49 ; end 3.92132 ; name E ;
50 id _50 ; end 3.98646 ; name g ;
51 id _51 ; end 4.07931 ; name @R ;
52 id _52 ; end 4.13302 ; name s ;
53 id _53 ; end 4.22215 ; name b ;
54 id _54 ; end 4.35471 ; name aI ;
55 id _55 ; end 4.40318 ; name gl ;
56 id _56 ; end 4.50832 ; name E ;
57 id _57 ; end 4.58259 ; name b ;
58 id _58 ; end 4.67972 ; name @ ;
59 id _59 ; end 4.73522 ; name gl ;
60 id _60 ; end 4.82778 ; name E ;
61 id _61 ; end 4.94776 ; name C ;
62 id _62 ; end 5.04672 ; name t ;
63 id _63 ; end 5.10842 ; name n ;
64 id _64 ; end 5.28897 ; name ah ;
65 id _65 ; end 5.43639 ; name X ;
66 id _66 ; end 5.53639 ; name _ ;
End_of_Stream_Items
Relations
Relation Phrase ; filename festival/relations//Phrase/1.Phrase ;
End_of_Relation
Relation Word ; filename festival/relations//Word/1.Word ;
End_of_Relation
Relation Syllable ; filename festival/relations//Syllable/1.Syllable ;
End_of_Relation
Relation Segment ; filename festival/relations//Segment/1.Segment ;
66 66 0 0 0 65
65 65 0 0 66 64
64 64 0 0 65 63
63 63 0 0 64 62
62 62 0 0 63 61
61 61 0 0 62 60
60 60 0 0 61 59
59 59 0 0 60 58
58 58 0 0 59 57
57 57 0 0 58 56
56 56 0 0 57 55
55 55 0 0 56 54
54 54 0 0 55 53
53 53 0 0 54 52
52 52 0 0 53 51
51 51 0 0 52 50
50 50 0 0 51 49
49 49 0 0 50 48
48 48 0 0 49 47
47 47 0 0 48 46
46 46 0 0 47 45
45 45 0 0 46 44
44 44 0 0 45 43
43 43 0 0 44 42
42 42 0 0 43 41
41 41 0 0 42 40
40 40 0 0 41 39
39 39 0 0 40 38
38 38 0 0 39 37
37 37 0 0 38 36
36 36 0 0 37 35
35 35 0 0 36 34
34 34 0 0 35 33
33 33 0 0 34 32
32 32 0 0 33 31
31 31 0 0 32 30
30 30 0 0 31 29
29 29 0 0 30 28
28 28 0 0 29 27
27 27 0 0 28 26
26 26 0 0 27 25
25 25 0 0 26 24
24 24 0 0 25 23
23 23 0 0 24 22
22 22 0 0 23 21
21 21 0 0 22 20
20 20 0 0 21 19
19 19 0 0 20 18
18 18 0 0 19 17
17 17 0 0 18 16
16 16 0 0 17 15
15 15 0 0 16 14
14 14 0 0 15 13
13 13 0 0 14 12
12 12 0 0 13 11
11 11 0 0 12 10
10 10 0 0 11 9
9 9 0 0 10 8
8 8 0 0 9 7
7 7 0 0 8 6
6 6 0 0 7 5
5 5 0 0 6 4
4 4 0 0 5 3
3 3 0 0 4 2
2 2 0 0 3 1
1 1 0 0 2 0
End_of_Relation
Relation IntEvent ; filename festival/relations//IntEvent/1.IntEvent ;
End_of_Relation
Relation Target ; filename festival/relations//Target/1.Target ;
66 66 0 0 0 65
65 65 0 0 66 64
64 64 0 0 65 63
63 63 0 0 64 62
62 62 0 0 63 61
61 61 0 0 62 60
60 60 0 0 61 59
59 59 0 0 60 58
58 58 0 0 59 57
57 57 0 0 58 56
56 56 0 0 57 55
55 55 0 0 56 54
54 54 0 0 55 53
53 53 0 0 54 52
52 52 0 0 53 51
51 51 0 0 52 50
50 50 0 0 51 49
49 49 0 0 50 48
48 48 0 0 49 47
47 47 0 0 48 46
46 46 0 0 47 45
45 45 0 0 46 44
44 44 0 0 45 43
43 43 0 0 44 42
42 42 0 0 43 41
41 41 0 0 42 40
40 40 0 0 41 39
39 39 0 0 40 38
38 38 0 0 39 37
37 37 0 0 38 36
36 36 0 0 37 35
35 35 0 0 36 34
34 34 0 0 35 33
33 33 0 0 34 32
32 32 0 0 33 31
31 31 0 0 32 30
30 30 0 0 31 29
29 29 0 0 30 28
28 28 0 0 29 27
27 27 0 0 28 26
26 26 0 0 27 25
25 25 0 0 26 24
24 24 0 0 25 23
23 23 0 0 24 22
22 22 0 0 23 21
21 21 0 0 22 20
20 20 0 0 21 19
19 19 0 0 20 18
18 18 0 0 19 17
17 17 0 0 18 16
16 16 0 0 17 15
15 15 0 0 16 14
14 14 0 0 15 13
13 13 0 0 14 12
12 12 0 0 13 11
11 11 0 0 12 10
10 10 0 0 11 9
9 9 0 0 10 8
8 8 0 0 9 7
7 7 0 0 8 6
6 6 0 0 7 5
5 5 0 0 6 4
4 4 0 0 5 3
3 3 0 0 4 2
2 2 0 0 3 1
1 1 0 0 2 0
End_of_Relation
Relation SylStructure ; ()
End_of_Relation
Relation Intonation ; ()
End_of_Relation
End_of_Relations
End_of_Utterance
and the full label formed from it is
Yours sincerely
Girish Malkarnenkar
- Follow-Ups
-
- [hts-users:02062] Re: Generated speech quality, Heiga Zen (Byung Ha CHUN)