[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04125] Re: parallel makefile for data in slt demo (2.2)


Hi Alexis (and all),

That's very neat! I wonder if the same approach could be taken for the training part itself? I've often thought voice building should really operate using some sort of managed-dependency-graph (like make uses).

Cheers,

Matt


On 19/09/14 15:55, Alexis Moinet wrote:
Hello

I've rewritten the data/Makefile.in from slt-based hts-demo (version 2.2) so that it can use:

   1. the "-j" parameter of make to run several jobs in parallel
   2. the automatic templates with dependencies to avoid re-running computations uselessly

it's been written as a drop-in replacement of data/Makefile.in, all the main targets are the same (all, labels, analysis, mgc, lf0, cmp, scp, list, mlf). This makes some parts inefficient, but not significantly.

I tested it on two different hardwares (laptop and cluster) with two different Linux distributions (ubuntu 12.04 and centOS 6.3) and didn't run into any error (doesn't mean it's bullet proof though)

As a rule of thumb, it should be run as:

$ make -jnum_of_cores data

and, as a result, the computation time for the data target should be reduced by a factor close to num_of_cores, give or take (I couldn't test i7 CPUs for "make -jnum_of_core*2" but it might just work ...).

As an example, on a 24-core cluster, the computation time for "make data" (thus using 1 core) was a bit more than 1h30 minutes whereas "make -j24 data" takes about 6 minutes (so, an overall improvement factor of 15. The "cmp" target is accelerated by more than 20 but the gain for the "labels" target is much smaller, around 5, I guess it has to do with it being text- and script-based)

Besides, re-running the make command a second time should only compute the files that were either not completed in the first run (e.g. in case the script was stopped it will re-start where it was stopped) or were modified in-between (e.g. if one modifies utts/cmu_us_arctic_slt_a0001.utt, make will only recompute the label for that file, same thing for the raw/mgc/lf0 files to create the cmp). mlf, scp and list are still recreated each time, but it takes only seconds. Use "make clean" to force a full re-run of the computation.

main drawback: it looks like it doesn't work with BSD make (tested with FreeBSD 10). Using GNU make (gmake) from ports bypasses the problem for now.

I'm aware that most of the time in the demo is spent in the training part (make voice), but I hope it will help some people nonetheless.

Alexis

ps: please feel free to share any fix and improvement, some parts are a bit hackish ;-)
pps: it shouldn't be too hard to port to 2.3


Follow-Ups
[hts-users:04126] Re: parallel makefile for data in slt demo (2.2), Alexis Moinet
[hts-users:04142] Bug Report on ApplyWindow Function in HMGenS, YangWang84
References
[hts-users:04124] parallel makefile for data in slt demo (2.2), Alexis Moinet