[hts-users:04409] parallel makefile for data in slt demo (2.3)
- Subject: [hts-users:04409] parallel makefile for data in slt demo (2.3)
- From: Alexis Moinet <alexis.moinet@xxxxxxxxxxx>
- Date: Tue, 21 Jun 2016 16:00:54 +0200
- Authentication-results: spf=pass (sender IP is 193.190.208.132) smtp.mailfrom=umons.ac.be; sp.nitech.ac.jp; dkim=none (message not signed) header.d=none;sp.nitech.ac.jp; dmarc=bestguesspass action=none header.from=umons.ac.be;
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Hello,
here is a version of data/Makefile.in for HTS demo 2.3 that allows using the "-j" option when running make. To use it, just replace data/Makefile.in with it and rerun ./configure (with usual options) which will regenerate data/Makefile.
Similarly to the previous version (2.2 [1]), this file allows to use:
1. the "-j" parameter of make to run several jobs in parallel
2. the automatic templates with dependencies to avoid re-running computations uselessly
For instance, running:
> $ cd data
> $ make -j20 analysis
will extracts the mgc, lf0 and cmp in about 5-6 minutes, as opposed to 1h30 in sequential mode on a 24-core Intel Xeon X7460 @ 2.66GHz.
Above 20 jobs, disk I/O latency prevents any further speedup on our server anyway.
Besides, re-running the make command a second time should only compute the files that were either not completed in the first run (e.g. in case the script was stopped it will re-start where it was stopped) or were modified in-between (e.g. if one modifies utts/cmu_us_arctic_slt_a0001.utt, make will only recompute the label for that file, same thing for the raw/mgc/lf0 files to create the cmp). mlf, scp and list are still recreated each time, but it takes only seconds. Use "make clean" to force a full re-run of the computation.
This version for 2.3 looks a bit less clean than the one for 2.2 because the new options USEUTT and USESTRAIGHT required the use of conditional statements around some of the templated targets. If someone knows of a cleaner way, I'd love to hear about it :-).
Please note that I couldn't test the part that uses STRAIGHT and matlab (though I'm not sure I'd recommend running 20 matlab scripts in parallel anyway).
Bug reports and improvements are, of course, welcome ;-)
HTH
Alexis
[1] see http://hts.sp.nitech.ac.jp/hts-users/spool/2014/msg00140.html
# ----------------------------------------------------------------- #
# The HMM-Based Speech Synthesis System (HTS) #
# developed by HTS Working Group #
# http://hts.sp.nitech.ac.jp/ #
# ----------------------------------------------------------------- #
# #
# Copyright (c) 2001-2015 Nagoya Institute of Technology #
# Department of Computer Science #
# #
# 2001-2008 Tokyo Institute of Technology #
# Interdisciplinary Graduate School of #
# Science and Engineering #
# #
# 2014-2016 Numediart Institute #
# Department of Signal Processing #
# #
# All rights reserved. #
# #
# Redistribution and use in source and binary forms, with or #
# without modification, are permitted provided that the following #
# conditions are met: #
# #
# - Redistributions of source code must retain the above copyright #
# notice, this list of conditions and the following disclaimer. #
# - Redistributions in binary form must reproduce the above #
# copyright notice, this list of conditions and the following #
# disclaimer in the documentation and/or other materials provided #
# with the distribution. #
# - Neither the name of the HTS working group nor the names of its #
# contributors may be used to endorse or promote products derived #
# from this software without specific prior written permission. #
# #
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND #
# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, #
# INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF #
# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE #
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS #
# BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, #
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED #
# TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, #
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON #
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, #
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY #
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE #
# POSSIBILITY OF SUCH DAMAGE. #
# ----------------------------------------------------------------- #
# setting
SPEAKER = @SPEAKER@
DATASET = @DATASET@
# awk and perl
AWK = @AWK@
PERL = @PERL@
# SPTK commands
X2X = @X2X@
MGCEP = @MGCEP@
LPC2LSP = @LPC2LSP@
MERGE = @MERGE@
VSTAT = @VSTAT@
SOPR = @SOPR@
NAN = @NAN@
MINMAX = @MINMAX@
PITCH = @PITCH@
FRAME = @FRAME@
WINDOW = @WINDOW@
RAW2WAV = @RAW2WAV@
# MATLAB and STRAIGHT
USESTRAIGHT = @USESTRAIGHT@
MATLAB = @MATLAB@
STRAIGHT = @STRAIGHT@
# Festival commands
USEUTT = @USEUTT@
TEXT2UTT = @TEXT2UTT@
DUMPFEATS = @DUMPFEATS@
# speech analysis conditions
SAMPFREQ = @SAMPFREQ@ # Sampling frequency (48kHz)
FRAMELEN = @FRAMELEN@ # Frame length in point (1200 = 48000 * 0.025)
FRAMESHIFT = @FRAMESHIFT@ # Frame shift in point (240 = 48000 * 0.005)
WINDOWTYPE = @WINDOWTYPE@ # Window type -> 0: Blackman 1: Hamming 2: Hanning
NORMALIZE = @NORMALIZE@ # Normalization -> 0: none 1: by power 2: by magnitude
FFTLEN = @FFTLEN@ # FFT length in point
FREQWARP = @FREQWARP@ # frequency warping factor
GAMMA = @GAMMA@ # pole/zero weight for mel-generalized cepstral (MGC) analysis
MGCORDER = @MGCORDER@ # order of MGC analysis
BAPORDER = @BAPORDER@ # order of BAP analysis
LNGAIN = @LNGAIN@ # use logarithmic gain rather than linear gain
LOWERF0 = @LOWERF0@ # lower limit for f0 extraction (Hz)
UPPERF0 = @UPPERF0@ # upper limit for f0 extraction (Hz)
# windows for calculating delta features
MGCWIN = win/mgc.win
LF0WIN = win/lf0.win
BAPWIN = win/bap.win
NMGCWIN = @NMGCWIN@
NLF0WIN = @NLF0WIN@
NBAPWIN = @NBAPWIN@
# for parallel Makefile, we need to know the list of file beforehand
RAW_FILES = $(wildcard raw/$(DATASET)_$(SPEAKER)_*.raw)
FEATURE_FILES = $(patsubst raw/%.raw, mgc/%.mgc, $(RAW_FILES))
FEATURE_FILES += $(patsubst raw/%.raw, lf0/%.lf0, $(RAW_FILES))
ifeq ($(USESTRAIGHT),1)
FEATURE_FILES += $(patsubst raw/%.raw, bap/%.bap, $(RAW_FILES))
endif
CMP_FILES = $(patsubst raw/%.raw, cmp/%.cmp, $(RAW_FILES))
ifeq ($(USEUTT),1)
UTT_FILES = $(wildcard utts/$(DATASET)_$(SPEAKER)_*.utt)
MONO_LAB_FILES = $(patsubst utts/%.utt, labels/mono/%.lab, $(UTT_FILES))
FULL_LAB_FILES = $(patsubst utts/%.utt, labels/full/%.lab, $(UTT_FILES))
else
TXT_FILES = $(wildcard txt/$(DATASET)_$(SPEAKER)_*.txt)
MONO_LAB_FILES = $(patsubst txt/%.txt, labels/mono/%.lab, $(TXT_FILES))
FULL_LAB_FILES = $(patsubst txt/%.txt, labels/full/%.lab, $(TXT_FILES))
endif
all: analysis labels
analysis: features cmp
labels: lab mlf list scp
features: $(FEATURE_FILES)
cmp: $(CMP_FILES)
feature-dirs: mgc-dir lf0-dir bap-dir
mgc-dir:
@if [ ! -d "mgc" ]; then echo "create directory mgc" && mkdir -p mgc; fi
lf0-dir:
@if [ ! -d "lf0" ]; then echo "create directory lf0" && mkdir -p lf0; fi
bap-dir:
@if [ ! -d "bap" ]; then echo "create directory bap" && mkdir -p bap; fi
cmp-dir:
@if [ ! -d "cmp" ]; then echo "create directory cmp" && mkdir -p cmp; fi
lab: mono full
mono: $(MONO_LAB_FILES)
full: $(FULL_LAB_FILES)
labels/mono:
@if [ ! -d "labels/mono" ]; then echo "create directory labels/mono" && mkdir -p labels/mono; fi
labels/full:
@if [ ! -d "labels/full" ]; then echo "create directory labels/full" && mkdir -p labels/full; fi
lists-dir:
@if [ ! -d "lists" ]; then echo "create directory lists" && mkdir -p lists; fi
scp-dir:
@if [ ! -d "scp" ]; then echo "create directory scp" && mkdir -p scp; fi
tmp:
@if [ ! -d "tmp" ]; then echo "create directory tmp" && mkdir -p tmp; fi
mgc/%.mgc lf0/%.lf0 bap/%.bap: raw/%.raw | feature-dirs
# Extracting features from raw audio
@SAMPKHZ=`echo $(SAMPFREQ) | $(X2X) +af | $(SOPR) -m 0.001 | $(X2X) +fa`; \
min=`$(X2X) +sf $< | $(MINMAX) | $(X2X) +fa | head -n 1`; \
max=`$(X2X) +sf $< | $(MINMAX) | $(X2X) +fa | tail -n 1`; \
if [ -s $< -a $${min} -gt -32768 -a $${max} -lt 32767 ]; then \
echo "Extracting features from $<"; \
if [ $(USESTRAIGHT) -eq 0 ]; then \
$(X2X) +sf $< | $(PITCH) -H $(UPPERF0) -L $(LOWERF0) -p $(FRAMESHIFT) -s $${SAMPKHZ} -o 2 > lf0/$*.lf0; \
if [ $(GAMMA) -eq 0 ]; then \
$(X2X) +sf $< | \
$(FRAME) -l $(FRAMELEN) -p $(FRAMESHIFT) | \
$(WINDOW) -l $(FRAMELEN) -L $(FFTLEN) -w $(WINDOWTYPE) -n $(NORMALIZE) | \
$(MGCEP) -a $(FREQWARP) -m $(MGCORDER) -l $(FFTLEN) -e 1.0E-08 > mgc/$*.mgc; \
else \
if [ $(LNGAIN) -eq 1 ]; then \
GAINOPT="-L"; \
fi; \
$(X2X) +sf $< | \
$(FRAME) -l $(FRAMELEN) -p $(FRAMESHIFT) | \
$(WINDOW) -l $(FRAMELEN) -L $(FFTLEN) -w $(WINDOWTYPE) -n $(NORMALIZE) | \
$(MGCEP) -a $(FREQWARP) -c $(GAMMA) -m $(MGCORDER) -l $(FFTLEN) -e 1.0E-08 -o 4 | \
$(LPC2LSP) -m $(MGCORDER) -s $${SAMPKHZ} $${GAINOPT} -n $(FFTLEN) -p 8 -d 1.0E-08 > mgc/$*.mgc; \
fi; \
if [ -n "`$(NAN) lf0/$*.lf0`" ]; then \
echo " Failed to extract features from $<"; \
rm -f lf0/$*.lf0; \
fi; \
if [ -n "`$(NAN) mgc/$*.mgc`" ]; then \
echo " Failed to extract features from $<"; \
rm -f mgc/$*.mgc; \
fi; \
else \
FRAMESHIFTMS=`echo $(FRAMESHIFT) | $(X2X) +af | $(SOPR) -m 1000 -d $(SAMPFREQ) | $(X2X) +fa`; \
$(RAW2WAV) -s $${SAMPKHZ} -d . $<; \
echo "path(path,'$(STRAIGHT)');" > $*.m; \
echo "prm.F0frameUpdateInterval=$${FRAMESHIFTMS};" >> $*.m; \
echo "prm.F0searchUpperBound=$(UPPERF0);" >> $*.m; \
echo "prm.F0searchLowerBound=$(LOWERF0);" >> $*.m; \
echo "prm.spectralUpdateInterval=$${FRAMESHIFTMS};" >> $*.m; \
echo "[x,fs]=wavread('$*.wav');" >> $*.m; \
echo "[f0,ap] = exstraightsource(x,fs,prm);" >> $*.m; \
echo "[sp] = exstraightspec(x,f0,fs,prm);" >> $*.m; \
echo "ap = ap';" >> $*.m; \
echo "sp = sp';" >> $*.m; \
echo "sp = sp*32768.0;" >> $*.m; \
echo "save '$*.f0' f0 -ascii;" >> $*.m; \
echo "save '$*.ap' ap -ascii;" >> $*.m; \
echo "save '$*.sp' sp -ascii;" >> $*.m; \
echo "quit;" >> $*.m; \
$(MATLAB) < $*.m; \
if [ -s $*.f0 ]; then \
$(X2X) +af $*.f0 | $(SOPR) -magic 0.0 -LN -MAGIC -1.0E+10 > lf0/$*.lf0; \
if [ -n "`$(NAN) lf0/$*.lf0`" ]; then \
echo " Failed to extract features from $<"; \
rm -f lf0/$*.lf0; \
fi; \
fi; \
if [ -s $*.sp ]; then \
if [ $(GAMMA) -eq 0 ]; then \
$(X2X) +af $*.sp | \
$(MGCEP) -a $(FREQWARP) -m $(MGCORDER) -l 2048 -e 1.0E-08 -j 0 -f 0.0 -q 3 > mgc/$*.mgc; \
else \
if [ $(LNGAIN) -eq 1 ]; then \
GAINOPT="-L"; \
fi; \
$(X2X) +af $*.sp | \
$(MGCEP) -a $(FREQWARP) -c $(GAMMA) -m $(MGCORDER) -l 2048 -e 1.0E-08 -j 0 -f 0.0 -q 3 -o 4 | \
$(LPC2LSP) -m $(MGCORDER) -s $${SAMPKHZ} $${GAINOPT} -n 2048 -p 8 -d 1.0E-08 > mgc/$*.mgc; \
fi; \
if [ -n "`$(NAN) mgc/$*.mgc`" ]; then \
echo " Failed to extract features from $<"; \
rm -f mgc/$*.mgc; \
fi; \
fi; \
if [ -s $*.ap ]; then \
$(X2X) +af $*.ap | \
$(MGCEP) -a $(FREQWARP) -m $(BAPORDER) -l 2048 -e 1.0E-08 -j 0 -f 0.0 -q 1 > bap/$*.bap; \
if [ -n "`$(NAN) bap/$*.bap`" ]; then \
echo " Failed to extract features from $<"; \
rm -f bap/$*.bap; \
fi; \
fi; \
rm -f $*.m $*.wav $*.f0 $*.ap $*.sp; \
fi; \
fi;
ifeq ($(USESTRAIGHT),0)
cmp/%.cmp: mgc/%.mgc lf0/%.lf0 | cmp-dir
else
cmp/%.cmp: mgc/%.mgc lf0/%.lf0 bap/%.bap | cmp-dir
endif
# Composing training data files from extracted features
@echo "Composing training data for $*"; \
if [ $(USESTRAIGHT) -eq 0 ]; then \
MGCDIM=`expr $(MGCORDER) + 1`; \
LF0DIM=1; \
MGCWINDIM=`expr $(NMGCWIN) \* $${MGCDIM}`; \
LF0WINDIM=`expr $(NLF0WIN) \* $${LF0DIM}`; \
BYTEPERFRAME=`expr 4 \* \( $${MGCWINDIM} + $${LF0WINDIM} \)`; \
if [ -s mgc/$*.mgc -a -s lf0/$*.lf0 ]; then \
MGCWINS=""; \
i=1; \
while [ $${i} -le $(NMGCWIN) ]; do \
eval MGCWINS=\"$${MGCWINS} $(MGCWIN)$${i}\"; \
i=`expr $${i} + 1`; \
done; \
$(PERL) scripts/window.pl $${MGCDIM} mgc/$*.mgc $${MGCWINS} > cmp/tmp_$*.mgc; \
LF0WINS=""; \
i=1; \
while [ $${i} -le $(NLF0WIN) ]; do \
eval LF0WINS=\"$${LF0WINS} $(LF0WIN)$${i}\"; \
i=`expr $${i} + 1`; \
done; \
$(PERL) scripts/window.pl $${LF0DIM} lf0/$*.lf0 $${LF0WINS} > cmp/tmp_$*.lf0; \
$(MERGE) +f -s 0 -l $${LF0WINDIM} -L $${MGCWINDIM} cmp/tmp_$*.mgc < cmp/tmp_$*.lf0 > cmp/tmp_$*.cmp; \
$(PERL) scripts/addhtkheader.pl $(SAMPFREQ) $(FRAMESHIFT) $${BYTEPERFRAME} 9 cmp/tmp_$*.cmp > cmp/$*.cmp; \
rm -f cmp/tmp_$*.*; \
fi; \
else \
MGCDIM=`expr $(MGCORDER) + 1`; \
LF0DIM=1; \
BAPDIM=`expr $(BAPORDER) + 1`; \
MGCWINDIM=`expr $(NMGCWIN) \* $${MGCDIM}`; \
LF0WINDIM=`expr $(NLF0WIN) \* $${LF0DIM}`; \
BAPWINDIM=`expr $(NBAPWIN) \* $${BAPDIM}`; \
MGCLF0WINDIM=`expr $${MGCWINDIM} + $${LF0WINDIM}`; \
BYTEPERFRAME=`expr 4 \* \( $${MGCWINDIM} + $${LF0WINDIM} + $${BAPWINDIM} \)`; \
if [ -s mgc/$*.mgc -a -s lf0/$*.lf0 -a -s bap/$*.bap ]; then \
MGCWINS=""; \
i=1; \
while [ $${i} -le $(NMGCWIN) ]; do \
eval MGCWINS=\"$${MGCWINS} $(MGCWIN)$${i}\"; \
i=`expr $${i} + 1`; \
done; \
$(PERL) scripts/window.pl $${MGCDIM} mgc/$*.mgc $${MGCWINS} > cmp/tmp_$*.mgc; \
LF0WINS=""; \
i=1; \
while [ $${i} -le $(NLF0WIN) ]; do \
eval LF0WINS=\"$${LF0WINS} $(LF0WIN)$${i}\"; \
i=`expr $${i} + 1`; \
done; \
$(PERL) scripts/window.pl $${LF0DIM} lf0/$*.lf0 $${LF0WINS} > cmp/tmp_$*.lf0; \
BAPWINS=""; \
i=1; \
while [ $${i} -le $(NBAPWIN) ]; do \
eval BAPWINS=\"$${BAPWINS} $(BAPWIN)$${i}\"; \
i=`expr $${i} + 1`; \
done; \
$(PERL) scripts/window.pl $${BAPDIM} bap/$*.bap $${BAPWINS} > cmp/tmp_$*.bap; \
$(MERGE) +f -s 0 -l $${LF0WINDIM} -L $${MGCWINDIM} cmp/tmp_$*.mgc < cmp/tmp_$*.lf0 > cmp/tmp_$*.mgc+lf0; \
$(MERGE) +f -s 0 -l $${BAPWINDIM} -L $${MGCLF0WINDIM} cmp/tmp_$*.mgc+lf0 < cmp/tmp_$*.bap > cmp/tmp_$*.cmp; \
$(PERL) scripts/addhtkheader.pl $(SAMPFREQ) $(FRAMESHIFT) $${BYTEPERFRAME} 9 cmp/tmp_$*.cmp > cmp/$*.cmp; \
rm -f cmp/tmp_$*.*; \
fi; \
fi;
ifeq ($(USEUTT),0)
tmp/tmp_%.utt: txt/%.txt | tmp
@echo "Extracting utterance from $<"; \
$(PERL) scripts/normtext.pl $< > tmp/tmp_$*.txt; \
$(TEXT2UTT) tmp/tmp_$*.txt > $@; \
rm -f tmp/tmp_$*.txt;
endif
ifeq ($(USEUTT),1)
tmp/tmp_%.feats: utts/%.utt | tmp
else
tmp/tmp_%.feats: tmp/tmp_%.utt | tmp
endif
@echo "Extracting feats from $<"; \
if [ -s $< ]; then \
$(DUMPFEATS) \
-eval scripts/extra_feats.scm \
-relation Segment \
-feats scripts/label.feats \
-output $@ \
$<; \
fi;
labels/mono/%.lab: tmp/tmp_%.feats | labels/mono
@echo creating $@
@$(AWK) -f scripts/label-mono.awk $< > $@;
labels/full/%.lab: tmp/tmp_%.feats | labels/full
@echo creating $@
@$(AWK) -f scripts/label-full.awk $< > $@;
mlf:
@echo "Generating monophone Master Label Files (MLF)"
@echo "#!MLF!#" > labels/mono.mlf
@echo "\"*/$(DATASET)_$(SPEAKER)_*.lab\" -> \"@PWD@/data/labels/mono\"" >> labels/mono.mlf
@echo "Generating fullcontext Master Label Files (MLF)"
@echo "#!MLF!#" > labels/full.mlf
@echo "\"*/$(DATASET)_$(SPEAKER)_*.lab\" -> \"@PWD@/data/labels/full\"" >> labels/full.mlf
list: cmp lab lists-dir
@echo "Generating a fullcontext model list file"; \
rm -f tmp_list; \
for lab in labels/full/$(DATASET)_$(SPEAKER)_*.lab; do \
if [ -s $${lab} -a -s labels/mono/`basename $${lab}` -a -s cmp/`basename $${lab} .lab`.cmp ]; then \
sed -e "s/.* //g" $${lab} >> tmp_list; \
fi \
done; \
sort -u tmp_list > lists/full.list; \
rm -f tmp_list
@echo "Generating a fullcontext model list file which includes unseen models"; \
rm -f tmp_list; \
cat lists/full.list > tmp_list; \
for lab in labels/gen/*.lab; do \
sed -e "s/.* //g" $${lab} >> tmp_list; \
done; \
sort -u tmp_list > lists/full_all.list; \
rm -f tmp_list
@echo "Generating a monophone model list file"; \
rm -f tmp_list; \
for lab in labels/mono/$(DATASET)_$(SPEAKER)_*.lab; do \
if [ -s $${lab} -a -s labels/full/`basename $${lab}` -a -s cmp/`basename $${lab} .lab`.cmp ]; then \
sed -e "s/.* //g" $${lab} >> tmp_list; \
fi \
done; \
sort -u tmp_list > lists/mono.list; \
rm -f tmp_list
scp: cmp lab scp-dir
@echo "Generating a training data script"; \
rm -f scp/train.scp; \
for cmp in @PWD@/data/cmp/$(DATASET)_$(SPEAKER)_*.cmp; do \
if [ -s $${cmp} -a -s labels/mono/`basename $${cmp} .cmp`.lab -a -s labels/full/`basename $${cmp} .cmp`.lab ]; then \
echo $${cmp} >> scp/train.scp; \
fi \
done;
@echo "Generating a generation label script"; \
rm -f scp/gen.scp; \
for lab in @PWD@/data/labels/gen/*.lab; do \
echo $${lab} >> scp/gen.scp; \
done
clean: clean-mgc clean-lf0 clean-bap clean-cmp clean-lab clean-mlf clean-list clean-scp
clean-mgc:
rm -rf mgc
clean-lf0:
rm -rf lf0
clean-bap:
rm -rf bap
clean-cmp:
rm -rf cmp
clean-lab:
rm -rf labels/mono
rm -rf labels/full
clean-mlf:
rm -f labels/*.mlf
clean-list:
rm -rf lists
clean-scp:
rm -rf scp
clean-tmp:
rm -rf tmp
distclean: clean
rm -f Makefile
.PHONY: all analysis features cmp labels lab mlf list scp clean distclean
.PRECIOUS: lf0/%.lf0 mgc/%.mgc bap/%.bap cmp/%.cmp labels/mono/%.lab labels/full/%.lab