python agSegmentSf.py examples/lachenmann.aiff
Next we run the agConcatenate.py script with an options file. The options file specifies the target soundfile's path along with a lot of other information which parameterized the concatenative process.
python agConcatenate.py examples/01-simplest.py
The agConcatenate.py script creates several output files with segment selection information. By default the selected sound segments are also rendered as a soundfile with csound6 and played back at the command line. Thus, if everything is working you should hear the concatenated soundfile played back after the process is complete.
python agSegmentSf.py examples/lachenmann.aiff
AudioGuide will think for a second, and then output the following data detailing the segmentation of this audiofile:
---------------------- AUDIOGUIDE SEGMENT SOUNDFILE ----------------------
Evaluating /Users/ben/Documents/AudioGuide1.32/examples/lachenmann.aiff from 0.00-64.65
AN ONSET HAPPENS when
The amplitude crosses the Relative Onset Trigger Threshold: -40.00 (-t option)
AN OFFSET HAPPENS when
1. Offset Rise Ratio: when next-frame’s-amplitude/this-frame’s-amplitude >= 1.30 (-r option)
...or...
2. Offset dB above minimum: when this frame’s absolute amplitude <= -80.00 (minimum found amplitude of -260.00 plus the offset dB boost of 12.00 (-d option))
Found 144 segments
Wrote file /Users/ben/Documents/AudioGuide1.32/examples/lachenmann.aiff.txt
As a result of running this python script, AudioGuide1.32 automatically writes a textfile with the exact same name and path as the soundfile, but adding the extension '.txt' -- in this case: examples/lachenmann.aiff.txt.
python agSegmentSf.py -t -30 -d 4 soundfilename.wav # sets the threshold to -30. it will produce less onsets then -40 (the default). also changes the drop dB value to 4 dB above the minimum amplitude found in the entire soundfile, likely leading to longer segments.
python agSegmentSf.py -r 4 soundfilename.wav # changes the rise ratio from the default (1.1) to 4. This means that a frame's amplitude must be four times louder than the previous one to start a new segment
Note that to segment a whole directory of soundfiles, you may use wildcard characters in the bash shell, as in:
python agSegmentSf.py mydir/*.aiff # create a segmentation file for each aiff file located in mydir/
The textfiles created by agSegmentSf.py use a segmentation labeling format identical to that of the soundfile editor Audacity. So, to examine your segments, open lachenmann.aiff in Audacity, then import labels and select ‘examples/lachenmann.aiff.txt’.
python agGranuateSf.py examples/lachenmann.aiff
There are similar flags to the agSegment script, as we only want to use grains where there is audio present. Thus the -t, -a, and -d remain. Grain length in second is specified with the -g flag and grain overlap is specified with -o.
csf('lachenmann.aiff', onsetLen='50%', offsetLen='50%')
See examples/06-granular.py for usage and details.
python agConcatenate.py examples/01-simplest.py
...which will use the options contained in ‘examples/01-simplest.py’ to parameterize the concatenative algorithm. In this options file you specify a target sound, the corpus sounds, and (if you like) lots of other options that parameterize the concatenative process. When run, the ‘agConcatenate.py’ script will perform the following operations:
CSOUND_RENDER_FILEPATH = '/path/to/the/file/i/want.aiff' # sets the path of the csound output aiff file
DESCRIPTOR_HOP_SIZE_SEC = 0.02049 # change the analysis hop size
However, there are five custom objects that are written into the options file as well -- tsf(), csf(), spass(), d() and si(). These objects take required parameters and also take keyword arguments. The following sections describe the object-style variables; Section Other Options details simple variables assigned with the ‘=’ symbol.
tsf('pathtosoundfile', start=0, end=file-length, thresh=-40, offsetRise=1.5, offsetThreshAdd=+12, offsetThreshAbs=-80, scaleDb=0, minSegLen=0.05, maxSegLen=1000, midiPitchMethod='composite', stretch=1, segmentationFilepath=None)
TARGET = tsf('cage.aiff') # uses the whole soundfile at its given amplitude
TARGET = tsf('cage.aiff', start=5, end=7, scaleDb=6) # only use seconds 5-7 of cage.aiff at double the amplitude.
TARGET = tsf('cage.aiff', start=2, end=3, stretch=2) # only uses seconds 2-3, but stretches the sound with supervp to twice its duration before concatenation
csf('pathToFileOrDirectoryOfFiles', start=None, end=None, includeTimes=[], excludeTimes=[], limit=[], wholeFile=False, recursive=True, includeStr=None, excludeStr=None, scaleDb=0.0, limitDur=None, onsetLen=0.01, offsetLen='30%', postSelectAmpBool=False, postSelectAmpMethod='power-mean-seg', postSelectAmpMin=-12, postSelectAmpMax=+12, midiPitchMethod='composite', transMethod=None, transQuantize=0, concatFileName=None, allowRepetition=True, restrictRepetition=0.5, restrictOverlaps=None, restrictInTime=0, maxPercentTargetSegments=None, scaleDistance=1, superimposeRule=None, segmentationFile=None, segmentationExtension='.txt')
The simplest way to include a soundfile in your corpus is to use its path as the first argument of the csf() object:
CORPUS = [csf("lachenmann.aiff")]# will search for a segmentation file called lachenmann.aiff.txt and add all of its segments to the corpus
The simplest way to include a directory of soundfiles in your corpus is to use its path as the first argument of the csf() object:
CORPUS = [csf("lachenmann.aiff"), csf('piano')]# will use segments from lachenmann.aiff as well as all sounds in the directory called piano
However, as you can see above, each csf() object has a lot of optional arguments to give you better control over what segments are used, how directories are read and how segments are treated during concatenation.
csf('lachenmann.aiff', start=20) # only use segments who start later than 20s.
csf('lachenmann.aiff', start=20, end=50) # only use segments who start between 20-50s.
csf('lachenmann.aiff', includeTimes=[(1, 4), (10, 12)]) # only use segments falling between 1-4 seconds and 10-12 seconds.
csf('lachenmann.aiff', excludeTimes=[(30, 55)]) # use all segments except those falling between 30-55s.
csf('lachenmann.aiff', limit=['centroid-seg >= 1000']) # segments whose centroid-seg is equal to or above 1000.
csf('lachenmann.aiff', limit=['centroid-seg < 50%']) # only use 50% of segments with the lowest centroid-seg.
csf('lachenmann.aiff', limit=['power-seg < 50%', 'power-seg > 10%']) # only use segments whose power-seg falls between 10%-50% of the total range of power-seg's in this file/directory.
csf('lachenmann.aiff', segmentationFile='marmotTent.txt') # will use a segmentation file called marmotTent.txt, not the default lachenmann.aiff.txt.
csf('lachenmann.aiff', segmentationExtension='-gran.txt') # will use a segmentation file called lachenmann.aiff-gran.txt, not the default lachenmann.aiff.txt.
csf('sliced/my-directory', wholeFile=True) # will not search for a segmentation txt file, but use whole soundfiles as single segments.
csf('/Users/ben/gravillons', wholeFile=True, recursive=False) # will only use soundfiles in the named folder, ignoring its subdirectories.
# includeStr/excludeStr have lots of uses. One to highlight here: working with sample databases which are normalized. Rather than having each corpus segment be at 0dbs, we apply a scaleDb value based on the presence of a ‘dynamic' written into the filename.
csf('Vienna-harpNotes/', includeStr=['_f_', '_ff_'], scaleDb=-6),
csf('Vienna-harpNotes/', includeStr='_mf_', scaleDb=-18),
csf('Vienna-harpNotes/', includeStr='_p_', scaleDb=-30),
# this will use all sounds from this folder which match one of the three dynamics.
csf('lachenmann.aiff', onsetLen=0.1, offsetLen='50%') # will apply a 100ms fade in time and a fade out time lasting 50% of each segments' duration.
csf('piano/', transMethod='f0') # transpose corpus segments to match the target's f0.
csf('piano/', transMethod='f0-chroma', transQuantize=0.5) # transpose corpus segments to match the target's f0 mod 12. Then quantize each resulting pitch to the newest quarter of tone.
csf('lachenmann-mono.aiff', concatFileName='lachenmann.stereo.aiff'), # runs the descriptor analysis on lachenmann-mono.aiff, but then uses the audio from lachenmann.stereo.aiff when rendering the csound output.
csf('piano/', allowRepetition=False) # each individual segment found in this directory of files may only be deleted one time during concatenation.
csf('piano/', restrictRepetition=2.5) # Each segment is invalid to be picked if it has already been selected in the last 2.5 seconds.
CORPUS = [csf('lachenmann.aiff', scaleDb=-6), csf('piano/', scaleDb=-6, wholeFile=True)]
# is equivalent to
CORPUS_GLOBAL_ATTRIBUTES = {'scaleDb': -6}
CORPUS = [csf('lachenmann.aiff'), csf('piano/', wholeFile=True)]
Target segments and corpus segments in AudioGuide are normalized with d() object keywords, documented in the section below.
Normalization is important when measuring the similarity between two different data sets (target sound segments and corpus sound segments) - it is essentially a way of mapping the two data sets onto each other. It is often the case that the target soundfile is less "expressive" than the corpus - i.e. that the variation in the target sound's descriptors are less robust that the variation of the descriptors found across the corpus.This is the case in examples/01-simplest.py: Cage's voice is fairly monolithic with regard to timbre, amplitude, pitch, etc., which the Lachenmann contains a great deal more nuance in timbre and dynamics. Under these circumstances, what do we want AudioGuide to do? Should Cage's relatively small variation in descriptor space be matched to a relatively small subset of the corpus? Or should Cage's variation in descriptor space be streched to fit the total variation of the corpus? My default answer to this question (aesthetically, compositionally, musically) is the latter. And it is for that reason that AudioGuide's default mode of operating is to normalize target and corpus descirptor spaces separately so that the variability of the target is stretched to fit the variabley of the corpus.
This is done with the norm=2 flag in the d() object (norm=2 is the default value for the d() object); norm=1 normalizes the target and corpus descirptor spaces together so that descriptor values are better preserved. The two charts below show target and corpus segments from examples/01-simplest.py normalized together and separately to show the difference. Normalized descriptor values affect how similarity is measured and therefore which corpus segments will be picked to "match" target segments.
In addition to specifying whether or not target and corpus values are normalized together or separately, you can change the algorithm for normalization. By default, normmethod='stddev', but graphs below show two other possibilities. However, there are certainly moments when you would want to opt for norm=1. Take the descriptor f0-seg for instance. Given d('f0-seg', norm=1), AudioGuide will attempt to match f0s between the target and corpus as closely as possible. Given d('f0-seg', norm=2), AudioGuide will stretch the target's f0s to match the variablelity of f0s in the corpus, effectively matching the target's contour of pitch but not its exact pitches.d('descriptor name', weight=1, norm=2, normmethod='stddev', distance='euclidean', energyWeight=False)
SEARCH= [spass('closest', d('centroid', weight=1), d('noisiness', weight=0.5))]# centroid is twice as important as noisiness.
SEARCH = [spass('closest', d('centroid'), d('effDur-seg', norm=1))]
SEARCH = [spass('closest', d('centroid', distance='pearson'))] # uses a pearson correlation formula for determining distance between target and corpus centroid arrays.
spass(search_type, descriptor1...descriptorN, percent=None, minratio=None, maxratio=None)
SEARCH = [spass('closest', d('centroid'))] # will search all corpus segments and select the one with the ‘closest' time varying centroid to the target segment.
Note that the first argument is the type of search performed -- in this case, selecting the closest sample. Following the arguments are a list of descriptor objects which specify which descriptors to use:
SEARCH = [spass('closest', d('centroid'), d('effDur-seg'))] # will search all corpus segments and select the one with the ‘closest' centroid and effective duration compared to the target segment.
Ok, great. As you can probably imagine, the first argument, ‘closest’, tells AudioGuide to pick the closest sound. But, there are also other possibilities:
SEARCH = [spass('farthest', d('centroid'))] # return the worst matching segment.
SEARCH = [spass('closest_percent', d('centroid'), percent=20)] # return the top 20 percent best matches.
SEARCH = [spass('farthest_percent', d('centroid'), percent=20)] # return the worst 20 percent of matches.
SEARCH = [spass('ratio_limit', d('centroid-seg'), minratio=0.5)] # return segments who's centroid-seg is at least half the value of the target's centroid-seg
If you use ‘closest_percent’ or ‘farthest_percent’ as the one and only spass object in the SEARCH variable, AudioGuide will select a corpus segment randomly among the final candidates. However, you can also chain spass objects together, essentially constructing a hierarchical search algorithm. So, for example, take the following SEARCH variable with two separate phases:
SEARCH = [
spass('closest_percent', d('effDur-seg'), percent=20), # take the best 20% of matches from the corpus
spass('closest', d('mfccs')), # now find the best matching segment from the 20 percent that remains.
]
I use the above example a lot when using AudioGuide. It first matches effDur-seg, the effective duration of the target measured agains’t the effective duration of each corpus segment. It retains the 20% closest matches, and throws away the worst 80%. Then, with the remaining 20%, the timbre of the sounds are matched according to mfccs.
SEARCH = [spass('ratio_limit', d('centroid-seg'), minratio=0.9, maxratio=1.1)] # reduce the number of samples in the corpus such
Remember, the order of the spass objects in the SEARCH variable is very important -- it is essentially the order of operations.
SUPERIMPOSE = si(minSegment=None, maxSegment=None, minOnset=None, maxOnset=8, minOverlap=None, maxOverlap=None, searchOrder='power', calcMethod='mixture')
DESCRIPTOR_FORCE_ANALYSIS (type=bool, default=False) if True, AudioGuide is forced to remake all analysis file, even if previously made.
DESCRIPTOR_WIN_SIZE_SEC (type=float, default=0.04096) the FFT window size of descriptor analysis in seconds. Note that this value gets automatically rounded to the nearest power-of-two size based on IRCAMDESCRIPTOR_RESAMPLE_RATE. 0.04096 seconds = 512 @ 12.5kHz (the default resample rate).
DESCRIPTOR_HOP_SIZE_SEC (type=float, default=0.01024) the FFT window overlaps of descriptor analysis in seconds. Note that this value gets automatically rounded to the nearest power-of-two size based on IRCAMDESCRIPTOR_RESAMPLE_RATE. Important, as it effectively sets of temporal resolution of AudioGuide. Performance is best when DESCRIPTOR_HOP_SIZE_SEC/DESCRIPTOR_WIN_SIZE_SEC is a whole number.
IRCAMDESCRIPTOR_RESAMPLE_RATE (type=int, default=12500) The internal resample rate of the IRCAM analysis binary. Important, as it sets the frequency resolution of spectral sound descriptors.
IRCAMDESCRIPTOR_WINDOW_TYPE (type=string, default=‘blackman’) see ircamdescriptor documentation for details.
IRCAMDESCRIPTOR_NUMB_MFCCS (type=int, default=13) sets the number of MFCCs to make. When you use the descriptor 'mfccs' if will automatcially be expanded into mfccs1-N where N is IRCAMDESCRIPTOR_NUMB_MFCCS.
IRCAMDESCRIPTOR_F0_MAX_ANALYSIS_FREQ (type=float, default=5000) see ircamdescriptor documentation for details.
IRCAMDESCRIPTOR_F0_MIN_FREQUENCY (type=float, default=20) minimum possible f0 frequency.
IRCAMDESCRIPTOR_F0_MAX_FREQUENCY (type=float, default=5000) maximum possible f0 frequency.
IRCAMDESCRIPTOR_F0_AMP_THRESHOLD (type=float, default=1) Thresholding of the spectrum in F0 detection.
SUPERVP_BIN (type=string/None, default=None) Optionally specify a path to the supervp analysis binary. Used for target pre-concatenation time stretching.
ROTATE_VOICES (type=bool, default=False]) if True, AudioGuide will rotate through the list of corpus entries during concatenation. This means that, when selecting corpus segment one, AudioGuide will only search sound segments from the first item of the CORPUS variable. Selection 2 will only search the second, and so on. Corpus rotation is modular around the length of the CORPUS variable. If the corpus only has one item, True will have no effect.
VOICE_PATTERN (type=list, default=[]) if an empty list, this does nothing. However, if the user gives a list of strings, AudioGuide will rotate through this list of each concatenative selection and only use corpus segments who’s filepath match this string. Matching can use parts of the filename, not necessarily the whole path and it is not case sensitive.
OUTPUT_GAIN_DB (type=int/None, default=None) adds a uniform gain in dB to all selected corpus units. Affects the subtractive envelope calculations and descriptor mixtures as well as csound rendering.
RANDOM_SEED (type=int/None, default=None) sets the pseudo-random seed for random unit selection. By default a value of None will use the system’s timestamp. Setting an integer will create repeatable random results.
These options change selected corpus events after concatenative selection, meaning that they do not affect similarity calculations, but are applied afterwards. At the moment, all of these parameters affect the temporality of the corpus sounds. These parameters affect all output files - csound, midi, json, etc.
OUTPUTEVENT_ALIGN_PEAKS (type=bool, default=False) If True aligns the peak times of corpus segments to match those of the target segments. Thus, every corpus segment selected to represent a target segment will be moved in time such that corpus segment's peak amplitude is aligned with the target segment's.
OUTPUTEVENT_QUANTIZE_TIME_METHOD (type=string/None, default=None) controls the quantisation of the start times of events selected during concatenation. Note that any quantisation takes place after the application of OUTPUTEVENT_TIME_STRETCH and OUTPUTEVENT_TIME_ADD, as detailed below. This variable has the following possible settings:
OUTPUTEVENT_QUANTIZE_TIME_INTERVAL (type=float, default=0.25) defines the temporal interval in seconds for quantisation. If OUTPUTEVENT_QUANTIZE_TIME_METHOD = None, this doesn’t do anything.
OUTPUTEVENT_TIME_STRETCH (type=float, default=1.) stretch the temporality of selected units. A value of 2 will stretch all events offsets by a factor of 2. Does not affect corpus segment durations.
OUTPUTEVENT_TIME_ADD (type=float, default=0.) offset the start time of selected events by a value in seconds.
OUTPUTEVENT_DURATION_SELECT (type=string, default=cps) if 'cps', AudioGuide will play each selected corpus sample according to its duration. if 'tgt', AudioGuide will use the duration of the target segment a corpus segment was matched to to determine its playback duration. 'tgt' may result in clipped corpus sounds.
OUTPUTEVENT_DURATION_MIN (type=float/None, default=None) Sets a minimum duration for all selected corpus segments. When this variable is a float, the length of each segment's playback will be at least OUTPUTEVENT_DURATION_MIN seconds (unless the soundfile is too short, in which case the duration will be the soundfile's duration). If None is given (the default), no minimum duration is enforced.
OUTPUTEVENT_DURATION_MAX (type=float/None, default=None) Sets a maximum duration for all selected corpus segments. When this variable is a float, the length of each segment's playback cannot be more than OUTPUTEVENT_DURATION_MAX seconds. If None is given (the default), no maximum duration is enforced.
For each of the following _FILEPATH variables, a value of None tells the agConcatenate.py NOT to create an output file. Otherwise a string tells agConcatenate.py to create this output file and also indicates the path of the file to create. Strings may be absolute paths. If a relative path is given, AudioGuide will create the file relative to the location o the agConcatenate.py script.
CSOUND_CSD_FILEPATH (type=string/None, default=‘output/output.csd’) creates an output csd file for rendering the resulting concatenation with csound.
CSOUND_RENDER_FILEPATH (type=string/None, default=‘output/output.aiff’) sets the sound output file in the CSOUND_CSD_FILEPATH file. This is the name of csound’s output soundfile and will be created at the end of concatenation.
MIDI_FILEPATH (type=string/None, default=None) a midi file of the concatenation with pitches chosen according to midiPitchMethod from each corpus entry.
OUTPUT_LABEL_FILEPATH (type=string/None, default=‘output/outputlabels.txt’) Audacity-style labels showing the selected corpus sounds and how they overlap.
LISP_OUTPUT_FILEPATH (type=string/None, default=‘output/output.lisp.txt’) a textfile containing selected corpus events as a lisp-style list.
DATA_FROM_SEGMENTATION_FILEPATH (type=string/None, default=None) This file lists all of the extra data of selected events during concatenation. This data is taken from corpus segmentation files, and includes everything after the startTime and endTime of each segment. This is useful if you want to tag each corpus segment with text based information for use later.
DICT_OUTPUT_FILEPATH (type=string/None, default=‘output/output.json’) a textfile containing selected corpus events in json format.
MAXMSP_OUTPUT_FILEPATH (type=string/None, default=‘output/output.maxmsp.json’) a textfile containing a list of selected corpus events. Data includes starttime in MS, duration in MS, filename, transposition, amplitude, etc.
For each of the following _FILEPATH variables, a value of None tells the agConcatenate.py NOT to create an output file. Otherwise a string tells agConcatenate.py to create this output file and also indicates the path of the file to create. Strings may be absolute paths. If a relative path is given, AudioGuide will create the file relative to the location o the agConcatenate.py script.
HTML_LOG_FILEPATH (type=string/None, default=‘output/log.html’) a log file with lots of information from the concatenation algorithm.
TARGET_SEGMENT_LABELS_FILEPATH (type=string/None, default=‘output/targetlabels.txt’) Audacity-style labels showing how the target sound was segmented.
TARGET_SEGMENTATION_GRAPH_FILEPATH (type=string/None, default=None) like TARGET_SEGMENT_LABELS_FILEPATH, this variable creates a file to show information about target segmentation. Here however, the output is a jpg graph of the onset and offset times and the target’s power. This output requires you to install python’s module matplotlib.
TARGET_DESCRIPTORS_FILEPATH (type=string/None, default=None) saves the loaded target descriptors to a json dictionary.
TARGET_PLOT_DESCRIPTORS_FILEPATH (type=string/None, default=None) creates a plot of each target descriptor used in concatenation. Doesn’t create plots for averaged descriptors (“-seg”), only time varying descriptors.
SEARCH_PATHS (type=list, default=[]) a list of strings, each of which is a path to a directory where soundfile are located. These paths extend the list of search paths that AudioGuide examines when searching for target and corpus soundfiles. The default is an empty list, which doesn’t do anything.
VERBOSITY (type=int, default=2) affects the amount of information AudioGuide prints to the terminal. A value of 0 yields nothing. A value of 1 prints a minimal amount of information. A value of 2 (the default) prints refreshing progress bars to indicate the progress of the algorithms.
PRINT_SELECTION_HISTO (type=bool, default=False) if True will print robust information about corpus selection after concatenation. If false (the default) will add this information to the log file, if used.
PRINT_SIM_SELECTION_HISTO (type=bool, default=False) if True will print robust information about corpus overlapping selection after concatenation. If false (the default) will add this information to the log file, if used.
CSOUND_SR (type=int, default=48000) The sample rate used for csound rendering. Csound will interpolate the sample rates of all corpus files to this rate. It will be the sr of csound’s output soundfile.
CSOUND_BITS (type=int, default=16) The bitrate of the csound output soundfile. Valid values are 16, 24, and 32.
CSOUND_KSMPS (type=int, default=128) The ksmps value used for csound rendering. See csound’s documentation for more information.
CSOUND_NORMALIZE (type=bool, default=False) If True normalize csound's output soundfile after rendering.
CSOUND_NORMALIZE_PEAK_DB (type=float, default=-3) If CSOUND_NORMALIZE is True, normalization of this peak dB will take place.
CSOUND_PLAY_RENDERED_FILE (type=bool, default=True) if True, AudioGuide will play the rendered csound file at the command line at the end of the concatenative algorithm.
CSOUND_STRETCH_CORPUS_TO_TARGET_DUR (type=string/None, default=None) Affects the durations of concatenated sound events rendered by csound. By default None doesn’t do anything -- csound plays back each corpus sound according to its duration. “pv” uses a phase vocoder to stretch corpus sounds to match the duration of the corresponding target segment. “transpose” does the same, but using the speed of playback to change duration rather than a phase vocoder. Note that, in this case, any other transposition information generated by the selection algorithm is overwritten.
CSOUND_CHANNEL_RENDER_METHOD (type=string, default=corpusmax) Tells AudioGuide how deal with corpus segments distribution in the output soundfile. By default‘’corpusmax” creates an output soundfile with the same number of channels as the maxmimum of all corpus files (corpus files nmay have different channel counts). The string “stereo” mixes all corpus sounds into a 2-channel soudnfile. The string “oneChannelPerVoice” tells AudioGuide to put selected sounds from each item of the CORPUS list into a separate channel. The number of output channels will therefore equal the length of the CORPUS list variable.
MIDIFILE_TEMPO (type=int/float, default=60) sets the tempo of the midi file output.
agGetSfDescriptors.py is a simple script that exports continuous descriptor values of a soundfile. The output format is a json dictionary. To use it simply call the script with the soundfile as the first argument and the output path as the second argument.
python agGetSfDescriptors.py examples/lachenmann.aiff examples/lachenmann.json
Descriptor Description | AudioGuide Averaged Descriptor | AudioGuide Time Varying Descriptor |
---|---|---|
duration | dur-seg | |
effective duration | effDur-seg1 | |
peak amplitude in frames | peakTime-seg | |
log attack time in seconds | logAttackTime-seg | |
time in file in percent | percentInFile-seg | |
midi pitch from filename | MIDIPitch-seg2 | |
amplitude | power-seg, power-mean-seg3 | power |
spectral centroid | centroid-seg | centroid |
combination of 4 spectral crest descriptors | crests-seg | crests |
spectral crest coefficient 1 ... 4 | crest0-seg ... crest3-seg | crest0 ... crest3 |
spectral decrease | decrease-seg | decrease |
combination of 4 spectral flatness descriptors | flatnesses-seg | flatnesses |
spectral flatness coefficient 1 ... 4 | flatness0-seg ... flatness3-seg | flatness0 ... flatness3 |
spectral kurtosis | kurtosis-seg | kurtosis |
spectral noisiness | noisiness-seg | noisiness |
spectral rolloff | rolloff-seg | rolloff |
spectral sharpness | sharpness-seg | sharpness |
spectral skewness | skewness-seg | skewness |
spectral slope | slope-seg | slope |
spectral spread | spread-seg | spread |
spectral variation | variation-seg | variation |
fundamental frequency | f0-seg4 | f0 |
zero crossings | zeroCross-seg | zeroCross |
combination of 12 mel frequency cepstral coefficients | mfccs-seg | mfccs |
mel frequency cepstral coefficient 1 ... 12 | mfcc1-seg ... mfcc12-seg | mfcc1 ... mfcc12 |
combination of 12 chroma descriptors | chromas-seg | chromas |
chroma 1 ... 12 | chroma0-seg ... chroma11-seg | chroma0 ... chroma11 |
harmonic spectral centroid | harmoniccentroid-seg | harmoniccentroid |
harmonic spectral decrease | harmonicdecrease-seg | harmonicdecrease |
harmonic spectral deviation | harmonicdeviation-seg | harmonicdeviation |
harmonic spectral kurtosis | harmonickurtosis-seg | harmonickurtosis |
harmonic spectral odd-even ratio | harmonicoddevenratio-seg | harmonicoddevenratio |
harmonic spectral rolloff | harmonicrolloff-seg | harmonicrolloff |
harmonic spectral skewness | harmonicskewness-seg | harmonicskewness |
harmonic spectral slope | harmonicslope-seg | harmonicslope |
harmonic spectral spread | harmonicspread-seg | harmonicspread |
harmonic spectral tristimulus coefficient 1 | harmonictristimulus0-seg | harmonictristimulus0 |
harmonic spectral tristimulus coefficient 2 | harmonictristimulus1-seg | harmonictristimulus1 |
harmonic spectral tristimulus coefficient 3 | harmonictristimulus2-seg | harmonictristimulus2 |
harmonic spectral variation | harmonicvariation-seg | harmonicvariation |
perceptual spectral centroid | perceptualcentroid-seg | perceptualcentroid |
perceptual spectral decrease | perceptualdecrease-seg | perceptualdecrease |
perceptual spectral deviation | perceptualdeviation-seg | perceptualdeviation |
perceptual spectral kurtosis | perceptualkurtosis-seg | perceptualkurtosis |
perceptual spectral odd-even ratio | perceptualoddtoevenratio-seg | perceptualoddtoevenratio |
perceptual spectral rolloff | perceptualrolloff-seg | perceptualrolloff |
perceptual spectral skewness | perceptualskewness-seg | perceptualskewness |
perceptual spectral slope | perceptualslope-seg | perceptualslope |
perceptual spectral spread | perceptualspread-seg | perceptualspread |
perceptual spectral tristimulus coefficient 1 | perceptualtristimulus0-seg | perceptualtristimulus0 |
perceptual spectral tristimulus coefficient 2 | perceptualtristimulus1-seg | perceptualtristimulus1 |
perceptual spectral tristimulus coefficient 3 | perceptualtristimulus2-seg | perceptualtristimulus2 |
perceptual spectral variation | perceptualvariation-seg | perceptualvariation |
Descriptor append string | Effect | Example |
---|---|---|
-delta | First order difference of the time varying series | centroid-delta = the first order difference of time varying centroids |
-delta-seg | Average of the first order difference of the time varying series | centroid-delta-seg = a power-weighted average of the first order difference of time varying centroids |
-deltadelta | Second order difference of the time varying series | power-deltadelta = the second order difference of time varying powers |
-deltadelta-seg | Average of the second order difference of the time varying series | zeroCross-deltadelta-seg = a power-weighted average of the second order difference of time varying zero crossings |
-slope-seg | Slope regression of the timevarying series | power-slope-seg = a linear regression descripbing the slope of the time varying series of powers |
SEARCH = [
spass('closest_percent', d('power-slope-seg'), percent=20), # take the best 20% of matches from the corpus
spass('closest', d('f0-delta-seg'), d('centroid-deltadelta')), # now find the best matching segment from the 20 percent that remains.
]