pyatsyn

ats_io

ATS File I/O

Functions for handling loading and saving of .ats files.

.ats files are written in binary format using double floats.

The expected structure of a .ats is:

ATS Header (all double floats)
`ATS_MAGIC_NUMBER`
sampling-rate (samples/sec)
frame-size (samples)
window-size (samples)
partials (number)
frames (number)
ampmax (max. amplitude)
frqmax (max. frequency)
dur (duration)
type (# specifying frame type, see below)

The frame data immediately follows the header, again all double floats, frame by frame, with a format matching the type (int) as specified in the header

ATS frames can be one of four different types:

TYPE 1: NO phase & NO noise	TYPE 2: WITH phase & NO noise	TYPE 3: NO phase & WITH noise	TYPE 4: WITH phase & WITH noise
time (frame starting time)	time (frame starting time)	time (frame starting time)	time (frame starting time)
amp (partial #0 amplitude)	amp (partial #0 amplitude)	amp (partial #0 amplitude)	amp (partial #0 amplitude)
frq (partial #0 frequency)	frq (partial #0 frequency)	frq (partial #0 frequency)	frq (partial #0 frequency)
…	pha (partial #0 phase)	…	pha (partial #0 phase)
amp (partial #n amplitude)	…	amp (partial #n amplitude)	…
frq (partial #n frequency)	amp (partial #n amplitude)	frq (partial #n frequency)	amp (partial #n amplitude)
	frq (partial #n frequency)	noise (band #0 energy)	frq (partial #n frequency)
	pha (partial #n phase)	…	pha (partial #n phase)
		noise (band #n energy)	noise (band #0 energy)
			…
			noise (band #n energy)

pyatsyn.ats_io.ATS_MAGIC_NUMBER

‘magic’ number used to validate and check endianness when using .ats files: 123.0

Type: float

pyatsyn.ats_io.ats_info(file, partials_info=False)[source]

Function to print information about a .ats to the stdout

Parameters

file (str) – an .ats file to print information about
partials_info (bool, optional) – whether to include frq and amp averages about each partial in the output (default: False)

pyatsyn.ats_io.ats_info_CLI()[source]

Command line wrapper for ats_info

Example

Display usage details with help flag

$ pyatsyn-info -h

Print the header information of a .ats file

$ pyatsyn-info example.ats

Print the header information and partials information of a .ats file

$ pyatsyn-info example.ats -p

pyatsyn.ats_io.ats_load(file, optimize=False, min_gap_size=None, min_segment_length=None, amp_threshold=None, highest_frequency=None, lowest_frequency=None)[source]

Function to load a .ats file into python

Loads a .ats file into python and provides routines to re-optimize the AtsSound data if required.

Parameters

file (str) – filepath to .ats file to load
optimize (bool, optional) – determined whether to call optimize upon load (default: True)
min_gap_size (int, optional) – when optimizing, tracked partial gaps smaller than or equal to this (in frames) will be interpolated and filled. If None, no gap filling will occur (default: None)
amp_threshold (float, optional) – minimum amplitude threshold used during optimization to prune tracks. If None, no amplitude thresholding will occur (default: None)
highest_frequency (float, optional) – maximum frequency threshold used during optimization to prune tracks. If None, no maximum frequency thresholding will occur (default: None)
lowest_frequency (float, optional) – minimum frequency threshold used during optimization to prune tracks. If None, no minimum frequency thresholding will occur (default: None)

Returns

the loaded ats data

Return type

AtsSound

Raises

ValueError – if file is not a compatible ATS format (i.e., the ATS magic number was not decodable)

pyatsyn.ats_io.ats_save(sound, file, save_phase=True, save_noise=True)[source]

Function to save an AtsSound to a file

Parameters

sound (AtsSound) – ats sound object to save
file (str) – file path to save to
save_phase (bool, optional) – whether to include phase data in file output (default: True)
save_noise (bool, optional) – whether to include noise band energy data in file output (default: True)

ats_structure

Data Abstraction for ATS

class pyatsyn.ats_structure.AtsPeak(amp=0.0, frq=0.0, pha=0.0, smr=0.0, track=0, db_spl=0.0, barkfrq=0.0, slope_r=0.0, asleep_for=None, duration=1)[source]

Bases: object

Data abstraction for storing single peak, single timepoint data for peak tracking

Used primarily as a data-store during the peak tracking phase of analysis.

amp

the amplitude of the peak

Type: float

frq

the frequency (in Hz) of the peak

Type: float

pha

the phase (in radians) of the peak

Type: float

smr

the signal-to-mask ratio (in dB SPL) of the peak

Type: float

track

the corresponding tracked partial the peak is assigned to

Type: int

db_spl

peak amplitude in dB SPL (used during SMR evaluation)

Type: float

bark_frq

frequency in bark scale (used during SMR evaluation)

Type: float

slope_r

right slope of masking curve (used during SMR evaluation)

Type: float

asleep_for

sleep counter (in frames) (used during peak tracking)

Type: int

duration

active counter (in frames) (used during peak tracking)

Type: float

frq_max

maximum frequency (used in track data during optimization)

Type: float

amp_max

maximum amplitude (used in track data during optimization)

Type: float

frq_min

minimum frequency (used in track data during optimization)

Type: float

clone()[source]

Function to return a copy of an AtsSound

Returns: a copy of the calling AtsPeak object
Return type: AtsPeak

class pyatsyn.ats_structure.AtsSound(sampling_rate, frame_size, window_size, partials, frames, dur, has_phase=True)[source]

Bases: object

Main data abstraction for ATS

Parameters

sampling_rate (int) – sampling rate (samples/sec)
frame_size (int) – interframe distance (in samples)
window_size (int) – size (in samples) of the FFT window used to analyze the sound
partials (int) – number of partials/tracks stored
frames (int) – number of frames of analysis
dur (float) – duration (in s) of the sound
has_phase (bool, optional) – whether to initial phase information data structure (default: True)

sampling_rate

sampling rate (samples/sec)

Type: int

frame_size

interframe distance (in samples)

Type: int

window_size

size (in samples) of the FFT window used to analyze the sound

Type: int

partials

number of partials/tracks stored

Type: int

frames

number of frames of analysis

Type: int

dur

duration (in s) of the sound

Type: float

optimized

whether the object has been through optimization yet

Type: bool

amp_max

maximum amplitude of the sound

Type: float

frq_max

maximum frequency (in Hz) of the sound

Type: float

frq_av

1D array of average frequency (in Hz) for each partial

Type: ndarray[float]

amp_av

1D array of average amplitude for each partial

Type: ndarray[float]

time

1D array of the time (in s) corresponding to each frame

Type: float

frq

2D array storing frequency (in Hz) for each partial at each frame

Type: ndarray[float]

amp

2D array storing amplitude for each partial at each frame

Type: ndarray[float]

pha

2D array storing phase (in radians) for each partial at each frame. None if no phase information is stored.

Type: ndarray[float]

energy

2D array for storing noise band energy into each partials at each frame. NOTE: Currently only implemented for legacy purposes. Empty list if no noise information is stored.

Type: ndarray[float]

band_energy

2D array of noise band energies for each band at each frame. Empty list if no noise information is stored.

Type: ndarray[float]

bands

1D array of unique indices to label each noise band. Empty list if no noise information is stored.

Type: ndarray[int]

clone()[source]

Function to return a deep copy of an AtsSound

Returns: a deep copy of the calling AtsSound object
Return type: AtsSound

optimize(min_gap_size=None, min_segment_length=None, amp_threshold=None, highest_frequency=None, lowest_frequency=None)[source]

Function to run optimization routines on the frames of partial data stored in the object.

The optimizations performed are:

fill gaps of min_gap_size or shorter
trim short partials
calculate and store maximum and average frq and amp
prune partials below amplitude threshold
prune partials outside frequency constraints
re-order partials according to average frq

Parameters

min_gap_size (int, optional) – partial gaps longer than this (in frames) will not be interpolated and filled in. If None, this sub-optimization will be skipped. (default: None)
min_segment_length (int, optional) – minimal size (in frames) of a valid partial segment, otherwise it is pruned. If None, this sub-optimization will be skipped. (default: None)
amp_threshold (float, optional) – amplitude threshold used to prune partials. If None, this sub-optimization will be skipped. (default: None)
highest_frequency (float) – upper frequency threshold, tracks with maxima above this will be pruned. If None, this sub-optimization will be skipped. (default: None)
lowest_frequency (float) – lower frequency threshold, tracks with minima below this will be pruned. If None, this sub-optimization will be skipped. (default: None)

ats_synth

Synthesizer Methods for Rendering .ats Files to Audio

pyatsyn.ats_synth.synth(ats_snd, normalize=False, compute_phase=True, export_file=None, sine_pct=1.0, noise_pct=0.0, noise_bands=None, normalize_sine=False, normalize_noise=False)[source]

Function to synthesize audio from AtsSound

Sine generator bank and band-limited noise synthesizer for .ats files. When phase information is ignored phase is linearly interpolated between consecutive frequencies from an initial phase of 0.0 at the first non-zero amplitude for that partial.

The method for cubic polynomial interpolation of phase used is credited to:

MR. McAulay and T. Quatieri, “Speech analysis/Synthesis based on a sinusoidal representation,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 4, pp. 744-754, August 1986

doi: 10.1109/TASSP.1986.1164910.

Parameters

ats_snd (AtsSound) – the .ats file used to synthesize
normalize (bool, optional) – normalize sound to ±1 before output (default: False)
compute_phase (bool, optional) – use cubic polynomial interpolation of phase information during synthesis, if available (default: True)
export_file (str) – audio file path to write synthesis to, or None for no file output (default: None)
sine_pct (float) – percentage of sine components to mix into output (default: 1.0)
noise_pct (float) – percentage of noise components to mix into output (default: 0.0)
noise_bands (ndarray[float]) – 1D array of band edges to use for noise analysis. Currently using other than 25 bands (i.e. 26 edges) is not fully supported. If None, ATS_CRITICAL_BAND_EDGES will be used. (default: None)
normalize_sine (bool) – normalize sine components to ±1 before mixing (default: False)
normalize_noise (bool) – normalize noise componenets to ±1 before mixing (default: False)

Returns

A 1D array of amplitudes representing the synthesized sound

Return type

ndarray[float]

pyatsyn.ats_synth.synth_CLI()[source]

Command line wrapper for synth

Example

Display usage details with help flag

$ pyatsyn-synth -h

Generate a wav file from a sine generator bank from an ats file

$ pyatsyn-synth example.ats example.wav

Generate a wav file from a sine generator bank and band-limited noise using from an ats file

$ pyatsyn-synth example.ats example.wav --noise 1.0

atsa module

critical_bands

Critical Bands and Signal-to-Mask Ratio Evaluation

This module is used to evaluate critical band masking for signal-to-mask ratio calculations

pyatsyn.atsa.critical_bands.ATS_CRITICAL_BAND_EDGES

1D array containing 26 frequencies that distinguish the default 25 critical bands

Type: ndarray[float]

pyatsyn.atsa.critical_bands.compute_slope_r(masker_amp_db, slope_l=-27.0)[source]

Function to compute right slope of triangular mask

Computes the right slope of mask, dependent on the level of the masker

Parameters

masker_amp_db (float) – Amplitude (in dB) of the masker peak
slope_l (float, optional) – slope (in dB / bark) of the lower frequency side of the masking triangle

pyatsyn.atsa.critical_bands.evaluate_smr(peaks, slope_l=-27.0, delta_db=-50)[source]

Function to evaluate signal-to-mask ratio for the given peaks

This function evaluates masking values (SMR) for AtsPeak in list peaks Iteratively the parameters will be use to generate a triangular mask with a primary vertex at the frequency of, and at delta_dB below the amplitude of the masker.

All other peaks are evaluated based on the triangular edges descending from the primary vertex according to slope_l for lower frequencies, and a calculated slope for higher frequencies. Maskee amplitudes proportions above this edge are then assigned to the maskee peak’s smr property. By the end of the iteration, the largest smr seen as maskee is kept in the peak’s smr property.

Parameters

peaks (Iterable[AtsPeak]) – An iterable collection of AtsPeaks that will have their smr attributes updated
slope_l (float, optional) – A float (in dB/bark) to dictate the slope of the left side of the mask (default: -27.0)
delta_db (float, optional) – A float (in dB) that sets the amplitude threshold for the masking curves Must be (<= 0dB) (default: -50)

Raises

ValueError – If delta_db is not less than or equal to 0.

pyatsyn.atsa.critical_bands.find_band(freq)[source]

Function to retrieve lower band edge in ATS_CRITICAL_BAND_EDGES

Parameters: freq (float) – A frequency (in Hz) to find the related band in ATS_CRITICAL_BAND_EDGES for
Returns: index into ATS_CRITICAL_BAND_EDGES that marks the lower band edge for the given freq
Return type: int
Raises: LookupError – if the frequency given is outside the range of the lowest or highest edge in ATS_CRITICAL_BAND_EDGES

pyatsyn.atsa.critical_bands.frq_to_bark(freq)[source]

Function to convert frequency from Hz to bark scale

This function will convert frequency from Hz to bark scale, a psychoacoustical scale used for subjective measurements of loudness.

Parameters: freq (float) – A frequency (in Hz) to convert to bark scale
Returns: the frequency in bark scale
Return type: float

peak_detect

Single-Frame Peak Detection from FFT Data

Functions to process FFT data and extract peaks

pyatsyn.atsa.peak_detect.parabolic_interp(alpha, beta, gamma)[source]

Function to obtain a parabolically modeled maximum from 3 points

Given 3 evenly-spaced points, a parabolic interpolation scheme is used to calculate a coordinate frequency offset and maximum amplitude at the estimated parabolic apex.

Expected: alpha <= beta <= gamma

Parameters

alpha (float) – Amplitude at lower frequency
beta (float) – Amplitude at center frequency
gamma (float) – Amplitude at upper frequency

Returns

offset (float) – Frequency offset (in samples) relative to center frequency bin
height (float) – Amplitude of estimated parabolic apex

pyatsyn.atsa.peak_detect.peak_detection(fftfreqs, fftmags, fftphases, lowest_bin=None, highest_bin=None, lowest_magnitude=None)[source]

Function to detect peaks from FFT data

This function scans for peaks in FFT frequency data, returning found peaks that pass constraint criteria. Because FFT data is restricted to discrete bins, interpolation is used to provide a more precise estimation of amplitude, phase, and frequency.

Parameters

fftfreqs (ndarray[float64]) – A 1D array of frequency labels (in Hz) corresponding to fftmags and fftphases
fftmags (ndarray[float64]) – A 1D array of FFT magnitudes for each frequency in fftfreqs; this is the data where we search for the peaks.
fftphases (ndarray[float64]) – A 1D array of FFT phases (in radians) for each index in fftfreqs and fftmags
lowest_bin (int, optional) – Lower limit bin index used to restrict what bins of fftfreqs are searched (default: None)
highest_bin (int, optional) – Upper limit bin index used to restrict what bins of fftfreqs are searched (default: None)
lowest_magnitude (float, optional) – Minimum amplitude threshold that must be exceeded for a peak to validly detected (default: None)

Returns

A list of AtsPeak constructed from detected peaks

Return type

list[AtsPeak]

pyatsyn.atsa.peak_detect.phase_correct(left, right, offset)[source]

Function for angular interpolation of phase

Parameters

left (float) – Phase value (in radians) to interpolate between
right (float) – Other phase value (in radians) to interpolate between
offset (float) – Phase offset (in samples) between left and right at which to calculate

Returns

interpolated phase (in radians)

Return type

float

peak_tracking

Peak Tracking algorithms to assemble spectral trajectories

Peaks issued by the peak detection algorithm need to be connected and translated into spectral trajectories. This process involves the evaluation of the possible candidates to continue trajectories on a frame-by-frame basis.

This is done using tracks that keep information of recent average values for each of the trajectory parameters. The length of the tracks is adjustable and has to be tuned depending on the characteristics of the analyzed sound.

A Gale-Shapley stable matching algorithm is used to determine the best candidate pair using a the cost criteria:

\(cost = \frac{|P_{freq} - T_{freq}| + \alpha * |P_{smr} - T_{smr}|}{1 + \alpha}\)

where \(P_{freq}\) is the candidate peak frequency, and \(P_{smr}\) its SMR, \(T_{freq}\) is the track frequency, and \(T_{smr}\) its SMR, both averaged over the track length (typically 3 frames). \(\alpha\) is a coefficient controlling how much the SMR deviation affects the cost.

The use of the SMR continuation as a parameter for the peak tracking process is based upon psychoacoustic temporal masking phenomena. Conceptually, we assume that masking profiles of stable sinusoidal trajectories can only evolve at slow rate (no sudden changes). This is true for analysis performed with hop sizes between 10 and 50 milliseconds, which is comparable to the average duration of pre- and post-making effects.

New tracks get created from orphan peaks (the ones that were not incorporated to any existing tracks), and tracks which couldn’t find continuing peaks are set to sleep.

class pyatsyn.atsa.peak_tracking.MatchCost(cost, index)[source]

Bases: object

Object to abstract cost for comparisons

cost

the calculated cost to index

Type: float

index

the index that indicates the track the cost was calculated against

Type: int

pyatsyn.atsa.peak_tracking.are_valid_candidates(candidate1, candidate2, deviation)[source]

Function to determine if the distance between two peaks are within the relative deviation constraint

Peaks are valid candidates for pair if their absolute distance is smaller than the frequency deviation multiplied by the lower of the candidate’s frequencies.

Parameters

candidate1 (AtsPeak) – a candidate peak
candidate2 (AtsPeak) – a candidate peak
deviation (float) – relative frequency deviation

Returns

True if the candidates are within constrained range, False otherwise.

Return type

bool

pyatsyn.atsa.peak_tracking.find_track_in_peaks(track, peaks)[source]

Function to search a the first peak found tagged a given track ind

Parameters

track (int) – the track index to search for
peaks (Iterable[AtsPeak]) – a collection of AtsPeak to search in

Returns

the first AtsPeak found in peaks that has a .track attribute matching track. If no matches are found, None is returned.

Return type

AtsPeak

pyatsyn.atsa.peak_tracking.peak_dist(pk1, pk2, alpha)[source]

Function to calculate peak frequency distance

This function is used to calculate the cost for the peak matching algorithm and allows for psychoacoustic biasing of the calculation:

\(dist = \frac{|P1_{freq} - P2_{freq}| + \alpha * |P1_{smr} - P2_{smr}|}{1 + \alpha}\)

where \(P\#_{freq}\) is the peak’s frequency, and \(P\#_{smr}\) its SMR. \(\alpha\) is a coefficient controlling how much the SMR deviation affects the distance.

Parameters

pk1 (AtsPeak) – a candidate peak
pk1 – a candidate peak
alpha (float) – percent of SMR to use to bias the result

Returns

the frequency distance (in Hz) between the peaks

Return type

float

pyatsyn.atsa.peak_tracking.peak_tracking(tracks, peaks, frame_n, analysis_frames, sample_rate, hop_size, frequency_deviation=0.45, SMR_continuity=0.0, min_gap_length=1)[source]

Core function to coordinate peak tracking

This function coordinates the matching of new peaks with existing tracks using an adaptation of the Gale-Shapley algorithm for stable matching. The algorithm is gap-size aware and will monitor ‘slept’ tracks within the minimum gap distance as candidates. Linear interpolation is used to fill the gaps for frequency and amplitude, and a cubic polynomial interpolation for phase.

NOTE: Tracks, peaks, and analysis_frames are updated directly.

Parameters

tracks (Iterable[AtsPeak]) – collection of established tracks
peaks (Iterable[AtsPeak]) – collection of candidate peaks to match
frame_n (int) – the current frame
analysis_frames (Iterable[Iterable[AtsPeak]]) – a running collection storing the AtsPeak objects at each frame in time
sample_rate (int) – the sampling rate (in samples / s)
hop_size (int) – the inter-frame distance (in samples)
frequency_deviation (float, optional) – maximum relative frequency deviation used to constrain peak tracking matches (default: 0.45)
SMR_continuity (float, optional) – percentage of SMR to use in cost calculations during peak tracking (default: 0.0)
min_gap_length (int) – tracked partial gaps longer than this (in frames) will not be interpolated (default: 1)

pyatsyn.atsa.peak_tracking.phase_interp(freq_0, freq_t, pha_0, t)[source]

Function to compute linear phase interpolation

NOTE: currently not used in peak tracking, but supplied for legacy purposes

Assumes smooth linear interpolation, where the average frequency dictates phase rate estimate from the relative time 0 to time t.

Parameters

freq_0 (float) – initial frequency (in Hz)
freq_t (float) – frequency at time t (in Hz)
pha_0 (float) – initial phase (in radians)
t (float) – time (in s) from freq_0

Returns

the phase (in radians) at relative time t

Return type

float

pyatsyn.atsa.peak_tracking.phase_interp_cubic(freq_0, freq_t, pha_0, pha_t, i_samps_from_0, samps_from_0_to_t, sample_rate)[source]

Function to interpolate phase using cubic polynomial interpolation

Uses cubic interpolation to determine and intermediate phase within the curve linking a particular frequency and phase at relative time 0, to a frequency and phase at time t.

The basis for this method is credited to:

MR. McAulay and T. Quatieri, “Speech analysis/Synthesis based on a sinusoidal representation,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 4, pp. 744-754, August 1986

doi: 10.1109/TASSP.1986.1164910.

Parameters

freq_0 (float) – initial frequency (in Hz)
freq_t (float) – frequency at time t (in Hz)
pha_0 (float) – initial phase (in radians)
pha_t (float) – phase at time t (in radians)
i_samps_from_0 (int) – relative sample index i to interpolate at
samps_from_0_to_t (int) – distance (in samples) from 0 to t
sample_rate (int) – sampling rate (in samps/s)

Returns

the modeled phase (in radians) at sample i

Return type

float

pyatsyn.atsa.peak_tracking.update_track_averages(tracks, track_length, frame_n, analysis_frames, beta=0.0)[source]

Function to update running averages of recent peaks

Using the list of current tracks, we use track_length frames to look back and update, the average amp, frq, and smr values for the tracks.

NOTE: Tracks are updated directly without return value.

Parameters

tracks (Iterable[AtsPeak]) – iterable of tracks to update
track_length (int) – how far back in time (in frames) to start average calculations
frame_n (int) – the current frame
analysis_frames (Iterable[Iterable[AtsPeak]]) – a running collection storing the AtsPeak objects at each frame in time
beta (float, optional) – TOadditional bias for the immediately prior frames values when calculating smoothing trajectories (default: 0.0)

residual

Functions to Compute and Analyze Residual Signals

The reisidual signal is computed by taking the time-domain difference between the orignal sound and the sinusoidal synthesis of the spectral trajectories. NOTE: this section is under active research. Currently, this noise signal is analyzed using the STFT to obtain time-varying energy at 25 critical bands (see ATS_CRITICAL_BAND_EDGES)

pyatsyn.atsa.residual.band_to_energy(ats_snd, band_edges, use_smr=False)[source]

Function to transfer band energies into partials

NOTE: Currently not fully supported. Included for legacy purposes.

Parameters

ats_snd (AtsSound) – the ats object containing band energies
band_edges (ndarray[float]) – 1D array of band edge frequencies (in Hz)
use_smr (bool, optional) – whether to use smr instead of amplitude for scaling energy across partials (default: False)

pyatsyn.atsa.residual.compute_residual(ats_snd, in_sound, start_sample, end_sample, residual_file=None)[source]

Function to computes the time domain difference between the sinusoidal synthesis of spectral trajectories in an ats_snd, and the original sound data

Parameters

ats_snd (AtsSound) – the input ats object to compute the residual for
in_sound (ndarray[float]) – the original sound signal from which to extract the residual
start_sample (int) – sample in in_sound where the ats_snd begins
end_sample (int) – sample in in_sound where the ats_snd ends
residual_file (str, optional) – path to audio file to output residual signal to. None if no file output. (Default: None)

Returns

residual – a 1D array of floats containing the amplitudes of the computed residual in the time domain

Return type

ndarray[float]

pyatsyn.atsa.residual.remove_bands(ats_snd, threshold)[source]

Function to remove bands and band_energies below a threshold

NOTE: ats_snd is updated directly

Parameters

ats_snd (AtsSound) – the ats object storing band energies to threshold
threshold (float) – energy threshold used to prune band energies and bands

pyatsyn.atsa.residual.residual_N(M, min_fft_size, factor=2)[source]

Function to compute an FFT window size for residual analysis

Parameters

M (int) – target window size
min_fft_size (int) – restricts the minimum size of the FFT window
factor (int, optional) – multiplicative window padding relative to M for calculating FFT window size(default: 2)

Returns

power-of-2 window size

Return type

int

pyatsyn.atsa.residual.residual_analysis(residual, ats_snd, min_fft_size=4096, equalize=False, pad_factor=2, band_edges=None, par_energy=False, verbose=False)[source]

Function to compute noise energy in a residual signal across 25 critical bands

Noise energy in each critical band is evaluated in the following way:

\(E[i] = \frac{1}{K} \sum^{k_{i0} + K - 1}_{k= k_{i0}} |X(k)|^2\)

where \(i\) is the band number (0 to 24), \(K\) is the number of bins of the STFT where the band \(i\) has frequency information. \(k_{i0}\) is the lowest STFT bin where band \(i\) has information, and \(X\) is the amplitude data for a given bin \(k\).

The algorithm evaluates the noise energy at each step of the Bark scale.

Parameters

residual (ndarray[float]) – a 1D array of floats containing the amplitudes of the residual signal in the time domain
ats_snd (AtsSound) – the input ats object to store the residual analysis in
min_fft_size (int, optional) – restricts the minimum size of the FFT window (default: 4096)
equalize (bool, optional) – equalize noise energy in the frequency domain to the time domain energy using Parseval’s Theorem (default: False)
pad_factor (int, optional) – multiplicative window padding relative to ats_snd.window_size for calculating FFT window size (default: 2)
band_edges (ndarray[float]) – 1D array containing 26 frequencies that distinguish the default 25 critical bands. If None, will use ATS_CRITICAL_BAND_EDGES (default: None)
par_energy (bool) – whether to transfer noise energy to partials. NOTE: currently not fully supported; only for legacy support (default: False)
verbose (bool, optional) – increase verbosity (default: False)

pyatsyn.atsa.residual.residual_compute_band_energy(fft_mags, band_limits, band_energy, frame_n)[source]

Function to compute the band energy

Energy in each band is evaluated in the following way:

\(E[i] = \frac{1}{K} \sum^{k_{i0} + K - 1}_{k= k_{i0}} |X(k)|^2\)

where \(i\) is the band number (0 to 24), \(K\) is the number of bins of the STFT where the band \(i\) has frequency information. \(k_{i0}\) is the lowest STFT bin where band \(i\) has information, and \(X\) is the amplitude data for a given bin \(k\).

NOTE: band_energy is updated directly

Parameters

fft_mags (ndarray[float]) – 1D array of frequency domain amplitudes
band_limits (ndarray[int]) – 1D array of bin indicies mapping band edge frequencies to bins in FFT frequency domain
band_energy (ndarray[float]) – 2D array to store band energies for each band at each frame
frame_n (int) – the current frame

pyatsyn.atsa.residual.residual_get_band_limits(fft_mag, band_edges)[source]

Function to convert band edges to FFT bin indices

Parameters

fft_mag (float) – FFT magic number - sampling rate / FFT window size
band_edges (ndarray[float]) – 1D array of band edge frequencies (in Hz)

Returns

band_limits – 1D array of bin indicies mapping band edge frequencies to bins in FFT frequency domain

Return type

ndarray[int]

tracker

Main ATS Analysis Function

The analysis tracker is responsible for driving the analysis of an audio file into the .ats format. The system uses a Short Time Fourier Transform (STFT) as is core analysis tool. Sound is analyzed using overlapping time windows and by taking the STFT on each window.

After converting to polar coordinates, a peak detection algorithm (peak_detection) determines relevant spectral peaks in the data. At this point, pyschoacoustics are considered in the form of masking curve evaluation and computation of the Signal-to-Mask ratio (SMR) for each candidate peak. SMR data is store together with a corrected frequency, magnitude and phase.

The next step involves frame-to-frame tracking of peaks (peak_tracking) to connect peaks that follow a similar spectral trajectory using both frequency and SMR data. The system uses a stable matching algorithm to pair candidate peaks, and is capable of interpolating gaps in the tracks.

Once valid tracks are assembled, the results can be modeled with sinusoids (atsa_synth) and subtracted from the origin source sound to compute a residual. NOTE: This part of the ATS system is currently under active research. For now, the residual analysis (residual) is modeled using a 25 time-varying critical noise band energy model (consistent with the critical bands used during SMR evaluation). These noise bands can then be resynthesized using 25 correspoding banks of time-enveloped, band-limited noise.

Analysis is finally stored and abstracted as an AtsSound object.

pyatsyn.atsa.tracker.tracker(in_file, start=0.0, duration=None, lowest_frequency=20, highest_frequency=20000.0, frequency_deviation=0.1, window_cycles=4, window_type='blackman-harris-4-1', hop_size=0.25, fft_size=None, amp_threshold=0.001, track_length=3, min_gap_length=3, min_segment_length=3, last_peak_contribution=0.0, SMR_continuity=0.0, residual_file=None, optimize=True, optimize_amp_threshold=None, force_M=None, force_window=None, window_alpha=0.5, window_beta=1.0, verbose=False)[source]

Function to generates an Analysis-Transformation-Synthesis AtsSound from an audio file

Parameters

in_file (str) – path to the audio file to analyze (must be single channel/mono)
start (float) – timepoint (in s) in audiofile to begin analysis (default: 0.0)
duration (float) – max duration (in s) in audiofile from start to end analysis or ‘None’ if analyze to end (default: None)
lowest_frequency – lowest frequency to analyze (must be > 0) (default: 20)
highest_frequency (float) – highest frequency to analyze (capped to nyquist frequency and must be greater than lowest_frequency) (default: 20000.0)
frequency_deviation (float) – maximum relative frequency deviation used to constrain peak tracking matches (default: 0.1)
window_cycles (int) – lowest frequency to fit in analysis window; used to determine window size (default: 4)
window_type (str) – type of window to use for FFT analysis (default: ‘blackman-harris-4-1’). See VALID_FFT_WINDOW_DEFINITIONS
hop_size (float) – fraction of window size to shift from frame-to-frame (default: 0.25)
fft_size (int) – None, or force an fft size (default: None)
amp_threshold (float) – lowest amplitude used for peak detection (default: 0.001)
track_length (int) – number of frames used to smooth frequency trajectories (default: 3)
min_gap_length (int) – tracked partial gaps longer than this (in frames) will not be interpolated (default: 3)
min_segment_length (int) – minimal size (in frames) of a track segment, otherwise it is pruned (default: 3)
last_peak_contribution (float) – additional bias for the immediately prior frames values when calculating smoothing trajectories (default: 0.0)
SMR_continuity (float) – percentage of SMR to use in cost calculations during peak tracking (default: 0.0)
residual_file (str) – path to the audio file used to store residual analysis. NOTE: noise calculation will not be performed in .ats file without this (default: None)
optimize (bool) – whether to perform the post-peak tracking optimization on the AtsSound object (default: True)
optimize_amp_threshold (float) – additional amplitude threshold used during optimization to prune tracks (default: None)
force_M (int) – None, or a forced window length in samples (default: None)
force_window (ndarray[float]) – None, or a 1D array describing a windowing curve (default: None)
window_alpha (float) – parameter used for calculating tukey windows (default: 0.5)
window_beta (float) – parameter used for calculating certain window types (default: 1.0)
verbose (bool) – increase verbosity (default: False)

Returns

the ats object that represents the analysis of the input audio file

Return type

AtsSound

Raises

ValueError – if input file is not single channel/mono
ValueError – if lowest_frequency is < 0.0
ValueError – if highest_frequency is < lowest_frequency

pyatsyn.atsa.tracker.tracker_CLI()[source]

Command line wrapper for tracker

Example

Display usage details with help flag

$ pyatsyn-atsa -h

Analyze a wav file

$ pyatsyn-atsa example.wav example.ats

Analyze a wav file and compute the residual and increase verbosity

$ pyatsyn-atsa example.wav example.ats -v -r example-residual.wav

utils

Utility Functions for ATS Analysis

pyatsyn.atsa.utils.MAX_DB_SPL

maximum DB_SPL level; used for converting amplitude units

Type: float

pyatsyn.atsa.utils.ATS_MIN_SEGMENT_LENGTH

default minimum segment length

Type: int

pyatsyn.atsa.utils.ATS_AMP_THRESHOLD

default amp threshold

Type: float

pyatsyn.atsa.utils.ATS_NOISE_THRESHOLD

default noise threshold

Type: float

pyatsyn.atsa.utils.amp_to_db(amp)[source]

Function to convert amplitude to decibels: \(20 * \log_{10}{amp}\)

Parameters: amp (float) – an amplitude value
Returns: the converted decibel value
Return type: float

pyatsyn.atsa.utils.amp_to_db_spl(amp)[source]

Function to convert amplitude to decibel sound pressure level (dB SPL)

Parameters: amp (float) – an amplitude value
Returns: the converted dB SPL value
Return type: float

pyatsyn.atsa.utils.compute_frames(total_samps, hop)[source]

Function to compute the number frames to use in the specified analysis.

Calculates an extra frame to prevent attenuation during windowing at the tail and to allow for interpolation at the end of the soundfile.

Parameters

total_samps (int) – number of samples in analyzed sound duration
hop (int) – interframe distance in samples

Returns

number of frames to use for STFT analysis

Return type

int

pyatsyn.atsa.utils.db_to_amp(db)[source]

Function to convert decibels to amplitude: \(10^{dB / 20.0}\)

Parameters: db (float) – a decibel value
Returns: the converted amplitude value
Return type: float

pyatsyn.atsa.utils.next_power_of_2(num)[source]

Function to return the closest power of 2 integer more than or equal to an input

Parameters: num (int) – a positive integer
Returns: the closest power of 2 integer more than or equal to num
Return type: int

pyatsyn.atsa.utils.optimize_tracks(tracks, analysis_frames, min_segment_length, amp_threshold, highest_frequency, lowest_frequency)[source]

Function to run optimization routines on the established tracks.

The optimizations performed are:

trim short partials
calculate and store maximum and average frq and amp
prune tracks below amplitude threshold
prune tracks outside frequency constraints
sort and renumber tracks and peaks in analysis_frames according to average frq

NOTE: directly updates analysis_frames, pruning peaks corresponding to pruned tracks.

Parameters

tracks (Iterable[AtsSound]) – collection of established tracks
analysis_frames (Iterable[Iterable[AtsPeak]]) – a collection storing the AtsPeak objects at each frame in time
min_segment_length (int) – minimal size (in frames) of a valid track segment, otherwise it is pruned
amp_threshold (float) – amplitude threshold used to prune tracks. If None, will default to ATS_AMP_THRESHOLD converted to amplitude.
highest_frequency (float) – upper frequency threshold, tracks with maxima above this will be pruned
lowest_frequency (float) – lower frequency threshold, tracks with minima below this will be pruned

Returns

tracks – the optimized subset of input tracks

Return type

Iterable[pyatsyn.ats_structure.AtsPeak]

windows

Functions to Generate FFT Windows

A collection of window utilies to generate several useful window types:

blackman-exact	kaiser	cauchy
blackman	gaussian	connes
blackman-harris-3-1	poisson	exponential
blackman-harris-3-2	cauchy	bartlett
blackman-harris-4-1	connes	riemann
blackman-harris-4-2	welch	tukey
rectangular	kaiser	hamming
parzen	gaussian	hann
welch	poisson	hann-poisson

Most equations are adapted from the following two papers:

F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” in Proceedings of the IEEE, vol. 66, no. 1, pp. 51-83, Jan. 1978.

doi: 10.1109/PROC.1978.10837.

A. Nuttall, “Some windows with very good sidelobe behavior,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 1, pp. 84-91, February 1981

doi: 10.1109/TASSP.1981.1163506.

pyatsyn.atsa.windows.VALID_FFT_WINDOW_DEFINITIONS

a list of supported window types

Type: list[str]

pyatsyn.atsa.windows.ATS_BLACKMAN_WINDOW_COEFF_LABELS

A dictionary to match blackman window type strings to their coefficients

Type: dict[str : list[float]]

pyatsyn.atsa.windows.bes_i0(x)[source]

Modified Bessel Function of the First Kind from “Numerical Recipes in C”

Parameters: x (float) – Bessel function input
Returns: Bessel function output
Return type: float

pyatsyn.atsa.windows.make_blackman_window(window_type, size)[source]

Helper function to build Blackman windows

Parameters

window_type (str) – the type of blackman window (supported types are defined in ATS_BLACKMAN_WINDOW_COEFF_LABELS)
size (int) – the size of the window to generate

Returns

a 1D array of floats representing the window

Return type

ndarray[float]

pyatsyn.atsa.windows.make_fft_window(window_type, size, beta=1.0, alpha=0.5)[source]

Function to build the specified window

Parameters

window_type (str) – the type of window (supported types are defined in VALID_FFT_WINDOW_DEFINITIONS)
size (int) – the size of the window to generate
beta (float, optional) – parameter used in certain window calculations (float: 1.0)
alpha (float, optional) – parameter used in tukey window calculation (float: 0.5)

Returns

a 1D array of floats representing the window

Return type

ndarray[float]

Raises

ValueError – if window_type is not one of the supported window types in VALID_FFT_WINDOW_DEFINITIONS

pyatsyn.atsa.windows.normalize_window(window)[source]

Function to normalize a window

Normalization here means that the window will integrate to 1.0 (i.e., total area of 1)

Parameters: window (ndarray[float]) – the window to normalize
Returns: a normalized version of the input window
Return type: ndarray[float]

pyatsyn.atsa.windows.window_norm(window)[source]

Function to compute the norm of a window

\(norm = \frac{1}{\sum | x |}\) where \(x\) are the window samples

Parameters: window (ndarray[float]) – the window from which to calculate a norm
Returns: the norm of the window
Return type: float