Cover
Title Page
1. Copyright
About the Authors
1. Dr Pejman Mowlaee (main author) Graz University of Technology, Graz, Austria
2. Dipl. Ing. Josef Kulmer (co-author) Graz University of Technology, Graz, Austria
3. Dipl. Ing. Johannes Stahl (co-author) Graz University of Technology, Graz, Austria
4. Florian Mayer (co-author) Graz University of Technology, Graz, Austria
Preface
1. Purpose and scope
2. Book outline
3. Intended audience
4. Acknowledgments
List of Symbols
Part I: History, Theory and Concepts
1. Chapter 1: Introduction: Phase Processing, History
  1. 1.1 Chapter Organization
  2. 1.2 Conventional Speech Communication
  3. 1.3 Historical Overview of the Importance or Unimportance of Phase
  4. 1.4 Importance of Phase in Speech Processing
  5. 1.5 Structure of the Book
  6. 1.6 Experiments
  7. 1.7 Summary
  8. References
2. Chapter 2: Fundamentals of Phase-Based Signal Processing
  1. 2.1 Chapter Organization
  2. 2.2 STFT Phase: Background and Some Remarks
  3. 2.3 Phase Unwrapping
  4. 2.4 Useful Phase-Based Representations
  5. 2.5 Experiments
  6. 2.6 Summary
  7. References
3. Chapter 3: Phase Estimation Fundamentals
  1. 3.1 Chapter Organization
  2. 3.2 Phase Estimation Fundamentals
  3. 3.3 Existing Solutions
  4. 3.4 Experiments
  5. 3.5 Summary
  6. References
Part II: Applications
1. Chapter 4: Phase Processing for Single-Channel Speech Enhancement
  1. 4.1 Introduction and Chapter Organization
  2. 4.2 Speech Enhancement in the STFT Domain: General Concepts
  3. 4.3 Conventional Speech Enhancement
  4. 4.4 Phase-Sensitive Speech Enhancement
  5. 4.5 Experiments
  6. 4.6 Summary
  7. References
2. Chapter 5: Phase Processing for Single-Channel Source Separation
  1. 5.1 Chapter Organization
  2. 5.2 Why Single-Channel Source Separation?
  3. 5.3 Conventional Single-Channel Source Separation
  4. 5.4 Phase Processing for Single-Channel Source Separation
  5. 5.5 Experiments
  6. 5.6 Summary
  7. References
3. Chapter 6: Phase-Aware Speech Quality Estimation
  1. 6.1 Chapter Organization
  2. 6.2 Introduction: Speech Quality Estimation
  3. 6.3 Conventional Instrumental Metrics for Speech Quality Estimation
  4. 6.4 Why Phase-Aware Metrics?
  5. 6.5 New Phase-Aware Metrics
  6. 6.6 Subjective Tests
  7. 6.7 Experiments
  8. 6.8 Summary
  9. References
4. Chapter 7: Conclusion and Future Outlook
  1. 7.1 Chapter Organization
  2. 7.2 Renaissance of Phase-Aware Signal Processing: Decline and Rise
  3. 7.3 Directions for Future Research
  4. 7.4 Summary
  5. References
Appendix A: MATLAB Toolbox
1. A.1 Chapter Organization
2. A.2 PhaseLab Toolbox
3. References
  1. Index
End User License Agreement

List of Illustrations

Chapter 1: Introduction: Phase Processing, History
1. Figure 1.1 Speech communication devices used in everyday life scenarios are expected to function robustly in adverse noisy conditions.
2. Figure 1.2 Block diagram for speech communication from transmitter (microphone) to receiver end (loudspeaker) composed of a chain of blocks: beamforming, echo cancellation, de-reverberation, noise reduction, speech coding, channel coding, speech decoding, artificial bandwidth extension, near-end listening enhancement.
3. Figure 1.3 Block diagram of the processing chain in speech communication applications: analysis–modification–synthesis.
4. Figure 1.4 The experimental setup in Vary (1985) comprised three stages: a spectral analyzer, either a polyphase network (PPN) or fast Fourier transform (FFT), followed by an adaptive processor (amplitude/phase modification) and a spectral synthesizer.
5. Figure 1.5 Block diagram for Vary's experiment to study the effects of phase modification (Vary 1985).
6. Figure 1.6 Block diagram for Wang and Lim's experiment (Wang and Lim 1982), where stimuli of phase-modified speech are constructed in the framework of analysis–modification–synthesis.
7. Figure 1.7 Vector diagram inspired by Vary (1985) showing the phase deviation $c01-math-0064$ resulting from the added noise to speech at frequency subband $c01-math-0065$ and frame $c01-math-0066$ .
8. Figure 1.8 Phase deviation upper bound versus the spectral SNR in Vary's experiment given in (1.5).
9. Figure 1.9 Block diagram to construct stimuli of phase-modified speech in analysis–modification–synthesis (Paliwal et al. 2011).
10. Figure 1.10 Vector diagrams showing phase spectrum compensation (PSC) inspired by Stark et al. (2008), where modification of the noisy STFT is shown for conjugate pair signal-to-noise ratio scenarios: (a) large ( $c01-math-0115$ ), (b) low ( $c01-math-0116$ ).
11. Figure 1.11 Speech enhancement using phase spectrum compensation (PSC; Stark et al. 2008). Spectrograms in dB are shown for (left) clean, (middle) noisy, (right) enhanced speech signals. PESQ and output SNR scores are shown at the top of each panel.
Chapter 2: Fundamentals of Phase-Based Signal Processing
1. Figure 2.1 Time domain (left), magnitude spectrogram in dB (middle), and phase spectrogram (right) of female speech. While the magnitude spectrum presents a detailed harmonic structure of speech in time and frequency, the instantaneous phase spectrum shows no useful pattern or useful details.
2. Figure 2.2 Example showing phase wrapping in the STFT phase spectrum for the vowel “e”: (a) $c02-math-0044$ , waveform; (b) $c02-math-0045$ , STFT magnitude $c02-math-0046$ ; (c) STFT phase using a causal window; (d) $c02-math-0047$ , STFT phase using an acausal symmetric window with zero phase; (e) waveform representation as a sum of harmonics; (f) $c02-math-0048$ , amplitude of the $c02-math-0049$ th harmonic; (g) $c02-math-0050$ , harmonic instantaneous phases; (h) $c02-math-0051$ , unwrapped phase.
3. Figure 2.3 An example for two zeros close to the unit circle $c02-math-0063$ located between $c02-math-0064$ and $c02-math-0065$ . Such zeros are the main source of difficulty for DFT-based phase unwrapping methods (Drugman and Stylianou 2015).
4. Figure 2.4 Different branches for the arctan function are used in McGowan and Kuc (1982) to determine $c02-math-0177$ in (2.13) for adding or subtracting the $c02-math-0178$ multiples required in the time series phase unwrapping method.
5. Figure 2.5 The baseband representation of band $c02-math-0257$ for a symbolic spectrum composed of one harmonic. The prototype window function spectrum $c02-math-0258$ suppresses the impact of the adjacent frequency bands, but not the one closest to the frequency bin of interest, $c02-math-0259$ (Krawczyk and Gerkmann 2012).
6. Figure 2.6 Spectrogram in dB (left) and baseband phase difference (BPD; right) calculated for a clean speech signal used in the short-time Fourier transform phase improvement (STFTPI) method (Krawczyk and Gerkmann 2012).
7. Figure 2.7 Non-uniform distribution for spectral phase in the form of von Mises characterized by mean $c02-math-0311$ and concentration $c02-math-0312$ , ranging between uniform distribution and Dirac delta.
8. Figure 2.9 Example showing how phase distortion features as mean and deviation are used to classify different voicing states: (left) onset, (middle) voiced, (right) offset. The results are shown as (top) time domain, (middle) phase distortion mean (PDM), and (bottom) phase distortion standard deviation (PDD).
9. Figure 2.8 (a) Time domain signal for female speech, (b) spectrogram in dB, (c) RPS, (d) fundamental frequency.
10. Figure 2.10 Example inspired by Gdeisat and Lilley (2011) to show the process of phase unwrapping using the DD method applied to a cosine waveform; starting from the wrapped phase (b), via adding/subtracting $c02-math-0336$ jumps to remove the wraps, sequentially shown in (c)–(f), for the four wraps in the wrapped phase signal shown in (b).
11. Figure 2.11 Example inspired by Gdeisat and Lilley (2011) to show the process of phase unwrapping using the DD method applied to a cosine waveform corrupted with additive noise. One-dimensional phase unwrapping problem: top panel: (a) continuous phase, (b) wrapped phase, (c) mild noisy version, (d) wrapped phase, (e) unwrapped phase for mild noise, (f) intense noisy version, (g) wrapped phase, (h) unwrapped phase for intense noise.
12. Figure 2.12 Computation time (top) and error rate results (bottom) for the phase unwrapping methods for speech signals as a function of $c02-math-0352$ for different phase unwrapping methods.
13. Figure 2.13 Group delay representations listed in Table 2.3 shown for a voiced speech segment: (a) FFT log-magnitude, (b) modified group delay (MGD; Hegde et al. 2007), (c) LPC, (d) CGD (Bozkurt et al. 2007), and (e) LPGD (Rajan et al. 2013).
14. Figure 2.14 (a) Time domain representation for clean speech; (b) unwrapped phase shown for the first three harmonics, (c) mean, and (d) circular variance.
15. Figure 2.15 Phase variance presentation for (a) clean speech and (b) speech deteriorated by additive, white Gaussian noise. In highly voiced regions, e.g. at time $c02-math-0383$ , the harmonic structure is visible due to their low phase variance. Additive noise increases the phase variance.
16. Figure 2.16 Time–frequency information for clean (left) and noisy (right) signals: (a) amplitude spectrogram in dB, (b) instantaneous phase, (c) group delay, (d) instantaneous frequency (IF), (e) phasegram, (f) phase distortion deviation (PDD), and (g) relative phase shift (RPS).
Chapter 3: Phase Estimation Fundamentals
1. Figure 3.1 Visualization of the window impact on a sinusoid $c03-math-0020$ in time and frequency domains with $c03-math-0021$ , $c03-math-0022$ , and a rectangular window with length $c03-math-0023$ . The window DTFT $c03-math-0024$ is shifted dependent on $c03-math-0025$ and multiplied by $c03-math-0026$ and $c03-math-0027$ , respectively, as shown in the phase response of DTFT $c03-math-0028$ .
2. Figure 3.2 Relation of sinusoidal periods and window length and its impact on amplitude and phase. (a) A sinusoid multiplied by a boxcar window with a length of one period $c03-math-0078$ . The Dirichlet kernels do not interfere at $c03-math-0079$ and $c03-math-0080$ , which yields an unbiased phase estimate of $c03-math-0081$ . (b) The more general case of a window length that does not correspond to an integer multiplier of the sinusoid's period $c03-math-0082$ . The amplitude as well as the phase do not approach the true value, and thus the outcome is biased.
3. Figure 3.3 Illustration of three different windows' impacts on the magnitude and phase response of one sinusoid. The improved sidelobe suppression is at the cost of a higher mainlobe width resulting in a lower frequency resolution. For windows with higher sidelobe suppression, the phase response at frequency $c03-math-0101$ is increasingly dominated by the phase $c03-math-0102$ within the mainlobe width.
4. Figure 3.4 Illustration of the impact of additive white Gaussian noise on the magnitude and phase response of three neighboring sinusoids, windowed with Hamming without noise [(a),(d)], SNR $c03-math-0129$ dB [(b),(e)], and SNR $c03-math-0130$ dB [(c),(f)]. The left column shows the case if no noise is added. The mainlobes of the Hamming windows are sufficiently separated. The phase response shows that the neighboring frequencies of $c03-math-0131$ , $c03-math-0132$ , and $c03-math-0133$ are dominated by the phase values of the sinusoids: $c03-math-0134$ , $c03-math-0135$ , and $c03-math-0136$ . The middle and right columns present the impact of ten realizations of additive noise for $c03-math-0137$ and $c03-math-0138$ , respectively. With an increased noise level the phase values at the neighboring frequencies of $c03-math-0139$ , $c03-math-0140$ , and $c03-math-0141$ are more affected by noise.
5. Figure 3.5 Spectral magnitude for a signal composed of three harmonics using (a) Hamming and (b) Blackman windows. The harmonics $c03-math-0162$ are located at $c03-math-0163$ , $c03-math-0164$ , and $c03-math-0165$ . The broader mainlobe width of the Blackman window demands a higher window length of $c03-math-0166$ in order to suppress the neighboring harmonics.
6. Figure 3.6 Iterative framework for signal reconstruction showing the GLA update procedure to reconstruct the signal at iteration index $c03-math-0279$ , following Griffin and Lim, (1984).
7. Figure 3.7 Spectrogram consistency concept used in Griffin–Lim iterative signal reconstruction. A consistent spectrogram (belonging to the set $c03-math-0291$ ) verifies $c03-math-0292$ , while for an inconsistent spectrogram $c03-math-0293$ leading to $c03-math-0294$ .
8. Figure 3.8 Speech enhancement by maintaining phase continuity and phase reconstruction across time as proposed in Mehmetcik and Ciloglu (2012) for voice frames.
9. Figure 3.9 Block diagram for phase randomization proposed for zooming noise suppression (Sugiyama and Miyahara 2013).
10. Figure 3.10 Proof-of-concept experiment for phase randomization: (top) clean versus noisy speech, (bottom) phase randomization with blind and oracle SNRs.
11. Figure 3.11 Representation of $c03-math-0351$ as the sum of speech $c03-math-0352$ and noise $c03-math-0353$ in the complex plane. Due to the parity in the sign of $c03-math-0354$ in (3.57), there is an ambiguity in the set of phase candidates.
12. Figure 3.12 Phase constraints across time (IFD), harmonics (RPS), and frequency (GDD). The arrows show the coordination to which the proposed constraints are applied on the phase spectrum.
13. Figure 3.13 Comparison between phase estimation error criteria: squared error ( $c03-math-0390$ ) and cyclic error ( $c03-math-0391$ ) measures shown versus phase estimation error $c03-math-0392$ . For further details, see Nitzan et al. (2016).
14. Figure 3.14 Temporal smoothing of unwrapped phase to estimate the clean phase from a noisy speech input. The steps are: fundamental frequency estimation, phase decomposition, temporal smoothing, and signal reconstruction.
15. Figure 3.15 SNR-based smoothing (Mowlaae and Kulmer 2015b): Phase deviation cosine $c03-math-0481$ as a function of a priori and a posteriori local SNRs (left); the regions for hypotheses $c03-math-0482$ and $c03-math-0483$ depending on the values of $c03-math-0484$ and $c03-math-0485$ (right).
16. Figure 3.16 Performance evaluation of the maximum likelihood phase estimator (3.33) and maximum a posteriori estimator (3.41) with regard to fundamental frequency, signal-to-noise ratio, and a window length of $c03-math-0507$ [(a),(c)] and $c03-math-0508$ [(b),(d)]. The non-zero phase error for low noise scenarios is caused by the approximation in (3.31).
17. Figure 3.17 Impact of different window functions on phase estimation for an accurate fundamental frequency $c03-math-0543$ estimate (top row), and underestimated $c03-math-0544$ by $c03-math-0545$ (middle) and $c03-math-0546$ (bottom) for a window length of $c03-math-0547$ (left) and $c03-math-0548$ (right), revealing the importance of an accurate $c03-math-0549$ estimate. Given an accurate $c03-math-0550$ (top), the rectangular window performs best due to its high frequency resolution. For inaccurate $c03-math-0551$ estimates (bottom), window functions with wider mainlobes, e.g. the Blackman window, are in favor.
18. Figure 3.18 Griffin and Lim algorithm (GLA; Griffin and Lim 1984: dashed) and fast Griffin and Lim algorithm (solid; Perraudin et al. 2013) used for phase recovery. The results are shown in SSNR measured in dB for (left) female and (right) male speech versus the number of iterations.
19. Figure 3.19 Block diagram for the single-channel speech enhancement example used in Experiment 3.4 to demonstrate the effectiveness of phase estimators when used to replace the noisy spectra phase at signal reconstruction. Phase modification refers to any selected phase estimator listed in Table 3.1; “A” and “S” denote the analysis and synthesis steps.
20. Figure 3.20 Phase-only enhancement results for (left) white, (middle) babble, (right) factory noise, reported in PESQ (top row), STOI (middle), and unRMSE (bottom).
Chapter 4: Phase Processing for Single-Channel Speech Enhancement
1. Figure 4.1 The typical blocks needed for STFT speech enhancement: noise PSD estimation, a priori SNR estimation, speech spectral estimation.
2. Figure 4.2 Log-histogram in dB plots inspired by Breithaupt et al. (2007) for the residual noise DFT magnitude distributions (neglecting DC and Nyquist bin) for different choices of $c04-math-0032$ in (4.9).
3. Figure 4.3 Conventional speech enhancement, illustrated by a block diagram. The STFT representation of the noisy signal is given by $c04-math-0040$ , and the estimate of the clean amplitude spectrum $c04-math-0041$ is denoted by $c04-math-0042$ . The $c04-math-0043$ is applied to $c04-math-0044$ .
4. Figure 4.4 Three ways to incorporate the spectral phase information into the overall spectral estimation procedure. (a) Use independently obtained estimates for $c04-math-0101$ and $c04-math-0102$ for reconstruction. Some of the phase estimators depicted in Chapter 3 need a spectral amplitude estimate, which is in general not derived from the same cost function and does not comprise any phase information in this scenario. (b) Phase information is used in order to refine the amplitude estimate. It is optional to use the phase estimate or the noisy phase for reconstruction (indicated by the dashed line, see Section 4.4.2). (c) Amplitude and phase are obtained jointly; there are several ways to accomplish this, and both estimates are employed for synthesis.
5. Figure 4.5 Spectrograms of the clean speech (a), noise-corrupted speech (b), and enhanced speech signals (c). $c04-math-0130$ , leading (4.33) to reduce to (4.23). (d) Phase estimate obtained by Krawczyk and Gerkmann (2014), (e) phase estimate obtained by Kulmer and Mowlaee (2015), and (f) (4.33) with clean phase given. Depending on the choice for $c04-math-0131$ , harmonics are restored and artifacts introduced.
6. Figure 4.6 The iterative closed loop method (Mowlaee and Saeidi 2013). The estimate $c04-math-0149$ is given by (4.34) and the estimated phase is obtained by the geometry method presented in Mowlaee et al. (2012).
7. Figure 4.7 Spectrograms of the iterative method and relative inconsistency across iterations. (a) Clean speech, (b) noise corrupted speech, (c) iterative blind, (d) iterative method with noise magnitude assumed to be known for the initial geometry-based phase estimate, (e) the normalized change in inconsistency across iterations. The rectangle shows the speech reconstruction provided a reliable phase estimate is given.
8. Figure 4.8 (a) Clean speech, (b) noise corrupted speech, (c) estimated phase, (d) oracle phase.
9. Figure 4.9 Spectral coefficients of a purely stochastic signal (data points centered around zero) and a deterministic signal with uncertainty (decentered data assembly). The phase of the deterministic signal is normalized (achieved by linear phase removal, as in Chapter 3). Figure inspired by Hendriks et al. (2007).
10. Figure 4.10 Graphical representation of (4.58), illustrating that, depending on $c04-math-0216$ , the phase and the amplitude of $c04-math-0217$ are altered. The resulting phasor is a weighted sum of the complex mean $c04-math-0218$ and the noisy observation $c04-math-0219$ . Therefore, the ML estimate of the spectral phase is not the noisy phase any more but a value between the noisy phase $c04-math-0220$ and the prior information $c04-math-0221$ , depending on the certainty of the deterministic model (McCallum and Guillemin 2013).
11. Figure 4.11 Spectrograms of (a) clean speech, (b) noise corrupted speech, (c) MMSE-STSA, (d) MMSE-STSA with given STFT phase, (e) CUP, and (f) the iterative approach.
12. Figure 4.12 Relative inconsistency for 20 randomly selected, gender balanced utterances from the TIMIT database (Garofolo et al. 1993) mixed at global SNRs of $c04-math-0241$ dB, 0 dB, and 5 dB. The noise types utilized are white, babble, and factory noise. The inconsistency is normalized to (a) the outcome of the estimator in (4.33) together with STFTPI. If the amplitude of (4.33) is used together with the noisy phase for reconstruction we obtain (b). (c) is the phase-unaware baseline estimator in (4.26), and (d) is the inconsistency of the CUP estimator in (4.46).
13. Figure 4.13 Sensitivity analysis of the phase-aware estimator in (4.34) (black curve) and its phase-unaware counterpart in (4.26) (gray curve). The left column refers to NMSE $c04-math-0249$ , while the right column presents NMSE $c04-math-0250$ . The corresponding a priori SNRs $c04-math-0251$ are $c04-math-0252$ dB in (a) and (b), 0 dB in (c) and (d), and 15 dB in (e) and (f).
Chapter 5: Phase Processing for Single-Channel Source Separation
1. Figure 5.1 A general scenario with mixed sources: A voice is masked by a guitar sound in the background, both recorded with one microphone. SCSS is capable of separating both underlying sources from their mixture.
2. Figure 5.2 Geometry of the SCSS problem.
3. Figure 5.3 Conventional SCSS principle using mixture phase at signal reconstruction stage.
4. Figure 5.4 Block diagram of a CASA system inspired by Wang (2005), comprised of segmentation and grouping stages.
5. Figure 5.5 Computation of the local SSR for the target source for (a) ideal ratio mask ( $c05-math-0037$ ) and (b) ideal binary mask, at frequency bin $c05-math-0038$ . Below, the time frequency representation of the IRM ( $c05-math-0039$ ) (c) and IBM (d) are shown, respectively.
6. Figure 5.6 Different approaches in deep learning inspired by Zöhrer et al. (2015) using (a) one model $c05-math-0043$ directly, (b) indirect learning of the ideal time frequency mask using two models $c05-math-0044$ and $c05-math-0045$ . The models learn the time frequency mask $c05-math-0046$ used to separate sources from mixture $c05-math-0047$ .
7. Figure 5.7 Decomposition of a speech signal, combining $c05-math-0074$ trained basis vectors and estimated activations.
8. Figure 5.8 Multiplicative update for the basis matrix $c05-math-0108$ and activation matrix $c05-math-0109$ in NMF to approximate the underlying source magnitude spectrum $c05-math-0110$ .
9. Figure 5.9 Schematic representation of the MISI algorithm (Gunawan and Sen 2010). The spectral magnitudes are combined with the estimated phase to produce time domain signal estimates $c05-math-0149$ . These source estimates are then subtracted from the observed mixture $c05-math-0150$ to produce the remixing error $c05-math-0151$ which is used to refine the phase estimates in iterations.
10. Figure 5.10 Ideal Wiener filter along with the estimated magnitude spectrum using the confidence domain in PPR (Sturmel and Daudet 2012) using fixed threshold $c05-math-0210$ (left). Sinusoidal confidence domain and the estimated magnitude spectrum (middle). Speech presence probability of the sinusoidal confidence domain (right). Results are shown for (top) first speaker, (bottom) second speaker.
11. Figure 5.11 Proof-of-concept result for consistent Wiener filter (Le Roux and Vincent 2013) applied on a noisy speech utterance at $c05-math-0237$ 10 dB in street noise with a known noise power spectrum. Spectrograms shown in dB for (top) clean, (middle) noisy, and (bottom) CWF outcome.
12. Figure 5.12 Comparison of a ranged (neglecting outliers) and non-ranged (including outliers) frequency distribution. The bars illustrate the histogram of the pre-separated input.
13. Figure 5.13 Proof of concept showing the outcome of applying temporal smoothing phase estimation on the ideal ratio mask (d). The clean target reference (a), mixture (b), and the ideal ratio mask outcome (c) are shown for comparison.
14. Figure 5.14 Convergence analysis for GL-based methods and their performance reported in terms of $c05-math-0365$ SDR, $c05-math-0366$ SIR, and $c05-math-0367$ SAR compared to Wiener filtering using the mixture phase as baseline (Watanabe and Mowlaee 2013).
15. Figure 5.15 Quantized magnitude spectrum inspired by Mowlaee et al. (2012a) obtained by adding additive white Gaussian noise $c05-math-0374$ to each signal source $c05-math-0375$ .
16. Figure 5.16 BSS EVAL results reported for different GL-based phase reconstruction methods versus different quantization levels: (top) SDR, (middle) SIR, (bottom) SAR (all in dB).
17. Figure 5.17 $c05-math-0381$ SDR (left) and $c05-math-0382$ SIR (right) in dB for different masks applied to the mixture spectrum, assuming that the phase spectrum is known.
18. Figure 5.18 Proof-of-concept result obtained by a complex mask applied on noisy male utterance at $c05-math-0385$ 3 dB. Shown are clean speech, noisy, IRM, complex mask.
19. Figure 5.19 The mixture amplitude $c05-math-0392$ obtained from different signal interaction functions (left). Mean square error averaged over speech frames achieved by different signal interaction functions (right; Mowlaee and Martin 2012).
20. Figure 5.20 Proof-of-concept result for (a) clean male utterance corrupted with female utterance as masker, (b) mixed signal at 0 dB, (c) NMF outcome, (d) CMF, and (e) CMF-WISA.
Chapter 6: Phase-Aware Speech Quality Estimation
1. Figure 6.1 Block diagram for (top) conventional instrumental metric without using phase information, (middle) phase-only instrumental metrics, and (bottom) joint amplitude and phase metric.
2. Figure 6.2 Segmentations of a TIMIT sentence based on its RMS levels to the low, mid, and high regions used in the CSII method.
3. Figure 6.3 Geometric representation for the single-channel speech enhancement problem, showing noisy, clean, and noise spectra denoted by $c06-math-0051$ , $c06-math-0052$ , and $c06-math-0053$ , respectively. The phase deviation $c06-math-0054$ is defined as the phase difference between the clean phase $c06-math-0055$ and the noisy phase $c06-math-0056$ .
4. Figure 6.4 Mean opinion scores (MOS) of the MUSHRA test inspired by Gaich and Mowlaee (2015a). White noise (left) and babble noise (right) shown for 11 participants. The results are grouped into (top) low SNR = 0 $c06-math-0074$ , (middle) mid SNR = 5 $c06-math-0075$ , and (bottom) high SNR = 10 $c06-math-0076$ .
5. Figure 6.5 Correlation results for speech intelligibility measures with the subjective listening results.
6. Figure 6.6 Speech signal (top) and spectrogram (bottom) of the utterance “bin blue at l four soon.”
7. Figure 6.7 Circular variance and spectrogram for the clean phase signal, phase-modified signal ( $c06-math-0172$ ) and randomized phase ( $c06-math-0173$ ).
8. Figure 6.8 Mean objective scores for the best performing instrumental measures evaluated over 50 GRID utterances corrupted by phase distortions controlled by $c06-math-0174$ . The results are shown for (top) quality measures (PESQ, IFD, PD, UnMSE) and (bottom) intelligibility measures (STOI, CSII, CSIIm, UnRMSE).
9. Figure 6.9 Experiment 6.2: Noisy (left), STFTPI (Krawczyk and Gerkmann 2014; middle), and clean (right) speech signals. Results shown as spectrogram (top), group delay (middle), and phase variance (bottom). The predicted quality using PESQ and frequency-weighted SNR are shown for each outcome at the top of each panel.
10. Figure 6.10 Results shown for clean, noisy, estimated and clean phase: (top) spectrogram, (middle) group delay, (bottom) phase variance. Speech intelligibility outcome predicted by STOI and CSII, for a phase-enhanced signal using STFTPI.

List of Tables

Chapter 1: Introduction: Phase Processing, History
1. Table 1.1 Results for the Wang and Lim experiment in terms of SNR amplitude ( $c01-math-0039$ ) versus SNR phase ( $c01-math-0040$ ) showing the equivalent SNR (Wang and Lim 1982). The results are shown for a window length of 512 samples
2. Table 1.2 Subjective speech quality for different SNRs and different treatment types with SNR of 0 and 10 dB (Paliwal et al. 2011)
Chapter 2: Fundamentals of Phase-Based Signal Processing
1. Table 2.1 List of phase unwrapping solutions
2. Table 2.2 List of useful phase representations explained in this chapter
3. Table 2.3 Group delay functions and variants
Chapter 3: Phase Estimation Fundamentals
1. Table 3.1 Categorization of phase estimation methods with citations
Chapter 4: Phase Processing for Single-Channel Speech Enhancement
1. Table 4.1 Spectral amplitude estimators that are special cases of the parametrized estimator in (4.26)
2. Table 4.2 Settings for the estimators used in the proof-of-concept experiments
Chapter 5: Phase Processing for Single-Channel Source Separation
1. Table 5.1 Phase estimation methods proposed for signal reconstruction in SCSS
2. Table 5.2 List of time–frequency masks used for SCSS, considering two sources
3. Table 5.3 List of signal interaction functions
Chapter 6: Phase-Aware Speech Quality Estimation
1. Table 6.1 List of instrumental metrics to predict perceived speech quality
2. Table 6.2 List of speech intelligibility metrics
3. Table 6.3 Subjective evaluation results for intelligibility test, reported in percentages comparing LSA and LSA + PE methods
4. Table 6.4 Statistical analysis of the top performing perceived quality metrics for different noise types and SNRs, averaged over both SNRs and noise types
5. Table 6.5 Statistical analysis of the top performing speech intelligibility metrics for different noise and SNRs, averaged over both SNRs and noise
6. Table 6.6 Results for different phase modification scenarios in terms of conventional and phase-aware measures. The phase modification methods are: (A) noisy (unprocessed), (B) STFTPI, (C) maximum a posteriori (MAP), (D) clean STFT phase (upper bound)
Appendix A: MATLAB Toolbox
1. Table A.1 Filename, description, and experiment number for each MATLAB® implementation used in the book and included in the PhaseLab Toolbox

Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice

Pejman Mowlaee

Josef Kulmer

Johannes Stahl

Florian Mayer

Graz University of Technology, Austria

This edition first published 2017

Registered office:

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought

MATLAB^® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book's use or discussion of MATLAB^® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB^® software.

Library of Congress Cataloging-in-Publication Data

Names: Mowlaee, Pejman, 1983- author. | Kulmer, Josef, author. | Stahl, Johannes, 1989- author. | Mayer, Florian, 1986- author.

Title: Single channel phase-aware signal processing in speech communication : theory and practice / [compiled and written by] Pejman Mowlaee, Josef Kulmer, Johannes Stahl, Florian Mayer.

Description: Chichester, UK ; Hoboken, NJ : John Wiley & Sons, Inc., 2016. | Includes bibliographical references and index.

Identifiers: LCCN 2016024931 (print) | LCCN 2016033469 (ebook) | ISBN 9781119238812 (cloth) | ISBN 9781119238829 (pdf) | ISBN 9781119238836 (epub)

Subjects: LCSH: Speech processing systems. | Signal processing. | Oral communication. | Phase modulation.

Classification: LCC TK7882.S65 S575 2016 (print) | LCC TK7882.S65 (ebook) | DDC 006.4/54-dc23

LC record available at https://lccn.loc.gov/2016024931

ISBN: 9781119238812

A catalogue record for this book is available from the British Library.

Cover Image: Gettyimages/lestyan4

About the Authors

Dr Pejman Mowlaee (main author) Graz University of Technology, Graz, Austria

Pejman Mowlaee was born in Anzali, Iran. He received his BSc and MSc degrees in telecommunication engineering in Iran in 2005 and 2007. He received his PhD degree at Aalborg University, Denmark in 2010. From January 2011 to September 2012 he was a Marie Curie post-doctoral fellow for digital signal processing in audiology at Ruhr University Bochum, Germany. He is currently an assistant professor at the Speech Communication and Signal Processing (SPSC) Laboratory, Graz University of Technology, Austria.

Dr. Mowlaee has received several awards: young researcher's award for MSc study in 2005 and 2006, best MSc thesis award. His PhD work was supported by the Marie Curie EST-SIGNAL Fellowship during 2009–2010. He is a senior member of IEEE. He was an organizer of a special session and a tutorial session in 2014 and 2015. He was the editor for a special issue of the Elsevier journal Speech Communication, and is a project leader for the Austrian Science Fund.

Dipl. Ing. Josef Kulmer (co-author) Graz University of Technology, Graz, Austria

Josef Kulmer was born in Birkfeld, Austria, in 1985. He received the MSc degree from Graz University of Technology, Austria, in 2014. In 2014 he joined the Signal Processing and Speech Communication Laboratory at Graz University of Technology, where he is currently pursuing his PhD thesis in the field of signal processing.

Dipl. Ing. Johannes Stahl (co-author) Graz University of Technology, Graz, Austria

Johannes Stahl was born in Graz, Austria, in 1989. In 2009, he started studying electrical engineering and audio engineering at Graz University of Technology. In 2015, he received his Dipl.-Ing. (MSc) degree with distinction. In 2015 he joined the Signal Processing and Speech Communication Laboratory at Graz University of Technology, where he is currently pursuing his PhD thesis in the field of speechprocessing.

Florian Mayer (co-author) Graz University of Technology, Graz, Austria

Florian Mayer was born in Dobl, Austria, in 1986. In 2006, he started studying electrical engineering and audio engineering at Graz University of Technology, and received his Dipl.-Ing. (MSc) in 2015.

Preface

Purpose and scope

Speech communication technology has been intensively studied for more than a century since the invention of the telephone in 1876. Today's main target applications are acoustic human–machine communication, digital telephony, and digital hearing aids. Some detailed applications for speech communication, to name a few, are artificial bandwidth extension, speech enhancement, source separation, echo cancellation, speech synthesis, speaker recognition, automatic speech recognition, and speech coding. The signal processing methods used in the aforementioned applications are mostly focused on the short-time Fourier transform. While the Fourier transform spectrum contains both amplitude and phase parts, the phase spectrum has often been neglected or counted as unimportant. Since the spectral phase is typically wrapped due to its periodic nature, the main difficulty in phase processing is associated with extracting a continuous phase representation. In addition, compared to the spectral amplitude, it is a sophisticated task to model the spectral phase across frames.

This book is, in part, an outgrowth of five years of research conducted by the first author, which started with the publication of the first paper on “Phase Estimation for Signal Reconstruction in Single-Channel Source Separation” back in 2012. It is also a product of the research actively conducted in this area by all the authors at the PhaseLab research group. The fact that there is no text book on phase-aware signal processing for speech communication made it paramount to explain its fundamental principles. The need for such a book was even more pronounced as a follow-up to the success of a series of events organized/co-organized by myself, amongst them: a special session on “Phase Importance in Speech Processing Applications” at the International Conference on Spoken Language Processing (INTERSPEECH) 2014, a tutorial session on “Phase Estimation from Theory to Practice” at the International Conference on Spoken Language Processing (INTERSPEECH) 2015, and an editorial for a special issue on “phase-aware signal processing in speech communication” in Speech Communication (Elsevier, 2016), all receiving considerable attention from researchers from diverse speech processing fields. The intention of this book is to unify the recent individual advances made by researchers toward incorporating phase-aware signal processing methods into speech communication applications.

This book develops the tools and methodologies necessary to deal withphase-based signal processing and its application, in particular in single-channel speech processing. It is intended to provide its readers with solid fundamental tools and a detailed overview of the controversial insights regarding the importance and unimportance of phase in speech communication. Phase wrapping, exposed as the main difficulty for analyzing the spectral phase will be presented in detail, with solutions provided. Several useful representations derived from the phase spectrum will be presented. An in-depth analysis for the estimation of a signals' phase observed in noise together with an overview of existing methods will be given. The positive impact of phase-aware processing is demonstrated for three selected applications: speech enhancement, source separation, and speech quality estimation. Through several proof-of-concept examples and computer simulations, we demonstrate the importance and potential of phase processing in each application. Our hope is to provide a sufficient basis for researchers aiming at starting their research projects in different applications in speech communication with a special focus on phase processing.

Book outline

The book is divided into two parts and consists of seven chapters and an appendix. Part I (Chapters 1–3) gives an introduction to phase-based signal processing, providing the fundamentals and key concepts. Chapters 1–3 introduce an overview of the history of phase processing and reveal the phase importance/unimportance arguments (Chapter 1), the required definitions and tools for phase-based signal processing, such as phase unwrapping and abundant representations for spectral phase to make the phase spectrum more accessible (Chapter 2), and finally phase estimation fundamentals, limits potential, and its application to speech signals will be presented (Chapter 3).

Part II (Chapters 4–7) deals with three applications to demonstrate the benefit of phase processing in single-channel speech enhancement (Chapter 4), single-channel source separation (Chapter 5), and speech quality estimation (Chapter 6). Chapter 7 concludes the book and provides several future prospects to pursue. The appendix is dedicated to the implementations in MATLAB® collected as the PhaseLab toolbox in order to describe most of the implementations that reproduce the experiments included in the book.

Intended audience

The book is mainly targeted at researchers and graduate students with some background in signal processing theory and applications focused on speech signal processing. Although it is not primarily intended as a text book, the chapters may be used as supplementary material for a special-topics course at second-year graduate level. As an academic instrument, the book could be used tostrengthen the understanding of the often mystical field of phase-aware signal processing and provides several interesting applications where phase knowledge is successfully incorporated. To get the maximal benefit from this book, the reader is expected to have a fundamental knowledge of digital signal processing, signals and systems, and statistical signal processing. For the sake of completeness, a summary of phase-based signal processing is provided in Chapter 2.

The book contains a detailed overview of phase processing and a collection of phase estimation methods. We hope that these provide a set of useful tools that will help new researchers entering the field of phase-aware signal processing and inspire them to solve problems related to phase processing. As the theory and practice are linked in speech communication applications, the book is supplemented by various examples and contains a number of MATLAB® experiments. The reader will find the MATLAB® implementations for the simulations presented in the book with some audio samples online at the following website:[https://www.spsc.tugraz.at/PhaseLab]

These implementations are provided in a toolbox called PhaseLab which is explained in the appendix. The authors believe that each chapter of the book itself serves as a valuable resource and reference for researchers and students. The topics covered within the seven chapters cross-link with each other and contribute to the progress of the field of phase-aware signal processing for speech communication.

Acknowledgments

The intense collaboration in the year of working on this book project together with the three contributors, Josef Kulmer, Johannes Stahl, and Florian Mayer, was a unique experience and I would like to express my deepest gratitude for all their individual efforts. Apart from the very careful and insightful proofreads, their endless helpful discussions in improving the contents of the chapters and in our regular meetings led to a successful outcome that was only possible within such a great team. In particular, I would like to thank Johannes Stahl and Josef Kulmer for their full contribution in preparing Chapters 3 and 4. I would like to thank Florian Mayer for his valuable contribution in Chapter 5 and his endless efforts in preparing all the figures in the book.

Last, but not least, a number of people contributed in various ways and I would like to thank them: Prof. Gernot Kubin, Prof. Rainer Martin, Prof. Peter Vary,Prof. Bastian Kleijn, Prof. Tim Fingscheidt, and Dr. Christiane Antweiler for their enlightening discussions, for providing several helpful hints, and for sharing their experience with the first author. I would like to thank Dr. Thomas Drugman, Dr. Gilles Degottex, and Dr. Rahim Saeidi for their support regarding the experiments in Chapter 2. Special thanks go to Andreas Gaich for his support in preparing the results in Chapter 6. I am also thankful to several of my former Masters students who graduated at PhaseLab at TU Graz, Carlos Chacón, Anna Maly, and Mario Watanabe, for their valuable insights and outstanding support. I am grateful to Nasrin Ordoubazari, Fereydoun, Kamran, Solmaz, Hana, and Fatemeh Mowlaee, and the Almirdamad family who provided support and encouragement during this book project.

I would also like to thank the editorial team at John Wiley & Sons for their friendly assistance. Finally, I acknowledge the financial support from the Austrian Science Fund (FWF) project number P28070-N33.

P. Mowlaee

Graz, Austria

April 4, 2016

List of Symbols

Part I
History, Theory and Concepts

$symbols-math-0001$	absolute value
$symbols-math-0002$	angle
$symbols-math-0003$	clean speech phase spectrum
$symbols-math-0004$	tuning parameter for modified smoothed group delay
$symbols-math-0005$	mean value of the von Mises distribution
$symbols-math-0006$	perturbed clean speech phase
$symbols-math-0007$	clean speech amplitude spectrum
$symbols-math-0008$	amplitude of harmonic h
$symbols-math-0009$	scale factor in the z-transform X(z)
$symbols-math-0010$	clean speech amplitude spectrum estimate
$symbols-math-0011$	coefficients in the numerator polynomial of X(z)
$symbols-math-0012$	continuous phase function
$symbols-math-0013$	principal value of phase
$symbols-math-0014$	coefficients in the denumerator polynomial of X(z)
$symbols-math-0015$	basis matrix for the qth source in NMF
$symbols-math-0016$	smoothing parameter for decision-directed a priori SNR estimation
$symbols-math-0017$	smoothing parameter for the uncertainty in unvoiced speech
$symbols-math-0018$	compression parameter of the parametric speech spectrum estimators
$symbols-math-0019$	coherent gain of a window function
$symbols-math-0020$	compression function
$symbols-math-0021$	baseband phase difference (BPD)
$symbols-math-0022$	−3 dB bandwidth of the window mainlobe
$symbols-math-0023$	distance metric used in geometry-based phase estimator
$symbols-math-0024$	GDD-based distance metric used in geometry-based phase estimator
$symbols-math-0025$	parabolic cylinder function
$symbols-math-0026$	additive noise signal in time domain
$symbols-math-0027$	additive noise along time with applied window function
$symbols-math-0028$	divergence measure
$symbols-math-0029$	DFT coefficient for noise
$symbols-math-0030$	DTFT of additive noise
$symbols-math-0031$	DTFT of windowed noise frame
$symbols-math-0032$	distance measure as squared error between two spectra
$symbols-math-0033$	mask approximation objective measure
$symbols-math-0034$	signal approximation objective measure
$symbols-math-0035$	change in inconsistency
$symbols-math-0036$	group delay deviation
$symbols-math-0037$	phase deviation between the observation and the noisy signal
$symbols-math-0038$	cyclic mean phase error
$symbols-math-0039$	remixing error in MISI for the ith iteration
$symbols-math-0040$	expected value operator
$symbols-math-0041$	conditional expected value operator
$symbols-math-0042$	relative change of inconsistency
$symbols-math-0043$	sampling frequency in Hz
$symbols-math-0044$	fundamental frequency in Hz
$symbols-math-0045$	fundamental frequency of qth source in mixture
$symbols-math-0046$	phase deviation
$symbols-math-0047$	instantaneous phase from STFT
$symbols-math-0048$	relative phase shift
$symbols-math-0049$	confluent hypergeometric function
$symbols-math-0050$	gain function of a speech spectrum estimation scheme
$symbols-math-0051$	STFT(iSTFT(·))
$symbols-math-0052$	tuning parameter for modified smoothed group delay
$symbols-math-0053$	key adjustment parameter in CWF
$symbols-math-0054$	magnitude-squared coherence (MSC)
$symbols-math-0055$	Gamma function
$symbols-math-0056$	phase-sensitive filter
$symbols-math-0057$	complex mask filter
$symbols-math-0058$	complex ratio mask filter
$symbols-math-0059$	harmonic index
$symbols-math-0060$	desired harmonic
$symbols-math-0061$	number of harmonics
$symbols-math-0062$	hypothesis of no harmonic structure in the phase
$symbols-math-0063$	hypothesis of harmonic structure in the phase
$symbols-math-0064$	iteration index
$symbols-math-0065$	maximum number of iterations
$symbols-math-0066$	modified Bessel function of the first kind and order ν
$symbols-math-0067$	inconsistency operator
$symbols-math-0068$	discretized IF
$symbols-math-0069$	confidence domain for the qth source in PPR approach
$symbols-math-0070$	ideal binary mask
$symbols-math-0071$	ideal ratio mask
$symbols-math-0072$	instantaneous frequency deviation
$symbols-math-0073$	imaginary unit
$symbols-math-0074$	frequency index
$symbols-math-0075$	von Mises distribution concentration parameter
$symbols-math-0076$	frame index
$symbols-math-0077$	integer-valued function used in time series phase unwrapping
$symbols-math-0078$	number of frames
$symbols-math-0079$	local criterion used in IBM
$symbols-math-0080$	phase spectrum compensation function
$symbols-math-0081$	number of periods per window length
$symbols-math-0082$	integer value as phase wrapping number
$symbols-math-0083$	number of atoms used in NMF
$symbols-math-0084$	number of zeros inside of the unit circle
$symbols-math-0085$	number of zeros outside of the unit circle
$symbols-math-0086$	shape parameter of the parametric speech amplitude distribution
$symbols-math-0087$	circular mean parameter for the hth harmonic
$symbols-math-0088$	circular mean parameter of the von Mises distribution
$symbols-math-0089$	mean of the Gaussian distribution fitted to the qth source fundamental frequency
$symbols-math-0090$	standard deviation of Gaussian distribution fitted to the qth source fundamental frequency
$symbols-math-0091$	sample index $symbols-math-0092$
$symbols-math-0093$	instantaneous attack time
$symbols-math-0094$	length of a window function
$symbols-math-0095$	length of a frame
$symbols-math-0096$	number of DFT points
$symbols-math-0097$	normalized mean square error
$symbols-math-0098$	normalized angular frequency
$symbols-math-0099$	fundamental radian frequency
$symbols-math-0100$	instantaneous frequency (IF)
$symbols-math-0101$	closest sinusoid to bin k in STFTPI
$symbols-math-0102$	tuning factor to scale mask in IRM
$symbols-math-0103$	phase change in Nashi's phase unwrapping method
$symbols-math-0104$	phase increment in Nashi's phase unwrapping method
$symbols-math-0105$	voicing probability
$symbols-math-0106$	linear phase along time
$symbols-math-0107$	frequency derivative of phase
$symbols-math-0108$	phase value of harmonic h
$symbols-math-0109$	estimated phase value of harmonic h
$symbols-math-0110$	phase distortion
$symbols-math-0111$	probability density function
$symbols-math-0112$	phase spectrum of the analysis window
$symbols-math-0113$	source index in a mixture
$symbols-math-0114$	number of audio sources in a mixture
$symbols-math-0115$	radial step size
$symbols-math-0116$	Pearson's correlation coefficient
$symbols-math-0117$	constant threshold used in ISSIR
$symbols-math-0118$	phase randomization index
$symbols-math-0119$	absolute value of noisy speech signal STFT
$symbols-math-0120$	relative phase shift
$symbols-math-0121$	set of frames for von Mises parameter estimation
$symbols-math-0122$	frame shift, hop size in samples
$symbols-math-0123$	speech variance
$symbols-math-0124$	speech intelligibility
$symbols-math-0125$	signal-to-signal ratio (SSR)
$symbols-math-0126$	SNR amplitude
$symbols-math-0127$	SNR phase
$symbols-math-0128$	local SNR
$symbols-math-0129$	normalized root-mean-square error
$symbols-math-0130$	circular variance
$symbols-math-0131$	noise variance
$symbols-math-0132$	instantaneous harmonic phase
$symbols-math-0133$	objective function used in CWF
$symbols-math-0134$	unwrapped harmonic phase
$symbols-math-0135$	time--frequency smoothed harmonic phase
$symbols-math-0136$	activity domain used in ISSIR approach
$symbols-math-0137$	time index
$symbols-math-0138$	sampling period
$symbols-math-0139$	Kendall's tau
$symbols-math-0140$	group delay
$symbols-math-0141$	modified smoothed group delay function
$symbols-math-0142$	fixed threshold used in PPR approach
$symbols-math-0143$	smoothed group delay function
$symbols-math-0144$	three-dimensional matrix for phase
$symbols-math-0145$	Unwrapped root mean square error
$symbols-math-0146$	Unwrapped harmonic phase SNR
$symbols-math-0147$	unvoiced speech signal components
$symbols-math-0148$	unvoiced speech signal spectrum
$symbols-math-0149$	anti-symmetry function used in phase spectrum compensation
$symbols-math-0150$	prediction error in adaptive numerical integration
$symbols-math-0151$	vocal tract spectrum
$symbols-math-0152$	activation matrix for the qth source in NMF
$symbols-math-0153$	phase spectrum of the vocal tract (minimum phase)
$symbols-math-0154$	von Mises distribution
$symbols-math-0155$	noisy speech phase spectrum
$symbols-math-0156$	window function along time
$symbols-math-0157$	frequency response for the window w(n)
$symbols-math-0158$	band importance function for the kth frequency band
$symbols-math-0159$	clean speech signal in time domain
$symbols-math-0160$	deterministic speech component in time domain
$symbols-math-0161$	windowed deterministic speech spectrum
$symbols-math-0162$	stochastic--deterministic (SD) speech signal in time domain
$symbols-math-0163$	zero-phase signal
$symbols-math-0164$	signal frame
$symbols-math-0165$	DFT of a clean speech signal
$symbols-math-0166$	qth source DFT spectrum in a mixture
$symbols-math-0167$	sequence along time with applied window function
$symbols-math-0168$	DTFT of a windowed speech frame
$symbols-math-0169$	baseband representation
$symbols-math-0170$	product spectrum
$symbols-math-0171$	real part of the clean speech spectrum
$symbols-math-0172$	imaginary part of the clean speech spectrum
$symbols-math-0173$	a priori SNR
$symbols-math-0174$	noisy speech in time domain
$symbols-math-0175$	noisy speech spectrum
$symbols-math-0176$	a posteriori mean for the stochastic–deterministic approach
$symbols-math-0177$	signal's modified STFT
$symbols-math-0178$	modified signal
$symbols-math-0179$	a posteriori SNR
$symbols-math-0180$	mth zero in the z-plane