Cover Page

Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice

Pejman Mowlaee

Josef Kulmer

Johannes Stahl

Florian Mayer

 

Graz University of Technology, Austria

 

 

 

 

Wiley Logo

About the Authors

Dr Pejman Mowlaee (main author) Graz University of Technology, Graz, Austria

Pejman Mowlaee was born in Anzali, Iran. He received his BSc and MSc degrees in telecommunication engineering in Iran in 2005 and 2007. He received his PhD degree at Aalborg University, Denmark in 2010. From January 2011 to September 2012 he was a Marie Curie post-doctoral fellow for digital signal processing in audiology at Ruhr University Bochum, Germany. He is currently an assistant professor at the Speech Communication and Signal Processing (SPSC) Laboratory, Graz University of Technology, Austria.

Dr. Mowlaee has received several awards: young researcher's award for MSc study in 2005 and 2006, best MSc thesis award. His PhD work was supported by the Marie Curie EST-SIGNAL Fellowship during 2009–2010. He is a senior member of IEEE. He was an organizer of a special session and a tutorial session in 2014 and 2015. He was the editor for a special issue of the Elsevier journal Speech Communication, and is a project leader for the Austrian Science Fund.

Dipl. Ing. Josef Kulmer (co-author) Graz University of Technology, Graz, Austria

Josef Kulmer was born in Birkfeld, Austria, in 1985. He received the MSc degree from Graz University of Technology, Austria, in 2014. In 2014 he joined the Signal Processing and Speech Communication Laboratory at Graz University of Technology, where he is currently pursuing his PhD thesis in the field of signal processing.

Dipl. Ing. Johannes Stahl (co-author) Graz University of Technology, Graz, Austria

Johannes Stahl was born in Graz, Austria, in 1989. In 2009, he started studying electrical engineering and audio engineering at Graz University of Technology. In 2015, he received his Dipl.-Ing. (MSc) degree with distinction. In 2015 he joined the Signal Processing and Speech Communication Laboratory at Graz University of Technology, where he is currently pursuing his PhD thesis in the field of speechprocessing.

Florian Mayer (co-author) Graz University of Technology, Graz, Austria

Florian Mayer was born in Dobl, Austria, in 1986. In 2006, he started studying electrical engineering and audio engineering at Graz University of Technology, and received his Dipl.-Ing. (MSc) in 2015.

Preface

Purpose and scope

Speech communication technology has been intensively studied for more than a century since the invention of the telephone in 1876. Today's main target applications are acoustic human–machine communication, digital telephony, and digital hearing aids. Some detailed applications for speech communication, to name a few, are artificial bandwidth extension, speech enhancement, source separation, echo cancellation, speech synthesis, speaker recognition, automatic speech recognition, and speech coding. The signal processing methods used in the aforementioned applications are mostly focused on the short-time Fourier transform. While the Fourier transform spectrum contains both amplitude and phase parts, the phase spectrum has often been neglected or counted as unimportant. Since the spectral phase is typically wrapped due to its periodic nature, the main difficulty in phase processing is associated with extracting a continuous phase representation. In addition, compared to the spectral amplitude, it is a sophisticated task to model the spectral phase across frames.

This book is, in part, an outgrowth of five years of research conducted by the first author, which started with the publication of the first paper on “Phase Estimation for Signal Reconstruction in Single-Channel Source Separation” back in 2012. It is also a product of the research actively conducted in this area by all the authors at the PhaseLab research group. The fact that there is no text book on phase-aware signal processing for speech communication made it paramount to explain its fundamental principles. The need for such a book was even more pronounced as a follow-up to the success of a series of events organized/co-organized by myself, amongst them: a special session on “Phase Importance in Speech Processing Applications” at the International Conference on Spoken Language Processing (INTERSPEECH) 2014, a tutorial session on “Phase Estimation from Theory to Practice” at the International Conference on Spoken Language Processing (INTERSPEECH) 2015, and an editorial for a special issue on “phase-aware signal processing in speech communication” in Speech Communication (Elsevier, 2016), all receiving considerable attention from researchers from diverse speech processing fields. The intention of this book is to unify the recent individual advances made by researchers toward incorporating phase-aware signal processing methods into speech communication applications.

This book develops the tools and methodologies necessary to deal withphase-based signal processing and its application, in particular in single-channel speech processing. It is intended to provide its readers with solid fundamental tools and a detailed overview of the controversial insights regarding the importance and unimportance of phase in speech communication. Phase wrapping, exposed as the main difficulty for analyzing the spectral phase will be presented in detail, with solutions provided. Several useful representations derived from the phase spectrum will be presented. An in-depth analysis for the estimation of a signals' phase observed in noise together with an overview of existing methods will be given. The positive impact of phase-aware processing is demonstrated for three selected applications: speech enhancement, source separation, and speech quality estimation. Through several proof-of-concept examples and computer simulations, we demonstrate the importance and potential of phase processing in each application. Our hope is to provide a sufficient basis for researchers aiming at starting their research projects in different applications in speech communication with a special focus on phase processing.

Book outline

The book is divided into two parts and consists of seven chapters and an appendix. Part I (Chapters 1–3) gives an introduction to phase-based signal processing, providing the fundamentals and key concepts. Chapters 1–3 introduce an overview of the history of phase processing and reveal the phase importance/unimportance arguments (Chapter 1), the required definitions and tools for phase-based signal processing, such as phase unwrapping and abundant representations for spectral phase to make the phase spectrum more accessible (Chapter 2), and finally phase estimation fundamentals, limits potential, and its application to speech signals will be presented (Chapter 3).

Part II (Chapters 4–7) deals with three applications to demonstrate the benefit of phase processing in single-channel speech enhancement (Chapter 4), single-channel source separation (Chapter 5), and speech quality estimation (Chapter 6). Chapter 7 concludes the book and provides several future prospects to pursue. The appendix is dedicated to the implementations in MATLAB® collected as the PhaseLab toolbox in order to describe most of the implementations that reproduce the experiments included in the book.

Intended audience

The book is mainly targeted at researchers and graduate students with some background in signal processing theory and applications focused on speech signal processing. Although it is not primarily intended as a text book, the chapters may be used as supplementary material for a special-topics course at second-year graduate level. As an academic instrument, the book could be used tostrengthen the understanding of the often mystical field of phase-aware signal processing and provides several interesting applications where phase knowledge is successfully incorporated. To get the maximal benefit from this book, the reader is expected to have a fundamental knowledge of digital signal processing, signals and systems, and statistical signal processing. For the sake of completeness, a summary of phase-based signal processing is provided in Chapter 2.

The book contains a detailed overview of phase processing and a collection of phase estimation methods. We hope that these provide a set of useful tools that will help new researchers entering the field of phase-aware signal processing and inspire them to solve problems related to phase processing. As the theory and practice are linked in speech communication applications, the book is supplemented by various examples and contains a number of MATLAB® experiments. The reader will find the MATLAB® implementations for the simulations presented in the book with some audio samples online at the following website:[https://www.spsc.tugraz.at/PhaseLab]

These implementations are provided in a toolbox called PhaseLab which is explained in the appendix. The authors believe that each chapter of the book itself serves as a valuable resource and reference for researchers and students. The topics covered within the seven chapters cross-link with each other and contribute to the progress of the field of phase-aware signal processing for speech communication.

Acknowledgments

The intense collaboration in the year of working on this book project together with the three contributors, Josef Kulmer, Johannes Stahl, and Florian Mayer, was a unique experience and I would like to express my deepest gratitude for all their individual efforts. Apart from the very careful and insightful proofreads, their endless helpful discussions in improving the contents of the chapters and in our regular meetings led to a successful outcome that was only possible within such a great team. In particular, I would like to thank Johannes Stahl and Josef Kulmer for their full contribution in preparing Chapters 3 and 4. I would like to thank Florian Mayer for his valuable contribution in Chapter 5 and his endless efforts in preparing all the figures in the book.

Last, but not least, a number of people contributed in various ways and I would like to thank them: Prof. Gernot Kubin, Prof. Rainer Martin, Prof. Peter Vary,Prof. Bastian Kleijn, Prof. Tim Fingscheidt, and Dr. Christiane Antweiler for their enlightening discussions, for providing several helpful hints, and for sharing their experience with the first author. I would like to thank Dr. Thomas Drugman, Dr. Gilles Degottex, and Dr. Rahim Saeidi for their support regarding the experiments in Chapter 2. Special thanks go to Andreas Gaich for his support in preparing the results in Chapter 6. I am also thankful to several of my former Masters students who graduated at PhaseLab at TU Graz, Carlos Chacón, Anna Maly, and Mario Watanabe, for their valuable insights and outstanding support. I am grateful to Nasrin Ordoubazari, Fereydoun, Kamran, Solmaz, Hana, and Fatemeh Mowlaee, and the Almirdamad family who provided support and encouragement during this book project.

I would also like to thank the editorial team at John Wiley & Sons for their friendly assistance. Finally, I acknowledge the financial support from the Austrian Science Fund (FWF) project number P28070-N33.

P. Mowlaee

Graz, Austria

April 4, 2016

List of Symbols

symbols-math-0001absolute value
symbols-math-0002angle
symbols-math-0003clean speech phase spectrum
symbols-math-0004tuning parameter for modified smoothed group delay
symbols-math-0005mean value of the von Mises distribution
symbols-math-0006perturbed clean speech phase
symbols-math-0007clean speech amplitude spectrum
symbols-math-0008amplitude of harmonic h
symbols-math-0009scale factor in the z-transform X(z)
symbols-math-0010clean speech amplitude spectrum estimate
symbols-math-0011coefficients in the numerator polynomial of X(z)
symbols-math-0012continuous phase function
symbols-math-0013principal value of phase
symbols-math-0014coefficients in the denumerator polynomial of X(z)
symbols-math-0015basis matrix for the qth source in NMF
symbols-math-0016smoothing parameter for decision-directed a priori SNR estimation
symbols-math-0017smoothing parameter for the uncertainty in unvoiced speech
symbols-math-0018compression parameter of the parametric speech spectrum estimators
symbols-math-0019coherent gain of a window function
symbols-math-0020compression function
symbols-math-0021baseband phase difference (BPD)
symbols-math-0022−3 dB bandwidth of the window mainlobe
symbols-math-0023distance metric used in geometry-based phase estimator
symbols-math-0024GDD-based distance metric used in geometry-based phase estimator
symbols-math-0025parabolic cylinder function
symbols-math-0026additive noise signal in time domain
symbols-math-0027additive noise along time with applied window function
symbols-math-0028divergence measure
symbols-math-0029DFT coefficient for noise
symbols-math-0030DTFT of additive noise
symbols-math-0031DTFT of windowed noise frame
symbols-math-0032distance measure as squared error between two spectra
symbols-math-0033mask approximation objective measure
symbols-math-0034signal approximation objective measure
symbols-math-0035change in inconsistency
symbols-math-0036group delay deviation
symbols-math-0037phase deviation between the observation and the noisy signal
symbols-math-0038cyclic mean phase error
symbols-math-0039remixing error in MISI for the ith iteration
symbols-math-0040expected value operator
symbols-math-0041conditional expected value operator
symbols-math-0042relative change of inconsistency
symbols-math-0043sampling frequency in Hz
symbols-math-0044fundamental frequency in Hz
symbols-math-0045fundamental frequency of qth source in mixture
symbols-math-0046phase deviation
symbols-math-0047instantaneous phase from STFT
symbols-math-0048relative phase shift
symbols-math-0049confluent hypergeometric function
symbols-math-0050gain function of a speech spectrum estimation scheme
symbols-math-0051STFT(iSTFT(·))
symbols-math-0052tuning parameter for modified smoothed group delay
symbols-math-0053key adjustment parameter in CWF
symbols-math-0054magnitude-squared coherence (MSC)
symbols-math-0055Gamma function
symbols-math-0056phase-sensitive filter
symbols-math-0057complex mask filter
symbols-math-0058complex ratio mask filter
symbols-math-0059harmonic index
symbols-math-0060desired harmonic
symbols-math-0061number of harmonics
symbols-math-0062hypothesis of no harmonic structure in the phase
symbols-math-0063hypothesis of harmonic structure in the phase
symbols-math-0064iteration index
symbols-math-0065maximum number of iterations
symbols-math-0066modified Bessel function of the first kind and order ν
symbols-math-0067inconsistency operator
symbols-math-0068discretized IF
symbols-math-0069confidence domain for the qth source in PPR approach
symbols-math-0070ideal binary mask
symbols-math-0071ideal ratio mask
symbols-math-0072instantaneous frequency deviation
symbols-math-0073imaginary unit
symbols-math-0074frequency index
symbols-math-0075von Mises distribution concentration parameter
symbols-math-0076frame index
symbols-math-0077integer-valued function used in time series phase unwrapping
symbols-math-0078number of frames
symbols-math-0079local criterion used in IBM
symbols-math-0080phase spectrum compensation function
symbols-math-0081number of periods per window length
symbols-math-0082integer value as phase wrapping number
symbols-math-0083number of atoms used in NMF
symbols-math-0084number of zeros inside of the unit circle
symbols-math-0085number of zeros outside of the unit circle
symbols-math-0086shape parameter of the parametric speech amplitude distribution
symbols-math-0087circular mean parameter for the hth harmonic
symbols-math-0088circular mean parameter of the von Mises distribution
symbols-math-0089mean of the Gaussian distribution fitted to the qth source fundamental frequency
symbols-math-0090standard deviation of Gaussian distribution fitted to the qth source fundamental frequency
symbols-math-0091sample index symbols-math-0092
symbols-math-0093instantaneous attack time
symbols-math-0094length of a window function
symbols-math-0095length of a frame
symbols-math-0096number of DFT points
symbols-math-0097normalized mean square error
symbols-math-0098normalized angular frequency
symbols-math-0099fundamental radian frequency
symbols-math-0100instantaneous frequency (IF)
symbols-math-0101closest sinusoid to bin k in STFTPI
symbols-math-0102tuning factor to scale mask in IRM
symbols-math-0103phase change in Nashi's phase unwrapping method
symbols-math-0104phase increment in Nashi's phase unwrapping method
symbols-math-0105voicing probability
symbols-math-0106linear phase along time
symbols-math-0107frequency derivative of phase
symbols-math-0108phase value of harmonic h
symbols-math-0109estimated phase value of harmonic h
symbols-math-0110phase distortion
symbols-math-0111probability density function
symbols-math-0112phase spectrum of the analysis window
symbols-math-0113source index in a mixture
symbols-math-0114number of audio sources in a mixture
symbols-math-0115radial step size
symbols-math-0116Pearson's correlation coefficient
symbols-math-0117constant threshold used in ISSIR
symbols-math-0118phase randomization index
symbols-math-0119absolute value of noisy speech signal STFT
symbols-math-0120relative phase shift
symbols-math-0121set of frames for von Mises parameter estimation
symbols-math-0122frame shift, hop size in samples
symbols-math-0123speech variance
symbols-math-0124speech intelligibility
symbols-math-0125signal-to-signal ratio (SSR)
symbols-math-0126SNR amplitude
symbols-math-0127SNR phase
symbols-math-0128local SNR
symbols-math-0129normalized root-mean-square error
symbols-math-0130circular variance
symbols-math-0131noise variance
symbols-math-0132instantaneous harmonic phase
symbols-math-0133objective function used in CWF
symbols-math-0134unwrapped harmonic phase
symbols-math-0135time--frequency smoothed harmonic phase
symbols-math-0136activity domain used in ISSIR approach
symbols-math-0137time index
symbols-math-0138sampling period
symbols-math-0139Kendall's tau
symbols-math-0140group delay
symbols-math-0141modified smoothed group delay function
symbols-math-0142fixed threshold used in PPR approach
symbols-math-0143smoothed group delay function
symbols-math-0144three-dimensional matrix for phase
symbols-math-0145Unwrapped root mean square error
symbols-math-0146Unwrapped harmonic phase SNR
symbols-math-0147unvoiced speech signal components
symbols-math-0148unvoiced speech signal spectrum
symbols-math-0149anti-symmetry function used in phase spectrum compensation
symbols-math-0150prediction error in adaptive numerical integration
symbols-math-0151vocal tract spectrum
symbols-math-0152activation matrix for the qth source in NMF
symbols-math-0153phase spectrum of the vocal tract (minimum phase)
symbols-math-0154von Mises distribution
symbols-math-0155noisy speech phase spectrum
symbols-math-0156window function along time
symbols-math-0157frequency response for the window w(n)
symbols-math-0158band importance function for the kth frequency band
symbols-math-0159clean speech signal in time domain
symbols-math-0160deterministic speech component in time domain
symbols-math-0161windowed deterministic speech spectrum
symbols-math-0162stochastic--deterministic (SD) speech signal in time domain
symbols-math-0163zero-phase signal
symbols-math-0164signal frame
symbols-math-0165DFT of a clean speech signal
symbols-math-0166qth source DFT spectrum in a mixture
symbols-math-0167sequence along time with applied window function
symbols-math-0168DTFT of a windowed speech frame
symbols-math-0169baseband representation
symbols-math-0170product spectrum
symbols-math-0171real part of the clean speech spectrum
symbols-math-0172imaginary part of the clean speech spectrum
symbols-math-0173a priori SNR
symbols-math-0174noisy speech in time domain
symbols-math-0175noisy speech spectrum
symbols-math-0176a posteriori mean for the stochastic–deterministic approach
symbols-math-0177signal's modified STFT
symbols-math-0178modified signal
symbols-math-0179a posteriori SNR
symbols-math-0180mth zero in the z-plane

Part I
History, Theory and Concepts