Papers by Deepak Gala

Moving Sound Source Localization and Tracking Using a Self Rotating Bi-Microphone Array
ASME 2019 Dynamic Systems and Control Conference, Nov 26, 2019
In this paper, we present three approaches to localizing and tracking a sound source moving in a ... more In this paper, we present three approaches to localizing and tracking a sound source moving in a three-dimensional (3D) space using a bi-microphone array rotating at a fixed angular velocity. The motion of the sound source along with the rotation of the bi-microphone array results in a sinusoidal inter-channel time difference (ICTD) signal with time-varying amplitude and phase. Two state-space models were employed to develop extended Kalman filters (EKFs) that identify instantaneous amplitude and phase of the signal. Observability analysis of the two state-space models was conducted to reveal singularities. We also developed a method based on Hilbert transform, which is done by comparing the analytic signal of the true ICTD signal with that of a virtual signal having zero elevation and azimuth angles. A moving average filter is then applied to reduce the noise and the effect of the artifacts at the beginning and the ending portion of the estimates. The effectiveness of the proposed methods was tested and comparison studies were conducted in the simulation.

Speech Signal Enhancement Techniques for Microphone Arrays
In all speech communication settings, the quality and intelligibility of speech is of utmost impo... more In all speech communication settings, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. The speech processing systems used to communicate or store speech are usually designed for a noise free environment but in a real-world environment, the presence of background interference in the form of additive background and channel noise drastically degrades the performance of these systems, causing inaccurate information exchange and listener fatigue. The Spectral Subtraction Technique can be used to reduce stationary noise but the non stationary noise still passes through it. Spectral Subtraction also introduces a musical noise which is very annoying to human ears. Beamforming is another possible method of speech enhancement that can be used. Further, the musical noise of Spectral Subtraction can be reduced by Beamforming. Beamforming by itself, however, does not appear to provide enough improvement. Further, the performance of Beamforming becomes worse if the noise source comes from many directions or the speech has strong reverberation. Therefore, a system has been designed with a combination of Spectral Subtraction Technique followed by Beamforming Technique reducing stationary as well as residual, musical noise. Algorithms and associated software have been developed for 1) Spectral Subtraction 2) Beamforming Technique and 3) Spectral Subtraction followed by Beamforming Technique. The last developed technique results in getting a noise free speech free of musical noise and reverberation making the speech intelligible and of good quality. Processing of the signal for Spectral Subtraction, Delay Sum Beamforming and the Combined Techniques, was carried out individually for three different experiments (with 3 , 6 and 10 microphones) and for 4 different cases with 3 different signals and fourth a signal with Gaussian white Noise. The SNR in each case was calculated.

Speech enhancement combining spectral subtraction and beamforming techniques for microphone array
Proceedings of the International Conference and Workshop on Emerging Trends in Technology, 2010
ABSTRACT In all speech communication settings the quality and intelligibility of speech is of utm... more ABSTRACT In all speech communication settings the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. The Spectral Subtraction Technique is one of the methods to reduce stationary noise but the non stationary noise still passes through it. Spectral Subtraction also introduces a musical noise which is very annoying to human ears. Beamforming is another possible method of speech enhancement that can be used. Beamforming by itself, however, does not appear to provide enough improvement. Further, the performance of Beamforming becomes worse if the noise source comes from many directions or the speech has strong reverberation. A combined technique using the advantages of Spectral Subtraction and Beamforming Techniques is proposed where the Spectral Subtraction Technique followed by Beamforming Technique reduces stationary as well as residual, musical noise. It can be observed that the Spectral Subtraction followed by Beamforming gives better SNR value as compared to that of individual techniques, thereby improving the quality of speech. Numerous simulation results are used to illustrate the reasoning.

SNR improvement with speech enhancement techniques
Proceedings of the International Conference & Workshop on Emerging Trends in Technology - ICWET '11, 2011
ABSTRACT Speech enhancement aims to improve the speech quality by using various techniques. Spect... more ABSTRACT Speech enhancement aims to improve the speech quality by using various techniques. Spectral Subtraction Technique is one earliest and longer standing, popular approaches to noise compensation and speech enhancement. It reduces stationary noise but the non stationary noise still passes through it. Further, it also introduces a musical noise which is very annoying to human ears. Beamforming is another possible method of speech enhancement that can be used. Beamforming by itself, however, does not appear to provide enough improvement. Further, the performance of Beamforming becomes worse if the noise source comes from many directions or the speech has strong reverberation. A combined technique using the Spectral Subtraction Technique followed by Beamforming Technique reduces stationary as well as residual, musical noise. It can be observed that the Spectral Subtraction followed by Beamforming gives better SNR value as compared to that of individual techniques, thereby improving the quality of speech. Numerous simulation results are used to illustrate the reasoning.

Proceedings of the 5th International Conference of Control, Dynamic Systems, and Robotics (CDSR'18), Jun 1, 2018
This paper presents a novel three-dimensional (3D) sound source localization (SSL) technique base... more This paper presents a novel three-dimensional (3D) sound source localization (SSL) technique based on only Interaural Time Difference (ITD) signals, acquired by a self-rotational two-microphone array on an Unmanned Ground Vehicle. Both the azimuth and elevation angles of a stationary sound source are identified using the phase angle and amplitude of the acquired ITD signal. An SSL algorithm based on an extended Kalman filter (EKF) is developed. The observability analysis reveals the singularity of the state when the sound source is placed above the microphone array. A means of detecting this singularity is then proposed and incorporated into the proposed SSL algorithm. The proposed technique is tested in both a simulated environment and two hardware platforms, i.e., a KEMAR dummy binaural head and a robotic platform. All results show the fast and accurate convergence of estimates.

Journal of Intelligent & Robotic Systems
While vision-based localization techniques have been widely studied for small autonomous unmanned... more While vision-based localization techniques have been widely studied for small autonomous unmanned vehicles (SAUVs), sound-source localization capabilities have not been fully enabled for SAUVs. This paper presents two novel approaches for SAUVs to perform three-dimensional (3D) multi-sound-sources localization (MSSL) using only the interchannel time difference (ICTD) signal generated by a self-rotating bi-microphone array. The proposed two approaches are based on two machine learning techniques viz., Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Random Sample Consensus (RANSAC) algorithms, respectively, whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their 3D orientations in a reverberant environment.
ArXiv, 2018
While vision-based localization techniques have been widely studied for small autonomous unmanned... more While vision-based localization techniques have been widely studied for small autonomous unmanned vehicles (SAUVs), sound-source localization capability has not been fully enabled for SAUVs. This paper presents two novel approaches for SAUVs to perform multi-sound-sources localization (MSSL) using only the interaural time difference (ITD) signal generated by a self-rotating bi-microphone array. The proposed two approaches are based on the DBSCAN and RANSAC algorithms, respectively, whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their three-dimensional orientations in a reverberant environment.

Journal of Intelligent & Robotic Systems
This work presents a novel technique that performs both orientation and distance localization of ... more This work presents a novel technique that performs both orientation and distance localization of a sound source in a three-dimensional (3D) space using only the interaural time difference (ITD) cue, generated by a newly-developed self-rotational bi-microphone robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis of the state-space model shows that the system is unobservable when the sound source is placed with elevation angles of 90 and 0 degree. The proposed method utilizes the difference between the azimuth estimates resulting from respectively the 3D and the two-dimensional models to check the zero-degreeelevation condition and further estimates the elevation angle using a polynomial curve fitting approach. Also, the proposed method is capable of detecting a 90-degree elevation by extracting the zero-ITD signal 'buried' in noise. Additionally, a distance localization is performed by first rotating the microphone array to face toward the sound source and then shifting the microphone perpendicular to the source-robot vector by a predefined distance of a fixed number of steps. The integrated rotational and translational motions of the microphone array provide a complete orientation and distance localization using only the ITD cue. A novel robotic platform using a self-rotational bi-microphone array was also developed for unmanned ground robots performing sound source localization. The proposed technique was first tested in simulation and was then verified on the newly-developed robotic platform. Experimental data collected by the microphones installed on a KEMAR Deepak Gala,

In all speech communication settings the quality and intelligibility of speech is of utmost impor... more In all speech communication settings the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. The Spectral Subtraction Technique is one of the methods to reduce stationary noise but the non stationary noise still passes through it. Spectral Subtraction also introduces a musical noise which is very annoying to human ears. Beamforming is another possible method of speech enhancement that can be used. Beamforming by itself, however, does not appear to provide enough improvement. Further, the performance of Beamforming becomes worse if the noise source comes from many directions or the speech has strong reverberation. A combined technique using the advantages of Spectral Subtraction and Beamforming Techniques is proposed where the Spectral Subtraction Technique followed by Beamforming Technique reduces stationary as well as residual, musical noise. It can be observed that the Spectral Subtraction followed by Beamforming gives better SNR value as compared to that of individual techniques, thereby improving the quality of speech. Numerous simulation results are used to illustrate the reasoning.

SOUND SOURCE LOCALIZATION AND TRACKING USING A SELF-ROTATING BI-MICROPHONE ARRAY
Dissertation, 2019
While vision-based localization techniques have been widely studied, sound-source localization ca... more While vision-based localization techniques have been widely studied, sound-source localization capabilities have not been fully enabled. In this dissertation, I present novel three-dimensional (3D) sound source localization (SSL) techniques based on only inter-channel time difference (ICTD) signals, acquired by a self-rotating bi-microphone array on a ground robot.
The rest of the dissertation is as follows. Chapter 2 presents the preliminaries. In Chapter 3, I present the localization of a single stationary sound source in a 3D environment. Both the azimuth and elevation angles of a stationary sound source are identified using the phase angle and amplitude of the acquired ICTD signal. An SSL algorithm based on an extended Kalman filter (EKF) is developed. The observability analysis reveals the singularity of the state estimates when the sound source is placed above the microphone array. A means of detecting this singularity is then proposed and incorporated into the proposed SSL algorithm. The proposed technique is tested in both a simulated environment and two hardware platforms, i.e., a KEMAR dummy binaural head and a robotic platform. All results show the fast and accurate convergence of estimates.
Chapter 4 presents a novel technique that performs both orientation and distance localization of a sound source in a 3D space using only the ICTD cue, generated by the self-rotating bi-microphone array mounted on the robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis of the state-space model shows that the system is unobservable when the sound source is placed with elevation angles of 90 and 0 degrees. The proposed method utilizes the difference between the azimuth estimates resulting from respectively the 3D and the two-dimensional (2D) models to check the zero-degree-elevation condition and further estimates the elevation angle using a polynomial curve fitting approach. Also, the proposed method is capable of detecting a 90-degree elevation by extracting the zero-ICTD signal 'buried' in noise. Additionally, a distance localization is performed by first rotating the microphone array to face toward the sound source and then shifting the microphone perpendicular to the source-robot vector by a predefined distance of a fixed number of steps. The integrated rotational and translational motions of the microphone array provide a complete orientation and distance localization using only the ICTD cue. The proposed technique is first tested in simulation and is then verified on the robotic platform. Experimental data collected by the microphones installed on a KEMAR dummy head are also used to test the proposed technique. All results show the effectiveness of the proposed technique.
In Chapter 5, I present two novel approaches to perform 3D multi-sound-source localization (MSSL) using only the ICTD signal generated by a self-rotating bi-microphone array. The two approaches are based on two machine learning techniques, viz., Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Random Sample Consensus (RANSAC) algorithms, respectively, whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their 3D orientations in a reverberant
environment.
Chapter 6 presents three approaches to localizing and tracking a sound source moving in a 3D space using a bi-microphone array rotating at a fixed angular velocity. The motion of the sound source along with the rotation of the bi-microphone array results in a sinusoidal ICTD signal with time-varying amplitude and phase. Four state-space models are employed to develop EKFs that identify the instantaneous amplitude and phase of the signal. Observability analysis of the four state-space models is conducted to reveal singularities. A method based on Hilbert transform is also developed, which compares the analytic signal of the true ICTD signal with a virtual signal having zero elevation and azimuth angles. A moving average filter is then applied to reduce the noise and the effect of the artifacts at the beginning and the ending portion of the estimates. The effectiveness of the proposed methods is tested and comparison studies are conducted in the simulation.

In this paper, we present three approaches to localizing and tracking a sound source moving in a ... more In this paper, we present three approaches to localizing and tracking a sound source moving in a three-dimensional (3D) space using a bi-microphone array rotating at a fixed angular velocity. The motion of the sound source along with the rotation of the bi-microphone array results in a sinusoidal inter-channel time difference (ICTD) signal with time-varying amplitude and phase. Two state-space models were employed to develop extended Kalman filters (EKFs) that identify instantaneous amplitude and phase of the signal. Observability analysis of the two state-space models was conducted to reveal singularities. We also developed a method based on Hilbert transform, which is done by comparing the analytic signal of the true ICTD signal with that of a virtual signal having zero elevation and azimuth angles. A moving average filter is then applied to reduce the noise and the effect of the artifacts at the beginning and the ending portion of the estimates. The effectiveness of the proposed methods was tested and comparison studies were conducted in the simulation.

This work presents a novel technique that performs both orientation and distance localization of ... more This work presents a novel technique that performs both orientation and distance localization of a sound source in a three-dimensional (3D) space using only the interaural time difference (ITD) cue, generated by a newly-developed self-rotational bi-microphone robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis of the state-space model shows that the system is unobservable when the sound source is placed with elevation angles of 90 and 0 degree. The proposed method utilizes the difference between the azimuth estimates resulting from respectively the 3D and the two-dimensional models to check the zero-degree-elevation condition and further estimates the elevation angle using a polynomial curve fitting approach. Also, the proposed method is capable of detecting a 90-degree elevation by extracting the zero-ITD signal 'buried' in noise. Additionally, a distance localization is performed by first rotating the microphone array to face toward the sound source and then shifting the microphone perpendicular to the source-robot vector by a predefined distance of a fixed number of steps. The integrated rotational and translational motions of the microphone array provide a complete orientation and distance localization using only the ITD cue. A novel robotic platform using a self-rotational bi-microphone array was also developed for unmanned ground robots performing sound source localization. The proposed technique was first tested in simulation and was then verified on the newly-developed robotic platform. Experimental data collected by the microphones installed on a KEMAR dummy head were also used to test the proposed technique. All results show the effectiveness of the proposed technique.
This paper presents a novel three-dimensional (3D) sound source localization (SSL) technique base... more This paper presents a novel three-dimensional (3D) sound source localization (SSL) technique based on only Interaural Time Difference (ITD) signals, acquired by a self-rotational two-microphone array on an Unmanned Ground Vehicle. Both the azimuth and elevation angles of a stationary sound source are identified using the phase angle and amplitude of the acquired ITD signal. An SSL algorithm based on an extended Kalman filter (EKF) is developed. The observability analysis reveals the singularity of the state when the sound source is placed above the microphone array. A means of detecting this singularity is then proposed and incorporated into the proposed SSL algorithm. The proposed technique is tested in both a simulated environment and two hardware platforms, i.e., a KEMAR dummy binaural head and a robotic platform. All results show the fast and accurate convergence of estimates.

Speech enhancement aims to improve the speech quality by using
various techniques. Spectral Subtr... more Speech enhancement aims to improve the speech quality by using
various techniques. Spectral Subtraction Technique is one earliest
and longer standing, popular approaches to noise compensation
and speech enhancement. It reduces stationary noise but the non
stationary noise still passes through it. Further, it also introduces a
musical noise which is very annoying to human ears. Beamforming
is another possible method of speech enhancement that can be
used. Beamforming by itself, however, does not appear to provide
enough improvement. Further, the performance of Beamforming
becomes worse if the noise source comes from many directions or
the speech has strong reverberation. A combined technique using
the Spectral Subtraction Technique followed by Beamforming
Technique reduces stationary as well as residual, musical noise. It
can be observed that the Spectral Subtraction followed by
Beamforming gives better SNR value as compared to that of
individual techniques, thereby improving the quality of speech.
Numerous simulation results are used to illustrate the reasoning.
Thesis Chapters by Deepak Gala

In all speech communication settings, the quality and intelligibility of speech is of utmost impo... more In all speech communication settings, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. The speech processing systems used to communicate or store speech are usually designed for a noise free environment but in a real-world environment, the presence of background interference in the form of additive background and channel noise drastically degrades the performance of these systems, causing inaccurate information exchange and listener fatigue.
The Spectral Subtraction Technique can be used to reduce stationary noise but the non stationary noise still passes through it. Spectral Subtraction also introduces a musical noise which is very annoying to human ears. Beamforming is another possible method of speech enhancement that can be used. Further, the musical noise of Spectral Subtraction can be reduced by Beamforming. Beamforming by itself, however, does not appear to provide enough improvement. Further, the performance of Beamforming becomes worse if the noise source comes from many directions or the speech has strong reverberation.
Therefore, a system has been designed with a combination of Spectral
Subtraction Technique followed by Beamforming Technique reducing stationary as well as residual, musical noise. Algorithms and associated software have been developed for 1) Spectral Subtraction 2)
Beamforming Technique and 3) Spectral Subtraction followed
by Beamforming Technique. The last developed technique results in getting a noise free speech free of musical noise and reverberation making the speech intelligible and of good quality.
Processing of the signal for Spectral Subtraction, Delay Sum Beamforming and the Combined Techniques, was carried out individually for three different experiments (with 3 , 6 and 10 microphones) and for 4 different cases with 3 different signals and
fourth a signal with Gaussian white Noise. The SNR in each case was calculated.
Uploads
Papers by Deepak Gala
The rest of the dissertation is as follows. Chapter 2 presents the preliminaries. In Chapter 3, I present the localization of a single stationary sound source in a 3D environment. Both the azimuth and elevation angles of a stationary sound source are identified using the phase angle and amplitude of the acquired ICTD signal. An SSL algorithm based on an extended Kalman filter (EKF) is developed. The observability analysis reveals the singularity of the state estimates when the sound source is placed above the microphone array. A means of detecting this singularity is then proposed and incorporated into the proposed SSL algorithm. The proposed technique is tested in both a simulated environment and two hardware platforms, i.e., a KEMAR dummy binaural head and a robotic platform. All results show the fast and accurate convergence of estimates.
Chapter 4 presents a novel technique that performs both orientation and distance localization of a sound source in a 3D space using only the ICTD cue, generated by the self-rotating bi-microphone array mounted on the robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis of the state-space model shows that the system is unobservable when the sound source is placed with elevation angles of 90 and 0 degrees. The proposed method utilizes the difference between the azimuth estimates resulting from respectively the 3D and the two-dimensional (2D) models to check the zero-degree-elevation condition and further estimates the elevation angle using a polynomial curve fitting approach. Also, the proposed method is capable of detecting a 90-degree elevation by extracting the zero-ICTD signal 'buried' in noise. Additionally, a distance localization is performed by first rotating the microphone array to face toward the sound source and then shifting the microphone perpendicular to the source-robot vector by a predefined distance of a fixed number of steps. The integrated rotational and translational motions of the microphone array provide a complete orientation and distance localization using only the ICTD cue. The proposed technique is first tested in simulation and is then verified on the robotic platform. Experimental data collected by the microphones installed on a KEMAR dummy head are also used to test the proposed technique. All results show the effectiveness of the proposed technique.
In Chapter 5, I present two novel approaches to perform 3D multi-sound-source localization (MSSL) using only the ICTD signal generated by a self-rotating bi-microphone array. The two approaches are based on two machine learning techniques, viz., Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Random Sample Consensus (RANSAC) algorithms, respectively, whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their 3D orientations in a reverberant
environment.
Chapter 6 presents three approaches to localizing and tracking a sound source moving in a 3D space using a bi-microphone array rotating at a fixed angular velocity. The motion of the sound source along with the rotation of the bi-microphone array results in a sinusoidal ICTD signal with time-varying amplitude and phase. Four state-space models are employed to develop EKFs that identify the instantaneous amplitude and phase of the signal. Observability analysis of the four state-space models is conducted to reveal singularities. A method based on Hilbert transform is also developed, which compares the analytic signal of the true ICTD signal with a virtual signal having zero elevation and azimuth angles. A moving average filter is then applied to reduce the noise and the effect of the artifacts at the beginning and the ending portion of the estimates. The effectiveness of the proposed methods is tested and comparison studies are conducted in the simulation.
various techniques. Spectral Subtraction Technique is one earliest
and longer standing, popular approaches to noise compensation
and speech enhancement. It reduces stationary noise but the non
stationary noise still passes through it. Further, it also introduces a
musical noise which is very annoying to human ears. Beamforming
is another possible method of speech enhancement that can be
used. Beamforming by itself, however, does not appear to provide
enough improvement. Further, the performance of Beamforming
becomes worse if the noise source comes from many directions or
the speech has strong reverberation. A combined technique using
the Spectral Subtraction Technique followed by Beamforming
Technique reduces stationary as well as residual, musical noise. It
can be observed that the Spectral Subtraction followed by
Beamforming gives better SNR value as compared to that of
individual techniques, thereby improving the quality of speech.
Numerous simulation results are used to illustrate the reasoning.
Thesis Chapters by Deepak Gala
The Spectral Subtraction Technique can be used to reduce stationary noise but the non stationary noise still passes through it. Spectral Subtraction also introduces a musical noise which is very annoying to human ears. Beamforming is another possible method of speech enhancement that can be used. Further, the musical noise of Spectral Subtraction can be reduced by Beamforming. Beamforming by itself, however, does not appear to provide enough improvement. Further, the performance of Beamforming becomes worse if the noise source comes from many directions or the speech has strong reverberation.
Therefore, a system has been designed with a combination of Spectral
Subtraction Technique followed by Beamforming Technique reducing stationary as well as residual, musical noise. Algorithms and associated software have been developed for 1) Spectral Subtraction 2)
Beamforming Technique and 3) Spectral Subtraction followed
by Beamforming Technique. The last developed technique results in getting a noise free speech free of musical noise and reverberation making the speech intelligible and of good quality.
Processing of the signal for Spectral Subtraction, Delay Sum Beamforming and the Combined Techniques, was carried out individually for three different experiments (with 3 , 6 and 10 microphones) and for 4 different cases with 3 different signals and
fourth a signal with Gaussian white Noise. The SNR in each case was calculated.