Reports on Spatial Sound Research


List of Report Titles

 

20. Characterizing Elevation Effects of a Prolate Spheroidal HRTF Model

19. The Incorporation of Range in a DSP-Based HRTF Model

18. Investigation of the Value of Adaptation for Azimuth Estimation from Speech in a Natural Auditory Environment

17. Monaural Estimation of Sound Source's Elevation Angle Using Cepstrum Analysis

16. Synthesis of Binaural Sound Using Digital Filters and a Head-Tracking System

15. Sound Localization using the SAA7740 Digital Signal Processor

14. Estimating Azimuth From Speech in a Natural Auditory Environment

13. Modeling the Elevation Characteristics of the Head-Related Impulse Response

12. Binaural and Monaural Elevation Estimate for Sound Sources

11. A Real-Time Onset Detector Using an Automatic Gain Control

10. A Real-Time DSP System for Estimating the Azimuth to a Sound Source from the Interaural Intensity Difference

9. A Sound Localization System Using Correlograms and Croscorrelograms from a Cochlear Model

8. An Adaptive Procedure for the Optimization of an Acoustic Onset Detector

7. Use of System Identification Methods to Estimate the Azimuth and Elevation of a Sound Source

6. Sound Localization: Estimation of Azimuth Angle by Using Binaural Cues

5. A Sound Localization System Using Lyon's Cochlear Model and Lindemann's Cross-Correlation Model

4. Estimating Azimuth and Elevation from the Interaural Intensity Difference

3. Sound Resynthesis from a Correlogram

2. Implementing Time-Variable DSP Filters to Synthesize Binaural Sounds

1. Synthesis of Localized Sound

0. A Correlation-Based Location Algorithm for Multiple Sound Sources


Full Citations and Abstracts


20. Richard W. Novy, "Characterizing Elevation Effects of a Prolate Spheroidal HRTF Model," Technical Report No. 20, NSF Grant No. IRI-9619339, Dept. of Elec. Engr., San Jose State Univ. (May, 1998).

Abstract: This thesis presents numerical solutions to the problem of the diffraction of an acoustic plane wave by a rigid, prolate spheroid. A sphere is often used as a model for the low-frequency behavior of human head-related transfer functions. Although a spherical model can explain azimuth effects, it does not exhibit any elevation dependence. By contrast, the HRTF for a prolate spheriod varies significantly with both azimuth and elevation. The theoretical solution for the spheroid is much more complicated than the solution for the sphere, which limited the number of cases that could be numerically studied. However, the results presented indicate that person-to-person differences in the shape of the head could introduce significant spectral changes that should be included in HRTF models.


19. Haihui Chen, "The Incorporation of Range in a DSP-Based HRTF Model," Technical Report No. 19, NSF Grant No. IRI-9619339, Dept. of Elec. Engr., San Jose State Univ. (June, 1998).

Abstract: There is an infinite series solution for the head-related transfer function of rigid sphere. The results are a function of both azimuth and range. Simplified pole/zero-plus-delay models have been previously developed for the case where the source is infinitely distant. This report extends these results to include range dependence. It also describes an effective demonstration system that used the sphere model for close-range effects and a simplified room model for distant-range effects.


18. Mary Susan Higgins, "Investigation of the Value of Adaptation for Azimuth Estimation from Speech in a Natural Auditory Environment," Technical Report No. 18, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (August, 1997).

Abstract: This report describes the results of adapting the parameters in Henderson's localization system [14]. Henderson used fixed weightings for scoring such quantities as degree of coincidence over time and coincidence over frequency, so that the procedure was the same whether the acoustic environment was anechoic or highly reverberant. The original premise was that localization accuracy could be improved by adapting these weightings for different acoustic environments. Although the improvement obtained was not significant, the required modification of Henderson's procedure to allow adaptation provided a more uniform implementation.


17. Chien Sheng Yang, "Monaural Estimation of Sound Source's Elevation Angle Using Cepstrum Analysis," Technical Report No. 17, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (December, 1996).

Abstract: This report describes a signal-processing approach for monaural sound localization. The methodology used is cepstrum analysis. This project is similar to earlier work by William Chau [2], [3], except many system blocks were redesigned. Specifically, this project uses Gaussian smoothing filter instead of gammatone filters and uses a second-difference filter matrix instead of a second-difference operator. Although the system displays large estimation error when the sound signal is narrow-band and at a high elevation angle (above 50 deg), overall the system can estimate low elevation angles with a low average error.


16. Daniel A. Leon and Robert A. Jacobson, "Synthesis of Binaural Sound Using Digital Filters and a Head-Tracking System," Technical Report No. 16, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (December, 1996).

Abstract: A binaural sound synthesizing system was designed, implemented, and evaluated in this senior design project. A head-tracking device was utilized with the system to measure head position in real time. The position information from the head tracker was used to alter digital filters in real time. These filters were based upon binaural models developed by R. O. Duda and C. P. Brown. A Tucker-Davis DSP prototyping system was utilized to implement simple DSP models in conjunction with the head tracker. More complex DSP routines were developed for a Turtle Beach "Tahiti" sound board which supported a Motorola DSP56001. This system produced externalized three-dimensional sounds that maintained a constant position in space, independent of head movement. The programming of the system was based on a modular approach, thereby facilitating future expansion and extension of algorithms.


15. David P. Bolander, "Sound Localization using the SAA7740 Digital Signal Processor," Technical Report No. 15, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (December, 1996).

Abstract: The purpose of this project is to design and test Digital Signal Processor (DSP) hardware that can be used to generate 3D sound effects through headphones. The DSP used for this design is the SAA7740. The SAA7740 is a dedicated audio processor that is new in the market and is designed to be used in stereo components for the generation of special sound effects including concert hall and surround sound. A previously developed model of a head-related impulse responses (HRIR), developed by C. Phillip Brown for his Master's thesis in May 1996, is implemented using hardware for real-time experiments. The project consists of two hardware modules and a software module that can be operated using a simple PC with a parallel port. The parameters that can be entered include desired azimuth, elevation and volume.


14. Nathaniel Henderson, "Estimating Azimuth From Speech in a Natural Auditory Environment," Technical Report No. 14, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (August, 1996).

Abstract: This thesis investigates and develops a system that enables computers to estimate the azimuth of a sound source. The system was developed with express intent of estimating the azimuth from speech in a reverberant environment. The system's design is based on a human auditory model. The system calculates the interaural level difference (ILD), interaural time difference (ITD), and interaural envelope delay (IED) from the output of a cochlear model. Fuzzy sets are defined to extract non-reverberant localization cues from any present sound sources. These fuzzy sets account for onsets, coincidence over time and frequency, and temporal integration. The fuzzy weighted ILD, ITD, and IED calculations are combined, and a maximum-likelihood criterion is used to produce a probability map of the perceived azimuth for the system. Experiments with 60 samples of speech and other sounds showed that the system performs with an error rate equivalent to that of humans.


13. C. Phillip Brown, "Modeling the Elevation Characteristics of the Head-Related Impulse Response," Technical Report No. 13, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (May, 1996). (A PDF version of this thesis is available.)

Abstract: This thesis presents the research performed to develop and validate a simple signal-processing-based model of the head-related impulse response (HRIR). The model captures elevation as well as azimuth cues. The simplicity of the model permits efficient implementation in signal processing hardware, allowing for real-time operation. The parameters in the model can be adjusted to fit a particular individual's HRIR. The evaluation is based on listening tests in which the output of the model is compared to that of experimentally measured HRIR's.


12. William Chau, "Binaural and Monaural Elevation Estimate for Sound Sources," Technical Report No. 12, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (May, 1995).

Abstract: This paper investigates and attempts to combine binaural and monaural estimates for sound localization. For the binaural method, we use the interaural level difference (ILD) to make elevation as well as azimuth estimates. We use a second-difference of the spectrum to get monaural elevation estimates for sound sources near the median plane. Experimental results show that these methods work well for impulsive inputs. However, monaural localization was unacceptable for speech input.


11. Vincent Arcelo, "A Real-Time Onset Detector Using an Automatic Gain Control," Technical Report No. 11, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (May, 1995).

Abstract: A real-time onset detector is described that can be used to handle the problem of multiple sound sources, including echoes. A simple, real-time onset detector is implemented on a TMS320C26 evaluation board by using an automatic-gain-control loop and by thresholding the difference of the short-term average and the long-term average of the high-frequency power from a sound source. Several different test signals were applied to the detector to verify its effectiveness.


10. Chetan Katira, "A Real-Time DSP System for Estimating the Azimuth to a Sound Source from the Interaural Intensity Difference," Technical Report No. 10, NSF Grant No. IRI-9402246, Dept. of Elec. Engr., San Jose State Univ. (May, 1995).

Abstract: The system built under this project is a sound source locator. It is designed to estimate the azimuth angles of a single sound source in real time. The hardware includes two inexpensive DSP boards and a PC. The system uses binaural audio signals and computes the interaural intensity difference to estimate the azimuth angle. The system performs best with wide-band signals, and will need an onset detector for reliable performance.


9. Chuck Lim, "A Sound Localization System Using Correlograms and Croscorrelograms from a Cochlear Model," Technical Report No. 9, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (November, 1994).

Abstract: Interaural level differences (ILD's) and interaural time differences (ITD's) provide very important information for estimating the elevation as well as the azimuth of a sound source. It is believed that elevation is basically determined from the ILD and azimuth from the ITD. The ILD and ITD information can be extracted from a filter-bank model of a human cochlea using short-term autocorrelation and crosscorrelation operations. A simple neural network system utilizing a maximum-likelihood approach is presented, and performance is shown to be very comparable to human localization abilities for a single, wide-band sound source in an anechoic environment.


8. Tareq Shahwan, "An Adaptive Procedure for the Optimization of an Acoustic Onset Detector," Technical Report No. 8, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (May, 1994).

Abstract: A method for detecting the arrival of new acoustic events is developed and evaluated. It employs lateral inhibition from adjacent channels of a filter bank to suppress the false-positive responses of standard onset detectors to chirps or sirens. A special version of the Least Mean Square (LMS) algorithm is used to separate tonal onsets from chirps. MATLAB simulations are used to determine the sensitivity of the method to chirp rate, dynamic range, and signal-to-noise ratio.


7. Hung Nguyen, "Use of System Identification Methods to Estimate the Azimuth and Elevation of a Sound Source," Technical Report No. 7, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (December, 1993).

Abstract: This project addresses the problem of estimating the azimuth and elevation of a sound source from features of the interaural transfer function. One feature is the interaural time difference, which is found by crosscorrelation. Another set of features are the LPC coefficients, which are found using system identification techniques. A neural network is used to estimate the azimuth and elevation of a single, wide-band source from these features. Experimental results show good accuracy in estimating azimuth, but poor accuracy in estimating elevation.


6. Ramin Djamschidi, "Sound Localization: Estimation of Azimuth Angle by Using Binaural Cues," Technical Report No. 6, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (January, 1994).

Abstract: This project concerns the estimation of the azimuth to a sound source from the binaural outputs of a filter-bank model of the cochlea. Binaural input signals were obtained by convolving monaural test sounds with a set of head related transfer functions for the KEMAR manikin. A 12-channel Gammatone filter followed by half-wave rectification and low-pass filtering was used for the cochlear model. One azimuth estimate was obtained from the differences (in dB) of the average outputs of corresponding channels, which provided an interaural-intensity-difference spectrum that was matched against stored spectral shapes. A second azimuth estimate was obtained by crosscorrelating the outputs of corresponding channels. Good accuracy was obtained with recordings of wide-band test sounds.


5. Binh Tran and Tuan Tran, "A Sound Localization System Using Lyon's Cochlear Model and Lindemann's Cross-Correlation Model," Technical Report No. 5, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (December, 1993).

Abstract: The purpose of this project was to design, implement and test a simulated system to locate sound sources. This system combined Lyon's cochlear model to divide the signals into perceptually important spectral bands, and Lindemann's contralateral inhibition model to suppress echoes and room reverberation and to determine the direction of arrival of sound sources in the horizontal plane. This biologically motivated system also included a head model to simulate head diffraction and an automatic gain control system to compensate for signal level differences. Experimental results showed that the system could localize sound sources, but echo suppression was ineffective due to the high sensitivity of the system to signal levels.


4. Richard O. Duda, "Estimating Azimuth and Elevation from the Interaural Intensity Difference," Technical Report No. 4, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (September, 1993).

Abstract: This report summarizes the results of experiments in estimating the azimuth and elevation of a sound source from the interaural level difference (ILD). ILD data were obtained from two sources -- KEMAR measurements made by the author in 1991 and SLV data from the University of Wisconsin. Results obtained in both cases are very similar. For azimuths out to about 70 degrees, the scale of the ILD surface is directly related to azimuth, and the shape of the surface is directly related to elevation. Maximum-likelihood estimates of azimuth and elevation using ILD alone have an accuracy close to that of humans. However, when azimuth and elevation are estimated using a model based on SLV and data from KEMAR, performance is greatly degraded. This result shows the importance of customizing head related transfer functions to obtain veridical localization perception.


3. Daniel Naar, "Sound Resynthesis from a Correlogram," Technical Report No. 3, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (May, 1993).

Abstract: A unique method of analysis creates a representation of sound, called the correlogram, that may allow sounds produced simultaneously to be separated. If a separation technique is developed, it will prove useful to be able to resynthesize the original sound. The resynthesis poses a difficult problem, because the analysis procedure includes nonlinearities and removes phase information from the signal. In this thesis, we integrate and modify a set of known techniques, thereby developing an algorithm to resynthesize speech from its correlogram. The algorithm is implemented and shown to successfully resynthesize speech.


2. Tom Cassaro and Mark Van Belleghem, "Implementing Time-Variable DSP Filters to Synthesize Binaural Sounds," Technical Report No. 2, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (May, 1993).

Abstract: This project concerns the development of a software application written in C and Motorola 56001 assembly language that converts a monaural digitized sound file into binaural audio sound. The task was accomplished by implementing a pair of head related transfer functions (HRTF's) on Digidesign's Sound Accelerator board for a Macintosh II computer. While listening to the sound through a pair of headphones, the user can experiment with the HRTF's and get real-time feedback on the effects of changing filter parameter values.


1. Chien C. Dinh, "Synthesis of Localized Sound," Technical Report No. 1, NSF Grant No. IRI-9214233, Dept. of Elec. Engr., San Jose State Univ. (May, 1993).

Abstract: This project presents an analysis of the azimuth dependence of the ear's acoustic impulse response. Using actual sampled acoustic data, we sought analytical solutions which approximate the ear's natural response. One of the solutions, presented in this report, is a four-pole low-pass model. This solution was implemented as a DSP model, capable of processing live sound samples. The final result allows real-time evaluation of the analytical solution through live hearing tests.


0. Charles Wayman, "A Correlation-Based Location Algorithm for Multiple Sound Sources," M.S. Project Report, Dept. of Comp. Engr., San Jose State Univ. (May, 1992).

Abstract: The purpose of this research was to produce a system to localize the direction of sound sources in the horizontal half plane. The algorithm presented herein successfully detected the direction of several simultaneously active sound sources to an accuracy of +- 5 degrees. It works by continuously detecting acoustic onsets which are characterized by sudden increases in acoustic energy. When an onset is detected, cross correlation is used to determine the interaural time delay. This time delay is then used to calculate the direction to the sound source. By performing the calculation over several frequency bands, in simulations we were able to locate multiple sound sources in a reverberant environment.


Senior Project: Kenneth Deng, Tony Chang and Daisy Dung Zhao, "Variable Notch Filter," EE199 Project Report, Dept. of Elec. Engr., San Jose State Univ. (April 1992).

Abstract. The human ears can distinguish sounds that come from directly in front, above, below or behind them. The objective of this project is to design and construct a cascade variable analog filter for real-time simulation of the effects of sounds reflecting from the outer ears and shoulders. The design incorporates the use of two separate circuits for attenuation and delay. One of the circuits is based on an analog delay chip that can vary the delay from 150 microseconds to 1.5 milliseconds. The other circuit is based on an op-amp implementation of a notch filter with varying frequency from 2.5 kHz to 20 kHz.


Senior Project: Tony Urayama, Mark Y. Fu, John Q. Lee and David B. Tong, "A Real-Time Monaural-to_Binaural Head-Shadow Filter," EE199 Project Report, Dept. of Elec. Engr., San Jose State Univ. (December 1990).

Abstract: The frequency response and time delay to the two ears are two factors that help one determine the location of a sound source. The objective of this project was to reprodeuce the time delay and frequency response shifts that occur when sound waves diffract around the head to the two ears. The design incorporates the use of a pair of coordinated linear filters (which vary the frequency response) in conjunction with a pair of analog delay lines (which control delay times) to achieve the desired location dependence effects in real time.

 


Last updated: 8/1/98

Home page for R. O. Duda