PUBLICATIONS

Members of CASPR have been involved in the research documented in the following scientific publications:

Journal Papers

Effects of Reduced Information in the Performance of Low-Frequency Sound Zones. J. Cadavid, M. Møller, T. van Waterschoot, S. Bech, J. Østergaard. Journal of the Audio Engineering Society, 2025.
Single-Microphone Deep Envelope Separation Based Auditory Attention Decoding for Competing Speech and Music. A. Tanveer, J. Jensen, Z-H. Tan, J. Østergaard. Journal of Neural Engineering, vol. 22, issue 3, 2025.
Low-latency Deep Analog Speech Transmission using Joint Source Channel Coding. Mohammad Bokaei, Jesper Jensen, Simon Doclo, Jan Østergaard, Journal of Selected Topics in Signal Processing, Accepted 2025.
Noise-Robust Hearing Aid Voice Control, Ivan Lopez-Espejo, Eros Rosello, Amin Edraki, and Jesper Jensen, IEEE Signal Processing Letters, Accepted 2025.
Hearing Loss Compensation Using Deep Neural Networks: A Framework and Results from a Listening Test. Peter Asbjørn Leer Bysted, Jesper Jensen, Laurel Carney, Zheng- Hua Tan, Jan Østergaard, Lars Bramsløw, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Accepted 2025.
Investigating the design space of diffusion models for speech enhancement, Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May, IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 4486- 4500. Vol.32, October 2024.
Identifying principal attributes for evaluating audio quality of reproduction systems with spatially dynamic program material. P.N.P. Moreta, S. Bech, J. Francombe, J. Østergaard, S. van de Par. Journal of the Audio Engineering Society. Vol. 72(9), 2024.
Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. A. Simon, S. Bech, G. Loquet, J. Østergaard. European journal of Neuroscience, Vol. 59(8), 2024.
Effects of Background Noise and Linguistic Violations on Frontal Theta Oscillations During Effortful Listening. Y. Mohammadi, C. Graversen, J.B. Manresa, J. Østergaard, O.K. Andersen. Ear and Hearing Vol. 45(3), 2024.
Joint Far- and Near-end Speech and Listening Enhancement with Minimum Processing. A. J. Fuglsig, Z.-H. Tan, L. S. Bertelsen, J. Jensen, J. C. Lindof, and J. Østergaard, IEEE Access, 2024.
Performance of low-frequency sound zones with very fast room impulse response measurements. J. Cavadid, M. Møller, C.S. Pedersen, S. Bech, T. van Waterschoot, J. Østergaard. The Journal of the Acoustical Society of America, Vol. 155(1), 2024.
The Effect of Training Dataset Size on Discriminative and Diffusion-Based Speech Enhancement Systems. P. Gonzalez, Z.-H. Tan, J. Østergaard, J. Jensen, T. S. Alstrøm, and T. May, IEEE Signal Processing Letters, 2024.
Generating Accurate and Diverse Audio Captions through Variational Autoencoder Framework. Y. Zhang, R. Du, Z.-H. Tan, W. Wang, and Z. Ma, IEEE Signal Processing Letters, 2024.
How to train your ears: Auditory-model emulation for large-dynamic-range inputs and mild-to-severe hearing losses. P. A. L. Bysted, J. Jensen, Z.-H. Tan, J. Østergaard, and L. Bramsløw, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 32, pp. 2006-2020, 2024.
Reduced complexity for sound zones with subband block adaptive filters and a loudspeaker line array. M.B. Møller, J. Martinez, and J. Østergaard. The Journal of the Acoustical Society of America 155 (4), 2314-2326, April, 2024.
Data-Driven Non-Intrusive Speech Intelligibility Prediction using Speech Presence Probability. M. B. Pedersen, Z.-H. Tan, S. H. Jensen, and J. Jensen. IEEE Trans. Audio, Speech, Language Process., Vol. 32, pp. 55- 67, Oct. 2023.
Masked spectrogram prediction for unsupervised domain adaptation in speech enhancement. K. Žmolíková, M. S. Pedersen, J. Jensen. IEEE Open Journal of Signal Processing, 2023. Accepted.
Validity and reliability of self-reported and neural measures of listening effort. Y. Mohammadi, J. Østergaard, C. Graversen, O.K. Andersen & J. Biurrun Manresa. European Journal of Neuroscience. 58(11), pp. 4357-4370. Dec. 2023.
Phase-locking of neural activity to the envelope of speech in the delta frequency band reflects differences between word lists and sentences. Y. Mohammadi, C. Graversen, J. Østergaard, O. K. Andersen and T. Reichenbach. Journal of Cognitive Neuroscience, vol. 35(8):1301-1311, August 2023.
ACTUAL: Audio Captioning with Caption Feature Space Regularization. Y. Zhang, H. Yu, R. Du, Z.-H. Tan, W. Wang, Z. Ma, and Y. Dong. IEEE/ACM Transactions on Audio, Speech and Language Processing, accepted. 2023.
On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification. A. K. Sarkar and Z.-H. Tan. Acoustics, accepted. 2023.
Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise. C. Heider and Z.-H. Tan. IEEE Open Journal of Signal Processing, vol. 4, pp. 179-187, 2023.
Cortical Auditory Attention Decoding During Music And Speech Listening. Simon, A., Loquet, G., Østergaard, J. & Bech, S. Accepted for publication in IEEE Transactions on Neural Systems and Rehabilitation Engineering, June 2023.
The Internet of Sounds: Convergent Trends, Insights and Future Directions. Turchet, L., Lagrange, M., Rottondi, C., Fazekas, G., Peters, N., Ostergaard, J., Font, F., Backstrom, T. & Fischione, C., 2023, I: IEEE Internet of Things Journal. 10, 13, s. 11264-11292 29 s., 10061604.
On the Deficiency of Intelligibility Metrics as Proxies for Subjective Intelligibility. I. López-Espejo, A. Edraki, W.-Y. Chan, Z.-H. Tan, and J. Jensen. Accepted by Elsevier Speech Communication, 2023.
Utilization of acoustic signals with generative Gaussian and autoencoder modeling for condition-based maintenance of injection moulds. G. Ø. Rønsch, I. López-Espejo, D. Michelsanti, Y. Xie, P. Popovski, and Z.-H. Tan. Accepted by International Journal of Computer Integrated Manufacturing, 2022.
Performance of Low Complexity Fully Connected Neural Networks for Monoaural Speech Enhancement. H. Reddy, A. Kar, J. Østergaard. Applied Acoustics, 2022.
A Family of Split Kernel Adaptive Filtering Algorithms for Nonlinear Stereophonic Acoustic Echo Cancellation. S. Burra, S. Sankar, A. Kar, J. Østergaard. Journal of Ambient Intelligence and Humanized Computing, 2022. 41, 2, pp. 1019–1037.
Incremental Refinements and Multiple Descriptions with Feedback. Østergaard, J., Erez, U. & Zamir, R., 2022. Accepted for publication in IEEE Transactions on Information Theory.
Speech to noise ratio improvement induces nonlinear parietal phase synchrony in hearing aid users. P.S. Baboukani, C. Graversen, E. Alickovic, J. Østergaard. Frontiers in Neuroscience, August 2022.
iVAE-GAN: Identifiable VAE-GAN Models for Latent Representation Learning. IEEE Access, vol. 10, pp. 48405-48418, 2022.
Training Data-Driven Speech Intelligibility Predictors on Heterogeneous Listening Test Data. M. B. Pedersen, A. H. Andersen, S. H. Jensen, Z.-H. Tan and J. Jensen. IEEE Access, vol. 10, pp. 66175-66189, 2022.
Incremental Refinements and Multiple Descriptions with Feedback. Østergaard, J., Erez, U. & Zamir, R., 2022. Accepted for publication in IEEE Transactions on Information Theory.
The Minimum Overlap-Gap Algorithm for Speech Enhancement. P. Hoang, Z.-H. Tan, J.-M. de Haan, J. Jensen, IEEE Access, February 2022.
Shouted and Whispered Speech Compensation for Speaker Verification Systems. S. Prieto, A. Ortega, I. López-Espejo, and E. Lleida. Accepted by Elsevier Digital Signal Processing, 2022.
Deep Spoken Keyword Spotting: An Overview. I. López-Espejo, Z.-H. Tan, J. Hansen, and J. Jensen. Accepted by IEEE Access, 2021.
Minimum Processing Beamforming. A. Zahedi, Michael S. Pedersen, J. Østergaard, T. Christiansen, L. Bramsløw and J. Jensen. IEEE Trans. Audio, Speech, Language Process. Vol. 29, pp. 2710-2724, 2021.
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting. I. López-Espejo, Z.-H. Tan, and J. Jensen. Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021.
Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization. J. Xie, Z. Ma, G. Zhang, J.-H. Xue, Z.-H. Tan and J. Guo. Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence.
Self-Segmentation of Pass-Phrase Utterances for Deep Feature Learning in Text-Dependent Speaker Verification. A. k. Sarkar and Z.-H. Tan. Accepted by Computer Speech & Language.
Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding. A. k. Sarkar, Z.-H. Tan. Accepted by IEEE Signal Processing Letters.
A Family of Adaptive Volterra Filters Based on Maximum Correntropy Criterion for Improved Active Control of Impulse Noise. Guttikonda, S. Burra, A. Kar, J. Østergaard, P. Sooraksa, V. Mladenovics, D. Haddad. Accepted for publication in Elsevier Circuits, Systems, and Signal Processing 2021.
Multiple Sub Filter Based Proportionate Filtering for Nonlinear Acoustic Echo Cancellation. V. Burra, A. Kar and J. Østergaard. Accepted for publication in Journal of Applied Acoustics, 2021.
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation. J. M. Martín-Doñas, J. Jensen, Z.-H. Tan, A. M. Gomez, and A. M. Peinado. Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing.
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation, D. Michelsanti, Z.-H. Tan, S.-X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen. IEEE Trans. Audio, Speech, Language Process.Vol. 29, pp. 1368-1396, 2021.
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis, A. Edraki, W.-Y. Chan, J. Jensen, and D. Fogerty. IEEE Trans. Audio, Speech, Language Process. Vol. 29, pp. 210-225, 2021.
Directed Data-Processing Inequalities for Systems with Feedback. M. Derpich and J. Østergaard. Entropy, Vol.23., April 2021.
Minimum Processing Beamforming. A. Zahedi, M. Pedersen, J. Østergaard, T. Christiansen, L. Bramsløw, and J. Jensen. Accepted by IEEE by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2021.
Deep InterBoost Networks for Small-sample Image Classification. X. Li, D. Chang, Z. Ma, Z.-H. Tan, J.-H. Xue, J. Cao and J. Guo. Accepted by Neurocomputing, 2020.
On the Comparisons of Decorrelation Approaches for non-Gaussian Neutral Vector Variables. Z. Ma, X. Lu, J. Xie, Z. Yang, J.-H. Xue, Z.-H. Tan, B. Xiao, J. Guo. Accepted by IEEE Transactions on Neural Networks and Learning Systems, 2020.
Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices. I. López-Espejo, Z.-H. Tan and J. Jensen. Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020.
OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer. X. Li, D. Chang, Z. Ma, Z.-H. Tan, J.-H. Xue, J. Cao, J. Yu, J. Guo. Accepted by IEEE Transactions on Image Processing, 2020.
On Loss Functions for Supervised Monaural Time- Domain Speech Enhancement. M. Kolbæk, Z.-H. Tan, S. H. Jensen and J. Jensen. Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing.
The Importance of Context When Recommending TV Content: Dataset and Algorithms. M. S. Kristoffersen, S. E. Shepstone, and Z.-H. Tan. Accepted by IEEE Transactions on Multimedia.
SketchSegNet+: An End-to-end Learning of RNN for Multi-Class Sketch Semantic Segmentation. Y. Qi and Z.-H. Tan. Accepted by IEEE Access.
rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method. Z.-H. Tan, A. Sarkar, and N. Dehak, accepted by Computer Speech and Language, vol. 59, pp. 1-21, January 2020. Source code: http://kom.aau.dk/~zt/online/rVAD/.
A Moving Horizon Framework for Sound Zones. M. Møller and J. Østergaard. IEEE Transactions on Audio, Speech and Language Processing Vol.28, pp. 256-265, 2020.
Estimating Conditional Transfer Entropy in Time Series Using Mutual Information and Nonlinear Prediction. P. Baboukani, C. Graversen, E. Alickovic, J. Østergaard. Entropy Vol. 22, October 2020.
Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks,” J. Amini, R. C. Hendriks, R. Heudsens, M. Guo, J. Jensen, IEEE Transactions Audio, Speech and Language Processing,” Vol. 28, No.1, pp. 1-12, Jan. 2020.
Zero-delay multiple descriptions of stationary scalar Gauss-Markov sources. A. Fuglsig, J. Østergaard. Entropy, MDPI, 21(12), 1185, December 2019.
Deep-learning-based audio-visual speech enhancement in presence of Lombard effect. D. Michelsanti, Z.-Hua Tan, J. Jensen, Speech Communication, Vol. 115, pp. 38-50, Dec. 2019.
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification. A. Sarkar, Z.-H. Tan, H. Tang, S. Shon, and J. Glass, IEEE Transactions on Audio, Speech and Language Processing, vol. 27, no. 8, pp.1267-1279, August 2019
Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation. J. M. Martín-Doñas, A. Peinado, I. López-Espejo, and A. Gomez, MDPI Applied Sciences, vol. 9, June 2019
On the Relationship between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement. M. Kolbæk, Z.-H. Tan and J. Jensen. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 27, no. 2, pp. 283-295, February 2019
Sound Quality Improvement for Hearing Aids in Presence of Multiple Inputs. A. Kar, A. Anand, J. Østergaard, S.H. Jensen, and M.N.S. Swarmy. In Circuits, Systems, and Signal Processing, Springer, 38(8), 3591-3615, April, 2019.
Mean Square Performance Evaluation in Frequency Domain for an Improved Adaptive Feedback Cancellation in Hearing Aids. A. Kar, A. Anand, J. Østergaard, S.H. Jensen, and M.N.S. Swarmy. In Signal Processing, Elsevier Journal, 157, pp. 45-61, 2019.
M. Z. Jahromi, A. Zahedi, J. Jensen, and J. Østergaard, Information Loss in the Human Auditory System, IEEE Trans. Audio, Speech, Language Process., Vol.27, No.3, pp.472-481, March 2019.
Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources. P. A. Stavrou, J. Østergaard, and C. Charalambous. IEEE Journal of Selected Topics in Signal Processing, 12, 5, pp.841-856, October 2018.
Asymmetric Coding for Rate-Constrained Noise Reduction in Binaural Hearing Aids. J. Amini, R. C. Hendriks, R. Heusdens, M. Guo, and J. Jensen. IEEE Trans. Audio, Speech, Language Process., 2018. Accepted.
Refinement and Validation of the Binaural Short Time Objective Intelligibility Measure for Spatially Diverse Conditions. A.H. Andersen, J.M. de Haan, Z.-H. Tan and J. Jensen. Elsevier Speech Communication, Vol. 102, pp. 1-13, Sept. 2018.
Non-Intrusive Speech Intelligibility Prediction using Convolutional Neural Networks. A.H. Andersen, J.M. de Haan, Z.-H. Tan and J. Jensen. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 26, No. 10, pp. 1925-1939, Oct. 2018.
A Spatial Self-Similarity Based Feature Learning Method for Face Recognition under Varying Poses. X. Duan and Z.-H. Tan, accepted by Pattern Recognition Letters, 2018.
Bias-compensated Informed Sound Source Localization Using Relative Transfer Functions. M. Farmani, M. S. Pedersen, Z.-H. Tan, and J. Jensen, accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018.
Using Closed-set Speaker Identification Score Confidence to Enhance Audio-based Collaborative Filtering for Multiple Users. S.E. Shepstone, Z.-H. Tan and M.S. Kristoffersen, accepted by IEEE Transactions on Consumer Electronics, 2018.
Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators. S. Braun, A. Kuklasinski, O. Schwartz, O. Thiergart, E.A.P. Habets, S. Gannot, S. Doclo, and J. Jensen. Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018.
A Perceptually Motivated LP Residual Estimator in Noisy and Reverberant Environments. R. Peng, Z.-H. Tan, X. Li, and C. Zheng, accepted by Speech Communication, 2017.
Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features. H. Yu, Z.-H. Tan, Z. Ma, R. Martin, and J. Guo, accepted by IEEE Transactions on Neural Networks and Learning Systems, 2017.
Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones. M. Sahidullah, D.A.L. Thomsen, R.G. Hautamaki, T. Kinnunen, Z.-H. Tan, R. Parts, M. Pitkanen, accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017.
iSocioBot – A Multimodal Interactive Social Robot. Z.-H. Tan, N.B. Thomsen, X. Duan, E. Vlachos, S.E. Shepstone, M.H. Rasmussen and J.L. Højvang, accepted by International Journal of Social Robotics, 2017.
Incorporating Pass-Phrase Dependent Background Models for Text-Dependent Speaker Verification. A. Sarkar and Z.-H. Tan, accepted by Computer Speech & Language, 2017.
Latent Dirichlet Mixture Model. J.-T. Chien, C.-H. Lee and Z.-H. Tan, accepted by Neurocomputing, 2017.
Visual Detection of Events of Interest from Urban Activity. S. Astaras, A. Pnevmatikakis and Z.-H. Tan, accepted by Wireless Personal Communications, 2017.
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks. M. Kolbæk, D. Yu, Z.-H. Tan and J. Jensen, IEEE Transactions on Audio, Speech and Language Processing, Vol. 25, No. 10, pp. 1901-1913, 2017.
DNN Filter Bank Cepstral Coefficients for Spoofing Detection. H. Yu, Z.-H. Tan, Y. Zhang, Z. Ma, and J. Guo, IEEE Access, to appear, 2017.
Informed Sound Source Localization Using Relative Transfer Functions for Hearing Aid Applications. M. Farmani, M. S. Pedersen, Z.-H. Tan and J. Jensen, IEEE Trans. Audio, Speech, Language Process., Vol. 25, No. 3, pp. 611-623, 2017.
Decorrelation of Neutral Vector Variables: Theory and Applications. Z. Ma, J.-H. Xue, A. Leijon, Z.-H. Tan, Z. Yang, and J. Guo, IEEE Transactions on Neural Networks and Learning Systems. To appear.
Audio-based Granularity-adapted Emotion Classification. S.W. Shepstone, Z.-H. Tan, and S.H. Jensen, IEEE Transactions on Affective Computing. To appear.
Text-Independent Speaker Identification Using the Histogram Transform Model. Z. Ma, H. Yu, Z.-H. Tan, and J. Guo, IEEE Access. To appear.
Multi-channel Wiener filters in binaural and bilateral hearing aids – speech intelligibility improvement and robustness to DoA errors. A. Kuklasiński and J. Jensen, Journal of the Audio Engineering Society., Vol. 25, No. 1/2, pp. 8 – 16, 2017.
Relaxed Binaural LCMV Beamforming. A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens and J. Jensen, IEEE Trans. Audio, Speech, Language Process., Vol. 25, No. 1, pp. 133 – 148, 2017.
Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems. M. Kolbæk, Z.-H. Tan and J. Jensen, IEEE Trans. Audio, Speech, Language Process., Vol. 25, No. 1, pp. 149 – 163, 2017.
Source Coding in Networks with Covariance Distortion Constraints. A. Zahedi, J. Østergaard, S.H. Jensen, P. Naylor, and S. Bech, IEEE Transactions on Signal Processing, Vol. 64, Issue 22, pp. 5943 – 5958, November 2016.
An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers. J. Jensen and C. H. Taal, IEEE Trans. Audio, Speech, Language Process., Vol. 24, No. 11, pp. 2009 – 2022, 2016. Matlab code.
Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech. A. H. Andersen, Z.-H. Tan, J. M. de Haan, and J. Jensen, IEEE Trans. Audio, Speech, Language Process., Vol. 24, No. 11, pp. 1908 – 1920, 2016.

Conference Papers

Detecting and Defending Against Adversarial Attacks on Automatic Speech Recognition via Diffusion Models. N. L. Kühne, A. H. F. Kitchen, M. S. Jensen, M. S. L. Brøndt, M. Gonzalez, C. Biscio, Z.-H. Tan. Accepted for IEEE ICASSP, April 2025.
Rate-Distortion under Neural Tracking of Speech: A Directed Redundancy Approach. J. Østergaard, S. G. Jayaprakash, R. Ordonez. Accepted for the IEEE Data Compression Conference, March 2025.
Near-End Listening Enhancement Using a Noise-Robust Linear Time-Invariant Filter. F. Villani, W.-Y. Chan, Z.-H. Tan, J. Østergaard, J. Jensen., Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), September 9 – 12, 2024.
Bayesian Sound Field Estimation Using Uncertain Data. J. Brunnstrom, M. B. Møller, J. Østergaard, and M. Moonen. Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), September 9 – 12, 2024.
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations. S. Yadav and Z.-H. Tan, Interspeech 2024, Kos Island, Greece, September 1-5, 2024. No-Reference Speech Intelligibility Prediction Leveraging a Noisy-Speech ASR Pre-Trained Model. H. Wang, J. Jensen, I. Lopez-Espejo, W.-Y. Chan, Proc. Interspeech, September 1-5, 2024.
Channel-Configurable Deep Wireless Speech Transmission. M. Bokaei, J. Jensen, S. Doclo, J. Østergaard. IEEE Wireless Communications and Networking Conference, 2024.
Deep Low-Latency Joint Speech Transmission and Enhancement over A Gaussian Channel. M. Bokaei, J. Jensen, S. Doclo, J. Østergaard. IEEE Conference on Acoustics, Speech, and Signal Processing Workshop, 2024.
Directed Redundancy in Time Series. J. Østergaard. IEEE International Symposium on Information Theory, 2024.
Spatial Sampling versus Acquisition Time of Room Impulse Responses for Low-Frequency Sound Zones. J. Cavadid, M. Møller, T. van Waterschoot, S. Bech, J. Østergaard. Proceedings of the Audio Engineering Society International Conference, 156th Convention, 2024.
Synergy and Redundancy Dominated Effects in Time Series via Transfer Entropy Decompositions. J. Østergaard, P. Boubakani. IEEE International Symposition on Information Theory Workshop – NeurIT, 2024.
Near-end Listening Enhancement Using a Noise-Robut Linear Time-Invariant Filter. F. Villani, W.-Y. Chan, Z.-H. Tan, J. Østergaard, and J. Jensen, The 18th International Workshop on Acoustic Signal Enhancement (IWAENC 2024), Aalborg, Denmark, September 9-12, 2024.
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations. S. Yadav and Z.-H. Tan, Interspeech 2024, Kos Island, Greece, September 1-5, 2024.
Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder. Y. Xie, M. Kuhlmann, F. Rautenberg, Z.-H. Tan, and R. Haeb-Umbach, The 32nd European Signal Processing Conference (EUSIPCO 2024), Lyon, France, August 26–30, 2024.
Envelope Based Deep Source Separation and EEG Auditory Attention Decoding for Speech and Music. A. Tanveer, J. Jensen, Z.-H. Tan, and J. Østergaard, The 32nd European Signal Processing Conference (EUSIPCO 2024), Lyon, France, August 26–30, 2024.
PAC-Bayesian Error Bound, via Rényi Divergence, for a Class of Linear Time-Invariant State-Space Models. D. Eringis, J. Leth, Z.-H. Tan, R. Wisniewski, and M. Petreczky, The 41st International Conference on Machine Learning (ICML 224), Vienna, Austria, July 21-27, 2024.
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners. S. Yadav, S. Theodoridis, L. K. Hansen, and Z.-H. Tan, The Twelfth International Conference on Learning Representations (ICLR 2024), Vienna, Austria, May 7-11, 2024.
Complex Recurrent Variational Autoencoder for Speech Resynthesis and Enhancement. Y. Xie, T. Arildsen, and Z.-H. Tan, IEEE World Congress on Computational Intelligence (IEEE WCCI 2024), Yokohama, Japan, June 30-July 5, 2024.
Self-Supervised Pre-Training for Robust Personalized Voice Activity Detection in Adverse Conditions. H. S. Bovbjerg, J. Jensen, J. Østergaard, Z.-H. Tan. Proc. ICASSP 2024. Accepted.
Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler. P. Gonzalez, Z.-H. Tan, J. Østergaard, J. Jensen, T. S. Alstrøm, T. May. Proc. ICASSP 2024. Accepted.
Speaker Adaptation for Enhancement of Bone-Conducted Speech. A. Edraki, W.-Y. Chan, J. Jensen, D. Fogerty. Proc. ICASSP 2024. Accepted.
Speech enhancement in hearing aids using target speech presence estimation based on a delayed remote microphone signal. V. Sathyapriyan, M. S. Pedersen, M. Brookes, J. Østergaard, P. A. Naylor, J. Jensen, Proc. ICASSP 2024. Accepted.
Binaural Speech Enhancement using Deep Complex Convolutional Transformer Networks. V. Tokala, E. Grinstein, M. Brookes, S. Doclo, J. Jensen, P. A. Naylor. Proc. ICASSP 2024. Accepted.
Binaural Speech Enhancement using Complex Convolutional Recurrent Networks. V. Tokala, E. Grinstein, M. Brookes, S. Doclo, J. Jensen, P. A. Naylor. Proc. 57th Asilomar Conference on Signals, Systems, and Computers, 2023.
Deep Joint Source-Channel Analog Coding for Low-Latency Speech Transmission over Gaussian Channels. M. Bokaei, J. Jensen, S. Doclo, J. Østergaard. Proc. European Signal Processing Conference (EUSIPCO), 2023.
Speech Inpainting: Context-Based Speech Synthesis Guided by Video. J.F. Montesinos, D. Michelsanti, G. Haro, Z.-H. Tan, and J. Jensen. Proc. Interspeech, 2023.
Robust Sound Zone Filters for Synchronization Errors. M. Zhou, M. B. Møller, C. S. Pedersen, N. E. M. de Koeijer and J. Østergaard. 10th Convention of the European Acoustic Association, Forum Acusticum, Turin, Italy, 2023.
Sound quality evaluation of packet loss concealment for wireless low-frequency sound zones. C. S. Pedersen, M. Zhou, M. B. Møller, N. E. M. de Koeijer and J. Østergaard. 10th Convention of the European Acoustic Association, Forum Acusticum, Turin, Italy, 2023.
Head Orientation Estimation with Distributed Microphones using Speech Radiation Patterns. K. Müller, B. Çakmak, P. Didier, S. Doclo, J. Østergaard, T. Wolff. Asilomar Conference, 2023.
Adaptive Coding in Wireless Acoustic Sensor Networks for Distributed Blind System Identification. M- Blochberger, J. Østergaard, R. Ali, M- Moonen, F. Elvander, J. Jensen, T. van Waterschoot. Asilomar Conference, 2023.
PAC-Bayes Generalisation Bounds for Dynamical Systems Including Stable RNNs. D. Eringis, J. Leth, Z-H. Tan, R. Wisniewski, and M. Petreczky. The 38th Annual AAAI Conference on Artificial Intelligence, 2024.
Correlation based glimpse proportion index. A. Alghamdi, L. Moen, W.-Y. Chan, D. Fogerty, and J. Jensen. Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023.
Speech enhancement using binary estimator selection applied to hearing aids with a remote microphone. Sathyapriyan, M. S. Pedersen, J. Østergaard, M. Brookes, P. A. Naylor, and J. Jensen. Proc. 2023 IEEE Int. Conf. Frontiers of Signal Processing (ICFSP).
earless Steps APOLLO: Identifying Conversational Mission-Critical Topics in NASA Apollo Missions Audio Based on Keyword Spotting. A. Joglekar, I. López-Espejo, and J. H. L. Hansen. NASA Human Research Program IWS, 2024.
Improved Disentangled Speech Representations Using Contrastive Learning in Factorized Hierarchical Variational Autoencoder. Y. Xie, T. Arildsen, and Z.-H. Tan. Proc. European Signal Processing Conference (EUSIPCO), 2023.
Speech Inpainting: Context-Based Speech Synthesis Guided by Video. J.F. Montesinos, D. Michelsanti, G. Haro, Z.-H. Tan, and J. Jensen. Interspeech 2023.
A Vision-Assisted Hearing Aid System Based on Deep Learning. D. Michelsanti, Z.-H. Tan, S.R. Griful and J. Jensen. IEEE ICASSP 2023 Satellite Workshop: AMHAT.
Improving Label-deficient Keyword Spotting Through Self-supervised Pretraining. H.S. Bovbjerg and Z.-H. Tan. ICASSP 2023 Satellite Workshop: SASB 2023.
Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification. López-Espejo, I., Prieto, S., Ortega, A. & Lleida, E., sep 2023, IEEE International Workshop on Machine Learning for Signal Processing (MLSP). Rome, Italy.
AR model for low latency packet loss concealment for wireless sound zones at low frequencies. Pedersen, C. S., Zhou, M., Møller, M. B., de Koeijer, N. & Østergaard, J., maj 2023, Audio Engineering Society 154th Convention. Helsinki, Finland, s. 1-10 10 s. 10651.
Distributed Adaptive Norm Estimation for Blind System Identification in Wireless Sensor Networks. Blochberger, M., Elvander, F., Ali, R., Østergaard, J., Jensen, J., Moonen, M. & van Waterschoot, T., jun. 2023, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
Interpretable Nonnegative Incoherent Deep Dictionary Learning for FMRI Data Analysis. Morante, M., Østergaard, J. & Theodoridis, S., jun. 2023, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting. I. López-Espejo, R. C. M. C. Shekar, Z.-H. Tan, J. Jensen and J. H. L. Hansen. Accepted for ICASSP 2023.
Robust FIR Filters for Wireless Low-Frequency Sound Zones. Zhou, M., Møller, M., Pedersen, C. S. & Østergaard, J., jun. 2023, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
Multiple Description Audio Coding for Wireless Low-Frequency Sound Zones. Østergaard, J., Pedersen, C. S., Zhou, M., de Koeijer, N. & Møller, M., 21 mar. 2023, IEEE Data Compression Conference.IEEE Signal Processing Society, 10 s.
Fearless Steps APOLLO: Challenges in keyword spotting and topic detection for naturalistic audio streams. A. Joglekar, I. López-Espejo, and J. H. L. Hansen. Accepted for the 184th Meeting of the Acoustical Society of America (2023).
An Experimental Study on Light Speech Features for Small-Footprint Keyword Spotting. I. López-Espejo, Z.-H. Tan and J. Jensen. Accepted for IberSPEECH 2022.
Fusion of Classical Digital Signal Processing and Deep Learning Methods (FTCAPPS). A. Gomez, V. E. Sánchez, A. Peinado, J. M. Martín-Doñas, A. Gómez-Alanís, A. Villegas-Morcillo, E. Rosello, M. Chica, C. Garcia and I. López-Espejo. Accepted for IberSPEECH 2022.
Model-Based Estimation of In-Car-Communication Feedback Applied to Speech Zone Detection. K. Müller, S. Doclo, J. Østergaard and T. Wolff. Accepted for IWAENC 2022.
Distributed Cross-Relation-Based Frequency-Domain Blind System Identification using Online-ADMM. M. Blochberger, F. Elvander, R. A., M. Moonen, J. Østergaard, J. Jensen and T. Waterschoot. Accepted for IWAENC 2022.
A linear MMSE filter using delayed remote microphone signals for speech enhancement in hearing aid applications. V. Sathyapriyan, M.S. Pedersen, J. Østergaard, M. Brookes, P.A. Naylor and J. Jensen. Accepted for IWAENC 2022.
Performance of Low Frequency Sound Zones Based on Truncated Room Impulse Responses. J. Cadavid, M.B. Møller, S. Bech, T. Waterschoot and J. Østergaard. Accepted for Audio Mostly 2022.
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay. C. M. Larsen, P. Koch and Z.-H. Tan. Interspeech 2022, September 18-22, Incheon, Korea.
A neural network framework for modelling parameterized auditory models, P. A. L. Bysted, J. Jensen, Z.-H. Tan, J. Østergaard, L. Bramsløw, Proc. Baltic Nordic Acoustic Meeting (BNAM) 2022 – Joint Acoustics Conference, May 2022.
Effect of Wireless Transmission Errors on Sound Zone Performance at Low Frequencies. Pedersen, C. S., Møller, M. B. & Østergaard, J. Presented at EUROREGIO BNAM2022 Joint Acoustic Conference.
Electrodes selection for cortical auditory attention decoding during speech and music listening. Simon, A. M. D., Bech, S., Loquet, G. S. J. M. & Østergaard, J., 2022, Proceedings Fusion Conference. (International Conference on Information Fusion Proceedings).
Optimal time lags for linear cortical auditory attention detection: differences between speech and music listening. Simon, A. M. D., Østergaard, J., Bech, S. & Loquet, G. S. J. M., 2022, Proceedings International Symposium on Hearing.
The Effect of Fixed-point Arithmetic on Low Frequency Sound Zone Control. Koch, P. & Østergaard, J., maj 2022, EUROREGIO BNAM2022 Joint Acoustics Conference. s. 125-134 10 s.
A Stimuli-Relevant Directed Dependency Index for Time Series. P.S. Baboukani, S. Theodoridis, J. Østergaard. IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2022.
Joint Far-and Near-End Speech Intelligibility Enhancement based on the Approximated Speech Intelligibility Index. A.J. Fuglsig, J. Østergaard, J. Jensen, L.S. Bertelsen, P. Mariager, and Z.-H. Tan. ICASSP 2022.
Multichannel Speech Enhancement with Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices. P. Hoang, J. M. de Haan, Z.-H. Tan, and J. Jensen. IEEE Trans. Audio, Speech, Language Processing, January 2022.
EEG Phase Synchrony Reflects SNR Levels During Continuous Speech-In-Noise Tasks.
Baboukani, P. S., Graversen, C., Alickovic, E. & Østergaard, J., oct. 2021. Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) . Accepted.
Compression of DNNs Using Magnitude Pruning and Nonlinear Information Bottleneck Training. M. Ø. Nielsen, J. Østergaard, J. Jensen, Z.-H. Tan, Proc. of IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP2021). Accepted.
Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective. Y. Xie, T. Arildsen, Z.-H. Tan, Proc. of IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP2021). Accepted.
A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction. A. Edraki, W.-Y. Chan, J. Jensen, and D. Fogherty, Proc. Interspeech, 2021, Accepted.
Audio-Visual Speech Inpainting with Deep Learning. G. Morrone, D. Michelsanti, Z.-H. Tan and J. Jensen. Proc. Int. Conf. Acoust., Speech, Signal Processing, June 2021.
Joint Maximum Likelihood Estimation Of Power Spectral Densities And Relative Acoustic Transfer Functions For Acoustic Beamforming. P. Hoang, Z.-H. Tan, J. M. de Haan, J. Jensen Proc. Int. Conf. Acoust., Speech, Signal Processing, 2021.
Auditory Attention Decoding During Naturalistic Music Listening: A Pilot Study. A. Simon, S. Bech, G. Loquet, J. Østergaard. Accepted for presentation at 16^th International Conference on Music Perception and Cognition, 2021.
Low Delay Robust Audio Coding by Noise Shaping, Fractional Sampling, and Source Prediction. J. Østergaard. IEEE Data Compression Conference, March 2021.
An Orthogonality Principle for Select-Maximum Estimation of Exponentional Variables. U. Erez, J. Østergaard, R. Zamir. Accepted for presentation at IEEE International Conference on Information Theory (ISIT), 2021.
Dual-channel eKF-RTF Framework for Speech Enhancement with DNN-based Speech Presence Estimation. J. M. Martín-Doñas, A. M. Peinado, I. López-Espejo, A. M. Gomez, Proc. of IberSPEECH 2020.
UIAI System for Short-Duration Speaker Verification Challenge 2020. M. Sahidullah, A.K. Sarkar, V. Vestman, X. Liu, R. Serizel, T. Kinnunen, Z.-H. Tan, E. Vincent, Proc. of the 8th IEEE Spoken Language Technology Workshop (SLT 2021).
CC-loss: Channel Correlation Loss for Image Classificaiton. Z. Song, D. Chang, Z. Ma, X. Li and Z.-H. Tan, Proc. of the 25th International Conference on Pattern Recognition (ICPR 2020).
M. B. Pedersen , M. Kolbæk , A. H. Andersen , S. H. Jensen , J. Jensen. “End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks,” Proc. of Interspeech 2020. Accepted.
S. Prieto, A. Ortega, I. López-Espejo, E. Lleida, “Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions,” Proc. of Interspeech 2020. Accepted.
D. Michelsanti, O. Slizovskaia, G. Haro, E. Gómez, Z.-H. Tan, J. Jensen, “Vocoder-Based Speech Synthesis from Silent Videos,” Proc. of Interspeech 2020. Accepted.
I. López-Espejo, Z.-H. Tan, J. Jensen, “Exploring Filterbank Learning for Keyword Spotting,” Proc. of European Signal Processing Conference (EUSIPCO) 2020. Accepted.
Estimation of Directed Dependencies in Time Series using Conditional Mutual Information and Non-linear Prediction. P. Baboukani, C. Graversen, and J. Østergaard. Proc. of European Signal Processing Conference (EUSIPCO) 2020.
S. Samizade, Z.-H. Tan, C. Shen, X. Guan, “Adversarial Example Detection by Classification for Deep Speech Recognition,” Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). Accepted.
A Neural Network for Monuaral Intrusive Speech Intelligibility Prediction. M. B. Pedersen, A. H. Andersen, S. H. Jensen, J. Jensen. Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). Accepted.
Maximum Likelihood Estimation of the Interference-plus-noise Cross Power Spectral Density Matrix for Own Voice Retrieval. P. Hoang, Z.-H. Tan, T. Lunner, J. M. de Haan, J. Jensen. Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). Accepted.
A Constrained Maximum Likelihood Estimator of Speech and Noise Spectra with Application to Multi-Microphone Noise Reduction. A. Zahedi, M. S. Pedersen, J. Østergaard, L. Bramsløw, T. U. Christiansen, J. Jensen. Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). Accepted.
Robust Joint Estimation of Multimicrophone Signal Model Parameters. A. Koutrouvelis, R. Hendriks, R. Heusdens, J. Jensen. Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). Accepted.
Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks. J. Amini, R. C. Hendriks, R. Heusdens, M. Guo, J. Jensen. Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). Accepted.
The Exponential Distribution in Rate Distortion Theory: The Case of Compression with Independent Encodings. U. Erez, J. Østergaard, and R. Zamir. Proc. IEEE Data Compression Conference, 2020. Accepted.
Keyword Spotting for Hearing Assistive Devices Robust to Interfering Speakers . I. Lopez-Espejo, Z.-H. Tan and J. Jensen, Proc. Interspeech. 2019.
Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation . A. Edraki, W.-Y. Chan, J. Jensen and D. Fogerty, Proc. Interspeech. 2019.
Estimation of Sensor Array Signal Model Parameters Using Factor Analysis . A. Koutrouvelis, R. Hendriks, R. Heusdens, J. Jensen, Proc. Eusipco. 2019.
Robust Bayesian and Maximum a Posteriori Beamforming for Hearing Assistive Devices. P. Hoang, Z.-H. Tan, J. M. de Haan, T. Lunner and J. Jensen. The 7th IEEE Global Conference on Signal and Information Processing (GlobalSIP 2019), Nov. 11-14, 2019, Shaw Centre, Ottawa, Canada.
Soft Dropout and Its Variational Bayes Approximation. J. Xie, Z. Ma, G. Zhang, J.-H. Xue, Z.-H. Tan and J. Guo. 2019 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2019), Oct. 13–16, 2019, Pittsburgh, PA, USA.
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. I. López-Espejo, Z.-H. Tan, and J. Jensen. Interspeech 2019, September 15-19, 2019, Graz, Austria.
Deep Joint Embeddings of Context and Content for Recommendation. M. S. Kristoffersen, J. L. Wieland, S. E. Shepstone, Z.-H. Tan and V. Vinayagamoorthy. CARS 2.0 – Workshop on Context-Aware Recommender Systems, in conjunction with RecSys’ 2019, 20 September 2019, Copenhagen, Denmark.
Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems. D. Michelsanti, Z.-H. Tan, S. Sigurdsson and J. Jensen. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019), Brighton, UK, May 12-17, 2019.
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Visual Speech Enhancement. D. Michelsanti, Z.-H. Tan, S. Sigurdsson and J. Jensen. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019), Brighton, UK, May 12-17, 2019.
Subjective Annotations for Vision-Based Attention Level Estimation. A. Coifman, P. Rohoska, M.S. Kristoffersen, S.E. Shepstone, and Z.-H. Tan, The 14th International Conference on Computer Vision Theory and Applications (VISAPP 2019), Prague, Czech Republic, 25-27 February 2019.
Public Perception of Android Robots: Indications from an Analysis of YouTube Comments. E. Vlachos and Z.-H. Tan, the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018), Madrid, Spain, 1-5 October 2018.
Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification. H. Yu, T. Hu, Z. Ma, Z.-H. Tan and J. Guo, IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2018), Guiyang, China, August 22 – 24, 2018.
The Sound or Silence: investigating the influence of robot noise on proxemics. G. Trovato, R. Paredes, J. Balvin, F. Cuellar, N.B. Thomsen, S. Bech, and Z.-H. Tan, the 27th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2018, Nanjing and Tai’an, China, 27-31 August 2018.
Effectiveness of Single-Channel BLSTM Enhancement for Language Identification. P.S. Frederiksen, J. Villalba, S. Watanabe, Z.-H. Tan and N. Dehak, accepted by Interspeech 2018, Hyderabad, India, September 2-6, 2018.
M. Farmani, M. S. Pedersen, and J. Jensen, Sound Source Localization for Hearing Aid Applications using Wireless Microphones, Accepted for IEEE Sensor Array and Multichannel Signal Processing Workshop, 2018.
J. Amini, R. C. Hendriks, R. Heusdens, M. Guo and J. Jensen, Operational Rate-Constrained Noise Reduction for Generalized Binaural Hearing Aid Setups, 2018 Symposium on Information Theory and Signal Processing in the Benelux.
A. Koutrouvelis, R.C. Hendriks, R. Heusdens, S. van de Par, J. Jensen, and M. Guo, Evaluation of Binaural Noise Reduction Methods in Terms of Intelligibility and Perceived Localization, Accepted for European Signal Processing Conference, 2018.
J. Amini, R. C. Hendriks, R. Heusdens, M. Guo, and J. Jensen, Operational Rate-Constrained Beamforming in Binaural Hearing Aids, Accepted for European Signal Processing Conference, 2018.
On Zero-Delay Source Coding of LTI Gauss-Markov Systems with Covariance Matrix Distortion Constraints. P. Stavrou, J. Østergaard, M. Skoglund, The European Control Conference (ECC), June 2018.
Monaural Speech Enhancement Using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure. M. Kolbæk, Z.-H. Tan and J. Jensen, The 43th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018), 15-20 April 2018, Calgary, Alberta, Canada.
Fixed-Rate Zero-Delay Source Coding for Stationary Vector-Valued Gauss-Markov Sources. P. A. Stavrou and J. Østergaard, IEEE Data Compression Conference (DCC), March 2018.
Time-Contrastive Learning Based DNN Bottleneck Features for Text-Dependent Speaker Verification. A.K. Sarkar and Z.-H. Tan, NIPS 2017 Time Series Workshop, Long Beach, CA, USA, Dec. 8, 2017.
Weighted Score Based Fast Converging CO-training with Application to Audio-Visual Person Identification. X. Duan, N.B. Thomsen, Z.-H. Tan, B. Lindberg and S.H. Jensen, The 29th IEEE International Conference on Tools with Artificial Intelligence (ICTAI2017), Boston, Massachusetts, USA, Nov. 6-8, 2017.
An Upper Bound to Zero-Delay Rate Distortion via Kalman Filtering for Vector Gaussian Sources. P. A. Stavrou, J. Østergaard, C. Charalambos, and M. Derpich. Proceedings of the IEEE Information Theory Workshop, Kaohsiung, Taiwan, 2017.
Joint Separation and Denoising of Noisy Multi-Talker Speech Using Recurrent Neural Networks and Permutation Invariant Training, M. Kolbæk, D. Yu, Z.-H. Tan and J. Jensen, accepted by the IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan, 25-28 September 2017. Best student paper award.
A lower bound to causal and zero delay rate distortion for scalar Gaussian autoregressive sources. P. Stavrou and J. Østergaard. Symposium on Information Theory and Signal Processing in the Benelux. Delft, The Netherlands, pp. 207 – 214, 2017.
Humans do not maximize the probability of correct decision when recognizing DANTALE words in noise. Z. Jahromi, J. Østergaard, and J. Jensen, Proc. Interspeech 2017, Stockholm, Sweden, 2017, to appear.
On the use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure. A.H. Andersen, J.M. de Haan, Z.-H. Tan and J. Jensen, Proc. Interspeech 2017, Stockholm, Sweden, 2017, to appear.
Adversarial Network Bottleneck Features for Noise Robust Speaker Verification. H. Yu, Z.-H. Tan, Z. Ma and J. Guo, Proc. Interspeech 2017, Stockholm, Sweden, 2017, to appear.
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification. D. Michelsanti and Z.-H. Tan, Proc. Interspeech 2017, Stockholm, Sweden, 2017, to appear.
Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data. A. Sarkar, Md Sahidullah, Z.-H. Tan and T. Kinnunen, Proc. Interspeech 2017, Stockholm, Sweden, 2017, to appear.
A Lower Bound on the Causal and Zero Delay Rate Distortion Function for Scalar Gaussian Autoregressive Sources, P. A. Stavrou and J. Østergaard. Proceedings of the Symposium on Information Theory and Signal Processing in the Benelux, pp. 207 – 214, Vol. 2017, Delft, The Netherlands, May 2017.
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-Talker Speech Separation. D. Yu, M. Kolbæk, Z.-H. Tan, J. Jensen, Proc. International Conf. Audio, Speech, Signal Proc. (ICASSP), 2017.
A Non-Intrusive Short-Time Objective Intelligibility Measure. A. H. Andersen, J. M. de Haan, Z.-H. Tan, and J. Jensen, Proc. International Conf. Audio, Speech, Signal Proc. (ICASSP), 2017.
RedDots Replayed: A New Replay Spoofing Attack Corpus for Text-dependent Speaker Verification Research. T. Kinnunen, M. Sahidullah, M. Falcone, L. Costantini, R. Hautamaki, D. Thomsen, A. Sarkar, Z.-H. Tan, H. Delgado, M. Todisco, N. Evans, V. Hautamaki, and K.A. Lee, Proc. International Conf. Audio, Speech, Signal Proc. (ICASSP), 2017.
An Asymmetric Difference Multiple Description Gaussian Noise Channel. J. Østergaard, Y. Kochman, and R. Zamir, IEEE Data Processing Conference, April, 2017.
TDOA-based Self-Calibration of Dual-Microphone Arrays. M. Farmani, R. Heusdens, M. S. Pedersen, Z.-H. Tan and J. Jensen, Proc. 19th International Conference on Information Fusion (FUSION), pp. 1931 – 1936, 2016.
Speech Enhancement Using Long Short-Term Memory Based Recurrent Neural Networks for Noise Robust Speaker Verification. M. Kolbæk, Z.-H. Tan, and J. Jensen,
Proc. IEEE Spoken Language Technology Workshop, 2016.
Further Optimisations of Constant Q Cepstral Processing for Integrated Utterance and Text-dependent Speaker Verification. H. Delgado, M. Todisco, M. Sahidullah, A. Sarkar, N. Evans, T. Kinnunen, and Z.-H. Tan, Proc. IEEE Spoken Language Technology Workshop, 2016.
Two Asymmetric Descriptions from Many Symmetric Descriptions. A. Mashiach, Y. Kochman, J. Østergaard, and R. Zamir, International Conference on the Science of Electrical Engineering (ICSEE), 2016.
Detection of Spoken Words in Noise: Comparison of Human Performance to Maximum Likelihood Detection. M. Z. Jahromi, J. Østergaard, and J. Jensen, IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2016.

PUBLICATIONS

Members of CASPR have been involved in the research documented in the following scientific publications:

Low-latency Deep Analog Speech Transmission using Joint Source Channel Coding. Mohammad Bokaei, Jesper Jensen, Simon Doclo, Jan Østergaard, Journal of Selected Topics in Signal Processing, Accepted 2025.

Noise-Robust Hearing Aid Voice Control, Ivan Lopez-Espejo, Eros Rosello, Amin Edraki, and Jesper Jensen, IEEE Signal Processing Letters, Accepted 2025.

Hearing Loss Compensation Using Deep Neural Networks: A Framework and Results from a Listening Test. Peter Asbjørn Leer Bysted, Jesper Jensen, Laurel Carney, Zheng- Hua Tan, Jan Østergaard, Lars Bramsløw, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Accepted 2025.

Investigating the design space of diffusion models for speech enhancement, Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May, IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 4486- 4500. Vol.32, October 2024.

Identifying principal attributes for evaluating audio quality of reproduction systems with spatially dynamic program material. P.N.P. Moreta, S. Bech, J. Francombe, J. Østergaard, S. van de Par. Journal of the Audio Engineering Society. Vol. 72(9), 2024.

Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. A. Simon, S. Bech, G. Loquet, J. Østergaard. European journal of Neuroscience, Vol. 59(8), 2024.

Effects of Background Noise and Linguistic Violations on Frontal Theta Oscillations During Effortful Listening. Y. Mohammadi, C. Graversen, J.B. Manresa, J. Østergaard, O.K. Andersen. Ear and Hearing Vol. 45(3), 2024.

Joint Far- and Near-end Speech and Listening Enhancement with Minimum Processing. A. J. Fuglsig, Z.-H. Tan, L. S. Bertelsen, J. Jensen, J. C. Lindof, and J. Østergaard, IEEE Access, 2024.

Performance of low-frequency sound zones with very fast room impulse response measurements. J. Cavadid, M. Møller, C.S. Pedersen, S. Bech, T. van Waterschoot, J. Østergaard. The Journal of the Acoustical Society of America, Vol. 155(1), 2024.

The Effect of Training Dataset Size on Discriminative and Diffusion-Based Speech Enhancement Systems. P. Gonzalez, Z.-H. Tan, J. Østergaard, J. Jensen, T. S. Alstrøm, and T. May, IEEE Signal Processing Letters, 2024.

Generating Accurate and Diverse Audio Captions through Variational Autoencoder Framework. Y. Zhang, R. Du, Z.-H. Tan, W. Wang, and Z. Ma, IEEE Signal Processing Letters, 2024.

How to train your ears: Auditory-model emulation for large-dynamic-range inputs and mild-to-severe hearing losses. P. A. L. Bysted, J. Jensen, Z.-H. Tan, J. Østergaard, and L. Bramsløw, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 32, pp. 2006-2020, 2024.

Data-Driven Non-Intrusive Speech Intelligibility Prediction using Speech Presence Probability. M. B. Pedersen, Z.-H. Tan, S. H. Jensen, and J. Jensen. IEEE Trans. Audio, Speech, Language Process., Vol. 32, pp. 55- 67, Oct. 2023.

Masked spectrogram prediction for unsupervised domain adaptation in speech enhancement. K. Žmolíková, M. S. Pedersen, J. Jensen. IEEE Open Journal of Signal Processing, 2023. Accepted.

Validity and reliability of self-reported and neural measures of listening effort. Y. Mohammadi, J. Østergaard, C. Graversen, O.K. Andersen & J. Biurrun Manresa. European Journal of Neuroscience. 58(11), pp. 4357-4370. Dec. 2023.

Phase-locking of neural activity to the envelope of speech in the delta frequency band reflects differences between word lists and sentences. Y. Mohammadi, C. Graversen, J. Østergaard, O. K. Andersen and T. Reichenbach. Journal of Cognitive Neuroscience, vol. 35(8):1301-1311, August 2023.

ACTUAL: Audio Captioning with Caption Feature Space Regularization. Y. Zhang, H. Yu, R. Du, Z.-H. Tan, W. Wang, Z. Ma, and Y. Dong. IEEE/ACM Transactions on Audio, Speech and Language Processing, accepted. 2023.

On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification. A. K. Sarkar and Z.-H. Tan. Acoustics, accepted. 2023.

Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise. C. Heider and Z.-H. Tan. IEEE Open Journal of Signal Processing, vol. 4, pp. 179-187, 2023.

Cortical Auditory Attention Decoding During Music And Speech Listening. Simon, A., Loquet, G., Østergaard, J. & Bech, S. Accepted for publication in IEEE Transactions on Neural Systems and Rehabilitation Engineering, June 2023.

The Internet of Sounds: Convergent Trends, Insights and Future Directions. Turchet, L., Lagrange, M., Rottondi, C., Fazekas, G., Peters, N., Ostergaard, J., Font, F., Backstrom, T. & Fischione, C., 2023, I: IEEE Internet of Things Journal. 10, 13, s. 11264-11292 29 s., 10061604.

On the Deficiency of Intelligibility Metrics as Proxies for Subjective Intelligibility. I. López-Espejo, A. Edraki, W.-Y. Chan, Z.-H. Tan, and J. Jensen. Accepted by Elsevier Speech Communication, 2023.

Utilization of acoustic signals with generative Gaussian and autoencoder modeling for condition-based maintenance of injection moulds. G. Ø. Rønsch, I. López-Espejo, D. Michelsanti, Y. Xie, P. Popovski, and Z.-H. Tan. Accepted by International Journal of Computer Integrated Manufacturing, 2022.

Performance of Low Complexity Fully Connected Neural Networks for Monoaural Speech Enhancement. H. Reddy, A. Kar, J. Østergaard. Applied Acoustics, 2022.

A Family of Split Kernel Adaptive Filtering Algorithms for Nonlinear Stereophonic Acoustic Echo Cancellation. S. Burra, S. Sankar, A. Kar, J. Østergaard. Journal of Ambient Intelligence and Humanized Computing, 2022. 41, 2, pp. 1019–1037.

Incremental Refinements and Multiple Descriptions with Feedback. Østergaard, J., Erez, U. & Zamir, R., 2022. Accepted for publication in IEEE Transactions on Information Theory.

Speech to noise ratio improvement induces nonlinear parietal phase synchrony in hearing aid users. P.S. Baboukani, C. Graversen, E. Alickovic, J. Østergaard. Frontiers in Neuroscience, August 2022.

iVAE-GAN: Identifiable VAE-GAN Models for Latent Representation Learning. IEEE Access, vol. 10, pp. 48405-48418, 2022.

Training Data-Driven Speech Intelligibility Predictors on Heterogeneous Listening Test Data. M. B. Pedersen, A. H. Andersen, S. H. Jensen, Z.-H. Tan and J. Jensen. IEEE Access, vol. 10, pp. 66175-66189, 2022.

Incremental Refinements and Multiple Descriptions with Feedback. Østergaard, J., Erez, U. & Zamir, R., 2022. Accepted for publication in IEEE Transactions on Information Theory.

The Minimum Overlap-Gap Algorithm for Speech Enhancement. P. Hoang, Z.-H. Tan, J.-M. de Haan, J. Jensen, IEEE Access, February 2022.

Shouted and Whispered Speech Compensation for Speaker Verification Systems. S. Prieto, A. Ortega, I. López-Espejo, and E. Lleida. Accepted by Elsevier Digital Signal Processing, 2022.

Deep Spoken Keyword Spotting: An Overview. I. López-Espejo, Z.-H. Tan, J. Hansen, and J. Jensen. Accepted by IEEE Access, 2021.

Minimum Processing Beamforming. A. Zahedi, Michael S. Pedersen, J. Østergaard, T. Christiansen, L. Bramsløw and J. Jensen. IEEE Trans. Audio, Speech, Language Process. Vol. 29, pp. 2710-2724, 2021.

A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting. I. López-Espejo, Z.-H. Tan, and J. Jensen. Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021.

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization. J. Xie, Z. Ma, G. Zhang, J.-H. Xue, Z.-H. Tan and J. Guo. Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence.

Self-Segmentation of Pass-Phrase Utterances for Deep Feature Learning in Text-Dependent Speaker Verification. A. k. Sarkar and Z.-H. Tan. Accepted by Computer Speech & Language.

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding. A. k. Sarkar, Z.-H. Tan. Accepted by IEEE Signal Processing Letters.

A Family of Adaptive Volterra Filters Based on Maximum Correntropy Criterion for Improved Active Control of Impulse Noise. Guttikonda, S. Burra, A. Kar, J. Østergaard, P. Sooraksa, V. Mladenovics, D. Haddad. Accepted for publication in Elsevier Circuits, Systems, and Signal Processing 2021.

Multiple Sub Filter Based Proportionate Filtering for Nonlinear Acoustic Echo Cancellation. V. Burra, A. Kar and J. Østergaard. Accepted for publication in Journal of Applied Acoustics, 2021.

Online Multichannel Speech Enhancement Based on Recursive EM and DNN-based Speech Presence Estimation. J. M. Martín-Doñas, J. Jensen, Z.-H. Tan, A. M. Gomez, and A. M. Peinado. Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing.

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation, D. Michelsanti, Z.-H. Tan, S.-X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen. IEEE Trans. Audio, Speech, Language Process.Vol. 29, pp. 1368-1396, 2021.

Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis, A. Edraki, W.-Y. Chan, J. Jensen, and D. Fogerty. IEEE Trans. Audio, Speech, Language Process. Vol. 29, pp. 210-225, 2021.

Directed Data-Processing Inequalities for Systems with Feedback. M. Derpich and J. Østergaard. Entropy, Vol.23., April 2021.

Minimum Processing Beamforming. A. Zahedi, M. Pedersen, J. Østergaard, T. Christiansen, L. Bramsløw, and J. Jensen. Accepted by IEEE by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2021.

Deep InterBoost Networks for Small-sample Image Classification. X. Li, D. Chang, Z. Ma, Z.-H. Tan, J.-H. Xue, J. Cao and J. Guo. Accepted by Neurocomputing, 2020.

On the Comparisons of Decorrelation Approaches for non-Gaussian Neutral Vector Variables. Z. Ma, X. Lu, J. Xie, Z. Yang, J.-H. Xue, Z.-H. Tan, B. Xiao, J. Guo. Accepted by IEEE Transactions on Neural Networks and Learning Systems, 2020.

Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices. I. López-Espejo, Z.-H. Tan and J. Jensen. Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020.

OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer. X. Li, D. Chang, Z. Ma, Z.-H. Tan, J.-H. Xue, J. Cao, J. Yu, J. Guo. Accepted by IEEE Transactions on Image Processing, 2020.

On Loss Functions for Supervised Monaural Time- Domain Speech Enhancement. M. Kolbæk, Z.-H. Tan, S. H. Jensen and J. Jensen. Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing.

The Importance of Context When Recommending TV Content: Dataset and Algorithms. M. S. Kristoffersen, S. E. Shepstone, and Z.-H. Tan. Accepted by IEEE Transactions on Multimedia.

SketchSegNet+: An End-to-end Learning of RNN for Multi-Class Sketch Semantic Segmentation. Y. Qi and Z.-H. Tan. Accepted by IEEE Access.

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method. Z.-H. Tan, A. Sarkar, and N. Dehak, accepted by Computer Speech and Language, vol. 59, pp. 1-21, January 2020. Source code: http://kom.aau.dk/~zt/online/rVAD/.

A Moving Horizon Framework for Sound Zones. M. Møller and J. Østergaard. IEEE Transactions on Audio, Speech and Language Processing Vol.28, pp. 256-265, 2020.

Estimating Conditional Transfer Entropy in Time Series Using Mutual Information and Nonlinear Prediction. P. Baboukani, C. Graversen, E. Alickovic, J. Østergaard. Entropy Vol. 22, October 2020.

Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks,” J. Amini, R. C. Hendriks, R. Heudsens, M. Guo, J. Jensen, IEEE Transactions Audio, Speech and Language Processing,” Vol. 28, No.1, pp. 1-12, Jan. 2020.

Zero-delay multiple descriptions of stationary scalar Gauss-Markov sources. A. Fuglsig, J. Østergaard. Entropy, MDPI, 21(12), 1185, December 2019.

Deep-learning-based audio-visual speech enhancement in presence of Lombard effect. D. Michelsanti, Z.-Hua Tan, J. Jensen, Speech Communication, Vol. 115, pp. 38-50, Dec. 2019.

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification. A. Sarkar, Z.-H. Tan, H. Tang, S. Shon, and J. Glass, IEEE Transactions on Audio, Speech and Language Processing, vol. 27, no. 8, pp.1267-1279, August 2019

Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation. J. M. Martín-Doñas, A. Peinado, I. López-Espejo, and A. Gomez, MDPI Applied Sciences, vol. 9, June 2019

On the Relationship between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement. M. Kolbæk, Z.-H. Tan and J. Jensen. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 27, no. 2, pp. 283-295, February 2019

Sound Quality Improvement for Hearing Aids in Presence of Multiple Inputs. A. Kar, A. Anand, J. Østergaard, S.H. Jensen, and M.N.S. Swarmy. In Circuits, Systems, and Signal Processing, Springer, 38(8), 3591-3615, April, 2019.

Mean Square Performance Evaluation in Frequency Domain for an Improved Adaptive Feedback Cancellation in Hearing Aids. A. Kar, A. Anand, J. Østergaard, S.H. Jensen, and M.N.S. Swarmy. In Signal Processing, Elsevier Journal, 157, pp. 45-61, 2019.

M. Z. Jahromi, A. Zahedi, J. Jensen, and J. Østergaard, Information Loss in the Human Auditory System, IEEE Trans. Audio, Speech, Language Process., Vol.27, No.3, pp.472-481, March 2019.

Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources. P. A. Stavrou, J. Østergaard, and C. Charalambous. IEEE Journal of Selected Topics in Signal Processing, 12, 5, pp.841-856, October 2018.

Asymmetric Coding for Rate-Constrained Noise Reduction in Binaural Hearing Aids. J. Amini, R. C. Hendriks, R. Heusdens, M. Guo, and J. Jensen. IEEE Trans. Audio, Speech, Language Process., 2018. Accepted.

Refinement and Validation of the Binaural Short Time Objective Intelligibility Measure for Spatially Diverse Conditions. A.H. Andersen, J.M. de Haan, Z.-H. Tan and J. Jensen. Elsevier Speech Communication, Vol. 102, pp. 1-13, Sept. 2018.

Non-Intrusive Speech Intelligibility Prediction using Convolutional Neural Networks. A.H. Andersen, J.M. de Haan, Z.-H. Tan and J. Jensen. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 26, No. 10, pp. 1925-1939, Oct. 2018.

A Spatial Self-Similarity Based Feature Learning Method for Face Recognition under Varying Poses. X. Duan and Z.-H. Tan, accepted by Pattern Recognition Letters, 2018.

Bias-compensated Informed Sound Source Localization Using Relative Transfer Functions. M. Farmani, M. S. Pedersen, Z.-H. Tan, and J. Jensen, accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018.

Using Closed-set Speaker Identification Score Confidence to Enhance Audio-based Collaborative Filtering for Multiple Users. S.E. Shepstone, Z.-H. Tan and M.S. Kristoffersen, accepted by IEEE Transactions on Consumer Electronics, 2018.

Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators. S. Braun, A. Kuklasinski, O. Schwartz, O. Thiergart, E.A.P. Habets, S. Gannot, S. Doclo, and J. Jensen. Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018.

A Perceptually Motivated LP Residual Estimator in Noisy and Reverberant Environments. R. Peng, Z.-H. Tan, X. Li, and C. Zheng, accepted by Speech Communication, 2017.

Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features. H. Yu, Z.-H. Tan, Z. Ma, R. Martin, and J. Guo, accepted by IEEE Transactions on Neural Networks and Learning Systems, 2017.

Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones. M. Sahidullah, D.A.L. Thomsen, R.G. Hautamaki, T. Kinnunen, Z.-H. Tan, R. Parts, M. Pitkanen, accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017.

iSocioBot – A Multimodal Interactive Social Robot. Z.-H. Tan, N.B. Thomsen, X. Duan, E. Vlachos, S.E. Shepstone, M.H. Rasmussen and J.L. Højvang, accepted by International Journal of Social Robotics, 2017.

Incorporating Pass-Phrase Dependent Background Models for Text-Dependent Speaker Verification. A. Sarkar and Z.-H. Tan, accepted by Computer Speech & Language, 2017.

Latent Dirichlet Mixture Model. J.-T. Chien, C.-H. Lee and Z.-H. Tan, accepted by Neurocomputing, 2017.

iSocioBot – A Multimodal Interactive Social Robot. Z.-H. Tan, N.B. Thomsen, X. Duan, E. Vlachos, S.E. Shepstone, M.H. Rasmussen and J.L. Højvang, accepted by International Journal of Social Robotics, 2017.