PhD student: Daniel Michelsanti
Supervisors: Zheng-Hua Tan, Sigurdur Sigurdsson and Jesper Jensen
In many situations human-human communication can be problematic due to background noise. Especially for hearing impaired people, the presence of a background noise can be very disturbing, which may limit communication quality, and even influence their social life. The task of reducing this noise by improving the quality and the intelligibility of the target speech is known as speech enhancement. Most speech enhancement systems try to estimate the clean speech by processing only the audio signal. However, speech is not a unimodal process since its production is based on the movements of the articulatory organs that are visible to the listener, as well as general facial expressions of the target speaker. Hence, better speech enhancement systems may be devised by integrating visual cues, e.g. facial expression of the target speaker, in the enhancement process. Nowadays, this integration is possible thanks to technological advances that have allowed to considerably reduce the size of cameras, and have increased the available computational power, which may be embedded in wearable devices.
With this project we aim at studying audio-visual speech enhancement for hearing assistive devices. We will focus our investigation on audio-visual systems that are able to work in real-world conditions, where no prior knowledge is available regarding the talkers and the environment. This will allow to develop better algorithms for hearing aid systems and improve the life quality of many people with a hearing loss.