數位健康暨醫用機器人實驗室 | Digital Health and Biorobotics Lab

研究主題 Researches

2023 使用智慧影像分析與新型立體波束成型技術建立沉浸式輔聽系統

全球輔聽裝置的需求逐年上升，輔聽裝置除了要讓使用者能夠聽得清楚外，許多研究還表明要能補強人類原本的聲音來源感知能力，讓使用者可以準確判斷聲源位置，才能降低配戴時所導致的生活風險。而目前市面上的輔聽裝置大多只能接收正前方的聲音，讓使用者忽略來自左右以及後方的聲音，例如在車水馬龍的街道上行走時，使用者因為沒辦法感知到來自後方的警示聲，使其身陷危險。有鑑於此，團隊過去提出一款語音追蹤系統，透過到達時間差進行 360度的掃瞄定位，搭配波束成型技術強化目標聲源並降低環境噪音。然而，該系統因為無法判定收音目標的高低位置，導致輪出音量忽大忽小，讓使用者感到不適。因此，本研究提出整合電腦視覺技術、雙層差分麥克風陣列演算法、聲音定位以及 Mixing Algorithm，建立一套可用於日常生活的沉浸式目標語音鎖定和增強裝置。該裝置主要針對輕、中度聽損患者所設計，並建構了初步的可穿戴式原型。該裝置能強化目標語音並減少噪音干擾，並透過 Mixing Algorithm 調整雙耳聲道的音量輸出，實現具360度的聲源感知的沉浸式輔聽裝置，藉此降低使用者因無法掌握環境聲源位置所導致的生活風險。此外，本研究基於 d-DMA 架構與演算法，改善以往團隊所提出的聽力輔助裝置，因無法判定收音目標的高低位置導致音量忽大忽小的問題，使其更能滿足聽損患者在日常生活中的需求。根據實驗結果顯示，本研究所提出的裝置在正常對話距離下（<160 cm）的影像準確率超過93%；收音準確率達到88%；音量輸出的穩定性對於穩態聲源提升 60%，非穩態聲源則是 30%。根據臨床結果顯示，該裝置能有效提升聽損患者的語音辨識閱值，在安靜環境中提升約5.5 dB，噪聲環境中則是5.8dB，並根據李克特量表，受試者對於該裝置在兩種環境下的滿意度也超過3分，展示了該裝置在未來產品化的潛力。

The demand for assistive listening devices (ALDs) has been increasing globally. These devices not only aim to enhance auditory clarity for users but also seek to augment their inherent ability to perceive sound sources accurately, thereby allowing users to accurately discern the location of sound origins and mitigate potential risks associated with daily life activities. However, current ALDs predominantly capture sound solely from the frontal direction, neglecting sounds originating from the sides or rear. This deficiency can lead to hazardous situations, such as failing to detect warning signals from behind while walking through busy streets. In response to this, the research team previously proposed a speech tracking system that utilized time difference of arrival (TDOA) for 360-degree scanning and employed beamforming technology to amplify the target sound while suppressing environmental noise. Nonetheless, this system's inability to ascertain the vertical position of the sound target resulted in inconsistent volume output, causing discomfort to users. Hence, this study proposes an integrated approach encompassing computer vision technology (CV), a dual-layer differential microphone array algorithm (d-DMA), sound localization, and Mixing Algorithm to develop an immersive target speech localization and enhancement device suited for daily life. This device primarily caters to individuals with mild to moderate hearing impairments, with an initial wearable prototype constructed. The device enhances target speech and mitigates noise interference, achieving 360-degree auditory source perception through dual-channel immersive audio output. This contributes to reducing users' life risks arising from the inability to ascertain environmental sound source locations. Additionally, this research builds upon the d-DMA architecture and algorithm to enhance the performance of prior proposed auditory assistance devices. The proposed device addresses the issue of varying volume output resulting from the inability to determine the vertical position of the sound target. These advancements better align with the needs of hearing- impaired individuals. Experimental results demonstrate that the proposed device achieves an image localization accuracy exceeding 93% within normal conversational distances (<160 cm), and a sound localization accuracy of 88%. The stability of volume output is improved by 60% for steady-state sound sources and 30% for non-steady-state sound sources. Clinical findings reveal that the device effectively enhances speech recognition thresholds for individuals with hearing loss by approximately 5.5 dB in quiet environments and 5.8 dB in noisy environments. Additionally, participants' satisfaction ratings on the Likert scale exceeded 3 for both environments, underscoring the device's potential for future commercialization.

游向前