研究主題 Researches

2025 整合口內鏡與AI空間定位技術建置創新口腔期吞嚥功能評估系統

吞嚥困難為高齡者、神經退行性疾病與頭頸癌術後常見之問題,若未及時辨識與介入,恐導致營養不良、脫水或吸入性肺炎等後果。舌頭動作在口腔期吞嚥與語音構音中扮演關鍵角色,然而現行臨床多仰賴舌壓量測或語言治療師主觀觀察,缺乏即時、可量化之舌頭深度資訊取得方式,難以全面反映舌頭運動功能。本研究提出一套結合口內影像、人工智慧自動分割與單目深度估計之創新分析架構,透過微型口內攝影裝置,結合深度學習模型進行舌頭區域自動偵測,並進一步估測影像中舌頭的深度資訊。系統於受試者發音/a/、/i/、/u/時,自動擷取三個舌頭特徵點,計算其三角形面積作為量化舌頭活動幅度的客觀指標,並建立數位孿生資料集以模擬各種舌形變化與攝影視角,作為深度估計模型訓練之用。透過模型舌體及實際臨床影像的驗證,系統在非剛體運動場景下展現高精度(誤差小於6 mm),且自動計算結果與專家標註高度一致。進一步分析健康組與吞嚥困難組在不同發音任務下的三角面積差異,發現本系統能顯著區分兩組(p < 0.05),並展現優於傳統聲學指標元音空間面積(VSA)的穩定性。整體而言,該技術具備非侵入性、即時量測及高客觀性的優勢,有助於臨床吞嚥障礙診斷、語音復健療程評估及居家健康長期監測等多元應用場域。

 

Dysphagia is a prevalent condition among elderly individuals, patients with neurodegenerative diseases, and those recovering from head and neck cancer surgery. Without early detection and timely intervention, dysphagia can lead to serious complications, including malnutrition, dehydration, and aspiration pneumonia, placing significant burdens on both individuals and healthcare systems. Tongue motion is crucial in the oral phase of swallowing and in speech articulation, yet current clinical assessment methods, such as tongue pressure measurement devices or subjective visual inspection by speech-language pathologists, are limited in their ability to provide real-time, objective, and comprehensive analysis of tongue motion. This often results in missed early signs of dysfunction and hampers effective rehabilitation and disease management. In this study, we propose a novel analytical framework that integrates intraoral imaging, automated segmentation, and monocular depth estimation to enable real-time, quantitative assessment of tongue motion. Utilizing a miniature intraoral camera in conjunction with deep learning models, the system automatically detects the tongue region and further estimates depth information from the captured images. Three articulation points are extracted while the subject pronounces the vowels /a/, /i/, and /u/, and the area of the triangle formed by these points is calculated as an objective index of tongue motion. To ensure robust depth estimation, a digital twin dataset simulating various tongue shapes and camera perspectives is generated for model training. Validation experiments with tongue phantoms and clinical images confirm high accuracy in non-rigid scenarios (error less than 6 mm), and the automatically computed results are highly consistent with expert annotations. Further analysis demonstrates that the proposed system can effectively distinguish between healthy individuals and those with dysphagia across different articulation tasks (p < 0.05), and exhibits greater stability compared to traditional acoustic indices such as Vowel Space Area (VSA). Overall, this technology offers a noninvasive, real-time, and objective approach for assessing tongue motion, with potential applications in clinical dysphagia diagnosis, speech rehabilitation, long-term home-based health monitoring, and broader applications in personalized medicine and remote care.

黃怡靜