Study on Tibetan Speech Recognition Based on Ultrasonic Tongue Imaging, Lip Video and Audio | |
Zhang, JinXi; Li, ChunLing | |
2022 | |
会议名称 | 5th International Conference on Computer Information Science and Application Technology, CISAT 2022 |
会议录名称 | Proceedings of SPIE - The International Society for Optical Engineering |
卷号 | 12451 |
会议日期 | July 29, 2022 - July 31, 2022 |
会议地点 | Chongqing, China |
会议录编者/会议主办者 | Guangzhou Computer Society |
出版者 | SPIE |
摘要 | Speech recognition system is a pattern recognition system in essence, including feature extraction, pattern matching, reference pattern library and other three basic units.This paper implements a speech recognition system based on DNN-HMM structure with Kaldi toolkit. For multiple pronunciators, it can have synchronous acquisition of the tongue body morphology, lipping, pronunciation and other multiple physiological signals of Tibetan Lhasa dialect and process the data for feature extraction, model training, voice recognition, etc. The result shows that recognition results may vary according to different pronunciator models or articulation parts and the pronunciators of different articulation parts have low dependent recognition error rate but high independent recognition error rate. It also shows that ultrasonic tongue imaging and lip video image sequence includes pronunciator's acoustic feature information, which provide great significance for the study of voice recognition, particularly the speech synthesis and silent speech interface. © 2022 SPIE. |
关键词 | Feature extraction Medical imaging Pattern matching Physiological models Speech recognition Speech synthesis Error rate Extraction patterns Features extraction Lip video Recognition error Speech recognition systems Tibetan corpus Tibetans Utrasonic tongue imaging Voice and audio |
DOI | 10.1117/12.2656631 |
收录类别 | EI |
语种 | 英语 |
EI入藏号 | 20224713138860 |
EI主题词 | Extraction |
EI分类号 | 461.1 Biomedical Engineering ; 746 Imaging Techniques ; 751.5 Speech ; 802.3 Chemical Operations |
原始文献类型 | Conference article (CA) |
文献类型 | 会议论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/33108 |
专题 | 商务传媒学院 统计与数据科学学院 |
作者单位 | Lanzhou University of Finance and Economics, Gansu, Lanzhou; 730020, China |
推荐引用方式 GB/T 7714 | Zhang, JinXi,Li, ChunLing. Study on Tibetan Speech Recognition Based on Ultrasonic Tongue Imaging, Lip Video and Audio[C]//Guangzhou Computer Society:SPIE,2022. |
条目包含的文件 | 条目无相关文件。 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Zhang, JinXi]的文章 |
[Li, ChunLing]的文章 |
百度学术 |
百度学术中相似的文章 |
[Zhang, JinXi]的文章 |
[Li, ChunLing]的文章 |
必应学术 |
必应学术中相似的文章 |
[Zhang, JinXi]的文章 |
[Li, ChunLing]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论