Study on Tibetan Speech Recognition Based on Ultrasonic Tongue Imaging, Lip Video and Audio
Zhang, JinXi; Li, ChunLing
2022
会议名称5th International Conference on Computer Information Science and Application Technology, CISAT 2022
会议录名称Proceedings of SPIE - The International Society for Optical Engineering
卷号12451
会议日期July 29, 2022 - July 31, 2022
会议地点Chongqing, China
会议录编者/会议主办者Guangzhou Computer Society
出版者SPIE
摘要Speech recognition system is a pattern recognition system in essence, including feature extraction, pattern matching, reference pattern library and other three basic units.This paper implements a speech recognition system based on DNN-HMM structure with Kaldi toolkit. For multiple pronunciators, it can have synchronous acquisition of the tongue body morphology, lipping, pronunciation and other multiple physiological signals of Tibetan Lhasa dialect and process the data for feature extraction, model training, voice recognition, etc. The result shows that recognition results may vary according to different pronunciator models or articulation parts and the pronunciators of different articulation parts have low dependent recognition error rate but high independent recognition error rate. It also shows that ultrasonic tongue imaging and lip video image sequence includes pronunciator's acoustic feature information, which provide great significance for the study of voice recognition, particularly the speech synthesis and silent speech interface. © 2022 SPIE.
关键词Feature extraction Medical imaging Pattern matching Physiological models Speech recognition Speech synthesis Error rate Extraction patterns Features extraction Lip video Recognition error Speech recognition systems Tibetan corpus Tibetans Utrasonic tongue imaging Voice and audio
DOI10.1117/12.2656631
收录类别EI
语种英语
EI入藏号20224713138860
EI主题词Extraction
EI分类号461.1 Biomedical Engineering ; 746 Imaging Techniques ; 751.5 Speech ; 802.3 Chemical Operations
原始文献类型Conference article (CA)
文献类型会议论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/33108
专题商务传媒学院
统计与数据科学学院
作者单位Lanzhou University of Finance and Economics, Gansu, Lanzhou; 730020, China
推荐引用方式
GB/T 7714
Zhang, JinXi,Li, ChunLing. Study on Tibetan Speech Recognition Based on Ultrasonic Tongue Imaging, Lip Video and Audio[C]//Guangzhou Computer Society:SPIE,2022.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Zhang, JinXi]的文章
[Li, ChunLing]的文章
百度学术
百度学术中相似的文章
[Zhang, JinXi]的文章
[Li, ChunLing]的文章
必应学术
必应学术中相似的文章
[Zhang, JinXi]的文章
[Li, ChunLing]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。