作者李伦珑
姓名汉语拼音Li Lunlong
学号2018000003121
培养单位兰州财经大学
电话18895678006
电子邮件cria95@foxmail.com
入学年份2018-9
学位类别专业硕士
培养级别硕士研究生
一级学科名称应用统计
学科代码0252
授予学位应用统计硕士专业学位
第一导师姓名王永瑜
第一导师姓名汉语拼音Wang Yongyu
第一导师单位兰州财经大学
第一导师职称教授
题名基于LDA模型的中国古典诗词在不同历史时期的主题发现
英文题名Theme discovery of Chinese classical poetry in different historical periods based on LDA model
关键词LDA模型 古典诗词 不同历史时期 主题发现
外文关键词LDA model ; Classical poetry ; Different historical periods ; Theme discovery
摘要

古典诗词是中国文学史上重要的文化形式之一,对于记录当时文人的所思所想,以及反映诗词所处时代政治、经济、文化与社会背景具有重要的意义。然而,中国历史各朝各代留存下来的古典诗词数不胜数,如果要从宏观的角度来分析这些诗词,研究者要翻阅、整理大量的书籍,以至于在诗词收集整理阶段就需要投入更多的时间与精力,而且后期的统计与分析阶段,也会因为研究者的主观偏好,带来结论的不客观性。诗词是历史的沉淀,同时影响深远,也在塑造当时的历史。古典诗词与当时历史时期的写作背景有密切的关系。丰富的古典诗词、悠久的历史是中华民族的长期以来不断汲取文化营养的书籍。

主题是文学作品或者社会活动等所要表现的中心思想,分析诗词主题是研究诗词创作背景的重要渠道,这有助于勾勒社会样貌。从时间序列角度出发,通过对古典诗词不同历史时期的主题进行研究,比较不同历史时期的古典诗词主题差异,呈现主题内容演化。无论是盛世时代、还是乱世时代,反映在古典诗词作品中的情感与内容都可以帮助我们更好得理解一个健康且进步的社会形态,并最终引领现实社会,推动社会向前发展。

LDALatent Dirichlet Allocation)模型是一种经典主题概率生成模型。LDA模型通常也被叫做三层贝叶斯模型,即包含词项、主题和文档三层结构。作为一种无监督机器学习方法,LDA模型擅长识别规模较大的文档数据集中潜在的主题。本文的古典诗词数据按照各个历史时期进行划分,共计738881条数据,结合LDA模型进行文本挖掘,共获得46个有效主题,分布在不同的历史时期。主要使用Python语言完成对数据的预处理和LDA模型的建模和实现,并把具有相同对象的主题划分为不同的主题类,文章以“国家意识”主题类为例对主题内容演化进行可视化分析,可以看出,“国家意识”的转变经历了三个阶段,一个是以魏晋、南北朝、隋为代表的圣皇崇拜阶段,一个是以唐、宋为代表的国家主体观阶段,一个是以明、清、近代为代表的人民主体观阶段,“国家意识”概念逐渐下沉,同时也变得越来越广;通过计算主题-文档概率矩阵,计算不同主题下的主题强度,利用主题强度的大小排序,结合诗词文本内容,以分析不同历史时期的古典诗词的时代背景,可以看出王朝的更迭、社会的动荡、人口的迁徙、科举的鼎盛、文化的繁荣、经济的复苏、军事的强盛与革命的乐观等都反映在了各个时代文人的精神面貌当中。

本文研究认为,各个历史时期古典诗词的主题大致存在一定的上下波动,无论是国家意识类,人生羁旅类,离愁类,还是爱情、友情、乡情类等,但由于历史阶段社会背景的不同,各个主题存在一定的差异,并且强度不一,社会演化趋势显著。通过古典诗词的文本分析,从宏观角度丰富了中国古典诗词的计算化研究。同时,借古思今,为后来研究者能够从政治、经济、文化、教育、军事等在社会中所处的重要地位出发,为当今社会坚定“四个自信”意识,推进“五位一体”总体布局,建设一个更好的社会提供参考和启迪。

英文摘要

Classical poetry is one of the important cultural forms in the history of Chinese literature. It is of great significance to record what the literati thought at that time, and to reflect the political, economic, cultural and social background of the times. However, there are so many classical poems in Chinese history. If we want to analyze these poems from a macro perspective, researchers have to read and sort out a large number of books, so that they need to invest more time and energy in the poetry collection and sorting stage. Moreover, in the later stage of statistics and analysis, because of the subjective preference of researchers, The conclusion is not objective. Poetry is the precipitation of history, at the same time, it has far-reaching influence, and it is also shaping the history at that time. Classical poetry is closely related to the writing background of that historical period. Rich classical poetry and a long history are the books of the Chinese nation that have been absorbing cultural nutrition for a long time.

Theme is the central idea of literary works or social activities. Analyzing the theme of poetry is an important channel to study the background of poetry creation, which helps to outline the social appearance. From the perspective of time series, this paper studies the theme of classical poetry in different historical periods, compares the theme differences of classical poetry in different historical periods, and presents the evolution of theme content. Whether it is the prosperous age or the turbulent age, the emotions and contents reflected in the classical poetry can help us better understand a healthy and progressive social form, and ultimately lead the real society and promote social development.

LDA (latent Dirichlet allocation) model is a classical topic probability generation model. LDA model is also known as three-tier Bayesian model, which includes three-tier structure of words, topics and documents. As an unsupervised machine learning method, LDA model is good at identifying potential topics in large-scale document datasets. In this paper, the classical poetry data is divided according to each historical period, a total of 738881 data, combined with LDA model for text mining, a total of 46 effective topics, distributed in different historical periods. This paper mainly uses Python language to complete the data preprocessing and LDA model modeling and implementation, and divides the theme with the same object into different theme classes. Taking the theme class of "national consciousness" as an example, this paper makes a visual analysis of the evolution of the theme content. It can be seen that the transformation of "national consciousness" has gone through three stages: one is the Wei and Jin Dynasties, the northern and Southern Dynasties, the Southern Dynasties, the northern and Southern Dynasties During the period of emperor worship represented by Sui Dynasty, the concept of "national consciousness" gradually sank and became more and more popular; By calculating the theme document probability matrix, the theme intensity of different themes is calculated, and the order of theme intensity is used to analyze the background of classical poetry in different historical periods. We can see the change of dynasties, social unrest, migration of people, prosperity of imperial examination, prosperity of culture, recovery of economy, and the change of culture The prosperity of the military and the optimism of the revolution are reflected in the mental outlook of the literati in various times.

This paper argues that the themes of classical poetry in different historical periods fluctuate up and down, whether it's national consciousness, life fetters, loneliness, or love, friendship, nostalgia, etc. but due to the different social backgrounds in different historical stages, there are certain differences in various themes, and the intensity is different, and the social evolution trend is significant. Through the text analysis of classical poetry, it enriches the computational research of Chinese classical poetry from a macro perspective. At the same time, thinking about the present through the past can provide reference and Enlightenment for later researchers to start from the important position of politics, economy, culture, education and military in the society, to strengthen the "four self-confidence" consciousness, to promote the "five in one" overall layout, and to build a better society.

学位类型硕士
答辩日期2021-05
学位授予地点甘肃省兰州市
研究方向大数据分析
语种中文
论文总页数71
参考文献总数70
馆藏号0003672
保密级别公开
中图分类号C8/277
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/29478
专题统计与数据科学学院
推荐引用方式
GB/T 7714
李伦珑. 基于LDA模型的中国古典诗词在不同历史时期的主题发现[D]. 甘肃省兰州市. 兰州财经大学,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
基于LDA模型的中国古典诗词在不同历史时(3543KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[李伦珑]的文章
百度学术
百度学术中相似的文章
[李伦珑]的文章
必应学术
必应学术中相似的文章
[李伦珑]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。