Institutional Repository of School of Information Engineering and Artificial Intelligence
作者 | 张晓蝶 |
姓名汉语拼音 | zhangxiaodie |
学号 | 2021000010015 |
培养单位 | 兰州财经大学 |
电话 | 18332203360 |
电子邮件 | 1480198338@qq.com |
入学年份 | 2021-9 |
学位类别 | 学术硕士 |
培养级别 | 硕士研究生 |
学科门类 | 管理学 |
一级学科名称 | 管理科学与工程 |
学科方向 | 无 |
学科代码 | 1201 |
第一导师姓名 | 杨海军 |
第一导师姓名汉语拼音 | yanghaijun |
第一导师单位 | 兰州财经大学 |
第一导师职称 | 教授 |
题名 | 高效用项集挖掘算法改进研究 |
英文题名 | Research on improving high utility itemsets mining algorithms |
关键词 | 高效用项集 投影数据库 事务合并 并行计算 Thread Pool |
外文关键词 | High utility itemset; ; Projected batabase ; Transaction merging ; Parallel computing ; Thread Pool |
摘要 | 高效用项集挖掘考虑项的重要性、利润、用户偏好等因素,给项赋予不同的权重以计算项集的效用,更加满足实际应用的需求。高效用项集挖掘领域是当前数据挖掘的研究热点之一。目前,数据的爆炸式增长给高效用项集挖掘算法的研究提出了新的挑战。 通过文献分析发现,采用utility-list数据结构的一类高效用项集挖掘算法,可以进一步减少在挖掘过程中的连接操作。本文对使用utility-list的经典算法HUI-Miner和ULB-Miner在连接操作等方面进行改进,分别提出了改进算法HUIwTMS-Miner和ULBwTMS-Miner,并对其进行实验验证。由于单机多核处理器是实现复杂并行计算系统的基础之一,在单机多核处理器环境下的研究可以从微观的角度揭示并行计算的原理。本文在单机多核处理器环境下,利用Thread Pool框架分别设计、实现了ULB-Miner算法和ULBwTMS-Miner算法的并行算法PULB-Miner和PULBwTMS-Miner。主要研究工作如下: (1)结合投影数据库技术和事务合并策略,分别对经典算法HUI-Miner、ULB-Miner进行改进。根据≻T顺顺序对事务进行排序,使用剩余效用剪枝对1-项集进行二次剪枝,减少投影数据库的生成数量,同时对相同事务进行合并。本研究在SPMF公开资源库中的Retail、Pumsb、Connect、Mushroom、eCommerce数据集上分别设定了5个不同最小效用阈值,进行了算法性能对比实验。实验结果显示,HUIwTMS-Miner算法除在极稀疏数据集Retail外的稀疏数据集eCommerce和稠密数据集Pumsb、Connect、Mushroom上的运行时间明显减少,而内存消耗有所增加;ULBwTMS-Miner算法在稠密数据集Pumsb、Connect、Mushroom上的运行时间明显减少,在除Retail外的其他数据集上的内存消耗有所增加。 (2)设计了ULB-Miner算法的并行算法PULB-Miner。在单机多核处理器环境下,利用Thread Pool框架实现PULB-Miner算法。分别在Retail、Pumsb、Connect、Mushroom、eCommerce数据集的不同最小效用阈值、不同线程进行实验,对比分析ULB-Miner算法、PULB-Miner算法的性能。实验结果显示,PULB-Miner算法在五个数据集上均表现良好,表明该算法具有可行性和有效性。 (3)设计了ULBwTMS-Miner算法的并行算法PULBwTMS-Miner。在单机多核处理器环境下,利用Thread Pool框架实现PULBwTMS-Miner算法。分别在Retail、Pumsb、Connect、Mushroom、eCommerce数据集的不同最小效用阈值、不同线程进行实验,对比分析了ULBwTMS-Miner算法和PULBwTMS-Miner算法的性能,同时还对比分析了PULB-Miner算法和PULBwTMS-Miner算法的性能。实验结果显示,PULBwTMS-Miner算法在运行时间上明显低于ULBwTMS-Miner算法,内存消耗略有增加,在稠密数据集Pumsb、Connect、Mushroom上的运行时间明显低于PULB-Miner算法,内存消耗也减少,表明该算法具有可行性和有效性。 |
英文摘要 | High utility itemsets mining considers factors such as the importance, profit, and user preferences of items, assigns different weights to items to calculate the utility of itemsets, and better meets the needs of practical applications. The field of high utility itemsets mining is one of the current research hotspots in data mining. At present, the explosive growth of data poses new challenges to the research of high utility itemsets mining algorithms. Through literature analysis, it was found that a class of high utility itemsets mining algorithms using utility-list data structures can further reduce the number of connection operations during the mining process. This article improves the classic algorithms HUI-Miner and ULB-Miner using utility-list in connection operations, and proposes improved algorithms HUIwTMS-Miner and ULBwTMS-Miner, respectively, and conducts experimental verification on them. Due to the fact that single machine multi-core processors are one of the foundations for implementing complex parallel computing systems, research in single machine multi-core processor environments can reveal the principles of parallel computing from a microscopic perspective. This article designs and implements parallel algorithms PULB-Miner and PULBwTMS-Miner for ULB-Miner and ULBwTMS-Miner algorithms using the Thread Pool framework in a single machine multi-core processor environment. The main research work is as follows: (1)By combining projection database technology and transaction merging strategy, improvements are made to the classic algorithms HUI-Miner and ULB-Miner, respectively. Sort transactions in ≻T order, use residual utility pruning to perform secondary pruning on the 1-itemset, reduce the number of generated projection databases, and merge the same transactions. This study set 5 different minimum utility thresholds on the Retail, Pumsb, Connect, Mushroom, and eCommerce datasets in the SPMF public resource library and conducted algorithm performance comparison experiments. The experimental results show that the HUIwTMS-Miner algorithm significantly reduces running time on sparse datasets eCommerce and dense datasets Pumsb, Connect, and Mushroom, except for the extremely sparse dataset Retail, while increasing memory consumption; The ULBwTMS-Miner algorithm significantly reduces runtime on dense datasets such as Pumsb, Connect, and Mushroom, while increasing memory consumption on datasets other than Retail. (2)Designed a parallel algorithm called PULB-Miner for the ULB-Miner algorithm. Implement the PULB-Miner algorithm using the Thread Pool framework in a single machine multi-core processor environment. Conduct experiments on different minimum utility thresholds and threads on the Retail, Pumsb, Connect, Mushroom, and eCommerce datasets to compare and analyze the performance of ULB-Miner algorithm and PULB-Miner algorithm. The experimental results show that the PULB-Miner algorithm performs well on all five datasets, indicating its feasibility and effectiveness. (3)Designed a parallel algorithm called PULBwTMS-Miner for the ULBwTMS-Miner algorithm. Implement the PULBwTMS-Miner algorithm using the Thread Pool framework in a single machine multi-core processor environment. We conducted experiments on different minimum utility thresholds and threads on the Retail, Pumsb, Connect, Mushroom, and eCommerce datasets to compare and analyze the performance of ULBwTMS-Miner algorithm and PULBwTMS-Miner algorithm. We also compared and analyzed the performance of PULB-Miner algorithm and PULBwTMS-Miner algorithm. The experimental results show that the PULBwTMS-Miner algorithm has significantly lower runtime than the ULBwTMS-Miner algorithm, with a slight increase in memory consumption. The runtime on dense datasets such as Pumsb, Connect, and Mushroom is significantly lower than the PULB-Miner algorithm, and memory consumption is also reduced, indicating the feasibility and effectiveness of this algorithm. |
学位类型 | 硕士 |
答辩日期 | 2024-05-18 |
学位授予地点 | 甘肃省兰州市 |
语种 | 中文 |
论文总页数 | 92 |
参考文献总数 | 58 |
馆藏号 | 0006295 |
保密级别 | 公开 |
中图分类号 | C93/99 |
文献类型 | 学位论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/36869 |
专题 | 信息工程与人工智能学院 |
推荐引用方式 GB/T 7714 | 张晓蝶. 高效用项集挖掘算法改进研究[D]. 甘肃省兰州市. 兰州财经大学,2024. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
2021000010015.pdf(3072KB) | 学位论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[张晓蝶]的文章 |
百度学术 |
百度学术中相似的文章 |
[张晓蝶]的文章 |
必应学术 |
必应学术中相似的文章 |
[张晓蝶]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论