作者张晓蝶
姓名汉语拼音zhangxiaodie
学号2021000010015
培养单位兰州财经大学
电话18332203360
电子邮件1480198338@qq.com
入学年份2021-9
学位类别学术硕士
培养级别硕士研究生
学科门类管理学
一级学科名称管理科学与工程
学科方向
学科代码1201
第一导师姓名杨海军
第一导师姓名汉语拼音yanghaijun
第一导师单位兰州财经大学
第一导师职称教授
题名高效用项集挖掘算法改进研究
英文题名Research on improving high utility itemsets mining algorithms
关键词高效用项集 投影数据库 事务合并 并行计算 Thread Pool
外文关键词High utility itemset; ; Projected batabase ; Transaction merging ; Parallel computing ; Thread Pool
摘要

  高效用项集挖掘考虑项的重要性、利润、用户偏好等因素,给项赋予不同的权重以计算项集的效用,更加满足实际应用的需求。高效用项集挖掘领域是当前数据挖掘的研究热点之一。目前,数据的爆炸式增长高效用项集挖掘算法的研究提出了新的挑战。

  通过文献分析发现,采用utility-list数据结构的一类高效用项集挖掘算法,可以进一步减少在挖掘过程中的连接操作。本文对使用utility-list的经典算法HUI-MinerULB-Miner在连接操作等方面进行改进,分别提出了改进算法HUIwTMS-MinerULBwTMS-Miner对其进行实验验证。由于单机多核处理器是实现复杂并行计算系统的基础之一,在单机多核处理器环境下的研究可以从微观的角度揭示并行计算的原理。本文在单机多核处理器环境下,利用Thread Pool框架分别设计、实现了ULB-Miner算法和ULBwTMS-Miner算法的并行算法PULB-MinerPULBwTMS-Miner主要研究工作如下:

(1)结合投影数据库技术和事务合并策略,分别对经典算法HUI-MinerULB-Miner进行改进。根据T顺序对事务进行排序,使用剩余效用剪枝对1-项集进行二次剪枝,减少投影数据库的生成数量,同时对相同事务进行合并。本研究SPMF公开资源库中的RetailPumsbConnectMushroomeCommerce数据集上分别设定了5不同最小效用阈值进行了算法性能对比实验。实验结果显示HUIwTMS-Miner算法除在极稀疏数据集Retail外的稀疏数据集eCommerce和稠密数据集PumsbConnectMushroom上的运行时间明显减少,而内存消耗有所增加;ULBwTMS-Miner算法在稠密数据集PumsbConnectMushroom的运行时间明显减少,在除Retail外的其他数据集上的内存消耗有所增加

2)设计ULB-Miner算法的并行算法PULB-Miner在单机多核处理器环境下,利用Thread Pool框架实现PULB-Miner算法。分别RetailPumsbConnectMushroomeCommerce数据集的不同最小效用阈值、不同线程进行实验,对比分析ULB-Miner算法PULB-Miner算法的性能。实验结果显示PULB-Miner算法在五个数据集上均表现良好表明该算法具有可行性和有效性。

3)设计ULBwTMS-Miner算法的并行算法PULBwTMS-Miner在单机多核处理器环境下,利用Thread Pool框架实现PULBwTMS-Miner算法。分别RetailPumsbConnectMushroomeCommerce数据集的不同最小效用阈值、不同线程进行实验,对比分析ULBwTMS-Miner算法PULBwTMS-Miner算法的性能,同时还对比分析PULB-Miner算法PULBwTMS-Miner算法的性能。实验结果显示PULBwTMS-Miner算法在运行时间上明显低于ULBwTMS-Miner算法内存消耗略有增加,在稠密数据集PumsbConnectMushroom上的运行时间明显低于PULB-Miner算法内存消耗也减少,表明该算法具有可行性和有效性。

英文摘要

  High utility itemsets mining considers factors such as the importance, profit, and user preferences of items, assigns different weights to items to calculate the utility of itemsets, and better meets the needs of practical applications. The field of high utility itemsets mining is one of the current research hotspots in data mining. At present, the explosive growth of data poses new challenges to the research of high utility itemsets mining algorithms.

  Through literature analysis, it was found that a class of high utility itemsets mining algorithms using utility-list data structures can further reduce the number of connection operations during the mining process. This article improves the classic algorithms HUI-Miner and ULB-Miner using utility-list in connection operations, and proposes improved algorithms HUIwTMS-Miner and ULBwTMS-Miner, respectively, and conducts experimental verification on them. Due to the fact that single machine multi-core processors are one of the foundations for implementing complex parallel computing systems, research in single machine multi-core processor environments can reveal the principles of parallel computing from a microscopic perspective. This article designs and implements parallel algorithms PULB-Miner and PULBwTMS-Miner for ULB-Miner and ULBwTMS-Miner algorithms using the Thread Pool framework in a single machine multi-core processor environment. The main research work is as follows:

(1)By combining projection database technology and transaction merging strategy, improvements are made to the classic algorithms HUI-Miner and ULB-Miner, respectively. Sort transactions in  T  order, use residual utility pruning to perform secondary pruning on the 1-itemset, reduce the number of generated projection databases, and merge the same transactions. This study set 5 different minimum utility thresholds on the Retail, Pumsb, Connect, Mushroom, and eCommerce datasets in the SPMF public resource library and conducted algorithm performance comparison experiments. The experimental results show that the HUIwTMS-Miner algorithm significantly reduces running time on sparse datasets eCommerce and dense datasets Pumsb, Connect, and Mushroom, except for the extremely sparse dataset Retail, while increasing memory consumption; The ULBwTMS-Miner algorithm significantly reduces runtime on dense datasets such as Pumsb, Connect, and Mushroom, while increasing memory consumption on datasets other than Retail.

(2)Designed a parallel algorithm called PULB-Miner for the ULB-Miner algorithm. Implement the PULB-Miner algorithm using the Thread Pool framework in a single machine multi-core processor environment. Conduct experiments on different minimum utility thresholds and threads on the Retail, Pumsb, Connect, Mushroom, and eCommerce datasets to compare and analyze the performance of ULB-Miner algorithm and PULB-Miner algorithm. The experimental results show that the PULB-Miner algorithm performs well on all five datasets, indicating its feasibility and effectiveness.

(3)Designed a parallel algorithm called PULBwTMS-Miner for the ULBwTMS-Miner algorithm. Implement the PULBwTMS-Miner algorithm using the Thread Pool framework in a single machine multi-core processor environment. We conducted experiments on different minimum utility thresholds and threads on the Retail, Pumsb, Connect, Mushroom, and eCommerce datasets to compare and analyze the performance of ULBwTMS-Miner algorithm and PULBwTMS-Miner algorithm. We also compared and analyzed the performance of PULB-Miner algorithm and PULBwTMS-Miner algorithm. The experimental results show that the PULBwTMS-Miner algorithm has significantly lower runtime than the ULBwTMS-Miner algorithm, with a slight increase in memory consumption. The runtime on dense datasets such as Pumsb, Connect, and Mushroom is significantly lower than the PULB-Miner algorithm, and memory consumption is also reduced, indicating the feasibility and effectiveness of this algorithm.

学位类型硕士
答辩日期2024-05-18
学位授予地点甘肃省兰州市
语种中文
论文总页数92
参考文献总数58
馆藏号0006295
保密级别公开
中图分类号C93/99
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/36869
专题信息工程与人工智能学院
推荐引用方式
GB/T 7714
张晓蝶. 高效用项集挖掘算法改进研究[D]. 甘肃省兰州市. 兰州财经大学,2024.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2021000010015.pdf(3072KB)学位论文 开放获取CC BY-NC-SA浏览 下载
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[张晓蝶]的文章
百度学术
百度学术中相似的文章
[张晓蝶]的文章
必应学术
必应学术中相似的文章
[张晓蝶]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2021000010015.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。