欢迎访问《空军工程大学学报》官方网站!

咨询热线:029-84786242 RSS EMAIL-ALERT
并行MapReduce模型下的一种改进型KNN分类算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TN391

基金项目:

陕西省科技计划自然基金重点项目(2012JZ8005)


Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    大数据时代带来数据处理模式的变革,依托Hadoop分布式编程框架处理大数据问题是当前该领域的研究热点之一。为解决海量数据挖掘中的分类问题,提出基于一种双度量中心索引KNN分类算法。该算法在针对存在类别域的交叉或重叠较多的大数据,先对训练集进行中心点的确定,通过计算分类集与训练集中心点的欧式距离,确定最相似的3个类别,然后以余弦距离为度量,通过索引选择找出K个近邻点,经过MapReduce编程框架对KNN并行计算加以实现。最后在UCI数据库进行比较验证,结果表明提出的并行化改进算法虽然准确率略有提高,运算效率得到了极大提高。

    Abstract:

    Big data era has a revolution on the data processing mode,and the way dealing with bigdata by Hadoop distributed framework becomes one of the most popular research topic.Cloud computing model of clusters covers the shortage of the large amount of calculation and time-consuming of traditional non-distributed algorithm, meanwhile huge amounts of unstructured data increases the difficulty of data utilization.Aimed at the problem of solving the mass classification in data mining, this essay puts forward a algorithm, i.e. Bi-Measurement Central Index KNN Classification. And the algorithm mainly deals with in the field of the cross or overlap data. First, the essay is to find center of training data, then calculate the Euclidean distance between classifying data and training sites, and determine the most similar to the three categories. In addition, the essay selects k nearest neighbor points by the cosine distance metric, and computes the results by MapReduce. Finally, the UCI database is compared with and verified. The results show that though the amplitude of improving the accuracy by the proposed algorithm is not very great, the efficiency of the algorithm is greatly improved.

    参考文献
    相似文献
    引证文献
引用本文

韦泽鲲,夏靖波,付凯,申建,陈珍.并行MapReduce模型下的一种改进型KNN分类算法[J].空军工程大学学报,2017,18(1):92-98

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2017-03-14
  • 出版日期: