欢迎访问《空军工程大学学报》官方网站!

咨询热线:029-84786242 RSS EMAIL-ALERT
基于领域BERT模型的服务文本分类方法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:


A Service Text Classification Method Based on Domain BERT Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对BERT模型领域适应能力较差,无法解决训练数据类别数量不均衡和分类难易不均衡等问题,提出一种基于WBBI模型的服务文本分类方法。首先通过TF\|IDF算法提取领域语料中的词汇扩展BERT词表,提升了BERT模型的领域适应性;其次,通过建立的BERT\|BiLSTM模型实现服务文本分类;最后,针对数据集的类别数量不均衡和分类难易不均衡问题,在传统焦点损失函数的基础上提出了一种可以根据样本不均衡性特点动态调整的变焦损失函数。为了验证WBBI模型的性能,在互联网获取的真实数据集上进行了大量对比试验,实验结果表明:WBBI模型与通用文本分类模型TextCNN、BiLSTM\|attention、RCNN、Transformer相比Macro\|F1值分别提高了4.29%、6.59%、5.3%和43%;与基于BERT的文本分类模型BERT\|CNN、BERT\|DPCNN相比,WBBI模型具有更快的收敛速度和更好的分类效果。

    Abstract:

    Aimed at the problems that BERT model is poor in domain adaptability, and unable to cope with the problems of uneven number of training data categories and unbalanced classification difficulty, a service text classification method is proposed based on WBBI model. Firstly, the domain adaptability to the BERT model is improved by extending the BERT word list by extracting words from the domain corpus through the TF\|IDF algorithm. Secondly, the service text classification is achieved by the established BERT\|BiLSTM model. Finally, in view of the problems of unbalanced number of categories and unbalanced classification difficulty of the dataset, a zoom loss function is proposedwhich can be dynamically adjusted according to the characteristics of sample unbalance on the basis of the traditional focus loss function. In order to verify the performance of the WBBI model, a large number of comparative experiments are conducted on real datasets obtained from the Internet, and their experimental results show that the WBBI model improves the Macro\|F1 values by 4.29%, 6.59%, 5.3%, and 43% respectively incomparison with the generic text classification models TextCNN, BiLSTM\|attention, RCNN, and Transformer. Compared with the BERT\|based text classification models BERT\|CNN and BERT\|DPCNN, the WBBI model goes further at convergence rate and classifies still better results.

    参考文献
    相似文献
    引证文献
引用本文

闫云飞, 孙鹏, 张杰勇, 马钰棠, 赵亮.基于领域BERT模型的服务文本分类方法[J].空军工程大学学报,2023,24(1):103-111

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-03-01
  • 出版日期: 2023-02-25