电子科技 ›› 2023, Vol. 36 ›› Issue (11): 35-40.doi: 10.16180/j.cnki.issn1007-7820.2023.11.006

• • 上一篇    下一篇

基于BERT的汽车生产设备故障领域命名实体识别

倪骥,王宇嘉,赵博   

  1. 上海工程技术大学 电子电气工程学院,上海 201620
  • 收稿日期:2022-04-06 出版日期:2023-11-15 发布日期:2023-11-20
  • 作者简介:倪骥(1998-),男,硕士研究生。研究方向:自然语言处理。|王宇嘉(1979-),女,博士,副教授。研究方向:进化计算。|赵博(1998-),男,硕士研究生。研究方向:知识表示。
  • 基金资助:
    科技创新2030-“新一代人工智能”重大项目(2020AAA0109300)

Named Entity Recognition of Automobile Production Equipment Fault Domain Based on BERT

NI Ji,WANG Yujia,ZHAO Bo   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China
  • Received:2022-04-06 Online:2023-11-15 Published:2023-11-20
  • Supported by:
    Scientific and Technological Innovation 2030-Major Project of New Generation Artificial Intelligence(2020AAA0109300)

摘要:

在汽车生产设备故障领域,中文命名实体识别时实体类别复杂,且传统词向量无法解决一词多义等问题。针对上述问题,文中提出一种基于BERT(Bidirectional Encoder Representations From Transformer)的汽车生产设备故障领域命名实体识别模型。首先,通过BERT预训练模型提取语义信息和句法特征,生成动态词向量。然后,将词向量输入到双向长短期记忆进行双向编码,获得长序列语义特征。最后,通过条件随机场进行序列解码,学习标签之间的依赖关系,得到最优的标签序列。在自建真实汽车生产设备故障领域数据集上进行实验,得到新方法的准确率、召回率和F1值分别为87.9%、89.6%和88.7%。

关键词: 设备故障, 自然语言处理, 序列标注, 命名实体识别, 预训练模型, LSTM, 条件随机场, 深度学习

Abstract:

In the field of automobile production equipment fault, the entity category of Chinese named entity is complicated, and the traditional word vector can not solve the polysemy of one word. In view of these problems, this study proposes a named entity recognition model in the field of automobile production equipment fault based on BERT(Bidirectional Encoder Representations From Transformer). First, the semantic information and syntactic features are extracted by BERT pretraining model to generate dynamic word vectors. Then, the word vector is input into bidirectional long-short term memory for bidirectional encoding to obtain the semantic features of long sequences. Finally, the conditional random field is used for sequence decoding to learn the dependency relationship between labels and obtain the optimal label sequence. Experiments are carried out on the self-built real automobile production equipment fault data set, and the accuracy, recall rate and F1 value are 87.9 %, 89.6 % and 88.7 %, respectively.

Key words: equipment fault, natural language processing, sequence labeling, named entity recognition, pre-training model, LSTM, conditional random fields, deep learning

中图分类号: 

  • TP391