Autor
Palabras clave
Resumen

Thangka is one of the precious intangible cultural heritages, which is closely related to Tibetan Buddhism. However, Tibetan Buddhism has a complex system, and the naming patterns of various deities are not fixed and difficult to identify from Chinese texts. In this paper, we propose a multi-neural network fusion named entity recognition model BERT-BiLSTM-CRF-a which is based on the BERT pre-training language model, Bidirectional Long-and-Short Term Memory (BiLSTM) and Conditional Random Field (CRF). Specifically, the model uses the BERT to enhance the dynamic representation ability. Then, a weighting method from attention mechanism is introduced to weight the forward and backward BiLSTM hidden layer vectors before concatenating to further improve the effective utilization of context features. Finally, CRF model is used to output the global optimal annotation results. Experimental results on the test sets show that the recall of the BERT-BiLSTM-CRF-a model is 87.4\%, 8.2\% higher than the traditional named entity recognition model BiLSTM-CRF, and the F1 value is also 4.8\% higher. Therefore, the model we proposed can be effectively used in the task of named entity recognition in thangka field.

Volumen
83
Número
1
Número de páginas
161-174
Publisher: Politechnica University of Bucharest
Numero ISSN
22863540 (ISSN)
URL
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85102971591&partnerID=40&md5=b35d8dbab6b317c09c00b21942ea8c6b
Descargar cita