01197nas a2200181 4500000000100000000000100001008004100002100001400043700001400057700001100071700001800082700001200100700001700112245011200129490000700241520075300248022001401001 2024 d1 aYan Zheng1 aFuqing Li1 aCui Li1 aZheyuan Zhang1 aRui Cao1 aNoman Sohail00aA Natural Language Processing Model for Automated Organization and Analysis of Intangible Cultural Heritage0 v363 aThis paper investigates text similarity methods in the field of NLP, improves upon the WMD, and develops the SWC-WMD distance, forming the basis for a clustering method for long ICH texts. Clustering experiments on the constructed ICH long text dataset using WMD, SWC-WMD, and TFIDF-WMD distances were conducted. The impact of the number of feature words on clustering results and the effect of different distances on clustering outcomes were assessed based on accuracy and F1 values from the evaluation criteria. The final results show that the SWC-WMD distance improves the accuracy and F1 values of the ICH long text clustering results compared to the other two distances, thereby proving the effectiveness of the methods proposed in this paper. a1546-2234