TY - JOUR KW - Fusing model KW - Image of the intangible cultural heritage in the mekong delta KW - Images classification KW - Intangible cultural heritages KW - Logistic regression KW - Logistics regressions KW - Network models KW - Transfer learning KW - Viet Nam KW - Visual feature AU - Minh-Tan Tran AU - The-Phi Pham AU - Nguyen Thai-Nghe AU - Thanh-Nghi Do AB - Our study aims to classify images of Intangible Cultural Heritage (ICH) in the Mekong Delta, Vietnam. To achieve this purpose, we have built a dataset consisting of images from 17 different ICH categories and manually annotated them. Initially, we fine-tuned recent pre-trained network models, including VGG16, DenseNet, and Vision Transformer (ViT), for classifying our dataset. After that, we trained Logistic Regression (LR) models, called fusing models, which fuse not only various visual features extracted from deep networks but also the output of deep networks to improve the classification accuracy. Our comparative study of the classification performance on the 17-category ICH image dataset shown that our fusing models improve the classification correctness compared to any single fine-tuned one. The first fusing model (LR with visual feature extracted from VGG16, DenseNet, ViT) achieves an accuracy of 66.76\%. The second fusing model (LR on top VGG16, DenseNet, ViT) gives an accuracy of 66.49\%. DO - 10.1007/978-981-97-9616-8_16 N1 - Type: Conference paper N2 - Our study aims to classify images of Intangible Cultural Heritage (ICH) in the Mekong Delta, Vietnam. To achieve this purpose, we have built a dataset consisting of images from 17 different ICH categories and manually annotated them. Initially, we fine-tuned recent pre-trained network models, including VGG16, DenseNet, and Vision Transformer (ViT), for classifying our dataset. After that, we trained Logistic Regression (LR) models, called fusing models, which fuse not only various visual features extracted from deep networks but also the output of deep networks to improve the classification accuracy. Our comparative study of the classification performance on the 17-category ICH image dataset shown that our fusing models improve the classification correctness compared to any single fine-tuned one. The first fusing model (LR with visual feature extracted from VGG16, DenseNet, ViT) achieves an accuracy of 66.76\%. The second fusing model (LR on top VGG16, DenseNet, ViT) gives an accuracy of 66.49\%. SP - 202 EP - 212 TI - Fusing Models for Classifying Intangible Cultural Heritage Images in the Mekong Delta UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209953679&doi=10.1007%2f978-981-97-9616-8_16&partnerID=40&md5=a19ebef6a298f352608ac0b6303ceacd VL - 2191 CCIS ER -