TY - JOUR
KW - Fusing model
KW - Image of the intangible cultural heritage in the mekong delta
KW - Images classification
KW - Intangible cultural heritages
KW - Logistic regression
KW - Logistics regressions
KW - Network models
KW - Transfer learning
KW - Viet Nam
KW - Visual feature
AU - Minh-Tan Tran
AU - The-Phi Pham
AU - Nguyen Thai-Nghe
AU - Thanh-Nghi Do
AB - Our study aims to classify images of Intangible Cultural Heritage (ICH) in the Mekong Delta, Vietnam. To achieve this purpose, we have built a dataset consisting of images from 17 different ICH categories and manually annotated them. Initially, we fine-tuned recent pre-trained network models, including VGG16, DenseNet, and Vision Transformer (ViT), for classifying our dataset. After that, we trained Logistic Regression (LR) models, called fusing models, which fuse not only various visual features extracted from deep networks but also the output of deep networks to improve the classification accuracy. Our comparative study of the classification performance on the 17-category ICH image dataset shown that our fusing models improve the classification correctness compared to any single fine-tuned one. The first fusing model (LR with visual feature extracted from VGG16, DenseNet, ViT) achieves an accuracy of 66.76\%. The second fusing model (LR on top VGG16, DenseNet, ViT) gives an accuracy of 66.49\%.
DO - 10.1007/978-981-97-9616-8_16
N1 - Type: Conference paper
N2 - Our study aims to classify images of Intangible Cultural Heritage (ICH) in the Mekong Delta, Vietnam. To achieve this purpose, we have built a dataset consisting of images from 17 different ICH categories and manually annotated them. Initially, we fine-tuned recent pre-trained network models, including VGG16, DenseNet, and Vision Transformer (ViT), for classifying our dataset. After that, we trained Logistic Regression (LR) models, called fusing models, which fuse not only various visual features extracted from deep networks but also the output of deep networks to improve the classification accuracy. Our comparative study of the classification performance on the 17-category ICH image dataset shown that our fusing models improve the classification correctness compared to any single fine-tuned one. The first fusing model (LR with visual feature extracted from VGG16, DenseNet, ViT) achieves an accuracy of 66.76\%. The second fusing model (LR on top VGG16, DenseNet, ViT) gives an accuracy of 66.49\%.
SP - 202
EP - 212
TI - Fusing Models for Classifying Intangible Cultural Heritage Images in the Mekong Delta
UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209953679&doi=10.1007%2f978-981-97-9616-8_16&partnerID=40&md5=a19ebef6a298f352608ac0b6303ceacd
VL - 2191 CCIS
ER -