01714nas a2200133 4500000000100000008004100001100001000042700001100052700001200063245010400075490000800179520137900187022001401566 d1 aT Fan1 aH Wang1 aSH Deng00aIntangible cultural heritage image classification with multimodal attention and hierarchical fusion0 v2313 aDesigning an efficient Intangible Cultural Heritage (ICH) image classification model is beneficial for the public to recognize the ICH and fostering the preservation and spread of ICH. Currently, related ICH image classification researches mainly focus on the visual features of ICH images, ignoring attached textual descriptions. However, attached textual descriptions can provide crucial clues for ICH images classification. Therefore, in this study, we propose to combine attached textual descriptions to perform ICH image classification in a multimodal way. Additionally, to capture intra- and inter-interactions between ICH images and attached textual descriptions, we propose a novel model named MICMLF, mainly consisted of multimodal attention and hierarchical fusion. Multimodal attention is employed to make the model focus on "important regions" and "important words" in ICH image and attached textual descriptions respectively. Hierarchical fusion is utilized to capture inter-modal dynamics interactions. Extensive experiments are conducted on datasets of two Chinese nation-level ICH projects, New Year Print ( yen (sic)) and Clay Figurine (Zg). Experimental results demonstrate the superiority of MICMLF, compared with several state-of-the-art methods. Also, the proposed model can handle the situation where ICH images and textual descriptions are incomplete. a0957-4174