TY - JOUR KW - Intangible cultural heritage KW - Natural language processing KW - Content-based search KW - Information extraction AU - Ivana Tanasijevic AU - Gordana Pavlovic-Lazetic AB - Purpose: The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviews. Assigned annotations provide a way to search the collection. Design/methodology/approach: Annotation is based on automatic extraction of metadata and is conducted by named entity and topic extraction from textual descriptions with a rule-based approach supported by vocabulary resources, a compiled domain-specific classification scheme and domain-oriented corpus analysis. Findings: The proposed methodology for automatic annotation of a collection of intangible cultural heritage, applied on the cultural heritage of the Balkans, has very good results according to F measure, which is 0.87 for the named entity and 0.90 for topic annotation. The overall methodology enables encapsulating domain-specific and language-specific knowledge into collections of finite state transducers and allows further improvements. Originality/value: Although cultural heritage has a significant role in the development of identity of a group or an individual, it is one of those specific domains that have not yet been fully explored in case of many languages. A methodology is proposed that can be used for incorporating natural language processing techniques into digital libraries of cultural heritage. BT - Electronic Library DA - dec DO - 10.1108/EL-03-2020-0052 LA - English M1 - 5-6 N1 - Publisher: Emerald Group Holdings Ltd. N2 - Purpose: The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviews. Assigned annotations provide a way to search the collection. Design/methodology/approach: Annotation is based on automatic extraction of metadata and is conducted by named entity and topic extraction from textual descriptions with a rule-based approach supported by vocabulary resources, a compiled domain-specific classification scheme and domain-oriented corpus analysis. Findings: The proposed methodology for automatic annotation of a collection of intangible cultural heritage, applied on the cultural heritage of the Balkans, has very good results according to F measure, which is 0.87 for the named entity and 0.90 for topic annotation. The overall methodology enables encapsulating domain-specific and language-specific knowledge into collections of finite state transducers and allows further improvements. Originality/value: Although cultural heritage has a significant role in the development of identity of a group or an individual, it is one of those specific domains that have not yet been fully explored in case of many languages. A methodology is proposed that can be used for incorporating natural language processing techniques into digital libraries of cultural heritage. PY - 2020 SP - 905 EP - 918 T2 - Electronic Library TI - HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85094122200&doi=10.1108%2fEL-03-2020-0052&partnerID=40&md5=b0788ab06279bc541a8e8ddc77d86488 VL - 38 SN - 02640473 (ISSN) ER -