TY - CPAPER KW - Archiving datum KW - cultural heritage KW - Cultural heritage preservation KW - Cultural heritages KW - Daily lives KW - Data cleaning KW - Data collecting KW - Digital storage KW - Heritage preservation KW - Historic preservation KW - Query processing KW - Social media KW - Social media networks KW - Social networking (online) KW - Web Scraping KW - Web scrapings AU - Shaimaa Rashid AU - Rawaa Qasha AB - During the last decades, various aspects of Nineveh s cultural heritage have been destroyed during wars or natural causes. Therefore, the needs to preserve these valuable heritages become crucial. With the increased use of the Internet, Social media networks have become part of peoples daily lives for publicly sharing information, including their feelings, opinion expression, knowledge, and sharing images, videos, audio, and even their locations. This paper aims to gather Nineveh s cultural heritage data from different social media sites. We prepare it to be used for supporting the preservation of the cultural heritage process, including both tangible and intangible heritage. With social media data, python programming language, and web scraping, various data types can be fetched from different heterogeneous sources such as Twitter, YouTube, etc., depending on several keywords and hashtags. Once the data is collected, several pre-processing operations are implemented to clean, organize and archive the resulted data in the NoSQL database and Amazon Simple Storage Service (Amazon S3). The archived cleaned information can be used later to query, browse, analyze and visualize the target information. C2 - Proc. Int. Conf. Comput. Sci. Softw. Eng., CSASE DO - 10.1109/CSASE51777.2022.9759782 N1 - Journal Abbreviation: Proc. Int. Conf. Comput. Sci. Softw. Eng., CSASE N2 - During the last decades, various aspects of Nineveh s cultural heritage have been destroyed during wars or natural causes. Therefore, the needs to preserve these valuable heritages become crucial. With the increased use of the Internet, Social media networks have become part of peoples daily lives for publicly sharing information, including their feelings, opinion expression, knowledge, and sharing images, videos, audio, and even their locations. This paper aims to gather Nineveh s cultural heritage data from different social media sites. We prepare it to be used for supporting the preservation of the cultural heritage process, including both tangible and intangible heritage. With social media data, python programming language, and web scraping, various data types can be fetched from different heterogeneous sources such as Twitter, YouTube, etc., depending on several keywords and hashtags. Once the data is collected, several pre-processing operations are implemented to clean, organize and archive the resulted data in the NoSQL database and Amazon Simple Storage Service (Amazon S3). The archived cleaned information can be used later to query, browse, analyze and visualize the target information. PB - Institute of Electrical and Electronics Engineers Inc. SN - 9781665426329 (ISBN) SP - 295 EP - 300 TI - Extracting and Archiving Data from Social Media to Support Cultural Heritage Preservation in Nineveh UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85129952423&doi=10.1109%2fCSASE51777.2022.9759782&partnerID=40&md5=878387f6d3a7d39f809819c9997e14ea ER -