In many countries in South Asia and Southeast Asia, palm leaves have been used as writing materials dating back to the 5th century BCE, and possibly much earlier. Initially knowledge was passed down orally, but after the diffusion of Indian (Pallava) scripts throughout South Asia, people eventually began to write it down in dried and smoke treated palm leaves of Palmyra palm (Lontar). These palm leaves have been used to record actual and mythical narratives but also the knowledge in medicine, history, sciences, literature, and so on. Some of them also contain drawings in black and white or even in colour. With the spreading of Indian culture to South East Asian countries such as Indonesia, Cambodia, Thailand, and the Philippines, these nations became home to collections of documents in palm leaf.
Nowadays, in Southeast Asia, some programs to collect, preserve and digitize documents are under way, but few have been completed especially in Indonesia and Cambodia. The main objective of these efforts is to preserve cultural heritage, because the manuscripts are written on fragile materials: lontar (palm leaves) or paper, bound in accordion-like books. These manuscripts are scanned empirically, sometimes with poor resolution, inadequate light, etc. by different stakeholders in diverse institutions (public or private). Moreover, the storage of the files is not always safe, lacking backup / mirror storage. Beside the objective of preserving the data from the injuries of time (dirt, moisture, insects) and accidents (fire, floods), these programs aim at facilitating the access to the data for researchers. Most of the scanned images are in reality not accessible to the public,and sometimes not accessible to researchers from other institutions, even in the same country
However, remarkable efforts in providing open access to the data, along with a precise bibliometric indexing, have been made in Cambodia (in collaboration with EFEO, Ecole Française d’Extrême-Orient, see http://www.khmermanuscripts.org/). This excellent example of open-handed data and scientific accuracy is, unfortunately, still unique in Southeast Asia for Indian-origin scripts. Nevertheless, to date, only bibliometric indexing can been offered beside the image files: identification of the manuscript origin, title, date, topic, etc., but no indexing of the content is available. Therefore, in-text research by keywords is still impossible. As far as we know some work has been done to enhance the quality of images of palm leaf manuscripts but no research to analyze and to index automatically the content of these ancient documents has been done. People, such as historians or philologist wanting to study these ancient documents have to read them one by one to find the needed information, and most importantly, people have to go, most of the time, physically to the place where the documents are.
Therefore, the main objective of our project is to bring added value to digitized palm leaf manuscripts by developing tools to analyze, index and access quickly and efficiently to the content of ancient documents. Our research team, led by the University of La Rochelle in cooperation with its partner universities in Bandung (Indonesia, Java), Singaraja (Indonesia, Bali) and Phnom Penh (Cambodia), brings together researchers in ICT and in philology (study of ancient literary texts). Therefore, this team will not only be international, but also cross-disciplinary, joining up exact sciences and human sciences.