I. Introduction on Southeast Asian Palm Leaf Manuscripts
The choice of natural materials that can be used as a medium for document writing is strongly influenced by the geographical condition and location of a nation. For example, because geographically bamboo and palm trees are easily found in Asia, both types of materials are the first choice as writing material in Asian continent. In Southeast Asia, most of the ancient manuscripts are written on palm leaves. For example in Cambodia, palm leaves have been used as a writing material dating back to the first appearance of Buddhism in the country. In Thailand, dried palm leaves have also been used as one of the most popular written documents for over five hundred years [1]. Palm leaves were also historically used as writing supports in manuscripts from Indonesian archipelago. The existence of ancient palm leaf manuscripts in Southeast Asia is very important both in term of quantity and variety of historical contents.
The technical challenges for palm leaf manuscripts in Document Image Analysis (DIA) system can be seen under two point-of-views. The first challenge is the physical condition of the palm leaf manuscript which will strongly influence the quality of captured document images. Due to the specific characteristics of the physical support of the manuscripts, the development of DIA methods for palm leaf manuscripts in order to extract relevant information is considered as a new research problem in handwritten document analysis. Ancient palm leaf manuscripts contain artefacts due to aging, foxing, yellowing, marks of strain, local shading effects, with low intensity variations or poor contrast, random noises, discoloured part, fading, and other types of degradation. The second challenge is the complexity of the script. The Southeast Asian manuscripts with different scripts and languages surely provide some real challenges for document analysis methods, not only because the different forms of characters from the script, but also the writing style for each script differs in how to write and to join or to separate a character in a text line. It ranges widely from binarization process [2–4], text line segmentation [5,6], character and text recognition tasks [4,7,8] to the word spotting methods [9].
II. Description and Goals of the Competition
This competition is part of an effort to explore DIA research for palm leaf manuscripts collection as the heritage documents from Southeast Asia. This collection offers a new challenge for DIA researchers because it uses palm leaf as writing media and also with a language and script that have never been analyzed before. In this competition, some principal tasks in DIA system, including the binarization, the text line segmentation, the isolated character/glyph recognition, the word recognition and transliteration task for the new challenging document collection of palm leaf manuscripts from Southeast Asia are proposed. The results of this competition will be very useful in benchmarking analysis for the collection of palm leaf manuscripts, accelerating, evaluating and improving the performance of existing DIA system for a new type of document collection.
III. Source of the Collections of Southeast Asian Palm Leaf Manuscripts
Three different corpus of palm leaf manuscripts written in three different scripts and languages are collected from various locations in Indonesia (Balinese and Sundanese) and Cambodia (Khmer).
A. Balinese Palm Leaf Manuscripts
In order to obtain a large variety of manuscript images, the sample images have been collected from 23 different collections (contents), which come from 5 different locations (regions): 2 museums and 3 private families. It consists of randomly selected 10 collections from Museum Gedong Kertya, City of Singaraja, Regency of Buleleng, North Bali, Indonesia, 4 collections from manuscript collections of Museum Bali, City of Denpasar, South Bali, 7 collections from the private family collection from Village of Jagaraga, Regency of Buleleng, and 2 others collections from private family collections from Village of Susut, Regency of Bangli and from Village of Rendang, Regency of Karangasem [10].
Figure 1. Balinese palm leaf manuscripts
B. Khmer Palm Leaf Manuscripts
In Cambodia, the Khmer palm leaf manuscripts (Fig. 2) are still seen in Buddhist establishments and are being used habitually and traditionally by monks as reading scriptures. Various libraries and institutions have been collecting and digitizing these manuscripts and have even shared the digital images to the public. For instance, the École Française d’Extrême-Orient (EFEO) has launched an online database of microfilm images of hundreds of Khmer palm leaf manuscript collections. Some more digitized collections are also obtained from the Buddhist Institute, which is one of the biggest institutes in Cambodia responsible for research on Cambodian literature and language related to Buddhism, and also from the National Library (situated in the capital city Phnom Penh) which is home to a large collection of palm leaf manuscripts. Moreover, a standard digitization campaign was conducted in order to capture and collect palm leaf manuscript images found in Buddhist temples in different locations throughout Cambodia: Phnom Penh, Kandal, and Siem Reap [11].
Figure 2. Khmer palm leaf manuscript
C. Sundanese Palm Leaf Manuscripts
The collection of Sundanese palm leaf manuscripts (Fig. 3) comes from Situs Kabuyutan Ciburuy, Garut, West Java, Indonesia. The Kabuyutan Ciburuy is a cultural complex heritage from Prabu Siliwangi and Prabu Kian Santang, The King and the son of the Padjadjaran kingdom. The cultural complex consists of six buildings. One of them is Bale Padaleuman which is used to store the Sundanese palm leaf manuscripts. The oldest Sundanese palm leaf manuscript in Situs Kabuyutan Ciburuy came from the 15th century. In Bale Padaleuman, there are 27 collections of Sundanese manuscript. Each collection contains 15 to 30 pages, with the dimension of 25-45 cm in length x 10-15 cm in width [12].
Figure 3. Sundanese palm leaf manuscript
In this competition, four DIA tasks are proposed as the challenges :
- Challenge A : Binarization for Southeast Asian Palm Leaf Manuscripts
- Challenge B : Text Line Segmentation for Southeast Asian Palm Leaf Manuscripts
- Challenge C : Isolated Character/Glyph Recognition for Southeast Asian Palm Leaf Manuscripts
- Challenge D : Word Transliteration for Southeast Asian Palm Leaf Manuscripts
All researchers are invited to participate in one or more challenges described in the corresponding sections.