Challenge 3

Challenge 3. Isolated Character Recognition of Balinese Script in Palm Leaf Manuscript Images

3.1 Description and goals

Isolated handwritten character recognition has been the subject of intensive research during the last decades. Balinese script is considered to be one of the complex scripts from Southeast Asia. The alphabet and numeral of Balinese script is composed of ±100 character classes including consonants, vowels, diacritics, and some other special compound characters. In the scope of this competition, we will focus on scripts written on palm leaf whose characters are even more elaborate in ancient written form. The main objective of this challenge is to be able to efficiently and accurately recognize the isolated characters of Balinese scripts. A dataset will be provided for feature extraction and data training. A set of image patches containing individual Balinese character or compound character from the original manuscript will be used as input, and a correct class of each character should be identified as a result.

3.2 Construction of Ground Truth Character-level annotated patch image

By using our collection of word-level annotated patch images, we applied binarization process to extract automatically all connected component found on the word patch images. The balinese philologists annotated manually the segment of connected component that represent a correct character in Balinese script. All the data that has been segmented and annotated at character level will serve as our isolated character dataset.

3.3 Datasets

For this challenge, the dataset is partitioned into training and test subsets.
For the training subset, we provide :

1. ± 10,000 character-level annotated patch images (from ±100 character classes)

For the testing subset, we provide :

1. ± 7,000 character-level annotated patch images as query test (from ±100 character classes)

Note: Number of samples for each classes is different. Some classes are oftenly used in our collection of palm leaf manuscripts, but some others are rarely used.

full size image : Challenge3.png

3.4 Protocols

Participants submit the results of isolated character recognition for all images in testing subset in one file text following this format:

filename image test;character class recognized

Example:

test1.jpg;A;
test2.jpg;Na;
test3.jpg;Ca;
......

3.5 Evaluation

The results are ranked according to the recognition rate, i.e., the percentage of correctly classified samples over the test samples : C/N, where C is the number of correctly recognized samples, and N is the total number of test samples.