Challenge 1. Binarization of Palm Leaf Manuscript Images

1.1 Description and goals

The binarization problem is still challenging and is still an open question especially for palm leaf manuscript images. Visually, several alternative well-known binarization algorithms do not provide a good binarized image for palm leaf manuscript images. They extract unrecognizable characters on palm leaf manuscripts with noise.. Therefore, to separate text (and, if any, graphic) from the background of palm leaf manuscript images, a specific and adapted binarization technique is required.

1.2 Construction of Ground Truth Binarized Image

To create the ground truth binarized image of palm leaf manuscript, we adopted a semi-automatics framework for construction of ground truth binarized image [1]. For our manuscript collection, the skeletonization process is completely performed by human. The skeletons of the Balinese character (and, if any, graphic) in palm leaf manuscript image were traced and drawn manually with PixLabeler tool [2]. The skeletons are then automatically processed to create the estimated ground truth binarized image of the manuscript. Human subjectivity has a great effect in producing a variability of ground truth binarized image. This phenomena becomes much more visible when we are working on ancient type manuscript which is still hard to be ground truthed even by human. Therefore, for our dataset, one manuscript was ground truthed by two different ground truthers. And for the binarization challenge in this competition, we will evaluate each binarized image with two different ground truth images.

1.3 Datasets

For this challenge, the dataset is partitioned into training and test subsets.
For the training subset, we provide :

1.    50 original images
2.    50 ground truth binarized images Version 1 (from the 1st ground truther)
3.    50 ground truth binarized images Version 2 (from the 2nd ground truther)

For the testing subset, we provide :

1.    50 original images (different from the training subset)

full size image : Challenge1.png

1.4 Protocols

Participants submit the results of binarization for all images in testing subset. For example: if the file name of original image is ABCD01.jpg, then the binarized image should be named: ABCD01_binarized.png (or it can be in any other lossless image format). Participants also submit a small, simple and complete (if use any library) executable package of their method implementation, with a clear user manual to run the binarization process for a given example of manuscript image.

1.5 Evaluation

Following our work in [1], three metrics of binarization evaluation which are used in the DIBCO 2009 contest [3], are used in the evaluation for this challenge. Those three metrics are F-Measure (FM), Peak SNR (PSNR), and Negative Rate Metric (NRM). For each binarization metric evaluated on each ground truth binarized images, we rank the participant’s score. We then assign point, from 1 (best result) to N (worst result, N is the number of participants). All points from 3 binarization metrics evaluated on 2 different ground truth binarized images of each participant will be accumulated to decide the winner.

References

[1]    M.W.A. Kesiman, S. Prum, I.M.G. Sunarya, J.-C. Burie, J.-M. Ogier, An Analysis of Ground Truth Binarized Image Variability of Palm Leaf Manuscripts, in: 5th Int. Conf. Image Process. Theory Tools Appl. IPTA 2015, Orleans, France, 2015: pp. 229–233.
[2]    E. Saund, J. Lin, P. Sarkar, PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images, in: IEEE, 2009: pp. 646–650. doi:10.1109/ICDAR.2009.250.
[3]    B. Gatos, K. Ntirogiannis, I. Pratikakis, DIBCO 2009: document image binarization contest, Int. J. Doc. Anal. Recognit. IJDAR. 14 (2011) 35–44. doi:10.1007/s10032-010-0115-7.