A. Description
In this competition, the glyph spotting challenge on palm leaf manuscript images is proposed.
In typography, a glyph is an elemental symbol within an agreed set of symbols, intended to represent a readable character for the purposes of writing. As such, glyphs are considered to be unique marks that collectively add up to the spelling of a word, or otherwise contribute to a specific meaning of what is written, with that meaning dependent on cultural and social usage (1).
A glyph (pronounced GLIHF; from a Greek word meaning carving) is a graphic symbol that provides the appearance or form for a character. A glyph can be an alphabetic or numeric font or some other symbol that pictures an encoded character (2).
Glyph spotting system is one of the most demanding system which has to be developed for the collection of palm leaf manuscript images. Glyph spotting system will allow user to find a glyph patch image in all pages from the collection of the palm leaf manuscript images. This system will facilitate user to find a specific glyph in all collection of palm leaf manuscript images with a single glyph patch as a query image (Fig. 5).
Figure 5. Glyph spotting scheme
The characteristics of palm leaf manuscripts provide a suitable challenge for testing and evaluation of robustness for some image features and descriptors which were already proposed for glyph spotting task.
B. Goal / Objective
The main objective of this challenge is to be able to efficiently and accurately spot the isolated glyph characters of Balinese and Khmer scripts on the collection of palm leaf manuscripts. A data set will be provided for feature extraction and data training. For this challenge, a set of glyph image patches containing individual Balinese glyph character from the original manuscript will be used as input, and a list of spotting area of each glyph character on manuscript pages should be identified as a result.
C. Tracks
Each participant’s method in this competition will be evaluated on three different evaluation tracks based on the dataset which will be used on the training subset and the test subset.
- Track Bali: training subset and test subset will use only Balinese manuscript dataset.
- Track Khmer: training subset and test subset will use only Khmer manuscript dataset.
- Track Mixed: training subset and test subset will use both dataset which is a mixed of Balinese and Khmer manuscript dataset
• In Balinese scripts, there is no space between words in a text line. Some characters are written on upper baseline or under the baseline of text line. Those documents with different scripts and languages surely provide some real challenges for glyph spotting methods, not only because the different forms of characters from the script, but also the writing style for each script differs in how to write and to join or separate a character in a text line. Balinese script is considered to be one of the complex scripts from Southeast Asia. The alphabet of Balinese is composed of ± 133 glyph character classes including consonants, vowels, diacritics, and some other special glyphs.
• In Khmer scripts, the alphabet also consists of numerous types of characters some of which can be combined to form new shapes. This results in more than 100 different glyphs. Even though the Khmer writing style is not cursive, parts of some characters are very elongated. They sometimes touch their adjacent characters on the same or neighbouring lines. This phenomenon causes a challenging problem in identifying individual characters or glyphs.
Figure 6. Examples of character glyph from Balinese Script in palm leaf manuscript images
Figure 7. Examples of character glyph from Khmer Script in palm leaf manuscript images
(1) - https://en.wikipedia.org/wiki/Glyph
(2) http://whatis.techtarget.com/definition/glyph