Relative N-Gram Signatures (2012)

Magdalena Jankowska
Dr. Evangelos Milios
Dr. Vlado Kešelj

Relative N-Gram Signatures is a visual text analytic application that visualizes similarities and differences between text documents on the level of character strings. It is based on the Common N-Gram (CNG) classifier: a text classification algorithm relaying on the frequencies of the most common character n-grams (strings of characters of a given length) of the considered documents, proposed by Keselj et al.

The system enables users to gain insight into characteristics of a given document (with a relation to a set of documents) as well as into the inner workings of the classifiers. It also allows to influence the classification based on the knowledge gained through the visual inspection of the algorithm.

publications

Magdalena Jankowska, Vlado Keselj, Evangelos Milios, Relative N-Gram Signatures: Document Visualization at the Level of Character N-Grams. Proceeding of the IEEE Conference on Visual Analytics Science and Technology, VAST’12, October 2012. [Paper]

demo

A movie prepared for the IEEE Conference on Visual Analytics Science and Technology, VAST’12, October 2012.

Relative N-gram Signatures online, with pre-loaded data.

presentation

Overview presentation (June 2013)