Authorship Verification (2013-2014)

Magdalena Jankowska
Dr. Evangelos Milios
Dr. Vlado Kešelj

The task of authorship verification is the task of detecting whether a given document was written by a certain person, given documents (possibly just one) written by this author. It has applications in forensics, security, and literary research. We approach this problem as a one-class classification problem with a proximity method similar to the idea of the k-center method for one-class classification. We utilize the Common N-Gram Classifier dissimilarity (Kešelj, 2003) between documents. We investigate ensembles of classifiers based on character n-grams and word n-grams.

We also worked on profiling authors of online conversations by detecting their gender and age.

publications

Magdalena Jankowska, Evangelos Milios, Vlado Keselj, “Author Verification Using Common N-Gram Profiles of Text Documents”, In Proceeding of the 25th International Conference on Computational Linguistics, COLING 2014: 387 – 397, August 2014 [Paper]

Magdalena Jankowska, Vlado Kešelj, Evangelos Milios, Ensembles of Proximity-Based One-Class Classifiers for Author Verification – Notebook for PAN at CLEF 2014″. In CLEF 2014 Evaluation Labs and Workshop – Working Notes Papers, September 2014. [Paper]

Magdalena Jankowska, Vlado Kešelj, Evangelos Milios, Proximity Based One-class Classification with Common N-Gram Dissimilarity for Authorship Verification Task – Notebook for PAN at CLEF 2013″. In Pamela Forner, Roberto Navigli, and Dan Tufis, editors. “CLEF 2013 Evaluation Labs and Workshop – Working Notes Papers”, September 2013. [Paper]

Magdalena Jankowska, Vlado Keselj, Evangelos Milios, “CNG Text Classification for Authorship Profiling Task — Notebook for PAN at CLEF 2013”. In Pamela Forner, Roberto Navigli, and Dan Tufis, editors. “CLEF 2013 Evaluation Labs and Workshop – Working Notes Papers”, September 2013. [Paper]

presentations

Overview presentation of the project, September 2014 : pdf format pptx format

Poster presentation at 25th International Conference on Computational Linguistics, COLING 2014, Dublin, Ireland, August 2014

Presentation: participation in PAN 2013 Author Indentification competition

 

On statistical significance of classification results evaluated on small test datasets