Dr. Evangelos Milios
Dr. Vlado Kešelj
The authorship attribution problem is a problem of identifying, among a closed set of candidates, an author of a questioned text document, given samples of writing of the candidates. Traditionally, the problem has been usually studied in the setting when all documents (the questioned document and the samples of writing) are on the same topic. There has been recently an interest in the research community in so called cross-topic authorship attribution, that is in the case when the topic of the samples of writing is different then the topic of the questioned document. We study the case that we called “authorship attribution with topically unbalanced train data”, which is the case when some candidates have writing samples that are more topically similar to the questioned document than the samples of writing of other candidates.