Building a Multilingual Model for Relationships between Concepts using Wikipedia

Arash Koushkestani

Overview

In this work, we tested the null hypothesis of using multilingual classifiers to increase the overall classification accuracy and per class precision. We wanted to classify text snippets into 8 classes indicating what kind of relationships they are talking about. Relationships between concepts in Wikipedia were extracted from WikiData, then multilingual equivalent for each concept was identified using multilingual DBPedia dataset. Having list of anchor texts of all concepts in different languages, snippets talking about pairs of concepts were extracted. So a relationship between two concepts could be expressed by many text snippets in more than one language. A classifier for each individual language was trained and then a majority voting system was applied to aggregate the results. The results show improvements for some classes while results for a few classes are not satisfactory.

continue reading…