Given a book, we address the problem of how to predict the time frame (more specifically, a year) the book was written or published using the Google Books Ngram corpus. This prediction could be useful for authorship and plagiarism detection, identification of literary movements, and forensic document examination. We propose an unsupervised approach and compare this with four baseline measures on a dataset consisting of 36 books written between 1551 and 1969. The proposed approach could be applicable to other languages as long as corpora of those languages similar to the Google Books Ngram are available.
A. Islam, J. Mei, E. Milios, V. Kešelj, “When was Macbeth Written? Mapping Book to Time”, in Proceedings of the 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2015), LNCS 9041, Springer, pp. 73-84, Cairo, Egypt, April, 2015. [available at Springer Link]