From text to categories and beyond: You don't need "deep learning" to do it
Speaker: Daniel Hromada, UdK Berlin
In this lecture, we are going to present a group of natural language processing (NLP) methods for multi-class classification of texts which belong to "Random Projection" family of algorithms. More concretely, we are going to show how word2vec-like vector spaces with very interesting properties can easily be constructed, out of the initial dataset, by means of following of a so-called "Reflective Random Indexing" recipe. Subsequently, it will be demonstrated how search for category prototypes partitioning such vector spaces can be optimized by means of genetic algorithms, yielding classifiers whose performance is comparable to more renowned methods. Finally, we shall show how such a hybrid approach combining stochastically constructed vector spaces and genetic algorithms could - when embelished with a formalization of a class as a N-dimensional sphere endowed with a certain centroid and radius - provide cognitively plausible way of dealing with the classical NLP problem known as "grammar induction".