Text Processing#
In these notebooks, we explore the traditional techniques for processing and analyzing text. This includes using Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction, creating a search engine and topic modelling. Just like the introductory notebooks, these skills will be pivotal in future lessons and notebooks.
Count Vectorization and TF-IDF
Explore feature extraction with two popular algorithms.
Creating a Search Engine for your own data using Whoosh
Use the Whoosh package to create custom search engine for your own data.
Traditional Topic Modeling in SKLearn
Apply the same principles of TF-IDF to the problem of topic modeling, reorganizing a large corpora into subtopics found in that corpora.