A simple guide for cleaning text
text preprocessing
tokenization
lemmatization
regex
nltk
Published

June 15, 2021

Cleaning text for NLP tasks

Some background

I started working in the field of Natual Language Processing back in August 2020. I am no expert in this field but in the past few months that I have spent my time cleaning textual data from different sources, I did manage to learn a few things and I am here to share them. These tips/suggestions are coming from someone who has had no prior experience in NLP at all. I hope whoever is reading this gets to learn something out of it. With that being said, let’s get started!

Reading txt files

There are a few simple parameters which people don’t often use while read txt files