Introduction to Natural Language Processing & Text Mining

Gain valuable knowledge and hands-on experience using the key concepts, tools and methodologies for natural language processing and text mining with Python.

Enroll in Introduction to Natural Language Processing & Text Mining

About This Online Course

Grow your understanding of artificial intelligence, machine learning and data science fundamentals by taking this online training course from statistics.com.

You will be introduced to the essential techniques of natural language processing—a component of AI—and text mining—an AI technology that uses NLP—with Python. You will learn how to apply unsupervised and supervised modeling techniques to text. Considerable attention will be devoted to data preparation and handling methods required to transform unstructured text into a form in which it can be mined.

This course emphasizes hands-on learning through guided tutorials and real-world examples.

This online training course utilizes Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data, 1st edition (Apress, 2019), Sarkar, D. (available on Amazon). Learners must purchase the book before starting the course.

The book was chosen for its wealth of hands-on Python illustrations and code (the code of the illustrations is organized here).

For a well-written guide to foundational concepts and context, you may wish to consider Fundamentals of Predictive Text Mining, 2010th edition (Springer, 2015), by Weiss, A.M., Indurkhya, N. and Zhang, T. (available on Amazon).

What You Will Learn

  • Process text data and strings, and perform pattern matching with regular expressions in Python
  • Preprocess and wrangle noisy text data via stemming, lemmatization, tokenization, removal of stop words and more
  • Represent text data in structured and easy-to-consume formats for machine learning and text mining
  • Represent text documents using features related to text word frequency, parts of speech and sentiment
  • Represent text documents using vectorized features like bag-of-words, TF-IDF and document similarity
  • Use the concepts of information retrieval and document similarity (e.g., in applications like recommender systems)
  • Perform unsupervised NLP using techniques like key phrase extraction, topic modeling and text summarization
  • Leverage pre-trained models for part-of-speech tagging and named entity recognition
  • Develop supervised models to classify documents

Your Instructors

Dipanjan (DJ) Sarkar has led advanced analytics initiatives working with several Fortune 500 companies, like Applied Materials and Intel, and open-source organizations, like Red Hat (now IBM). He primarily works on leveraging data science, machine learning and deep learning to build large-scale intelligent systems.

In 2020, Mr. Sarkar was recognized as one of the “Top Ten Data Scientists in India,” by a leading technology magazines and publishing houses, and, in 2019, as a Google developer expert in machine learning by Google. He holds his Master of Technology degree from IIIT Bangalore, India, with specializations in Data Science and Software Engineering, and his post graduate diploma in Machine Learning and Artificial Intelligence from Columbia University, New York.

Mr. Sarkar is a published author and has written books on R, Python, machine learning, NLP and deep learning.

Dr. Anurag Bhardwaj is senior manager, data scientist at Apple. He has been a statistics.com instructor since 2014.

Dr. Bhardwaj has been published in numerous prestigious industry journals at premier industry conferences and workshops, and has won numerous professional awards. He received both his Doctorate and Master of Science in Computer Science and Engineering from the State University of New York at Buffalo and his Bachelor of Technology in Computer Engineering from the National Institute of Technology, India

Who Should Take This Course

This course is designed for data scientists and aspiring data scientists who want to analyze text data and build AI and machine learning models that use text data.

Prerequisites

None.

However, it is assumed that you are sufficiently familiar with Python and have the equivalent understanding of AI and machine learning topics covered in the statistics.com course, Predictive Analytics 1 - Machine Learning Tools.

Course Certificate

A record of completion will be issued, along with professional development credits in the form of continuing education units upon 50-percent completion.

In addition, a Credly badge to add to your LinkedIn profile will be issued upon 80-percent completion of this online training course.

Course Format

This self-paced, online training course takes place at The Institute for Statistics Education at statistics.com for four weeks. During each session week, you can participate at times of your own choosing—there are no set times for the lessons. Participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

Course Pricing

$599 (per person)

Register through FedLearn using the special promo code FedLearn22 and receive a five-percent discount on the original online course price.

Continuing Education Unit Credits

This online course provides 5.0 CEUs upon 50-percent completion.

This course is also recommended for 3.0 upper division college credits by the American Council on Education upon 80-percent completion.