Data Science Texts

Discover what you don't know, and attack your weaknesses!

Basic Natural Language Processing

Strongly Recommended Prerequisites

Recommended Prerequisites

Natural language processing (NLP) is a broad field; it includes everything from linguistics to machine translation. It's also an expression used in data science to hype a product's capabilities. Often, natural language processing just means that regular expressions are used. You don't have to understand a lot of NLP theory to start using NLP techniques.

Recommended Books

  1. Natural Language Processing with Python

    Steven Bird, Ewan Klein, and Edward Loper

    (Image takes you to Amazon.)

    Key Features

    Key Topics

    • Accessing Text Corpora
    • Analyzing Sentence Structure
    • Analyzing the Meaning of Sentences
    • Categorizing and Tagging Words
    • Chunking
    • Classifying Text
    • Extracting Information from Text
    • Feature-Based Grammars
    • First-Order Logic
    • Lexical Resources
    • Managing Linguistic Data
    • Natural Language Toolkit (NLTK)
    • Normalizing Text
    • Processing Raw Text
    • Propositional Logic
    • Regular Expressions
    • Segmentation
    • The Semantics of English Sentences
    • Tokenizing Text

    Description

    This is an effective introduction to analyzing text data. It's based on the Natural Language Toolkit (NLTK) Python library. You don't really have to be a Python programmer to use this book though, as introductions to basic Python concepts are dispersed throughout the book. By the end of the text, you'll be familiar with the basic techniques of natural language processing for classifying text and extracting the meaning of sentences, plus you'll be proficient with the NLTK if you do the exercises. You can impress your friends by downloading your social media conversations and analyzing them!

  2. Speech and Language Processing

    Daniel Jurafsky and James H. Martin

    (Image takes you to Amazon.)

    Key Features

    Key Topics

    • Acoustic Phonetics
    • Articulatory Phonetics
    • Automata
    • Computational Discourse
    • Computational Phonology
    • Computational Semantics
    • Conversational Agents
    • English Morphology
    • First-Order Logic
    • Formal Grammars of English
    • Grammars
    • Hidden Markov Models
    • Information Extraction
    • Language and Complexity
    • Lexical Semantics
    • Machine Translation
    • Maximum Entropy Models
    • Morphological Parsing
    • N-Grams
    • Part-of-Speech Tagging
    • Phonetics
    • Pronunciation Variation
    • Prosodic Analysis
    • Question Answering
    • Regular Expressions
    • Representation of Meaning
    • Speech Recognition
    • Speech Synthesis
    • Statistical Parsing
    • Syntactic Parsing
    • Syntax
    • Text Normalization
    • Word and Sentence Tokenization
    • Words and Transducers

    Description

    This is a good introduction to general natural language processing. It covers a wide range of topics in NLP, but it has some deficiencies. It's a bit theory-heavy, and the algorithms are given in pseudocode. The field of NLP has advanced a lot since the second edition of this book came out, so parts of this book are well behind the state of the art. Fortunately, a new edition of this book is under development and you can view a draft of the new material at the link above.