Back to Project Library
Completed
Sanitized

TF-IDF + logistic regression

Yelp Review Rating / Sentiment Modeling

Built a text-mining workflow to classify Yelp review sentiment and surface operational signals from customer language.

PythonTF-IDFLogistic RegressionVADERFLAIR

NLP Extractor Pipeline

1
Raw Corpus
Yelp JSON
2
TF-IDF Array
Vectorization
3
VADER / FLAIR
Polarity Signals
4
Logit Model
Binary Classifier

Review Rating Classification

TF-IDF + logistic regression | Python • TF-IDF • Logistic Regression • VADER / FLAIR

The Problem

Massive volumes of raw customer review text were completely invisible to standard KPI reporting, leaving operations teams blind to repeating service bottlenecks.

The Methodology

Built an NLP pipeline combining Python text-preprocessing, advanced TF-IDF vectorization, pre-trained VADER/FLAIR sentiment analyzers, and a tuned logistic regression classifier.

The Impact & Outcome

A robust sentiment engine that correctly isolates positive hospitality traits from recurring negative feedback clusters with high efficiency.

Key Metric: High-accuracy text classification transforming angry and positive paragraphs directly into isolated, actionable operational feedback signals.

Classification Pipeline Walkthrough

Below is the recorded presentation outlining the classification model accuracy, tokenization workflow, and text-mining pipeline application out in the wild.

Download Deck (.pptx) Download Research Report (.docx)