Tony Wright

Predicting a song’s genre from its lyrics — a Georgetown Data Science capstone, hand-built before LLMs.

Classifying Musical Genre from Lyrics — Georgetown Data Science Capstone

June 2021 Author / data scientist — built the full pipeline by hand in Python and SQL

A Georgetown Data Science capstone that used machine learning on song lyrics to predict a song's genre — the foundational attribute for matching a song to an audience. Lyrics break standard NLP: choruses wreck n-gram analysis, and words are often chosen for sound over meaning. The project built domain-specific stop-word lists and sentiment variants tuned for lyrics, then iterated through data munging, feature engineering, and feature selection. Hand-coded in 2021, before LLMs made this kind of work a one-liner.

Open full PDF Download

← back to Principal Analyst