Classifying Musical Genre from Lyrics — Georgetown Data Science Capstone
A Georgetown Data Science capstone that used machine learning on song lyrics to predict a song's genre — the foundational attribute for matching a song to an audience. Lyrics break standard NLP: choruses wreck n-gram analysis, and words are often chosen for sound over meaning. The project built domain-specific stop-word lists and sentiment variants tuned for lyrics, then iterated through data munging, feature engineering, and feature selection. Hand-coded in 2021, before LLMs made this kind of work a one-liner.