In this group project I led, we tried to predict K-Pop song’s popularity through machine learning and natural language processing.
We were first inspired by the BTS 147 Songs Audio Features (Spotify) dataset from Kaggle, but to conduct our analysis, we referred to the following two datasets: the Kpop Artists and Full Spotify Discography and BTS Lyrics dataset (more details in later sections).
Before proceeding, we extracted the “Popularity” scores Spotify generated for each song we observed through the Spotify API and appended it to the datasets mentioned above. This way, we had a fixed metric we could use to determine the accuracy of our results produced by our models. In addition, we extracted the corresponding audio features and popularity scores based on the songs IDs listed in the “BTS Lyrics” data set.
To read more, please feel free to visit our Medium article and check out our GitHub repository for more detail.
