The Science Behind ScoreAHit

At ScoreAHit, we apply the above quote to the UK popular music charts. How and why do we do this? Below is a more technical description of how we use mathematics and Artificial Intelligence techniques to infer the hit potential equation.

Task Description
We are interested in the Music Information Retrieval task that aims at predicting whether a given song will be a commercial success prior to its distribution, based on its audio. The underlying assumption is that popular songs are similar with respect to a set of features that make them appealing to a majority of people. These features could then be exploited by Machine Learning algorithms in order to predict whether a song will rise to a high peak position in the chart. Machine Learning is a branch of Artificial Intelligence concerned with learning to perform a task based on examples -- in this case learning to predict hit potential based on past hits and non-hits.

An Investigation of UK Top 40 Singles Chart
The dataset we investigated is the UK top 40 singles chart from the past 50 years. The peak UK chart position (popularity) of the songs were collected from the Official Charts Company, while the musical features were mainly extracted from the EchoNest API.

Mathematical Definition
To quantify the hit potential of a song, we make use of the regression technique. Mathematically, the hit potential (peak UK chart position) of a song is denoted by a variable y and a set of audio features x of the song are also presented. A pre-trained classifier f(x)=w'x is then used to estimate the hit potential.

Learning Machine
So far, so easy. Now is the critical point - which learning machine is most appropriate for hit potential estimation? Since the hit potential likely depends on the era, it makes sense to express the learning machine as a function of time. We thus organized the dataset chronologically and employed the Time-Shifting Ridge Regression (TSRR) as the learning agent. The algorithm of TSRR is outlined in the following table

Intuitively, the taste of music listeners evolves through time and songs in an era should be more helpful in exploiting coeval trends than past music. This is modelled by means of the memory parameter in TSRR, allowing the learning machine to "forget" past trends and adapt to new ones.

Finished!
You now have a full understanding of how we estimate the hit potential of a song prior to its release. We hope you enjoy browsing the rest of ScoreAHit!