windowsolz.blogg.se

Ass like misses incredible lyric juvenile
Ass like misses incredible lyric juvenile




ass like misses incredible lyric juvenile ass like misses incredible lyric juvenile

We then took the resulting matrix and fed it into t-SNE with a perplexity parameter set to 40. We took the tf-idf matrix and first reduced it to 50 dimensions using Truncated singular value decomposition (SVD). T-SNE: To create our map of rappers, we used a dimensionality reduction technique called t-SNE. Higher cosine values imply more similarity, with an upper bound of 1 when the vectors are perfectly similar. In our case, that means taking the tf-idf vector for an artist and comparing it to that of another. This rules out words that are repeated over and over by one or a few artists (think “controlla” for Drake).Ĭosine Similarity: Cosine similarity is a common way of calculating the similarity between two vectors by taking the cosine of the angle between them. That means, to be considered in our tf-idf computation, a term had to be used at least once by 10% of the artists in our dataset. 2) We also set a “cut-off” for document frequency of 0.1. You can read more about why you might want to do sublinear scaling here.

ass like misses incredible lyric juvenile

1) We used sublinear scaling on the term frequencies, giving us a little more variation across our lists.

ass like misses incredible lyric juvenile

We made two slight modifications to the traditional formula. The words with the ten highest tf-idf scores for each artist were deemed the words “most unique” to him or her. For a given word, we count the number of times it occurs in one rapper’s catalogue (its term frequency) and divide by the number of artists that use it across the hip-hop corpus (its document frequency). Each rapper gets assigned a tf-idf score for every word in the hip-hop corpus. TF-IDF: to determine the words that characterize each hip-hop artist, we used a technique called term frequency-inverse document frequency (tf-idf). other genres, but was only used 116 times in 26 million words. For example “lowrider” had a 255:1 ratio in hip hop vs. These all had fewer than 1,000 occurances in the hip hop corpus. the general corpus, were still rather rare words. Some words were filtered from this list that, while indexing high in hip hop vs. We then compare that to the same math for the general corpus. For example, this is # of appearences in hip hop corpus divided by total words in hip hop corpus. Most Hip Hop: To find the words most “characteristic” of hip-hop, we computed the odds that a word appeared in the hip hop corpus vs. This included efforts to standardize spelling, remove capitalization, and apply light lemmatization. We filtered hip-hop artists by cross-referencing their primary genre on MusixMatch.įor consistency, The hip hop data was cleaned using the same script as the LyricFind corpus. The general music corpus was formed using data from LyricFind.






Ass like misses incredible lyric juvenile