Chapter 13 Sentiment Analysis
13.1 Postive Authors and Not so Positive Authors
We investigate how often positive and negative words occurred in the text written by the authors. Which author was the most positive or negative overall?
We will use the AFINN sentiment lexicon, which provides numeric positivity scores for each word, and visualize it with a bar plot.
Edgar Allen Poe and Mary Wollstonecraft Shelley are positive
authors
HP Lovecraft is unfortunately a negative
author as explained through a bar plot. We need to go into detail why HP Lovecraft
is a negative
author.
visualize_sentiments <- function(SCWords) {
SCWords_sentiments <- SCWords %>%
inner_join(get_sentiments("afinn"), by = "word") %>%
group_by(author) %>%
summarize(score = sum(score * n) / sum(n)) %>%
arrange(desc(score))
SCWords_sentiments %>%
mutate(author = reorder(author, score)) %>%
ggplot(aes(author, score, fill = score > 0)) +
geom_col(show.legend = TRUE) +
coord_flip() +
ylab("Average sentiment score") + theme_bw()
}
trainWords <- train %>%
unnest_tokens(word, text) %>%
count(author, word, sort = TRUE) %>%
ungroup()
visualize_sentiments(trainWords)
13.2 Postive and Not So Postive Words of Authors
The following graph shows the Twenty high positive and the negative words
positiveWordsBarGraph <- function(SC) {
contributions <- SC %>%
unnest_tokens(word, text) %>%
count(author, word, sort = TRUE) %>%
ungroup() %>%
inner_join(get_sentiments("afinn"), by = "word") %>%
group_by(word) %>%
summarize(occurences = n(),
contribution = sum(score))
contributions %>%
top_n(20, abs(contribution)) %>%
mutate(word = reorder(word, contribution)) %>%
head(20) %>%
ggplot(aes(word, contribution, fill = contribution > 0)) +
geom_col(show.legend = FALSE) +
coord_flip() + theme_bw()
}
positiveWordsBarGraph(train)
13.3 Postive and Not So Postive Words of Author HPL
trainHPL = train %>% filter(author == 'HPL')
positiveWordsBarGraph(trainHPL)
13.4 Postive and Not So Postive Words of Author EAP
trainEAP = train %>% filter(author == 'EAP')
positiveWordsBarGraph(trainEAP)
13.5 Postive and Not So Postive Words of Author MWS
trainMWS = train %>% filter(author == 'MWS')
positiveWordsBarGraph(trainMWS)