Chapter 13 Sentiment Analysis

13.1 Postive Authors and Not so Positive Authors

We investigate how often positive and negative words occurred in the text written by the authors. Which author was the most positive or negative overall?

We will use the AFINN sentiment lexicon, which provides numeric positivity scores for each word, and visualize it with a bar plot.

Edgar Allen Poe and Mary Wollstonecraft Shelley are positive authors

HP Lovecraft is unfortunately a negative author as explained through a bar plot. We need to go into detail why HP Lovecraft is a negative author.

visualize_sentiments <- function(SCWords) {
  SCWords_sentiments <- SCWords %>%
    inner_join(get_sentiments("afinn"), by = "word") %>%
    group_by(author) %>%
    summarize(score = sum(score * n) / sum(n)) %>%
    arrange(desc(score))
  
  SCWords_sentiments %>%
    mutate(author = reorder(author, score)) %>%
    ggplot(aes(author, score, fill = score > 0)) +
    geom_col(show.legend = TRUE) +
    coord_flip() +
    ylab("Average sentiment score") + theme_bw()
}



trainWords <- train %>%
  unnest_tokens(word, text) %>%
  count(author, word, sort = TRUE) %>%
  ungroup()

visualize_sentiments(trainWords)

13.2 Postive and Not So Postive Words of Authors

The following graph shows the Twenty high positive and the negative words

positiveWordsBarGraph <- function(SC) {
  contributions <- SC %>%
    unnest_tokens(word, text) %>%
    count(author, word, sort = TRUE) %>%
    ungroup() %>%
    
    inner_join(get_sentiments("afinn"), by = "word") %>%
    group_by(word) %>%
    summarize(occurences = n(),
              contribution = sum(score))
  
  contributions %>%
    top_n(20, abs(contribution)) %>%
    mutate(word = reorder(word, contribution)) %>%
    head(20) %>%
    ggplot(aes(word, contribution, fill = contribution > 0)) +
    geom_col(show.legend = FALSE) +
    coord_flip() + theme_bw()
}

positiveWordsBarGraph(train)

13.3 Postive and Not So Postive Words of Author HPL

trainHPL = train %>% filter(author == 'HPL')

positiveWordsBarGraph(trainHPL)

13.4 Postive and Not So Postive Words of Author EAP

trainEAP = train %>% filter(author == 'EAP')

positiveWordsBarGraph(trainEAP)

13.5 Postive and Not So Postive Words of Author MWS

trainMWS = train %>% filter(author == 'MWS')

positiveWordsBarGraph(trainMWS)