Chapter 6 Words Length Distribution

We examine the number of words written by the author in a single sentence with the histogram. Unfortunately the plot does not reveal much. Therefore we would like to change the x-axis so that we can have a better plot.

train %>%
      ggplot(aes(x = len, fill = author)) +    
      geom_histogram() +
      scale_fill_manual( values = c("red","blue","orange") ) +
      facet_wrap(~author) +
      labs(x= 'Word Length',y = 'Count', title = paste("Distribution of", ' Word Length ')) +
      theme_bw()

6.1 Words Length Distribution Plot 2

We limit the word length to 100 and investigate the distribution.We notice that HP Lovecraft and Mary Wollstonecraft Shelley have a lot of sentences with word length in the range 75 - 100.

train %>%
      ggplot(aes(x = len, fill = author)) +    
      geom_histogram() +
      scale_x_continuous(limits = c(15,100)) +
      scale_fill_manual( values = c("red","blue","orange") ) +
      facet_wrap(~author) +
      labs(x= 'Word Length',y = 'Count', title = paste("Distribution of", ' Word Length ')) +
      theme_bw()