Chapter 6 Words Length Distribution
We examine the number of words written by the author in a single sentence with the histogram
. Unfortunately the plot does not reveal much. Therefore we would like to change the x-axis so that we can have a better plot.
train %>%
ggplot(aes(x = len, fill = author)) +
geom_histogram() +
scale_fill_manual( values = c("red","blue","orange") ) +
facet_wrap(~author) +
labs(x= 'Word Length',y = 'Count', title = paste("Distribution of", ' Word Length ')) +
theme_bw()
6.1 Words Length Distribution Plot 2
We limit the word length to 100 and investigate the distribution.We notice that HP Lovecraft and Mary Wollstonecraft Shelley have a lot of sentences with word length in the range 75 - 100.
train %>%
ggplot(aes(x = len, fill = author)) +
geom_histogram() +
scale_x_continuous(limits = c(15,100)) +
scale_fill_manual( values = c("red","blue","orange") ) +
facet_wrap(~author) +
labs(x= 'Word Length',y = 'Count', title = paste("Distribution of", ' Word Length ')) +
theme_bw()