• Hidden Gems Book
  • 1 What is Kaggle
  • 2 Hidden Gems
  • 3 Most Popular Hidden Gems Authors
    • 3.1 Jonathan Bouchet Notebooks - Top Hidden Gem Author
    • 3.2 Jonathan Bouchet - Leading Hidden Gem Author reviews
    • 3.3 Ramshankar Yadhunath - Leading Hidden Gem Author reviews
    • 3.4 Parul Pandey - Leading Hidden Gem Author reviews
    • 3.5 Laura Fink - Leading Hidden Gem Author reviews
    • 3.6 Vopani - Leading Hidden Gem Author reviews
    • 3.7 kxx - Leading Hidden Gem Author reviews
    • 3.8 Bojan Tunguz - Leading Hidden Gem Author reviews
  • 4 Tags and Percentage
    • 4.1 Jonathan Bouchet Tags
    • 4.2 Ramshankar Yadhunath Tags
    • 4.3 Vopani Tags
    • 4.4 Bojan Tunguz Tags
  • 5 Performance Tier and Gems
  • 6 Hidden Gem and Competiton Notebook
    • 6.1 95% Confidence Interval for a Hidden Gem being a NOT A Competition notebook
  • 7 Total Votes for a Hidden Gem
    • 7.1 Box Plot ( without Outliers )
    • 7.2 Box Plot
    • 7.3 Density Plot
    • 7.4 Summary Statistics for Votes
    • 7.5 95% Confidence Interval for Hidden Gems Votes
  • 8 Total Comments
    • 8.1 Box Plot
    • 8.2 Density Plot
    • 8.3 Summary Statistics for Total Comments
    • 8.4 95% Confidence Interval for Hidden Gems Total Comments
  • 9 Total Views
    • 9.1 Box Plot
    • 9.2 Density Plot
    • 9.3 Histogram Plot
    • 9.4 Summary Statistics for Total Views
    • 9.5 95% Confidence Interval for Hidden Gems Total Views
  • 10 Medal distribution
  • 11 Versions of the Hidden Gems
    • 11.1 Box Plot [ Removing Outliers ]
    • 11.2 Density Plot
    • 11.3 Histogram Plot
    • 11.4 Summary Statistics for Maximum Version Number
    • 11.5 95% Confidence Interval for Hidden Gems Maximum Version Number
  • 12 Principal Components
    • 12.1 Principal Component 1
    • 12.2 Principal Component 2
    • 12.3 Principal Component 3
    • 12.4 Principal Component 4
    • 12.5 Principal Component 5
    • 12.6 Principal Component 6
    • 12.7 Principal Component 7
  • 13 Recommended Notebooks for 2021 June to 2021 December
  • 14 Who got Highest Votes after Hidden Gem Declaration
  • 15 Who got No Votes after the Hidden Gem Declaration
  • 16 Lowest Number of Votes after Hidden Gem Declaration
  • 17 More Analysis UpVotes after Hidden Gem declaration
    • 17.1 Box Plot [ removing Outliers]
    • 17.2 Density Plot
    • 17.3 Summary Statistics for Hidden Gems UpVotes
    • 17.4 95% Confidence Interval for Hidden Gems UpVotes
  • 18 Mining Hidden Gems Titles and Reviews
    • 18.1 Word Cloud of the Hidden Gem Reviews
    • 18.2 Word Cloud of the Hidden Gem Titles
    • 18.3 Network graph of Hidden Gem Title
    • 18.4 Network graph of Hidden Gem Reviews
    • 18.5 Competition Network Graph
      • 18.5.1 Competition reviews
    • 18.6 Image Network Graph
      • 18.6.1 Image reviews
    • 18.7 GrandMaster and Reviews
      • 18.7.1 Grand Master reviews
    • 18.8 Kaggle Network Graph
      • 18.8.1 Kaggle reviews
    • 18.9 Master Network Graph
    • 18.10 Topic Modelling of Hidden Gem Reviews
  • 19 Similar Authors
    • 19.1 Jonathan Bouchet - Similar Author
    • 19.2 Vopani - Similar Author
    • 19.3 Parul Pandey - Similar Author
    • 19.4 Bojan Tunguz - Similar Author
    • 19.5 Laura Fink - Similar Author
    • 19.6 Bukun - Similar Author
  • References
  • Published with bookdown

Hidden Gems Book

Chapter 13 Recommended Notebooks for 2021 June to 2021 December

We recommend the following Notebooks created between 2021 June to 2021 December [ This is chosen to reduce the dataset analysis purposes only ]

We choose the following criteria

  • Medals - Silver

  • We chose a Kernel which is NOT a Competition Notebook

  • Performance Tier of the author is Expert or Master

  • We chose Kernels whose Total Votes greater than 40, Total Comments greater than 10 and the Number of views is more than 3100

  • We removed Kernels which had common data sources such as Titanic, Breast Cancer , Heart and Diabetes

kernels$MadePublicDate = as.Date(kernels$MadePublicDate,format = "%m/%d/%Y")

kernels_subset = kernels %>% 
  filter(between(MadePublicDate, as.Date("2021-06-01"),as.Date("2021-12-31")))

kernels_subset = kernels_subset %>%
  filter(TotalVotes > 40)

kernels_subset = kernels_subset %>%
  filter(TotalComments > 10)

kernels_subset = kernels_subset %>%
  filter(TotalViews > 3100)

kernels_subset$Medal = as.integer(kernels_subset$Medal)

kernels_subset_silver = kernels_subset %>%
  filter(Medal >= 2)

kvcs_silver <- kernels_subset_silver %>%
  left_join(kernel_version_competition ,  
            by = c("CurrentKernelVersionId" = "KernelVersionId"))

kvcs_silver = kvcs_silver %>%
  filter(is.na(SourceCompetitionId))

kvcs_silver = kvcs_silver %>%
  mutate(CompNoteBook = ifelse(is.na(SourceCompetitionId),0,1))

kvcs_silver_users = kvcs_silver %>% 
  left_join(users %>% select(AuthorUserId = Id, 
                             author_kaggle = UserName,
                             DisplayName,
                             RegisterDate,
                             PerformanceTier), by = "AuthorUserId")

kvcs_silver_users_experts = kvcs_silver_users %>%
  filter(PerformanceTier %in%  c(2,3))

kvcs_silver_users_experts = kvcs_silver_users_experts %>%
  filter(!str_detect(CurrentUrlSlug, c("titanic") ))

kvcs_silver_users_experts = kvcs_silver_users_experts %>%
  filter(!str_detect(CurrentUrlSlug, c("diabetes") ))

kvcs_silver_users_experts = kvcs_silver_users_experts %>%
  filter(!str_detect(CurrentUrlSlug, c("house") ))

kvcs_silver_users_experts = kvcs_silver_users_experts %>%
  filter(!str_detect(CurrentUrlSlug, c("heart") ))

kvcs_silver_users_experts = kvcs_silver_users_experts %>%
  filter(!str_detect(CurrentUrlSlug, c("breast") ))


kvcs_silver_users_experts = kvcs_silver_users_experts %>%
  mutate( URL = paste("https://www.kaggle.com/code/",author_kaggle,"/",CurrentUrlSlug,sep =""))

kvcs_versions_info_reduced = kvcs_silver_users_experts %>%
  select("URL","Medal",
         "TotalViews","TotalComments","TotalVotes",
  ) %>%
  arrange(desc(TotalVotes))



kvcs_versions_info_reduced %>%
  gt() %>%
  tab_header(
    title = "Recommended Notebooks for 2021 June to December")
Recommended Notebooks for 2021 June to December
URL Medal TotalViews TotalComments TotalVotes
https://www.kaggle.com/code/ankitkalauni/tokyo-olympic-2021-starter-clean-eda 2 4747 48 96
https://www.kaggle.com/code/sonalisingh1411/eda-on-train-test-dataset-price-prediction 2 4471 38 86
https://www.kaggle.com/code/imakash3011/customer-analysis-eda-report-clustering 2 6530 46 77
https://www.kaggle.com/code/mysarahmadbhat/types-of-transformations-for-better-distribution 2 3551 64 73
https://www.kaggle.com/code/miguelfzzz/store-customers-clustering-analysis 2 4338 22 72
https://www.kaggle.com/code/imakash3011/covid-19-india-eda-visualization-report 2 3414 60 71
https://www.kaggle.com/code/mysarahmadbhat/python-from-zero-to 2 5180 36 71
https://www.kaggle.com/code/kaanboke/beginner-friendly-end-to-end-ml-project-enjoy 2 3211 24 69
https://www.kaggle.com/code/jonaspalucibarbosa/chest-x-ray-pneumonia-cnn-transfer-learning 2 5180 27 66
https://www.kaggle.com/code/kartik2khandelwal/bitcoin-crash-prediction 2 3565 49 66
https://www.kaggle.com/code/miguelfzzz/olympics-tokyo-2020-cool-eda 2 4085 39 65
https://www.kaggle.com/code/maricinnamon/store-sales-time-series-forecast-visualization 2 4671 33 65
https://www.kaggle.com/code/kslarwtf/eda-clustering-updated 2 4404 45 64
https://www.kaggle.com/code/miguelfzzz/bellabeat-data-analysis-discovering-trends 2 3110 12 63
https://www.kaggle.com/code/ankitkalauni/covid-19-india-statewise-clean-eda-deaths-pred 2 3308 49 62
https://www.kaggle.com/code/gaganmaahi224/eda-detailed-explanation-of-knn-algorithm 2 3351 39 62
https://www.kaggle.com/code/yuyougnchan/look-at-this-note-numeric-variable-is-easy 2 3662 42 60
https://www.kaggle.com/code/zwartfreak/easiest-price-prediction-full-explanation 2 4773 36 59
https://www.kaggle.com/code/thomaskonstantin/exploring-and-predicting-drinking-water-potability 2 4285 39 58
https://www.kaggle.com/code/victoriamiller19/hypothesis-testing-explanation 2 3107 27 58
https://www.kaggle.com/code/vardhansiramdasu/summer-olympics-eda 2 3364 41 58
https://www.kaggle.com/code/mostafaalaa123/customer-personality 2 4615 22 57
https://www.kaggle.com/code/ludovicocuoghi/twitter-sentiment-analysis-with-bert-roberta 2 3346 39 57
https://www.kaggle.com/code/aryantiwari123/hotel-booking-eda-models 2 4616 56 56
https://www.kaggle.com/code/tensorchoko/g-research-crypto-forecasting-eda 2 3979 18 56
https://www.kaggle.com/code/frankmollard/a-story-about-unsupervised-learning 2 5311 14 53
https://www.kaggle.com/code/aditimulye/adult-income-dataset-from-scratch 2 3215 25 52
https://www.kaggle.com/code/mostafaalaa123/finished-quick-analysis-of-each-q 2 5383 38 51
https://www.kaggle.com/code/anandhuh/image-classification-using-cnn-for-beginners 2 5234 24 50
https://www.kaggle.com/code/frankmollard/nlp-a-gentle-introduction-lstm-word2vec-bert 2 3633 30 50
https://www.kaggle.com/code/ankitkalauni/customer-personality-clean-eda-k-means 2 3348 25 50
https://www.kaggle.com/code/prena0808/tokyo-olympics-data-analysis 3 3331 20 50
https://www.kaggle.com/code/paulrohan2020/ml-algorithms-from-scratch-with-pure-python 2 3717 37 48
https://www.kaggle.com/code/rankirsh/predicting-attrition-from-a-to-z 2 3895 33 47
https://www.kaggle.com/code/atasaygin/hotel-booking-demand-eda-and-of-guest-prediction 2 3213 20 46
https://www.kaggle.com/code/hijest/text-generation-for-beginners-thorough-tutorial 2 6054 11 45
https://www.kaggle.com/code/aryantiwari123/handwriting-recognition-deep-learning-tensorflow 2 3349 32 44
https://www.kaggle.com/code/imakash3011/water-quality-prediction-7-model 2 4013 45 43
https://www.kaggle.com/code/yogidsba/personal-loan-logistic-regression-decision-tree 2 7634 24 42
https://www.kaggle.com/code/jonaspalucibarbosa/default-of-credit-card-eda-catboost-w-ft-eng 2 3164 26 42
https://www.kaggle.com/code/anoopashware/food-demand-forecasting-predict-orders 2 3642 14 41