- Take lots of (possibly) weak predictors
- Weight them and add them up
- Get a stronger predictor
Jeffrey Leek
Johns Hopkins Bloomberg School of Public Health
http://webee.technion.ac.il/people/rmeir/BoostingTutorial.pdf
http://webee.technion.ac.il/people/rmeir/BoostingTutorial.pdf
http://webee.technion.ac.il/people/rmeir/BoostingTutorial.pdf
http://webee.technion.ac.il/people/rmeir/BoostingTutorial.pdf
http://webee.technion.ac.il/people/rmeir/BoostingTutorial.pdf
library(ISLR); data(Wage); library(ggplot2); library(caret);
Wage <- subset(Wage,select=-c(logwage))
inTrain <- createDataPartition(y=Wage$wage,
p=0.7, list=FALSE)
training <- Wage[inTrain,]; testing <- Wage[-inTrain,]
modFit <- train(wage ~ ., method="gbm",data=training,verbose=FALSE)
print(modFit)
2102 samples
10 predictors
No pre-processing
Resampling: Bootstrap (25 reps)
Summary of sample sizes: 2102, 2102, 2102, 2102, 2102, 2102, ...
Resampling results across tuning parameters:
interaction.depth n.trees RMSE Rsquared RMSE SD Rsquared SD
1 50 30 0.3 1 0.02
1 100 30 0.3 1 0.02
1 200 30 0.3 1 0.02
2 50 30 0.3 1 0.02
2 100 30 0.3 1 0.02
2 200 30 0.3 1 0.02
3 50 30 0.3 1 0.02
3 100 30 0.3 1 0.02
3 200 30 0.3 1 0.02
Tuning parameter 'shrinkage' was held constant at a value of 0.1
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were interaction.depth = 2, n.trees = 100 and shrinkage = 0.1.
qplot(predict(modFit,testing),wage,data=testing)