The goal of LogisticEnsembles is to perform a complete analysis of logistic data. The package automatically returns 36 models (23 individual and 13 ensembles of models)
Installation
You can install the development version of LogisticEnsembles like so:
devtools::install_github("InfiniteCuriosity/LogisticEnsembles")Example
This is a basic example which shows you how to solve a common problem:
library(LogisticEnsembles)
Logistic(data = SAHeart,
colnum = 10,
numresamples = 2,
how_to_handle_strongs = 1,
do_you_have_new_data = "N",
save_all_trained_models = "N",
remove_ensemble_correlations_greater_than = 1.00,
use_parallel = "N",
train_amount = 0.60,
test_amount = 0.20,
validation_amount = 0.20)Each of the 36 models returns a probability between 0 and 1. Each of the 36 models fit the data to the training set, make predictions and measure accuracy on the test and validation sets.
The list of 36 logistic models:
- ADA Boost
- Bagged Random Forest
- Bagging
- BayesGLM
- BayesRNN
- C50
- Cubist
- Ensemble ADA Boost
- Ensemble Bagging
- Ensemble C50
- Ensemble Gradient Boosted
- Ensemble Partial Least Squares
- Ensemble Penalized Discrminant Analysis
- Ensemble Random Forest
- Ensemble Ranger
- Ensemble Regularized Discrminant Analysis
- Ensemble RPart
- Ensemble Support Vector Machines
- Ensemble Trees
- Ensemble XGBoost
- Flexible Discriminant Analysis
- Generalized Additive Models
- Generalized Linear Models
- Gradient Boosted
- Linear Discrmininant Analysis
- Linear Model
- Mixed Discrmininant Analysis
- Naive Bayes
- Penalized Discrminant Analysis
- Quadratic Discrmininant Analysis
- Random Forest
- Ranger
- RPart
- Support Vector Machines
- Trees
- XGBoost
The 13 plots automatically created by the package are:
- Target vs each feature (multiple barcharts)
- Boxplots of the numeric data
- Over or underfitting barchart
- Duration barchart
- Overfitting by model and resample
- Model accuracy barchart
- Accuracy by model and resample
- Accuracy by model
- ROC curves
- Pairwise scatterplots
- Correlation of the data as circles by color and size
- Correlation of the data by color and number
The tables and reports automtically created: 1. Summary report. This includes the Model, Accuracy, True Postive, True Negative, False Positive, False Negative, Positive Predictive Value, Negative Predictive Value, F1 score, Area under the curve, overfitting min, overfitting mean, overfitting max, and duration. 2. Data summary 3. Head of the ensemble 4. Correlation of the ensemble 5. Correlation of the data 6. Head of the data frame
The package also returns all 36 summary reports, alphabetical by model.