Automatically Returns 36 Logistic Models (23 Individual Models and 13 Ensembles of Models) • LogisticEnsembles

The goal of LogisticEnsembles is to perform a complete analysis of logistic data. The package automatically returns 36 models (23 individual and 13 ensembles of models)

Installation

You can install the development version of LogisticEnsembles like so:

devtools::install_github("InfiniteCuriosity/LogisticEnsembles")

Example

This is a basic example which shows you how to solve a common problem:

library(LogisticEnsembles)
Logistic(data = SAHeart,
    colnum = 10,
    numresamples = 2,
    how_to_handle_strongs = 1,
    do_you_have_new_data = "N",
    save_all_trained_models = "N",
    remove_ensemble_correlations_greater_than = 1.00,
    use_parallel = "N",
    train_amount = 0.60,
    test_amount = 0.20,
    validation_amount = 0.20)

Each of the 36 models returns a probability between 0 and 1. Each of the 36 models fit the data to the training set, make predictions and measure accuracy on the test and validation sets.

The list of 36 logistic models:

ADA Boost
Bagged Random Forest
Bagging
BayesGLM
BayesRNN
C50
Cubist
Ensemble ADA Boost
Ensemble Bagging
Ensemble C50
Ensemble Gradient Boosted
Ensemble Partial Least Squares
Ensemble Penalized Discrminant Analysis
Ensemble Random Forest
Ensemble Ranger
Ensemble Regularized Discrminant Analysis
Ensemble RPart
Ensemble Support Vector Machines
Ensemble Trees
Ensemble XGBoost
Flexible Discriminant Analysis
Generalized Additive Models
Generalized Linear Models
Gradient Boosted
Linear Discrmininant Analysis
Linear Model
Mixed Discrmininant Analysis
Naive Bayes
Penalized Discrminant Analysis
Quadratic Discrmininant Analysis
Random Forest
Ranger
RPart
Support Vector Machines
Trees
XGBoost

The 13 plots automatically created by the package are:

Target vs each feature (multiple barcharts)
Boxplots of the numeric data
Over or underfitting barchart
Duration barchart
Overfitting by model and resample
Model accuracy barchart
Accuracy by model and resample
Accuracy by model
ROC curves
Pairwise scatterplots
Correlation of the data as circles by color and size
Correlation of the data by color and number

The tables and reports automtically created: 1. Summary report. This includes the Model, Accuracy, True Postive, True Negative, False Positive, False Negative, Positive Predictive Value, Negative Predictive Value, F1 score, Area under the curve, overfitting min, overfitting mean, overfitting max, and duration. 2. Data summary 3. Head of the ensemble 4. Correlation of the ensemble 5. Correlation of the data 6. Head of the data frame

The package also returns all 36 summary reports, alphabetical by model.