Overview

autoCaret is a R package for helping business analysts and other enthusiasts understand how to begin building predictive models in R via automation. View our intro presentation!

It leverages and wraps underlying features and functionality provided by both the caret, (short for Classification And REgression Training) and caretEnsemble R packages in an effort to provide a simple programmatic interface for analysts who would like to begin working on binary classification problems .

Also included in the package is also an intuitive graphical interface - in the form of an RStudio Add-In - that allows for an easy introduction into the package’s main functionality - producing an ensemble model via autocaret::automodel() - in order to help speed the learning and development process.


Why autoCaret?

1. It’s easy to believe that machine learning is hard

We fundamentally believe that the best ideas and concepts in machine learning are simple but that current literature and accessibility to “getting started” with machine learning sometimes puts walls up against these ideas.

We think:

  • People should be empowered to use machine learning via simple tools
  • These tools should allow for a linear path enabling their users to graduate into using the underlying language(s) used to make them
  • There needs to be a focus on an analyst’s understanding about what is happening, said differently: robust explanation/summarization about the modeling process is best


2. Making machine learning easier is important

Machine learning is a field that will only continue to pervade modern life. We think that additional tools need to be built to get analysts, who might not have much experience using R or other programming languages, engaged and excited about using machine learning in their day to day.

Why is this so important?

  • Machine learning can allow for automated decision making - freeing up human time to work on tasks requiring creativity
  • Machine learning methods add to current heuristic or rule based approaches, most fundamentally, they enable the ability to “learn” without being explicitly programmed
  • It’s pretty clear that Machine learning is revolutionizing every industry - analysts and others have a real need to understand what is going on “behind the curtain” since most use it daily without even noticing.


3. It’s feasible to make machine learning even easier!

While R tools like caret have brought us a long way in the effort to standardize many of the commonly repeated parts of the process required for building predictive models, there is no reason we can’t further streamline this process.

Python tools like TpoT have attempted to do this using genetic programming. Additionally there are a number of proprietary tools built by companies like BigMl and DataRobot that also seek to automate machine learning tasks.

The autoCaret package intends to take an analagous but simpler approach:

  • We have an initial focus on binary classification
  • We expose a minimal set of functions to the user and allow the objects returned to be interacted with internal generics like summary() and predict()
  • We make available a dead simple GUI for those who’d prefer to use one over the command line (at first, at least, we hope)
  • We’re open source and on GitHub. Send a pull request!


Where does autoCaret fit?

Both sourcing raw data and cleaning it are the responsibility of the end user. Otherwise, many of the most tedious parts of the predictive model process are covered by autoCaret! Be sure to see the getting started guide included with the package or explore other examples shown on this page.

There are too many potential possibilities for a R package like autoCaret to be able to provide functionality that would provide acceptable performance or even begin to be able to automate the data cleaning process. There are, however; a great wealth of tools that do help and should be explored – the github page RStartHere is a great place to get begin getting acquainted!