Better Machine Learning, End-to-End

Prepare Your Data For Modeling

Munge Tool Description
Data Profiling Quickly summarize the shape of your dataset to avoid bias or missing information before you start building your model. Missing data, zero values, text, and a visual distribution of the data are visualized automatically upon data ingestion.
Summary Statistics Visualize your data with summary statistics to get the mean, standard deviation, min, max, cardinality, quantile and a preview of the data set.
Aggregate, Filter, Bin, and Derive Columns Build unique views with Group functions, Filtering, Binning, and Derived Columns.
Slice, Log Transform, and Anonymize Normalize, anonymize, and partition to get your data into the right shape for modeling.
Variable Creation Highly customizable variable value creation to hone in on the key data characteristics to model.
PCA Principal Component Analysis makes feature selection easy with a simple to use interface and standard input values.
Training and Validation Sampling Plan Design a random or stratified sampling plan to generate data sets for model training and scoring.


Model with State of the Art Machine Learning Algorithms

Model Description
Generalized Linear Models (GLM) A flexible generalization of ordinary linear regression for response variables that have error distribution models other than a normal distribution. GLM unifies various other statistical models, including linear, logistic, Poisson, and more.
Random Forest
A robust ensemble algorithm that employs the power of statistical sampling and averaging and uses decision trees to build supervised models.
Gradient Boosting (GBM) A method to produce a prediction model in the form of an ensemble of weak prediction models. It builds the model in a stage-wise fashion and is generalized by allowing an arbitrary differentiable loss function. It is one of the most powerful methods available today.
K-Means A method to uncover groups or clusters of data points often used for segmentation. It clusters observations into k certain points with the nearest mean.
Anomaly Detection Identify the outliers in your data by invoking a powerful pattern recognition model.
Deep Learning Model high-level abstractions in data by using non-linear transformations in a layer-by-layer method. Deep learning is an example of unsupervised learning and can make use of unlabeled data that other algorithms cannot.
Naìˆve Bayes A probabilistic classifier that assumes the value of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. It is often used in text categorization.
Hyper-Parameter Search Automatically find the best parameters for your models with Cartesian or Random hyper-parameter searches, fully featured with automatic convergence-based stopping and support for validation and cross-validation.


Score Models with Confidence

Model Metrics and Scoring Tools Description
Predict Generate outcomes of a data set with any model. Predict with GLM, GBM, Decision Trees or Deep Learning models.
Confusion Matrix Visualize the performance of an algorithm in a table to understand how a model performs.
AUC A graphical plot to visualize the performance of a model by its sensitivity, true positive, false positive to select the best model.
HitRatio A classification matrix to visualize the ratio of the number of correctly classified and incorrectly classified cases.
Gains/Lift table Gains and Lift charts visualize the relative improvement of the top-ranked predictions over the baseline, and are useful for ranking applications such as marketing campaigns.