This blog post was authored by Soniya Shah. Vertica 9.0.1 introduces new functionality that continues to match our goals for fast-paced development and enhancement of machine learning in Vertica. In this release, we introduce support for random forest for regression, a new statistical summary function, increased support for cross validation, and enhancements for data evaluation. […]

This blog post was authored by Vincent Xu. In this blog post, I’ll take you through the exercise I did to estimate the price of a diamond based on its characteristics, using the linear regression algorithm in Vertica. Besides Vertica 9.0, I used Tableau for charting and DbVisualizer as the SQL editor. From this exercise, […]

This blog post was authored by Vincent Xu. Vertica 9.0 is out and here is the updated Vertica machine learning cheat sheet. Vertica 9.0 introduces a slew of new machine learning features including one-hot encoding, Lasso regression, cross validation, model import/export, and many more. See the cheat sheet for examples of how to use the […]

This blog post was authored by Soniya Shah. Vertica 9.0 introduces new functionality that continues to match our goals for fast-paced development of the existing machine learning functions. In this release, we introduce two new summary functions, support for cross validation, support for one hot encoding, and the ability to import and export your models […]

This blog post was authored by Soniya Shah. Vertica 8.1.1 continues with the fast-paced development for machine learning. In this release, we introduce the highly-requested random forest algorithm. We added support for SVM to include SVM for regression, in addition to the existing SVM for classification algorithm. L2 regularization was added to both the linear […]

This blog post was authored by Vincent Xu. As of Vertica 8.1, Vertica has introduced a set of popular machine learning algorithms, including Linear Regression, Logistic Regression, Kmeans, Naïve Bayes, and SVM. Based on our recent benchmarks, they run faster than MLlib on Apache Spark. The following chart shows the performance difference between Vertica 8.1.0 […]

This blog post was authored by Vincent Xu. This post is part of our Machine Learning Mondays series. Stay tuned for more! Introduction Machine learning (ML) is an iterative process. From understanding data, preparing data, building models, testing models to deploying models, every step of the way requires careful examination and manipulation of the data. […]

This blog post was authored by Soniya Shah. Overall, you will notice that Machine Learning for Predictive Analytics, introduced in Vertica 7.2.2, is more accessible to use in Vertica 8.1, with the addition of several important functions. There are improvements to model management with access control ability to save and re-apply normalization parameters, missing value […]

This blog post is based on a white paper authored by Maurizio Felici. What is Logistic Regression? Logistic regression is a popular machine learning algorithm used for binary classification. Logistic regression labels a sample with one of two possible classes, given a set of predictors in the sample. Optionally, the output can be the probability […]

The content of this blog is based on a white paper that was authored by Maurizio Felici. What is k-means Clustering? K-means clustering is an unsupervised learning algorithm that clusters data into groups based on their similarity. Using k-means, you can find k clusters of data, represented by centroids. As the user, you select the […]