Machine learning > Support Vector Machines > SVM Theory and Usage > Non-linear SVM
Non-Linear Support Vector Machines: The Kernel Trick
Support Vector Machines (SVMs) are powerful algorithms for classification and regression tasks. While linear SVMs work well when data is linearly separable, many real-world datasets are not. Non-linear SVMs, leveraging the 'kernel trick', extend the capabilities of SVMs to handle these complex, non-linear relationships. This tutorial explores the theory behind non-linear SVMs and demonstrates their practical application using Python.
Understanding Non-Linear Separability
Linear SVMs find a hyperplane that best separates data points into different classes. However, when data points are intertwined and cannot be separated by a straight line (in 2D) or a hyperplane (in higher dimensions), a linear SVM fails. Non-linear SVMs address this by mapping the data into a higher-dimensional space where a linear separation is possible. Imagine trying to separate two intertwined circles of data points on a 2D plane. A straight line cannot do the job. But, if you could somehow transform this data into 3D space such that one circle sits 'above' the other, then a simple plane could separate them. This is the essence of non-linear SVMs.
The Kernel Trick: Implicit Feature Mapping
Instead of explicitly transforming the data to a higher-dimensional space (which can be computationally expensive), non-linear SVMs use a clever mathematical trick called the 'kernel trick'. A kernel function calculates the dot product of the data points in the higher-dimensional space without actually performing the transformation. This is what makes non-linear SVMs efficient. Common kernel functions include: The RBF kernel is a popular choice due to its flexibility and ability to handle complex decision boundaries.
Code Example: Non-Linear SVM with RBF Kernel in Scikit-learn
This code snippet demonstrates how to use a non-linear SVM with the RBF kernel using Scikit-learn. It generates a sample dataset of two intertwined circles, splits it into training and testing sets, trains an SVM classifier with the RBF kernel, and then evaluates the accuracy of the model. The gamma parameter controls the 'width' of the RBF kernel, influencing the shape of the decision boundary. C is the regularization parameter. The larger the C value, the more the classifier tries to classify all the training examples correctly. Important parameters:
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_circles
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import numpy as np
# Generate non-linear data (circles)
X, y = make_circles(n_samples=100, factor=0.5, noise=0.05, random_state=42)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create an SVM classifier with RBF kernel
clf = svm.SVC(kernel='rbf', gamma=0.7, C=1.0)
# Train the classifier
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Evaluate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
# Plot the decision boundary (optional)
def plot_decision_boundary(X, y, model):
h = .02 # Step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM with RBF Kernel Decision Boundary')
plt.show()
plot_decision_boundary(X, y, clf)
Concepts Behind the Snippet
The core concept is using the RBF kernel as a similarity function. For each data point, the RBF kernel measures its similarity to other data points based on their distance. The gamma parameter controls how far the influence of a single training example reaches. A smaller gamma means a larger influence, leading to smoother decision boundaries. The make_circles
function is used to create a synthetic dataset specifically designed to be non-linearly separable.
Real-Life Use Case Section
Non-linear SVMs are used in various applications, including: Specifically, in medical imaging, non-linear SVMs can be used to detect tumors or other anomalies in MRI or CT scans where the boundaries are often irregular and non-linear.
Best Practices
Here are some best practices for using non-linear SVMs:
Interview Tip
When discussing non-linear SVMs in an interview, be prepared to explain: Be able to articulate the pros and cons of using non-linear SVMs compared to other classification algorithms.
When to Use Them
Use non-linear SVMs when: Consider starting with a linear SVM and then moving to a non-linear SVM if the linear SVM performs poorly. Inspecting the data visually can give you clues about linearity.
Memory Footprint
The memory footprint of an SVM can be significant, especially with the RBF kernel. The algorithm typically stores support vectors, which are a subset of training data points. In the worst-case scenario, with a very complex dataset, nearly all training examples become support vectors. The more support vectors, the higher the memory consumption and the slower the prediction time. Strategies to reduce the number of support vectors include feature selection or dimensionality reduction techniques. Using linear kernels is also much more memory efficient.
Alternatives
Alternatives to non-linear SVMs include: The best choice depends on the specific dataset and the desired trade-off between accuracy, interpretability, and computational cost.
Pros
Pros of non-linear SVMs:
Cons
Cons of non-linear SVMs:
FAQ
-
What is the difference between a linear SVM and a non-linear SVM?
A linear SVM uses a hyperplane to separate data, while a non-linear SVM uses a kernel function to map the data into a higher-dimensional space where a linear separation becomes possible.
-
When should I use the RBF kernel?
The RBF kernel is a good general-purpose kernel that works well for a wide range of problems. It's often a good starting point when you don't have specific prior knowledge about the data.
-
How do I choose the best values for gamma and C?
Use cross-validation techniques like grid search or randomized search to find the optimal values for gamma and C. These techniques systematically evaluate different combinations of parameters and select the combination that performs best on a validation set.