Machine learning > Dimensionality Reduction > Techniques > Feature Selection vs Extraction
Feature Selection vs Feature Extraction
Introduction to Dimensionality Reduction
Two primary approaches to dimensionality reduction are feature selection and feature extraction. This tutorial will detail the differences between them.
Feature Selection: Choosing the Best Features
Feature Extraction: Creating New Features
Key Differences Summarized
Feature Feature Selection Feature Extraction Feature Type Original Features New Features (Transformations) Data Loss Potentially minimal, only discarding features Potential loss of information during transformation Interpretability High - Using original features Can be lower, new features can be difficult to interpret Computational Cost Varies, can be low for filter methods Can be computationally intensive depending on the method Examples SelectKBest, Recursive Feature Elimination (RFE) Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA)
Code Example: Feature Selection with SelectKBest
SelectKBest
method from scikit-learn. SelectKBest
selects the top `k` features based on a scoring function (in this case, f_classif
for classification). We load the Iris dataset, split it into training and testing sets, and then apply SelectKBest
to select the top 2 features. The fit_transform
method is used on the training data to select the features, and then transform
is used on the test data to apply the same feature selection. The selected feature indices are printed to show which features were chosen. The feature scores are printed to show how the features were scored by the score_func.
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Select the top 2 features using SelectKBest and f_classif
selector = SelectKBest(score_func=f_classif, k=2)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)
# Print the shapes of the original and selected feature sets
print("Original feature shape:", X_train.shape)
print("Selected feature shape:", X_train_selected.shape)
# Get the indices of the selected features
selected_feature_indices = selector.get_support(indices=True)
print("Selected feature indices:", selected_feature_indices)
#Get the scores
print("Feature scores:", selector.scores_) #available after fit
Code Example: Feature Extraction with PCA
fit_transform
method is used on the training data to compute the principal components, and then transform
is used on the test data to apply the same transformation. The shapes of the original and PCA-transformed feature sets are printed, along with the explained variance ratio, which indicates the amount of variance explained by each principal component.
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Apply PCA to reduce the dimensionality to 2 components
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
# Print the shapes of the original and PCA-transformed feature sets
print("Original feature shape:", X_train.shape)
print("PCA feature shape:", X_train_pca.shape)
# Explained variance ratio
print("Explained variance ratio:", pca.explained_variance_ratio_)
Concepts Behind the Snippets
Understanding the underlying principles helps choose the appropriate technique for a specific problem.
Real-Life Use Case Section
Best Practices
Interview Tip
When to Use Them
Memory Footprint
Smaller memory footprint is crucial for deploying models on resource-constrained devices or handling large datasets.
Alternatives
Pros and Cons of Feature Selection
Cons:
Pros and Cons of Feature Extraction
Cons:
FAQ
-
When should I use feature selection over feature extraction?
Use feature selection when you want to maintain the interpretability of your features, or when you believe that a subset of the original features contains the most relevant information. Feature selection is also a good choice when computational resources are limited, as it can be less computationally expensive than feature extraction methods. -
What are some common techniques for feature selection?
Common techniques for feature selection include:- Filter methods: SelectKBest, SelectPercentile, VarianceThreshold
- Wrapper methods: Recursive Feature Elimination (RFE), Sequential Feature Selection
- Embedded methods: L1 regularization (Lasso), Tree-based feature importance
-
What are some common techniques for feature extraction?
Common techniques for feature extraction include:- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
- Non-negative Matrix Factorization (NMF)
- Autoencoders
-
How do I evaluate the performance of dimensionality reduction techniques?
Evaluate the performance of dimensionality reduction techniques by comparing the performance of your model with and without dimensionality reduction. Use appropriate metrics such as accuracy, precision, recall, F1-score, or AUC, depending on the type of problem you are solving. Also, consider the computational cost and interpretability of the model.