name: ads2-clustering-feature-extraction description: Use this skill for ADS/ADS2 k-means clustering, neuron-type classification from numeric features, comparing clusters to known labels, elbow plots, adjusted Rand index, simple feature extraction from image-like matrices, MNIST-style row/column summaries, or classification feature preparation. Trigger on kmeans, k-means, clustering, neuron type, hap1, hap2, feature extraction, MNIST, pixels, classification, or cluster validation.
ADS2 Clustering And Feature Extraction
Use for the ADS2 neuron-classification style task and lower-priority feature extraction practice.
K-Means Workflow
- Import and check data.
- Plot original labels if available.
- Select numeric features.
- Scale features if ranges differ.
- Choose
k; use known class count if supplied. - Run
kmeanswithset.seed()andnstart. - Plot clusters.
- Compare to original labels with a contingency table.
K-Means Template
dat <- read.csv("vmndata.csv")
str(dat)
head(dat)
colSums(is.na(dat))
sum(duplicated(dat))
plot(dat$hap1, dat$hap2, col = as.factor(dat$type), pch = 19,
xlab = "hap1", ylab = "hap2", main = "Original classification")
features <- dat[, c("hap1", "hap2")]
features_scaled <- scale(features)
set.seed(123)
km <- kmeans(features_scaled, centers = 5, nstart = 25)
dat$cluster <- factor(km$cluster)
plot(dat$hap1, dat$hap2, col = dat$cluster, pch = 19,
xlab = "hap1", ylab = "hap2", main = "K-means clusters")
table(dat$type, dat$cluster)
Elbow Plot
wss <- sapply(1:10, function(k) {
kmeans(features_scaled, centers = k, nstart = 25)$tot.withinss
})
plot(1:10, wss, type = "b", xlab = "Number of clusters", ylab = "Within-cluster SS")
Adjusted Rand Index
Only use if a package is already available. Do not waste exam time installing packages.
library(mclust)
adjustedRandIndex(dat$type, dat$cluster)
Fallback: use table(dat$type, dat$cluster) and visual comparison.
Feature Extraction From 28x28 Images
raw <- read.csv("mnist_train.csv", header = FALSE)
labels <- raw[, 1]
pixels <- raw[, -1]
get_features <- function(row_pixels) {
mat <- matrix(as.numeric(row_pixels), nrow = 28, byrow = TRUE)
c(rowMeans(mat), colMeans(mat))
}
features <- t(apply(pixels[1:1000, ], 1, get_features))
features <- data.frame(label = labels[1:1000], features)
Interpretation
The k-means clustering produced [k] clusters. Comparing the clusters with the original labels using a contingency table shows [degree of agreement]. The clustering is [good/moderate/poor] because [specific pattern], but cluster labels themselves are arbitrary.