# Adjusted Rand Index > The Rand Index (RI) computes how similar two clusterings are by counting how many pairs of samples are assigned consistently in both clusterings (either in the same cluster or in different clusters). The Rand Index (RI) computes how similar two [clusterings](https://wiki.g15e.com/pages/Clustering%20(machine%20learning.txt)) are by counting how many pairs of samples are assigned consistently in both clusterings (either in the same cluster or in different clusters). The Adjusted Rand Index (ARI) corrects this for chance — it adjusts for the fact that some agreement between clusterings might happen randomly. ## Example ```python from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import adjusted_rand_score from scipy.cluster.hierarchy import linkage, fcluster, dendrogram import matplotlib.pyplot as plt # 1. Sample data texts = [ "dog barks loudly", # Label 0 - Animal "cat meows at night", # Label 0 - Animal "puppy plays with ball", # Label 0 - Animal "car drives fast", # Label 1 - Vehicle "truck carries cargo", # Label 1 - Vehicle "bus transports people", # Label 1 - Vehicle "pizza has cheese", # Label 2 - Food "burger with lettuce", # Label 2 - Food "pasta with tomato sauce" # Label 2 - Food ] true_labels = [0, 0, 0, 1, 1, 1, 2, 2, 2] # 2. Convert text to TF-IDF vectors vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(texts).toarray() # 3. Perform hierarchical clustering (you can vary 'ward', 'average', etc.) Z = linkage(X, method='ward') # 4. Optional: visualize dendrogram plt.figure(figsize=(8, 4)) dendrogram(Z, labels=true_labels) plt.title("Dendrogram") plt.xlabel("Sample index") plt.ylabel("Distance") plt.show() # 5. Cut the dendrogram to form 3 clusters num_clusters = 3 predicted_labels = fcluster(Z, num_clusters, criterion='maxclust') # 6. Evaluate clustering using Adjusted Rand Index ari = adjusted_rand_score(true_labels, predicted_labels) print("Adjusted Rand Index (ARI):", ari) ```