STA 141C Big-data and Statistical Computing

Discussion 4: Spectral Clustering

TA: Tesi Xiao

Graph

A graph is made up of vertices (nodes/points) which are connected edges (links/lines). Mathematically speaking, a graph is an ordered pair $G=(V, E)$

  • $G = (V, E)$
  • $V$: the set of vertices
  • $E\subseteq \{ (x,y)\vert x,y\in V \text{ and } x\neq y\}$ (for simplicity, we do not consider self-loops here)
    • Undirected graph: $(x, y)$ is an unordered pair;
    • Directed graph: $(x, y)$ is an ordered pair.
  • Weighted graph $G = (V, E, W)$: A weighted graph (or a network) is a graph with weighted edges. That is to say, a number (the weight) $w_{ij}\in W$ is assigned to the edge $(i,j)\in E$. $w_{ij}=0$ if $(i,j)\notin E$.
    • $\vert V \vert$: the cardinality of set $V$ / the number of vertices
    • $\vert E\vert$: the cardinality of set $E$ / the number of edges
    • Weight Matrix $W$: a $\vert V \vert\times \vert V \vert$ matrix where $W_{i,j} = w_{ij}$ if $(i,j) \in E$ else $W_{i,j} = 0$. For undirected graphs, $W$ is symmetric, i.e., $w_{ij} = w_{ji}$.

Note that for unweigted graph, one can assign 1 to all the edges, then $W$ would become the adjacency matrix $A$ where $A_{i,j} = 1$ if $(i,j)\in E$ else $A_{i,j}=0$

  • Degree: For undirected graphs, the degree of a vertix $v\in V$ denoted by $\text{deg}(v)$ is defined as $\text{deg}(v) = \sum_{j\in V} w_{vj}$. For directed graphs, we can similarly define the in-degree and the out-degree of a vertex. See Wikipage.

Commonly, we use $D$ to represent the degree matrix, which is a diagonal matrix including the degrees of vertices.

  • Laplacian Matrix for weighted graphs
    • Laplacian Matrix $L$: $L=D-W$
    • Symmetric Normalized Laplacian $\overline{L} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} W D^{-1/2}$

Netwokx

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Check out Tutorials

# a simple example of using networkx

import networkx as nx
import matplotlib.pyplot as plt

G = nx.Graph()
G.add_nodes_from([1,2,3])
G.add_node(4)

G.add_edges_from([(1,2), (2,3), (1,3)])
G.add_edge(3,4)
nx.draw(G, with_labels=True)

Graph Cut

  • Partitioning into two clusters

    • Naive Balanced Cut: \(\min_{V_1, V_2} \text{ cut}(V_1, V_2) := \sum_{v_i \in V_1, v_j \in V_2} W_{ij} \text{ s.t. } \vert V_1\vert = \vert V_2\vert \text{ and } V_1 \cup V_2 = V, V_1 \cap V_2 = \emptyset\)
    • Ratio-Cut: \(\min_{V_1, V_2} \text{ RC}(V_1, V_2) := \bigg\{ \frac{\text{cut}(V_1, V_2)}{\vert V_1\vert } + \frac{\text{cut}(V_1, V_2)}{\vert V_2\vert} \bigg\}\)
    • Normalized-Cut: \(\min_{V_1, V_2} \text{ NC}(V_1, V_2) := \bigg\{ \frac{\text{cut}(V_1, V_2)}{\text{deg}(V_1)} + \frac{\text{cut}(V_1, V_2)}{\text{deg}(V_2)} \bigg\}\)

    In the orginal paper about Normalized-Cut, they call $\text{deg}(V_1)$ as $\text{assoc}(V_1, V)$, which is the total connections from nodes in $V_1$ to all nodes in the graph. See paper.

  • Generalization to $k$ clusters

    • Ratio-Cut: \(\min_{V_1,..., V_k} \sum_{c=1}^k\frac{\text{cut}(V_c, V-V_c)}{\vert V_c \vert }\)
    • Normalized-Cut: \(\min_{V_1,..., V_k} \sum_{c=1}^k\frac{\text{cut}(V_c, V-V_c)}{\text{deg}(V_c)}\)

Spectral Clustering

The above graph-cut problems are hard to solve. However, fortunately, one can derive relaxed versions of those problems.

  • Relaxed to the real-valued Rayleigh quotient minimization problem
    • Ratio-Cut: \(\min_{y_1,..., y_k} \sum_{c=1}^k\frac{y_c^\top L y_c}{y_c^\top y_c}\)
    • Normalized-Cut: \(\min_{y_1,..., y_k} \sum_{c=1}^k\frac{y_c^\top \overline{L} y_c}{y_c^\top y_c}\)
    • Remark: $y_c$ is a vector indicates which vertices belong to the $c$-th cluster, thus satifying certain constraints. Putting these $y_c$s together into a matrix, we will get a problem in the form of \(\min_{Y^\top Y = I} = \text{Trace}(Y^\top L Y) \text{(Ratio-Cut) or }\text{Trace}(Y^\top \overline{L} Y) \text{(Normalized-Cut)}\) Then we are able to solve it by finding eigenvectors corresponding the smallest $k$ eigenvalues of $L$. However, the obtained $Y$ may not satisfy the contraints that $Y$ should exactly indicate the partition. We run K-means algorithm on the rows of $Y$ to obtain the final results.
  • Summary: We convert the original problem into a eigenvalue problem of the graph Laplacian.

Spectral Clustering by scikit learn