## 1 Introduction

Statistical inference on graphs is a burgeoning field of research in machine learning and statistics, with numerous applications to social network, neuroscience, etc. Many statistical inference procedures for graphs involve a preprocessing step of finding a representation of the vertices as points in some low-dimensional Euclidean space. This representation is usually given by the truncated eigendecomposition of the adjacency matrix or related matrices such as the combinatorial Laplacian or the normalized Laplacian. For example, given a point cloud lying in some purported low-dimensional manifold in a high-dimensional ambient space, many manifold learning or non-linear dimension reduction algorithms such as Laplacian eigenmaps

[5] and diffusion maps [15]use the eigenvectors of the normalized Laplacian constructed from a neighborhood graph of the points as a low-dimensional Euclidean representation of the point cloud before performing inference such as clustering or classification. Spectral clustering algorithms such as the normalized cuts algorithm

[35] proceed by embedding a graph into a low-dimensional Euclidean space followed by running -means on the embedding to obtain a partitioning of the vertices. Some network comparison procedures embed the graphs and then compute a kernel-based distance measure between the resulting point clouds [41, 3].The choice of the matrix used in the embedding step and its effect on subsequent inference is, however, rarely addressed in the literature. In a recent pioneering work, the authors of [6] addressed this issue by analyzing, in the context of stochastic blockmodel graphs where the subsequent inference task is the recovery of the block assignments, a metric given by the average distance between the vertices of a block and its cluster centroid for the spectral embedding of the adjacency matrix and the normalized Laplacian matrix. The metric is then used as a surrogate measure for the performance of the subsequent inference task, i.e., the metric is a surrogate measure for the error rate in recovering the vertices to block assignments. The stochastic blockmodel [20] is a popular generative model for random graphs with latent community structure and many results are known regarding consistent recovery of the block assignments; see for example [34, 39, 7, 27, 23, 30, 13, 36, 28] and the references therein.

It was shown in [6]

that for two-block stochastic blockmodels, for a large regime of parameters the normalized Laplacian spectral embedding reduces the within-block variance (occasionally by a factor of four) while preserving the between-block variance, as compared to that of the adjacency spectral embedding. This suggests that for a large region of the parameters space for two-block stochastic blockmodels, the spectral embedding of the Laplacian is to be preferred over that of the adjacency matrix for subsequent inference. However, we observed that the metric in

[6] is intrinsically tied to the use of -means as the clustering procedure, i.e., a smaller value of the metric for the Laplacian spectral embedding as compared to that for the adjacency spectral embedding only implies that clustering the Laplacian spectral embedding using -means is possibly better than clustering the adjacency spectral embedding using -means.Motivated by the above observation, one main goal of this paper is to propose a metric that is independent of any specific clustering procedure, i.e., a metric that characterizes the minimum error achievable by any clustering procedure that uses only the spectral embedding, for the recovery of block assignments in stochastic blockmodel graphs. We achieve this by establishing distributional limit results for the eigenvectors corresponding to the few largest eigenvalues of the adjacency or Laplacian matrix and then characterizing, through the notion of statistical information, the distributional differences between the blocks for either embedding method. Roughly speaking, smaller statistical information implies less information to discriminate between the blocks of the stochastic blockmodel.

More specifically, the limit result in [4] states that, for stochastic blockmodel graphs, conditional on the block assignments the scaled eigenvectors corresponding to the few largest eigenvalues of the adjacency matrix converge to a multivariate normal (see e.g., Theorem 2.2) as the number of vertices increases. Furthermore, the associated covariance matrix is not necessarily spherical and hence -means clustering for the adjacency spectral embedding does not always yield minimum error for recovering the block assignment. Analogous limit results (see e.g., Theorem 3.2) for the eigenvectors of the normalized Laplacian matrix then facilitate comparison between the two embedding methods via the classical notion of Chernoff information [11]. The Chernoff information is a supremum of the Chernoff -divergences for and characterizes the error rate of the Bayes decision rule in hypothesis testing; the Chernoff -divergence is an example of a -divergence [16, 1] and it satisfies the information processing lemma and is invariant with respect to invertible transformations [24].

Our paper is thus structured as follows. We recall in Section 2 the definition of random dot product graphs, stochastic blockmodel graphs, and spectral embedding of the adjacency and Laplacian matrices. We then state in Section 2.1 several limit results for the eigenvectors of the adjacency spectral embedding. These results are generalizations of results from [4, 40]. The main technical contribution of this paper, namely analogous limit results for the eigenvectors of the Laplacian spectral embedding, are then given in Section 3. We then discuss the implications of these limit results in Section 4; in particular Section 4.3 characterizes, via the notion of Chernoff statistical information, the large-sample optimal error rate of spectral clustering procedures. We demonstrate that neither embedding method dominates for the inference task of recovering block assignments in stochastic blockmodels. We conclude the paper with some brief remarks on potential extensions of the results presented herein. Proofs of stated results are given in the appendix.

## 2 Background and Setting

We first recall the notion of a random dot product graph [31].

###### Definition 1.

Let be a distribution on a set satisfying for all . We say with sparsity factor if the following hold. Let

be independent random variables and define

(2.1) |

The are the latent positions for the random graph, i.e., we do not observe , rather we observe only the matrix . The matrix is defined to be symmetric with all zeroes on the diagonal such that for all , conditioned on the are independent and

(2.2) |

namely,

(2.3) |

###### Remark.

We note that non-identifiability is an intrinsic property of random dot product graphs. More specifically, if where is a distribution on , then for any orthogonal transformation , is identically distributed to ; we write to denote the distribution of whenever . Furthermore, there also exists a distribution on with such that is identically distributed to . Non-identifiability due to orthogonal transformations cannot be avoided given the observed . We avoid the other source of non-identifiability by assuming throughout this paper that if then is non-degenerate, i.e., is of full rank.

As an example of random dot product graphs, we could take to be the unit simplex in and let be a mixture of Dirichlet distributions or logistic-normal distribution. Random dot product graphs are a specific example of latent position graphs or inhomogeneous random graphs [19, 8], in which each vertex is associated with a latent position

and, conditioned on the latent positions, the presence or absence of the edges in the graph are independent Bernoulli random variables where the probablity of an edge between any two vertices with latent positions

and is given by for some symmetric function . A random dot product graph on vertices is also, when viewed as an induced subgraph of an infinite graph, an exchangeable random graph [17]. Random dot product graphs are related to stochastic block model graphs [20] and degree-corrected stochastic block model graphs [21]; for example, a stochastic blockmodel graph on blocks with a positive semidefinite block probability matrix corresponds to a random dot product graph where is a mixture of point masses.For a given matrix with non-negative entries, denote by the normalized Laplacian of defined as

(2.4) |

where, given , is the diagonal matrix whose diagonal entries are the ’s. Our definition of the normalized Laplacian is slightly different from that often found in the literature, e.g., in [14, 35] the normalized Laplacian is . For the purpose of this paper, namely the notion of the Laplacian spectral embedding via the eigenvalues and eigenvectors of the normalized Laplacian, these two definitions of the normalized Laplacian are equivalent. We shall henceforth refer to as the Laplacian of , in contrast to the combinatorial Laplacian of . See [29] for a survey of the combinatorial Laplacian and its connection to graph theory.

###### Definition 2 (Adjacency and Laplacian spectral embedding).

Let be a adjacency matrix. Suppose the eigendecomposition of is given by where are the eigenvalues and are the corresponding orthonormal eigenvectors. Given a positive integer , denote by the diagonal matrix whose diagonal entries are the , and denote by the matrix whose columns are the corresponding eigenvectors . The adjacency spectral embedding (ASE) of into is then the matrix . Similarly, let denote the normalized Laplacian of and suppose the eigendecomposition of is given by where are the eigenvalues and are the corresponding orthonormal eigenvectors. Then given a positive integer , denote by the diagonal matrix whose diagonal entries are the and denote by the matrix whose columns are the eigenvectors . The Laplacian spectral embedding of into is then the matrix .

###### Remark.

Let with sparsity factor and suppose that the matrix is of full-rank where . The matrix , the adjacency spectral embedding of into

, can then be viewed as a consistent estimate of

. See [38] for a comprehensive overview of the consistency results and their implications for subsequent inference. On the other hand, as for any constant , the matrix – the normalized Laplacian embedding of into – can be viewed as a consistent estimate of which does not depend on the sparsity factor . This is in contrast to the adjacency spectral embedding. For previous consistency results of as an estimator for in various random graphs models, the reader is referred to [34, 33, 42] among others. However, to the best of our knowledge, Theorem 3.2 – namely the distributional convergence of to a mixture of multivariate normals in the context of random dot product graphs and stochastic blockmodel graphs – had not been established prior to this paper. Finally, we remark that and are estimating quantities that, while closely related – and are one-to-one transformations of each other – are in essence distinct “parametrizations” of random dot product graphs. It is therefore not entirely straightforward to facilitate a direct comparison of the “efficiency” of and as estimators. This thus motivates our consideration of the -divergences between the multivariate normals since the family of -divergences satisfy the information processing lemma and are invariant with respect to invertible transformations.###### Remark.

For simplicity we shall assume henceforth that either for all , or that with . We note that for our purpose, namely the distributional limit results in Section 2.1 and Section 3, the assumption that for all is equivalent to the assumption that there exists a constant such that . The assumption that is so that we can apply the concentration inequalties from [25] to show concentration, in spectral norm, of and around and , respectively.

### 2.1 Limit results for the adjacency spectral embedding

We now recall several limit results for . These results are restatements of earlier results from [4] and [40]. Theorem 2.2 as stated below is a slight generalization of Theorem 1 in [4]; the result in [4] assumed a more restrictive distinct eigenvalues assumption for the matrix where . We shall assume throughout this paper that , the rank of where , is fixed and known a priori.

###### Remark.

For ease of exposition, many of the bounds in this paper are said to hold “with high probability”. We say that a random variable is if, for any positive constant there exists a and a constant (both of which possibly depend on ) such that for all , with probability at least ; in addition, we say that a random variable is if for any positive constant and any there exists a such that for all , with probability at least . Similarly, when

is a random vector in

or a random matrix in

, or if or , respectively. Here denotes the Euclidean norm of when is a vector and the spectral norm of when is a matrix. We write or if or , respectively.###### Theorem 2.1.

Let with sparsity factor . Then there exists a orthogonal matrix and a matrix such that

(2.5) |

Furthermore, . Let and . If for all , then there exists a sequence of orthogonal matrices such that

(2.6) |

If, however, and , then

(2.7) |

###### Theorem 2.2.

Assume the setting and notations of Theorem 2.1. Denote by the -th row of . Let

denote the cumulative distribution function for the multivariate normal, with mean zero and covariance matrix

, evaluated at . Also denote by the matrixIf for all , then there exists a sequence of orthogonal matrices such that for each fixed index and any ,

(2.8) |

That is, the sequence converges in distribution to a mixture of multivariate normals. We denote this mixture by . If, however, and then there exists a sequence of orthogonal matrices such that

(2.9) |

where .

An important corollary of Theorem 2.2 is the following result for when is a mixture of point masses, i.e., is a -block stochastic blockmodel graph. Then for any fixed index , the event that is assigned to block has non-zero probabilty and hence one can conditioned on the block assignment of to show that the conditional distribution of converges to a multivariate normal. This is in contrast to the unconditional distribution being a mixture of multivariate normals as in Eq. (2.8) and Eq. (2.9).

###### Corollary 2.3.

Assume the setting and notations of Theorem 2.1 and let

be a mixture of point masses in where is the Dirac delta measure at . Then if , there exists a sequence of orthogonal matrices such that for any fixed index ,

(2.10) |

where is as defined in Eq. (2.8). If and as , then the sequence of orthogonal matrices satisfies

(2.11) |

where is as defined in Eq. (2.9).

## 3 Limit results for Laplacian spectral embedding

We now present the main technical results of this paper, namely analogues of the limit results in Section 2.1 for the Laplacian spectral embedding.

###### Theorem 3.1.

Let for be a sequence of random dot product graphs with sparsity factors . Denote by and the diagonal matrices and , respectively, i.e., the diagonal entries of are the vertex degrees of and the diagonal entries of are the expected vertex degrees. Let . Then for any , there exists a orthogonal matrix and a matrix such that satisfies

(3.1) |

Furthermore, , i.e., as . Define the following quantities

(3.2) | |||

(3.3) |

If then the sequence of orthogonal matrices satisfies

(3.4) |

where the expectation in Eq. (3.4) is taken with respect to and being i.i.d drawn according to . Equivalently,

If and then the sequence satisfies

(3.5) |

As a companion of Theorem 3.1, we have the following result on the asymptotic normality of the rows of .

###### Theorem 3.2.

Assume the setting and notations of Theorem 3.1. Denote by and the -th row of and , respectively. Also denote by the matrix

(3.6) |

If then there exists a sequence of orthogonal matrices such that for each fixed index and any ,

(3.7) |

That is, the sequence converges in distribution to a mixture of multivariate normals. We denote this mixture by . If and then there exists a sequence of orthogonal matrices such that

(3.8) |

where is defined by

(3.9) |

The proofs of Theorem 3.1 and Theorem 3.2 are given in Section B. We end this section by stating the conditional distribution of when is a -block stochastic blockmodel graph.

###### Corollary 3.3.

Assume the setting and notations of Theorem 3.1 and let

be a mixture of point masses in . Then if , there exists a sequence of orthogonal matrices such that for any fixed index ,

(3.10) |

where is as defined in Eq. (3.6) and for denote the number of vertices in that are assigned to block . If instead and as then the sequence of orthogonal matrices satisfies

(3.11) |

where is as defined in Eq. (3.9).

###### Remark.

As a special case of Corollary 3.3, we have that if is an Erdős-Rényi graph on vertices with edge probability – which corresponds to a random dot product graph where the latent positions are identically – then for each fixed index , the normalized Laplacian embedding satisfies

while the adjacency spectral embedding satisfies

As another example, if is a stochastic blockmodel graph with block probabilities matrix and block assignment probabilities – which corresponds to a random dot product graph where the latent positions are either with probability or with probability – then for each fixed index , the normalized Laplacian embedding satisfies

(3.12) | |||

(3.13) |

where and are the number of vertices of with latent positions and . The adjacency spectral embedding meanwhile satisfies

(3.14) | |||

(3.15) |

###### Remark.

We note that the quantity appears in Eq. (3.7) and Eq. (3.8). Replacing by in Eq. (3.7) and Eq. (3.8) is, however, not straightforward. For example, for the two-block stochastic blockmodel considered in Eq. (3.12), letting we have

By the strong law of large numbers and Slutsky’s theorem, we have

We note that, as the are assumed to be random variables, i.e., we are not conditioning on the block sizes, by the central limit theorem we have

Therefore, by Slutsky’s theorem, we have

To replace by in Eq. (3.7) and Eq. (3.8), we thus need to include the random term . While we surmise that Eq. (3.7) and Eq. (3.8) can be adapt to account for this randomness in , we shall not do so in this paper.

### 3.1 Proofs sketch for Theorem 3.1 and Theorem 3.2

We present in this subsection a sketch of the main ideas in the proofs of Theorem 3.1 and Theorem 3.2; the detailed proofs are given in Section B of the appendix. We start with the motivation behind Eq. (3.1). Given , the entries of the right hand side of Eq. (3.1), except for the term , can be expressed explicitly in terms of linear combinations of the entries of . This is in contrast with the left hand side of Eq. (3.1) which depends on the quantities and (recall Definition 2); since the quantities and cannot be express explicitly in terms of the entries of and , we conclude that the right hand side of Eq. (3.1) is simpler to analyze. From Eq. (3.1), the squared Frobenius norm is

Then conditional on , the above expression is, up to the term of order , a function of the independent random variables . We can then apply concentration inequalities such as those in [9] to show that the squared Frobenius norm is, conditional on , concentrated around its expectation. Here the expectation is taken with respect to the random entries of . Eq. (3.4) and Eq. (3.5) then follows by direct evaluation of this expectation, for the case when and for when , respectively.

Once Eq. (3.1) is established, we can derive Theorem 3.2 as follows. Let denotes the -th row of and let denotes the -th row of . Eq. (3.1) then implies

We then show that . Indeed, there are rows in and ; hence, on average, for each index , . Furthermore, as . Finally, which, as we show in Section B, converges to as . We therefore have, after additional manipulations, that

Comments

There are no comments yet.