TY - GEN
T1 - A Scalable Method for Exact Sampling from Kronecker Family Models
AU - Moreno, Sebastian
AU - Pfeiffer, Joseph J.
AU - Neville, Jennifer
AU - Kirshner, Sergey
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/1/1
Y1 - 2014/1/1
N2 - The recent interest in modeling complex networks has fueled the development of generative graph models, such as Kronecker Product Graph Model (KPGM) and mixed KPGM (mKPGM). The Kronecker family of models are appealing because of their elegant fractal structure, as well as their ability to capture important network characteristics such as degree, diameter, and (in the case of mKPGM) clustering and population variance. In addition, scalable sampling algorithms for KPGMs made the analysis of large-scale, sparse networks feasible for the first time. In this work, we show that the scalable sampling methods, in contrast to prior belief, do not in fact sample from the underlying KPGM distribution and often result in sampling graphs that are very unlikely. To address this issue, we develop a new representation that exploits the structure of Kronecker models and facilitates the development of novel grouped sampling methods that are provably correct. In this paper, we outline efficient algorithms to sample from mKPGMs and KPGMs based on these ideas. Notably, our mKPGM algorithm is the first available scalable sampling method for this model and our KPGM algorithm is both faster and more accurate than previous scalable methods. We conduct both theoretical analysis and empirical evaluation to demonstrate the strengths of our algorithms and show that we can sample a network with 75 million edges in 87 seconds on a single processor.
AB - The recent interest in modeling complex networks has fueled the development of generative graph models, such as Kronecker Product Graph Model (KPGM) and mixed KPGM (mKPGM). The Kronecker family of models are appealing because of their elegant fractal structure, as well as their ability to capture important network characteristics such as degree, diameter, and (in the case of mKPGM) clustering and population variance. In addition, scalable sampling algorithms for KPGMs made the analysis of large-scale, sparse networks feasible for the first time. In this work, we show that the scalable sampling methods, in contrast to prior belief, do not in fact sample from the underlying KPGM distribution and often result in sampling graphs that are very unlikely. To address this issue, we develop a new representation that exploits the structure of Kronecker models and facilitates the development of novel grouped sampling methods that are provably correct. In this paper, we outline efficient algorithms to sample from mKPGMs and KPGMs based on these ideas. Notably, our mKPGM algorithm is the first available scalable sampling method for this model and our KPGM algorithm is both faster and more accurate than previous scalable methods. We conduct both theoretical analysis and empirical evaluation to demonstrate the strengths of our algorithms and show that we can sample a network with 75 million edges in 87 seconds on a single processor.
UR - http://www.scopus.com/inward/record.url?scp=84936945189&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2014.148
DO - 10.1109/ICDM.2014.148
M3 - Conference contribution
AN - SCOPUS:84936945189
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 440
EP - 449
BT - Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014
A2 - Kumar, Ravi
A2 - Toivonen, Hannu
A2 - Pei, Jian
A2 - Zhexue Huang, Joshua
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Data Mining, ICDM 2014
Y2 - 14 December 2014 through 17 December 2014
ER -