#### Eighth International Workshop on ## High Performance Big Graph Data ## Management, Analysis, and Mining **December 16, 2021** To be held in conjunction with the [2021 IEEE International Conference on Big Data **(IEEE BigData 2021)**](http://bigdataieee.org/BigData2021/) ### Workshop Description Modern Big Data increasingly appears in the form of complex graphs and networks. Examples include the physical Internet, the world wide web, online social networks, phone networks, and biological networks. In addition to their massive sizes, these graphs are dynamic, noisy, and sometimes transient. They also conform to all five Vs (Volume, Velocity, Variety, Value and Veracity) that define Big Data. However, many graph-related problems are computationally difficult, and thus big graph data brings unique challenges, as well as numerous opportunities for researchers, to solve various problems that are significant to our communities. Big graph problems are currently solved using several complementary paradigms. The most popular approach is perhaps by exploiting parallelism, through specialized algorithms for supercomputers, shared-memory multicore and manycore systems, and heterogeneous CPU-GPU systems. However, since real-world graphs are sparse and highly irregular, there are very few parallel implementations that can actually deliver high performance. The major challenges to scaling and efficiency include irregular data dependencies, poor locality, and high synchronization costs of current approaches. In addition to parallelism, researchers are developing approximation algorithms that use sampling for compressing and summarizing graph data. Streaming algorithms are also being considered for scenarios where the rate of updates is too fast to process the entire graph in a single pass. Further, out-of-core algorithms are necessary for massive graphs that do not fit in the main memory of a typical system. Researchers can use graph-based solutions for solving problems from many diverse disciplines, including routing and transportation, social networks, bioinformatics, computational science, health care, security and intelligence analysis. This workshop aims to bring together researchers from different paradigms solving big graph problems under a unified platform for sharing their work and exchanging ideas. We are soliciting novel and original research contributions related to big graph data management, analysis, and mining (algorithms, software systems, applications, best practices, performance). Significant work-in-progress papers are also encouraged. Papers can be from any of the following areas, including but not limited to: * Graph embeddings and representation learning for graph data * Graph neural networks * Deep Learning-based models for learning on graph data * Extreme-scale computing for large tensor, network, and graph problems * Parallel algorithms for big graph analysis on HPC systems * Heterogeneous CPU-GPU solutions to solve big graph problems * Sampling and summarization of large graphs * Graph algorithms for large-scale scientific computing problems * Graph clustering, partitioning, and classification methods * Scalable graph topology measurement: diameter approximation, eigenvalues, triangle and graphlet counting * Parallel algorithms for computing graph kernels * Inference on large graph data * Graph evolution and dynamic graph models * Graph streams * Computational methods for visualization of large-scale graphs * Graph databases, novel querying and indexing strategies for RDF data * Novel applications of big graph problems in bioinformatics, health care, security, and social networks * New software systems and runtime systems for big graph data mining **Regular paper** submissions must be at most 10 pages long, including all figures, tables, and references. They must be formatted according to the paper submission formatting guidelines provided in the [IEEE BigData 2021 Call for Papers](http://bigdataieee.org/BigData2021/CallPapers.html). Additionally, we encourage **short paper** submissions (at most 6 pages) describing new work in progress. ### Past Workshops [BigGraphs 2014](workshop2014.html) [BigGraphs 2015](workshop2015.html) [BigGraphs 2016](workshop2016.html) [BigGraphs 2017](workshop2017.html) [BigGraphs 2018](workshop2018.html) [BigGraphs 2019](workshop2019.html) [BigGraphs 2020](workshop2020.html) ### Important Dates * Oct 1, 2021 (11.59 pm Anywhere on Earth time): [Submission](https://wi-lab.com/cyberchair/2021/bigdata21/index.php) deadline * Nov 3, 2021: Notification of paper acceptance to authors * Nov 15, 2021: Camera-ready submissions due * Dec 16, 2021: Workshop to be held virtually ### Keynote **[David A. Bader](https://davidbader.net)** ![David Bader](bader.jpg "David Bader") Distinguished Professor and Director of the Institute for Data Science New Jersey Institute of Technology **Solving Global Grand Challenges with High Performance Data Analytics** *Abstract*: Data science aims to solve grand global challenges such as: detecting and preventing disease in human populations; revealing community structure in large social networks; protecting our elections from cyber-threats, and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and architectures, and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data science for applications in social sciences, physical sciences, and engineering. *Bio*: David A. Bader is a Distinguished Professor and founder of the Department of Data Science and inaugural Director of the Institute for Data Science at New Jersey Institute of Technology. Prior to this, he served as founding Professor and Chair of the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology. Dr. Bader is a Fellow of the IEEE, AAAS, and SIAM, a recipient of the IEEE Sidney Fernbach Award, and advises the White House, most recently on the National Strategic Computing Initiative (NSCI) and Future Advanced Computing Ecosystem (FACE). Bader is a leading expert in solving global grand challenges in science, engineering, computing, and data science. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 300 scholarly papers and has best paper awards from ISC, IEEE HPEC, and IEEE/ACM SC. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM, Ubiquitous High Performance Computing (UHPC) with NVIDIA, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), Hierarchical Identify Verify Exploit (HIVE), and Software-Defined Hardware (SDH). Recently, Bader received an NVIDIA AI Lab (NVAIL) award, and a Facebook Research AI Hardware/Software Co-Design award. Dr. Bader is Editor-in-Chief of the ACM Transactions on Parallel Computing, and General Co-Chair of IPDPS 2021, and previously served as Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He serves on the leadership team of Northeast Big Data Innovation Hub as the inaugural chair of the Seed Fund Steering Committee. In 2021, ROI-NJ recognized Bader on its inaugural list of technology influencers, and in 2012, Bader was the inaugural recipient of University of Maryland’s Electrical and Computer Engineering Distinguished Alumni Award. In 2014, Bader received the Outstanding Senior Faculty Research Award from Georgia Tech. Bader has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor and Director of an NVIDIA GPU Center of Excellence. In 1998, Bader built the first Linux supercomputer that led to a high-performance computing (HPC) revolution. He is a cofounder of the Graph500 List for benchmarking “Big Data” computing platforms. He is recognized as a “RockStar” of High Performance Computing by InsideHPC and as HPCwire’s People to Watch in 2012 and 2014. ### Workshop Program December 16, 2021 9:15 am -- 12:25 pm U.S. Eastern Standard Time Location: IEEE BigData 2021 virtual platform. 20-minute presentations (15-minute talk video will be broadcast and 5 minutes for live Q&A) 9:15 am     Opening remarks Workshop organizers 9:20 am     Building Graphs at a Large Scale: Union Find Shuffle Sai Gopal Thota, Mridul Jain, Sai Kiran Reddy Malikireddy, Pruthvi Raj Eranti, Albin Kuruvilla, and Nishad Kamat 9:45 am     Event-based Product Carousel Recommendation with Query-Click Graph Luyi Ma, Nimesh Sinha, Parth Vajge, Jason H.D. Cho, Sushant Kumar, and Kannan Achan 10:00 am    Coffee break 10:15 am    Solving Global Grand Challenges with High Performance Data Analytics (keynote) David A. Bader 11:00 am    A Pre-training Oracle for Predicting Distances in Social Networks Gunjan Mahindre, Rasika Karkare, Randy Paffenroth, and Anura Jayasumana 11:20 am    HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks Avishek Bose, Huichen Yang, William Hsu, and Daniel Andresen 11:40 am    Correlation and pattern detection in event networks Valerio Bellandi, Paolo Ceravolo, Samira Maghool, Margherita Pindaro, and Stefano Siccardi 12:00 pm    Drug Abuse Detection in Twitter-sphere: Graph-Based Approach Khaled Mohammed Saifuddin, Muhammad Ifte khairul Islam, and Esra Akbas 12:25 pm    Closing remarks Workshop organizers ### Workshop Organizers [Nesreen Ahmed](http://nesreenahmed.com/) ![Nesreen Ahmed](ahmed.jpg "Nesreen Ahmed") Intel Labs Santa Clara, CA 95054 [Mohammad Al Hasan](http://cs.iupui.edu/~alhasan/) ![Mohammad Al Hasan](hasan.jpg "Mohammad Al Hasan") Department of Computer and Information Science Indiana University - Purdue University Indianapolis, IN 46202 [Shaikh Arifuzzaman](https://www.cs.uno.edu/~arif/) ![Shaikh Arifuzzaman](arif.jpg "Shaikh Arifuzzaman") Department of Computer Science The University of New Orleans New Orleans, LA 70148 [Kamesh Madduri](https://madduri.org/) ![Kamesh Madduri](madduri.jpg "Kamesh Madduri") Department of Computer Science and Engineering The Pennsylvania State University University Park, PA 16802 ### Contact Please send email to one of the workshop organizers.