Scalable Comparative Visualization of Ensembles of Call Graphs
2020
Suraj P. Kesavan, Harsh Bhatia,
Abhinav Bhatele, Todd Gamblin, Peer-Timo Bremer, and Kwan-Liu Ma.
Optimizing the performance of large-scale parallel codes is critical for
efficient utilization of computing resources. Code developers often explore
various execution parameters, such as hardware configurations, system software
choices, and application parameters, and are interested in detecting and
understanding bottlenecks in different executions. They often collect
hierarchical performance profiles represented as call graphs, which combine
performance metrics with their execution contexts. The crucial task of exploring
multiple call graphs together is tedious and challenging because of the many
structural differences in the execution contexts and significant variability
in the collected performance metrics (e.g., execution runtime). In this paper,
we present an enhanced version of CallFlow to support the exploration of
ensembles of call graphs using new types of visualizations, analysis, graph
operations, and features. We introduce ensemble-Sankey, a new visual
design that combines the strengths of resource-flow (Sankey) and box-plot
visualization techniques. Whereas the resource-flow visualization can easily
and intuitively describe the graphical nature of the call graph, the box
plots overlaid on the nodes of Sankey convey the performance variability
within the ensemble. Our interactive visual interface provides linked views
to help explore ensembles of call graphs, e.g., by facilitating the analysis
of structural differences, and identifying similar or distinct call graphs.
We demonstrate the effectiveness and usefulness of our design through case
studies on large-scale parallel codes.
A Visual Analytics Framework for Reviewing Streaming Performance Data.
2020
Suraj P. Kesavan, Takanori Fujiwara,
Jianping Kelvin Li, Caitlin Ross, Misbah Mubarak, Christopher D.Carothers,
Robert B.Ross, and Kwan-Liu Ma.
In Proceedings of IEEE Pacific Visualization Symposium
(PacificVis), forthcoming.
Understanding and tuning the performance of
extreme-scale parallel computing systems
demands a streaming approach due to the
computational cost of applying offline
algorithms to vast amounts of performance
log data. Analyzing large streaming data is
challenging because the rate of receiving
data and limited time to comprehend data
make it difficult for the analysts to
sufficiently examine the data without
missing important changes or patterns. To
support streaming data analysis, we
introduce a visual analytic framework
comprising of three modules: data
management, analysis, and interactive
visualization. The data management module
collects various computing and communication
performance metrics from the monitored
system using streaming data processing
techniques and feeds the data to the other
two modules. The analysis module
automatically identifies important changes
and patterns at the required latency. In
particular, we introduce a set of online and
progressive analysis methods for not only
controlling the computational costs but also
helping analysts better follow the critical
aspects of the analysis results. Finally,
the interactive visualization module
provides the analysts with a coherent view
of the changes and patterns in the
continuously captured performance data.
Through a multi-faceted case study on
performance analysis of parallel
discrete-event simulation, we demonstrate
the effectiveness of our framework for
identifying bottlenecks and locating
outliers.
Visualizing Hierarchical Performance Profiles of Parallel Codes using CallFlow.
2019
Huu Tan Pham Nguyen, Abhinav Bhatele, Nikhil Jain,
Suraj P. Kesavan, Harsh Bhatia,
Todd Gamblin, Kwan-Liu Ma and, Peer-Timo Bremer.
IEEE Transactions on Visualization and Computer Graphics,
2019.
Calling context trees (CCTs) couple
performance metrics with call paths, helping
understand the execution and performance of
parallel programs. To identify performance
bottlenecks, programmers and performance
analysts visually explore CCTs to form and
validate hypotheses regarding degraded
performance. However, due to the complexity
of parallel programs, existing visual
representations do not scale to applications
running on a large number of processors. We
present CALLFLOW, an interactive visual
analysis tool that provides a high-level
overview of CCTs together with semantic
refinement operations to progressively
explore the CCTs. Using a flow-based
metaphor, we visualize a CCT by treating
execution time as a resource spent during a
call chain, and demonstrate the
effectiveness of our design with case
studies on large-scale, production
simulation codes.
A Visual Analytics Framework for Analyzing Parallel and Distributed Computing
Applications.
2019
Jianping Kelvin Li, Takanori Fujiwara,
Suraj P. Kesavan,
,
Caitlin Ross, Misbah Mubarak, Christopher D. Carothers, Robert B. Ross, and Kwan-Liu
Ma.
In Proceedings of Symposium on
Visualization in Data
Science (VDS)
, 2019.
To optimize the performance and efficiency of HPC applications, programmers and
analysts often need to collect various performance metrics for each computer at
different time points as well as the communication data between the computers.
This results in a complex dataset that consists of multivariate time-series and
communication network data, which makes debugging and performance tuning of HPC
applications challenging. Automated analytical methods based on statistical
analysis and unsupervised learning are often insufficient to support such tasks
without the background knowledge from the application programmers. To better
explore and analyze a wide spectrum of HPC datasets, effective visual data
analytics techniques are needed. In this paper, we present a visual analytics
framework for analyzing HPC datasets produced by parallel discrete-event
simulations (PDES). Our framework leverages automated time-series analysis methods
and effective visualizations to analyze both multivariate time-series and
communication network data. Through several case studies for analyzing the
performance of PDES, we show that our visual analytics techniques and system can
be effective in reasoning multiple performance metrics, temporal behaviors
of the simulation, and the communication patterns.