Reformeded related work.

This commit is contained in:
Julian M. Kunkel 2020-10-23 18:14:52 +01:00
parent a0a2ebabf7
commit 6d508d53fc
1 changed files with 12 additions and 13 deletions

View File

@ -148,8 +148,9 @@ Finally, we conclude our paper in \Cref{sec:summary}.
\section{Related Work}
\label{sec:relwork}
Related work can be classified into: distance measures, time series analysis of HPC applications, and IO monitoring tools.
Related work can be classified into: distance measures, analysis of HPC application performance, inter-comparison of jobs in HPC, and IO-specific tools.
%% DISTANCE MEASURES
The ranking of similar jobs performed in this article is related to clustering strategies.
The comparison of the time series using various metrics has been extensively investigated.
In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for clustering of multivariate time series is performed.
@ -160,32 +161,30 @@ In this model, gaps imply a cost.
Levenshtein distance is often referred to as Edit Distance (ED) \cite{navarro2001guided}.
% Lock-Step Measures and Elastic Measures
Monitoring systems that record statistics about hardware usage are widely used in HPC.
In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering showing the general effectivity of the approach.
% Analysis of HPC application performance
The performance of applications can be analyzed using one of many tracing tools such as Vampir \cite{weber2017visual} that record the behavior of an application explicitly or implicitly by collecting information about the resource usage with a monitoring system.
Monitoring systems that record statistics about hardware usage are widely deployed in data centers to record system utilization by applications.
There are various tools for analyzing the IO behavior of an application \cite{TFAPIKBBCF19}.
Comparison of applications by extracting the IO patterns from application traces.
With PAS2P \cite{mendez2012new}...
% time series analysis for inter-comparison of processes or jobs in HPC
For Vampir, a popular tool for trace file analysis, in \cite{weber2017visual} the Comparison View is introduced that allows to manually compare traces of application runs, e.g., to compare optimized with original code.
Vampir generally supports the clustering of process timelines of a single job allowing to focus on relevant code sections and processes when investigating large number of processes.
Chameleon \cite{bahmani2018chameleon} extends ScalaTrace for recording MPI traces but reduces the overhead by clustering processes and collecting information from one representative of each cluster.
For the clustering, a signature is created for each process that includes the call-graph.
Characterization of jobs
In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectivity of the approach.
In \cite{rodrigo2018towards}, a characterization of the NERSC workload is performed based on job scheduler information (profiles).
Profiles that include the MPI activities have shown effective to identify the code that is executed \cite{demasi2013identifying}.
Approaches for clustering HPC applications typically operate on profiles for compute, network, and IO \cite{emeras2015evalix,liu2020characterization,bang2020hpc}.
Many approaches for clustering applications operate on profiles for compute, network, and IO \cite{emeras2015evalix,liu2020characterization,bang2020hpc}.
For example, Evalix \cite{emeras2015evalix} monitors system statistics (from proc) in 1 minute intervals but for the analysis they are converted to a profile removing the time dimension, i.e., compute the average CPU, memory, and IO over the job runtime.
% IO-specific tools
PAS2P \cite{mendez2012new} extracts the IO patterns from application traces and then allows users to manually compare them.
In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- similar to Fourier analysis.
The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns which stress the file system.
In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- there is a considerable similarity to fourier analysis.
In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center wide deployed monitoring system.
\section{Methodology}
\label{sec:methodology}