Typos fixed
This commit is contained in:
parent
6d508d53fc
commit
cceff91731
|
@ -148,14 +148,14 @@ Finally, we conclude our paper in \Cref{sec:summary}.
|
||||||
\section{Related Work}
|
\section{Related Work}
|
||||||
\label{sec:relwork}
|
\label{sec:relwork}
|
||||||
|
|
||||||
Related work can be classified into: distance measures, analysis of HPC application performance, inter-comparison of jobs in HPC, and IO-specific tools.
|
Related work can be classified into distance measures, analysis of HPC application performance, inter-comparison of jobs in HPC, and IO-specific tools.
|
||||||
|
|
||||||
%% DISTANCE MEASURES
|
%% DISTANCE MEASURES
|
||||||
The ranking of similar jobs performed in this article is related to clustering strategies.
|
The ranking of similar jobs performed in this article is related to clustering strategies.
|
||||||
The comparison of the time series using various metrics has been extensively investigated.
|
The comparison of the time series using various metrics has been extensively investigated.
|
||||||
In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for clustering of multivariate time series is performed.
|
In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for the clustering of multivariate time series is performed.
|
||||||
14 similarity measures are applied to 23 data sets.
|
14 similarity measures are applied to 23 data sets.
|
||||||
It shows that no similarity measure produces statistical significant better results than another.
|
It shows that no similarity measure produces statistically significant better results than another.
|
||||||
However, the Swale scoring model \cite{morse2007efficient} produced the most disjoint clusters.
|
However, the Swale scoring model \cite{morse2007efficient} produced the most disjoint clusters.
|
||||||
In this model, gaps imply a cost.
|
In this model, gaps imply a cost.
|
||||||
Levenshtein distance is often referred to as Edit Distance (ED) \cite{navarro2001guided}.
|
Levenshtein distance is often referred to as Edit Distance (ED) \cite{navarro2001guided}.
|
||||||
|
@ -167,24 +167,25 @@ Monitoring systems that record statistics about hardware usage are widely deploy
|
||||||
There are various tools for analyzing the IO behavior of an application \cite{TFAPIKBBCF19}.
|
There are various tools for analyzing the IO behavior of an application \cite{TFAPIKBBCF19}.
|
||||||
|
|
||||||
% time series analysis for inter-comparison of processes or jobs in HPC
|
% time series analysis for inter-comparison of processes or jobs in HPC
|
||||||
For Vampir, a popular tool for trace file analysis, in \cite{weber2017visual} the Comparison View is introduced that allows to manually compare traces of application runs, e.g., to compare optimized with original code.
|
For Vampir, a popular tool for trace file analysis, in \cite{weber2017visual} the Comparison View is introduced that allows them to manually compare traces of application runs, e.g., to compare optimized with original code.
|
||||||
Vampir generally supports the clustering of process timelines of a single job allowing to focus on relevant code sections and processes when investigating large number of processes.
|
Vampir generally supports the clustering of process timelines of a single job allowing to focus on relevant code sections and processes when investigating a large number of processes.
|
||||||
|
|
||||||
Chameleon \cite{bahmani2018chameleon} extends ScalaTrace for recording MPI traces but reduces the overhead by clustering processes and collecting information from one representative of each cluster.
|
Chameleon \cite{bahmani2018chameleon} extends ScalaTrace for recording MPI traces but reduces the overhead by clustering processes and collecting information from one representative of each cluster.
|
||||||
For the clustering, a signature is created for each process that includes the call-graph.
|
For the clustering, a signature is created for each process that includes the call-graph.
|
||||||
In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectivity of the approach.
|
In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectivity of the approach.
|
||||||
|
|
||||||
In \cite{rodrigo2018towards}, a characterization of the NERSC workload is performed based on job scheduler information (profiles).
|
In \cite{rodrigo2018towards}, a characterization of the NERSC workload is performed based on job scheduler information (profiles).
|
||||||
Profiles that include the MPI activities have shown effective to identify the code that is executed \cite{demasi2013identifying}.
|
Profiles that include the MPI activities have shown effective to identify the code that is executed \cite{demasi2013identifying}.
|
||||||
Many approaches for clustering applications operate on profiles for compute, network, and IO \cite{emeras2015evalix,liu2020characterization,bang2020hpc}.
|
Many approaches for clustering applications operate on profiles for compute, network, and IO \cite{emeras2015evalix,liu2020characterization,bang2020hpc}.
|
||||||
For example, Evalix \cite{emeras2015evalix} monitors system statistics (from proc) in 1 minute intervals but for the analysis they are converted to a profile removing the time dimension, i.e., compute the average CPU, memory, and IO over the job runtime.
|
For example, Evalix \cite{emeras2015evalix} monitors system statistics (from proc) in 1-minute intervals but for the analysis, they are converted to a profile removing the time dimension, i.e., compute the average CPU, memory, and IO over the job runtime.
|
||||||
|
|
||||||
% IO-specific tools
|
% IO-specific tools
|
||||||
PAS2P \cite{mendez2012new} extracts the IO patterns from application traces and then allows users to manually compare them.
|
PAS2P \cite{mendez2012new} extracts the IO patterns from application traces and then allows users to manually compare them.
|
||||||
In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- similar to Fourier analysis.
|
In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- similar to Fourier analysis.
|
||||||
The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns which stress the file system.
|
The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns that stress the file system.
|
||||||
|
|
||||||
|
In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center-wide deployed monitoring system.
|
||||||
|
|
||||||
In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center wide deployed monitoring system.
|
|
||||||
|
|
||||||
\section{Methodology}
|
\section{Methodology}
|
||||||
\label{sec:methodology}
|
\label{sec:methodology}
|
||||||
|
|
Loading…
Reference in New Issue