diff --git a/paper/main.tex b/paper/main.tex index e8663cb..0942b4b 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -148,14 +148,14 @@ Finally, we conclude our paper in \Cref{sec:summary}. \section{Related Work} \label{sec:relwork} -Related work can be classified into: distance measures, analysis of HPC application performance, inter-comparison of jobs in HPC, and IO-specific tools. +Related work can be classified into distance measures, analysis of HPC application performance, inter-comparison of jobs in HPC, and IO-specific tools. %% DISTANCE MEASURES The ranking of similar jobs performed in this article is related to clustering strategies. The comparison of the time series using various metrics has been extensively investigated. -In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for clustering of multivariate time series is performed. +In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for the clustering of multivariate time series is performed. 14 similarity measures are applied to 23 data sets. -It shows that no similarity measure produces statistical significant better results than another. +It shows that no similarity measure produces statistically significant better results than another. However, the Swale scoring model \cite{morse2007efficient} produced the most disjoint clusters. In this model, gaps imply a cost. Levenshtein distance is often referred to as Edit Distance (ED) \cite{navarro2001guided}. @@ -167,24 +167,25 @@ Monitoring systems that record statistics about hardware usage are widely deploy There are various tools for analyzing the IO behavior of an application \cite{TFAPIKBBCF19}. % time series analysis for inter-comparison of processes or jobs in HPC -For Vampir, a popular tool for trace file analysis, in \cite{weber2017visual} the Comparison View is introduced that allows to manually compare traces of application runs, e.g., to compare optimized with original code. -Vampir generally supports the clustering of process timelines of a single job allowing to focus on relevant code sections and processes when investigating large number of processes. +For Vampir, a popular tool for trace file analysis, in \cite{weber2017visual} the Comparison View is introduced that allows them to manually compare traces of application runs, e.g., to compare optimized with original code. +Vampir generally supports the clustering of process timelines of a single job allowing to focus on relevant code sections and processes when investigating a large number of processes. Chameleon \cite{bahmani2018chameleon} extends ScalaTrace for recording MPI traces but reduces the overhead by clustering processes and collecting information from one representative of each cluster. For the clustering, a signature is created for each process that includes the call-graph. -In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectivity of the approach. +In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectivity of the approach. In \cite{rodrigo2018towards}, a characterization of the NERSC workload is performed based on job scheduler information (profiles). Profiles that include the MPI activities have shown effective to identify the code that is executed \cite{demasi2013identifying}. Many approaches for clustering applications operate on profiles for compute, network, and IO \cite{emeras2015evalix,liu2020characterization,bang2020hpc}. -For example, Evalix \cite{emeras2015evalix} monitors system statistics (from proc) in 1 minute intervals but for the analysis they are converted to a profile removing the time dimension, i.e., compute the average CPU, memory, and IO over the job runtime. +For example, Evalix \cite{emeras2015evalix} monitors system statistics (from proc) in 1-minute intervals but for the analysis, they are converted to a profile removing the time dimension, i.e., compute the average CPU, memory, and IO over the job runtime. % IO-specific tools PAS2P \cite{mendez2012new} extracts the IO patterns from application traces and then allows users to manually compare them. In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- similar to Fourier analysis. -The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns which stress the file system. +The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns that stress the file system. + +In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center-wide deployed monitoring system. -In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center wide deployed monitoring system. \section{Methodology} \label{sec:methodology}