Typos fixed

2020-10-23 18:18:37 +01:00 · 2020-10-23 18:18:37 +01:00 · cceff91731
commit cceff91731
parent 6d508d53fc
1 changed files with 10 additions and 9 deletions
--- a/paper/main.tex
+++ b/paper/main.tex
@ -148,14 +148,14 @@ Finally, we conclude our paper in \Cref{sec:summary}.
 \section{Related Work}
 \label{sec:relwork}

-Related work can be classified into: distance measures, analysis of HPC application performance, inter-comparison of jobs in HPC, and IO-specific tools.
+Related work can be classified into distance measures, analysis of HPC application performance, inter-comparison of jobs in HPC, and IO-specific tools.

 %% DISTANCE MEASURES
 The ranking of similar jobs performed in this article is related to clustering strategies.
 The comparison of the time series using various metrics has been extensively investigated.
-In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for clustering of multivariate time series is performed.
+In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for the clustering of multivariate time series is performed.
 14 similarity measures are applied to 23 data sets.
-It shows that no similarity measure produces statistical significant better results than another.
+It shows that no similarity measure produces statistically significant better results than another.
 However, the Swale scoring model \cite{morse2007efficient} produced the most disjoint clusters.
 In this model, gaps imply a cost.
 Levenshtein distance is often referred to as Edit Distance (ED) \cite{navarro2001guided}.
@ -167,24 +167,25 @@ Monitoring systems that record statistics about hardware usage are widely deploy
 There are various tools for analyzing the IO behavior of an application \cite{TFAPIKBBCF19}.

 % time series analysis for inter-comparison of processes or jobs in HPC
-For Vampir, a popular tool for trace file analysis, in \cite{weber2017visual} the Comparison View is introduced that allows to manually compare traces of application runs, e.g., to compare optimized  with original code.
-Vampir generally supports the clustering of process timelines of a single job allowing to focus on relevant code sections and processes when investigating large number of processes.
+For Vampir, a popular tool for trace file analysis, in \cite{weber2017visual} the Comparison View is introduced that allows them to manually compare traces of application runs, e.g., to compare optimized with original code.
+Vampir generally supports the clustering of process timelines of a single job allowing to focus on relevant code sections and processes when investigating a large number of processes.

 Chameleon \cite{bahmani2018chameleon} extends ScalaTrace for recording MPI traces but reduces the overhead by clustering processes and collecting information from one representative of each cluster.
 For the clustering, a signature is created for each process that includes the call-graph.
-In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general  effectivity of the approach.
+In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectivity of the approach.

 In \cite{rodrigo2018towards}, a characterization of the NERSC workload is performed based on job scheduler information (profiles).
 Profiles that include the MPI activities have shown effective to identify the code that is executed \cite{demasi2013identifying}.
 Many approaches for clustering applications operate on profiles for compute, network, and IO \cite{emeras2015evalix,liu2020characterization,bang2020hpc}.
-For example, Evalix \cite{emeras2015evalix} monitors system statistics (from proc) in 1 minute intervals but for the analysis they are converted to a profile removing the time dimension, i.e., compute the average CPU, memory, and IO over the job runtime.
+For example, Evalix \cite{emeras2015evalix} monitors system statistics (from proc) in 1-minute intervals but for the analysis, they are converted to a profile removing the time dimension, i.e., compute the average CPU, memory, and IO over the job runtime.

 % IO-specific tools
 PAS2P \cite{mendez2012new} extracts the IO patterns from application traces and then allows users to manually compare them.
 In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- similar to Fourier analysis.
-The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns which stress the file system.
+The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns that stress the file system.
+
+In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center-wide deployed monitoring system.

-In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center wide deployed monitoring system.

 \section{Methodology}
 \label{sec:methodology}