Kommentare
This commit is contained in:
parent
ced1384734
commit
78224a4b35
|
@ -12,6 +12,12 @@
|
|||
\usepackage{subcaption}
|
||||
\usepackage[backend=bibtex, style=numeric]{biblatex}
|
||||
\addbibresource{bibliography.bib}
|
||||
\usepackage{placeins}
|
||||
|
||||
\usepackage{todonotes}
|
||||
%\usepackage[disable]{todonotes}
|
||||
\newcommand{\eb}[1]{\todo[inline, color=green]{EB: #1}}
|
||||
\newcommand{\jk}[1]{\todo[inline]{JK: #1}}
|
||||
|
||||
|
||||
\usepackage{textcomp}
|
||||
|
@ -112,6 +118,7 @@ Related work can be classified into distance measures, analysis of HPC applicati
|
|||
%% DISTANCE MEASURES
|
||||
The ranking of similar jobs performed in this article is related to clustering strategies.
|
||||
Levenshtein (Edit) distance is a widely used distance metric indicating the number of edits needed to convert one string to another \cite{navarro2001guided}.
|
||||
\eb{Was heisst ``Edit''}
|
||||
The comparison of the time series using various metrics has been extensively investigated.
|
||||
In \cite{khotanlou2018empirical}, an empirical comparison of distance measures for the clustering of multivariate time series is performed.
|
||||
14 similarity measures are applied to 23 data sets.
|
||||
|
@ -156,12 +163,14 @@ On the Mistral supercomputer at DKRZ, the monitoring system \cite{betke20} gathe
|
|||
The results are 4D data (time, nodes, metrics, file system) per job.
|
||||
The distance measures should handle jobs of different lengths and node count.
|
||||
In the open-access article \cite{Eugen20HPS}\footnote{\scriptsize \url{https://zenodo.org/record/4478960/files/jhps-incubator-06-temporal-29-jan.pdf}}, we discussed a variety of options from 1D job-profiles to data reductions to compare time series data and the general workflow and pre-processing in detail.
|
||||
\eb{Doppelte Referenz (in der Fussleiste und im Literaturverzeichnis) sieht aus wie eine mathematische Gleichung.}
|
||||
We will be using this representation.
|
||||
In a nutshell, for each job executed on Mistral, they partitioned it into 10 minutes segments\footnote{We found in preliminary experiments that 10 minutes reduces noise, i.e., the variation of the statistics when re-running the same job.} and compute the arithmetic mean of each metric, categorize the value into NonIO (0), HighIO (1), and CriticalIO (4) for values below 99-percentile, up to 99.9-percentile, and above, respectively.
|
||||
\eb{Noise ist nicht ganz korrekt. Das Problem ist eher die Datenmenge, weil sie nicht leicht zu verarbeiten ist.}
|
||||
The values are chosen to be 0, 1, and 4 because we arithmetically derive metrics: naturally, the value of 0 will indicate that no I/O issue appears; we weight critical I/O to be 4x as important as high I/O.
|
||||
This strategy ensures that the same approach can be applied to other HPC systems regardless of the actual distribution of these statistics on that data center.
|
||||
After the mean value across nodes is computed for a segment, the resulting numeric value is encoded either using binary (I/O activity on the segment: yes/no) or hexadecimal representation (quantizing the numerical performance value into 0-15) which is then ready for similarity analysis.
|
||||
By pre-filtering jobs with no I/O activity -- their sum across all dimensions and time series is equal to zero, the dataset is reduced from 1 million jobs to about 580k jobs.
|
||||
By pre-filtering jobs with no I/O activity -- their sum across all dimensions and time series is equal to zero -- the dataset is reduced from 1 million jobs to about 580k jobs.
|
||||
|
||||
|
||||
\subsection{Algorithms for Computing Similarity}
|
||||
|
@ -170,8 +179,12 @@ They differ in the way data similarity is defined; either the time series is enc
|
|||
B-all determines the similarity between binary codings by means of Levenshtein distance.
|
||||
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
|
||||
Q-lev determines the similarity between quantized codings by using Levenshtein distance.
|
||||
Q-native uses a performance-aware similarity function, i.e., the distance between two jobs for a metric is $\frac{|m_{job1} - m_{job2}|}{16}$.
|
||||
There are various options for how a longer job is embedded in a shorter job, for example, a larger input file may stretch the length of the I/O and compute phases; another option can be that more (model) time is simulated. In this article, we consider these different behavioral patterns and attempt to identify situations where the I/O pattern of a long job is contained in a shorter job. Therefore, for jobs with different lengths, a sliding-windows approach is applied which finds the location for the shorter job in the long job with the highest similarity.
|
||||
Q-native uses a performance-aware similarity function, i.e., the distance between two jobs for a metric is $\frac{|m_{\text{job1}} - m_{\text{job2}}|}{16}$.
|
||||
%There are various options for how a longer job is embedded in a shorter job, for example, a larger input file may stretch the length of the I/O and compute phases; another option can be that more (model) time is simulated.
|
||||
One of our basic considerations is that a short job may run longer, e.g, when restarted with a larger input file (it can stretch the length of the I/O and compute phases) or when run with more simulating steps.
|
||||
\eb{Der Satz oben wurde umgeschrieben. Checken ob er passt.}
|
||||
In this article, we consider these different behavioral patterns and attempt to identify situations where the I/O pattern of a long job is contained in a shorter job.
|
||||
Therefore, for jobs with different lengths, a sliding-windows approach is applied which finds the location for the shorter job in the long job with the highest similarity.
|
||||
Q-phases extracts phase information and performs a phase-aware and performance-aware similarity computation.
|
||||
The Q-phases algorithm extracts I/O phases from our 10-minute segments and computes the similarity between the most similar I/O phases of both jobs.
|
||||
|
||||
|
@ -298,7 +311,7 @@ The job runtime of the Top\,100 jobs is shown using boxplots in \Cref{fig:runtim
|
|||
While all algorithms can compute the similarity between jobs of different lengths, the B algorithms and Q-native penalize jobs of different lengths, preferring jobs of very similar lengths.
|
||||
Q-phases is able to identify much shorter or longer jobs.
|
||||
|
||||
\begin{figure}
|
||||
\begin{figure}[bt]
|
||||
|
||||
\centering
|
||||
\begin{subfigure}{0.47\textwidth}
|
||||
|
@ -421,6 +434,10 @@ Related jobs stem from the same user/group and may have a related job name, but
|
|||
This was the first exploration of this methodology.
|
||||
In the future, we will expand the study by comparing more jobs in order to identify the suitability of the methodology.
|
||||
|
||||
|
||||
\eb{Darf man eigentlich ein Bild mitten im Literaturverzeichnis plazieren? Falls nicht, dann koennte man mit einem FloatBarrier eine Grenze setzen (siehe Code). Allerdings werdes es dann 13 Seiten.}
|
||||
|
||||
%\FloatBarrier
|
||||
\printbibliography%
|
||||
|
||||
\end{document}
|
||||
|
|
Loading…
Reference in New Issue