Überarbeitung der Beschreibung
This commit is contained in:
parent
562a8cf7db
commit
fc99affb60
|
@ -195,23 +195,22 @@ The results are 4D data (time, nodes, metrics, file system) per job.
|
|||
The distance measures should handle jobs of different lengths and node count.
|
||||
In \cite{Eugen20HPS}, we discussed a variety of options from 1D job-profiles to data reductions to compare time series data and the general workflow and pre-processing in detail.
|
||||
In a nutshell, for each job executed on Mistral, we partition it into 10-minute segments and compute the arithmetic mean of each metric, categorize the value into non-IO (0), HighIO (1), and CriticalIO (4) for values below 99-percentile, up to 99.9-percentile, and above, respectively.
|
||||
After data is reduced across nodes, we quantize the timelines either using binary or quantum hexadecimal representation which is then ready for similarity analysis.
|
||||
The fixed interval of 10 minutes ensures the portability of the approach to other HPC systems.
|
||||
After the mean value across nodes is computed for a segment, the resulting numeric value is encoded either using binary (IO activity on the segment: yes/no) or hexadecimal representation (quantizing the numerical performance value into 0-15) which is then ready for similarity analysis.
|
||||
By pre-filtering jobs with no I/O activity -- their sum across all dimensions and time series is equal to zero, we are reducing the dataset from about 1 million jobs to about 580k jobs.
|
||||
|
||||
\subsection{Algorithms for Computing Similarity}
|
||||
We reuse the algorithms developed in \cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases.
|
||||
They differ in the way data similarity is defined; either the binary or hexadecimal coding is used, the distance measure is mostly the Euclidean distance or the Levenshtein-distance.
|
||||
For jobs with different lengths, we apply a sliding-windows approach which finds the location for the shorter job in the long job with the highest similarity.
|
||||
|
||||
They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance.
|
||||
B-all determines similarity between binary codings by means of Levenshtein distance.
|
||||
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
|
||||
Q-lev determines similarity between quantized codings by using Levensthein distance.
|
||||
Q-native uses instead of Levenshtein distance a performance-aware similarity function.
|
||||
Q-native uses a performance-aware similarity function, i.e., distance for a metric is $\frac{|m_{job1} - m_{job2}|}{16}$.
|
||||
For jobs with different lengths, we apply a sliding-windows approach which finds the location for the shorter job in the long job with the highest similarity.
|
||||
Q-phases extract phase information and performs a phase-aware and performance-aware similarity computation.
|
||||
KS concatenates individual node data (instead of averaging) and computes similarity be means of Kolmogorov-Smirnov-Test.
|
||||
|
||||
The Q-phases algorithm extracts I/O phases and computes the similarity between the most similar I/O phases of both jobs.
|
||||
In this paper, we add a new similarity definition based on Kolmogorov-Smirnov-Test that compares the probability distribution of the observed values which we describe in the following.
|
||||
In brief, KS concatenates individual node data (instead of averaging) and computes similarity be means of Kolmogorov-Smirnov-Test.
|
||||
|
||||
\paragraph{Kolmogorov-Smirnov (KS) algorithm}
|
||||
% Summary
|
||||
|
@ -221,7 +220,6 @@ This reduces the four-dimensional dataset to two dimensions (time, metrics).
|
|||
|
||||
% Aggregation
|
||||
The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system.
|
||||
The fixed interval of 10 minutes also ensures the portability of the approach to other HPC systems.
|
||||
Unlike the previous similarity definitions, the concatenation of time series on the node dimension preserves the individual I/O information of all nodes while it still allows comparison of jobs with a different number of nodes.
|
||||
We apply no aggregation function to the metric dimension.
|
||||
|
||||
|
|
Loading…
Reference in New Issue