Kl. verbesserung

This commit is contained in:
Julian M. Kunkel 2020-12-04 16:04:35 +00:00
parent fc99affb60
commit 7be00c5a3b
1 changed files with 4 additions and 4 deletions

View File

@ -205,17 +205,17 @@ They differ in the way data similarity is defined; either the time series is enc
B-all determines similarity between binary codings by means of Levenshtein distance.
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
Q-lev determines similarity between quantized codings by using Levensthein distance.
Q-native uses a performance-aware similarity function, i.e., distance for a metric is $\frac{|m_{job1} - m_{job2}|}{16}$.
Q-native uses a performance-aware similarity function, i.e., the distance between two jobs for a metric is $\frac{|m_{job1} - m_{job2}|}{16}$.
For jobs with different lengths, we apply a sliding-windows approach which finds the location for the shorter job in the long job with the highest similarity.
Q-phases extract phase information and performs a phase-aware and performance-aware similarity computation.
The Q-phases algorithm extracts I/O phases and computes the similarity between the most similar I/O phases of both jobs.
In this paper, we add a new similarity definition based on Kolmogorov-Smirnov-Test that compares the probability distribution of the observed values which we describe in the following.
In brief, KS concatenates individual node data (instead of averaging) and computes similarity be means of Kolmogorov-Smirnov-Test.
In this paper, we add a similarity definition based on Kolmogorov-Smirnov-Test that compares the probability distribution of the observed values which we describe in the following.
%In brief, KS concatenates individual node data and computes similarity be means of Kolmogorov-Smirnov-Test.
\paragraph{Kolmogorov-Smirnov (KS) algorithm}
% Summary
For the analysis, we perform two preparation steps.
Dimension reduction by computing means across the two file systems and by concatenating the time series data of the individual nodes.
Dimension reduction by computing means across the two file systems and by concatenating the time series data of the individual nodes (instead of averaging) them.
This reduces the four-dimensional dataset to two dimensions (time, metrics).
% Aggregation