KS dimension reduction and filtering

2020-09-03 18:32:22 +02:00 · 2020-09-03 18:32:22 +02:00 · e4dd65c064
commit e4dd65c064
parent ea893d76f0
1 changed files with 26 additions and 0 deletions
--- a/paper/main.tex
+++ b/paper/main.tex
@ -112,6 +112,32 @@ The contribution of this paper...
 \section{Methodology}
 \label{sec:methodology}
 \ebadd{
 % Summary
 For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps.
 Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions.
 Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further.
 % Aggregation
 The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system.
 A fixed interval also ensure the portability of the approach to other HPC systems.
 The concatenation of time series on the node dimension preserves I/O information of all nodes.
 We apply no aggregation function to the metric dimension.
 % Filtering
 Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis.
 Their sum across all dimensions and time series is equal to zero.
 Furthermore, we filter those jobs whose time series have less than 8 values.
 % Similarity
 For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''.
 The similarity function \Cref{eq:ks_similarity} calculates the inverse of reject probability $p_{\text{reject}}$.
 }
 \begin{equation}\label{eq:ks_similarity}
 	similarity = 1 - p_{\text{reject}}
 \end{equation}
 Given: the reference job ID.
 Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.