KS dimension reduction and filtering

2020-09-03 18:32:22 +02:00 · 2020-09-03 18:32:22 +02:00 · e4dd65c064
commit e4dd65c064
parent ea893d76f0
1 changed files with 26 additions and 0 deletions
--- a/paper/main.tex
+++ b/paper/main.tex
@ -112,6 +112,32 @@ The contribution of this paper...

 \section{Methodology}
 \label{sec:methodology}
+\ebadd{
+% Summary
+For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps.
+Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions.
+Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further.
+
+% Aggregation
+The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system.
+A fixed interval also ensure the portability of the approach to other HPC systems.
+The concatenation of time series on the node dimension preserves I/O information of all nodes.
+We apply no aggregation function to the metric dimension.
+
+% Filtering
+Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis.
+Their sum across all dimensions and time series is equal to zero.
+Furthermore, we filter those jobs whose time series have less than 8 values.
+
+% Similarity
+For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''.
+The similarity function \Cref{eq:ks_similarity} calculates the inverse of reject probability $p_{\text{reject}}$.
+}
+\begin{equation}\label{eq:ks_similarity}
+	similarity = 1 - p_{\text{reject}}
+\end{equation}
+
+

 Given: the reference job ID.
 Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.