diff --git a/paper/main.tex b/paper/main.tex index dbceaa1..3a07285 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -112,6 +112,32 @@ The contribution of this paper... \section{Methodology} \label{sec:methodology} +\ebadd{ +% Summary +For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps. +Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions. +Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further. + +% Aggregation +The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system. +A fixed interval also ensure the portability of the approach to other HPC systems. +The concatenation of time series on the node dimension preserves I/O information of all nodes. +We apply no aggregation function to the metric dimension. + +% Filtering +Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis. +Their sum across all dimensions and time series is equal to zero. +Furthermore, we filter those jobs whose time series have less than 8 values. + +% Similarity +For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''. +The similarity function \Cref{eq:ks_similarity} calculates the inverse of reject probability $p_{\text{reject}}$. +} +\begin{equation}\label{eq:ks_similarity} + similarity = 1 - p_{\text{reject}} +\end{equation} + + Given: the reference job ID. Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.