@ -112,6 +112,32 @@ The contribution of this paper...

\section{Methodology}

\label{sec:methodology}

\ebadd{

% Summary

For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps.

Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions.

Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further.

% Aggregation

The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system.

A fixed interval also ensure the portability of the approach to other HPC systems.

The concatenation of time series on the node dimension preserves I/O information of all nodes.

We apply no aggregation function to the metric dimension.

% Filtering

Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis.

Their sum across all dimensions and time series is equal to zero.

Furthermore, we filter those jobs whose time series have less than 8 values.

% Similarity

For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''.

The similarity function \Cref{eq:ks_similarity} calculates the inverse of reject probability $p_{\text{reject}}$.

}

\begin{equation}\label{eq:ks_similarity}

similarity = 1 - p_{\text{reject}}

\end{equation}

Given: the reference job ID.

Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.