KS dimension reduction and filtering
This commit is contained in:
parent
ea893d76f0
commit
e4dd65c064
|
@ -112,6 +112,32 @@ The contribution of this paper...
|
||||||
|
|
||||||
\section{Methodology}
|
\section{Methodology}
|
||||||
\label{sec:methodology}
|
\label{sec:methodology}
|
||||||
|
\ebadd{
|
||||||
|
% Summary
|
||||||
|
For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps.
|
||||||
|
Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions.
|
||||||
|
Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further.
|
||||||
|
|
||||||
|
% Aggregation
|
||||||
|
The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system.
|
||||||
|
A fixed interval also ensure the portability of the approach to other HPC systems.
|
||||||
|
The concatenation of time series on the node dimension preserves I/O information of all nodes.
|
||||||
|
We apply no aggregation function to the metric dimension.
|
||||||
|
|
||||||
|
% Filtering
|
||||||
|
Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis.
|
||||||
|
Their sum across all dimensions and time series is equal to zero.
|
||||||
|
Furthermore, we filter those jobs whose time series have less than 8 values.
|
||||||
|
|
||||||
|
% Similarity
|
||||||
|
For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''.
|
||||||
|
The similarity function \Cref{eq:ks_similarity} calculates the inverse of reject probability $p_{\text{reject}}$.
|
||||||
|
}
|
||||||
|
\begin{equation}\label{eq:ks_similarity}
|
||||||
|
similarity = 1 - p_{\text{reject}}
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Given: the reference job ID.
|
Given: the reference job ID.
|
||||||
Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.
|
Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.
|
||||||
|
|
Loading…
Reference in New Issue