KS dimension reduction and filtering
This commit is contained in:
parent
ea893d76f0
commit
e4dd65c064
|
@ -112,6 +112,32 @@ The contribution of this paper...
|
|||
|
||||
\section{Methodology}
|
||||
\label{sec:methodology}
|
||||
\ebadd{
|
||||
% Summary
|
||||
For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps.
|
||||
Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions.
|
||||
Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further.
|
||||
|
||||
% Aggregation
|
||||
The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system.
|
||||
A fixed interval also ensure the portability of the approach to other HPC systems.
|
||||
The concatenation of time series on the node dimension preserves I/O information of all nodes.
|
||||
We apply no aggregation function to the metric dimension.
|
||||
|
||||
% Filtering
|
||||
Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis.
|
||||
Their sum across all dimensions and time series is equal to zero.
|
||||
Furthermore, we filter those jobs whose time series have less than 8 values.
|
||||
|
||||
% Similarity
|
||||
For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''.
|
||||
The similarity function \Cref{eq:ks_similarity} calculates the inverse of reject probability $p_{\text{reject}}$.
|
||||
}
|
||||
\begin{equation}\label{eq:ks_similarity}
|
||||
similarity = 1 - p_{\text{reject}}
|
||||
\end{equation}
|
||||
|
||||
|
||||
|
||||
Given: the reference job ID.
|
||||
Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.
|
||||
|
|
Loading…
Reference in New Issue