KS dimension reduction and filtering
This commit is contained in:
		
							parent
							
								
									ea893d76f0
								
							
						
					
					
						commit
						e4dd65c064
					
				| @ -112,6 +112,32 @@ The contribution of this paper... | ||||
| 
 | ||||
| \section{Methodology} | ||||
| \label{sec:methodology} | ||||
| \ebadd{ | ||||
| % Summary | ||||
| For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps. | ||||
| Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions. | ||||
| Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further. | ||||
| 
 | ||||
| % Aggregation | ||||
| The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system. | ||||
| A fixed interval also ensure the portability of the approach to other HPC systems. | ||||
| The concatenation of time series on the node dimension preserves I/O information of all nodes. | ||||
| We apply no aggregation function to the metric dimension. | ||||
| 
 | ||||
| % Filtering | ||||
| Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis. | ||||
| Their sum across all dimensions and time series is equal to zero. | ||||
| Furthermore, we filter those jobs whose time series have less than 8 values. | ||||
| 
 | ||||
| % Similarity | ||||
| For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''. | ||||
| The similarity function \Cref{eq:ks_similarity} calculates the inverse of reject probability $p_{\text{reject}}$. | ||||
| } | ||||
| \begin{equation}\label{eq:ks_similarity} | ||||
| 	similarity = 1 - p_{\text{reject}} | ||||
| \end{equation} | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| Given: the reference job ID. | ||||
| Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set. | ||||
|  | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user