Noch eine Iteration
This commit is contained in:
		
							parent
							
								
									78224a4b35
								
							
						
					
					
						commit
						29d2b2501a
					
				| @ -175,7 +175,7 @@ By pre-filtering jobs with no I/O activity -- their sum across all dimensions an | ||||
| 
 | ||||
| \subsection{Algorithms for Computing Similarity} | ||||
| We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases. | ||||
| They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance. | ||||
| They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein distance. | ||||
| B-all determines the similarity between binary codings by means of Levenshtein distance. | ||||
| B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero. | ||||
| Q-lev determines the similarity between quantized codings by using Levenshtein distance. | ||||
| @ -211,7 +211,7 @@ The segmented timelines of the job are visualized in \Cref{fig:refJobs} -- remem | ||||
| This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}. | ||||
| The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview. | ||||
| For example, we can see that several metrics increase in Segment\,12.  | ||||
| We can also see an interesting result of our categorized coding, the write bytes are bigger than 0 while write calls are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}. | ||||
| We can also see an interesting result of our categorized coding, the \lstinline|write_bytes| are bigger than 0 while \lstinline|write_calls| are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}. | ||||
| 
 | ||||
| \begin{figure} | ||||
| \includegraphics[width=\textwidth]{job-timeseries5024292} | ||||
| @ -223,7 +223,7 @@ We can also see an interesting result of our categorized coding, the write bytes | ||||
| \section{Evaluation}% | ||||
| \label{sec:evaluation} | ||||
| 
 | ||||
| In the following, we assume the reference job (Job-M) is given and we aim to identify similar jobs. | ||||
| In the following, we assume the reference job (Job-M) is given, and we aim to identify similar jobs. | ||||
| For the reference job and each algorithm, we created CSV files with the computed similarity to all other jobs from our job pool (worth 203 days of production of Mistral). | ||||
| During this process, the runtime of the algorithm is recorded. | ||||
| Then we inspect the correlation between the similarity and number of found jobs. | ||||
| @ -295,10 +295,10 @@ To confirm the hypotheses presented, we analyzed the job metadata comparing job | ||||
| 
 | ||||
| 
 | ||||
| \paragraph{User distribution.} | ||||
| To understand how the Top\,100 are distributed across users, the data is grouped by userid and counted. | ||||
| To understand how the Top\,100 are distributed across users, the data is grouped by user ID and counted. | ||||
| \Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs. | ||||
| Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms. | ||||
| We didn't include the group analysis in the figure as user count and group id are proportional, at most the number of users is 2x the number of groups. | ||||
| We didn't include the group analysis in the figure as user count and group ID are proportional, at most the number of users is 2x the number of groups. | ||||
| Thus, a user is likely from the same group and the number of groups is similar to the number of unique users. | ||||
| 
 | ||||
| \paragraph{Node distribution.} | ||||
|  | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user