Noch eine Iteration
This commit is contained in:
parent
78224a4b35
commit
29d2b2501a
|
@ -175,7 +175,7 @@ By pre-filtering jobs with no I/O activity -- their sum across all dimensions an
|
|||
|
||||
\subsection{Algorithms for Computing Similarity}
|
||||
We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases.
|
||||
They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance.
|
||||
They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein distance.
|
||||
B-all determines the similarity between binary codings by means of Levenshtein distance.
|
||||
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
|
||||
Q-lev determines the similarity between quantized codings by using Levenshtein distance.
|
||||
|
@ -211,7 +211,7 @@ The segmented timelines of the job are visualized in \Cref{fig:refJobs} -- remem
|
|||
This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}.
|
||||
The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview.
|
||||
For example, we can see that several metrics increase in Segment\,12.
|
||||
We can also see an interesting result of our categorized coding, the write bytes are bigger than 0 while write calls are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}.
|
||||
We can also see an interesting result of our categorized coding, the \lstinline|write_bytes| are bigger than 0 while \lstinline|write_calls| are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}.
|
||||
|
||||
\begin{figure}
|
||||
\includegraphics[width=\textwidth]{job-timeseries5024292}
|
||||
|
@ -223,7 +223,7 @@ We can also see an interesting result of our categorized coding, the write bytes
|
|||
\section{Evaluation}%
|
||||
\label{sec:evaluation}
|
||||
|
||||
In the following, we assume the reference job (Job-M) is given and we aim to identify similar jobs.
|
||||
In the following, we assume the reference job (Job-M) is given, and we aim to identify similar jobs.
|
||||
For the reference job and each algorithm, we created CSV files with the computed similarity to all other jobs from our job pool (worth 203 days of production of Mistral).
|
||||
During this process, the runtime of the algorithm is recorded.
|
||||
Then we inspect the correlation between the similarity and number of found jobs.
|
||||
|
@ -295,10 +295,10 @@ To confirm the hypotheses presented, we analyzed the job metadata comparing job
|
|||
|
||||
|
||||
\paragraph{User distribution.}
|
||||
To understand how the Top\,100 are distributed across users, the data is grouped by userid and counted.
|
||||
To understand how the Top\,100 are distributed across users, the data is grouped by user ID and counted.
|
||||
\Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs.
|
||||
Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms.
|
||||
We didn't include the group analysis in the figure as user count and group id are proportional, at most the number of users is 2x the number of groups.
|
||||
We didn't include the group analysis in the figure as user count and group ID are proportional, at most the number of users is 2x the number of groups.
|
||||
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
|
||||
|
||||
\paragraph{Node distribution.}
|
||||
|
|
Loading…
Reference in New Issue