Noch eine Iteration

This commit is contained in:
Eugen Betke 2021-06-03 22:37:03 +02:00
parent 78224a4b35
commit 29d2b2501a
1 changed files with 5 additions and 5 deletions

View File

@ -175,7 +175,7 @@ By pre-filtering jobs with no I/O activity -- their sum across all dimensions an
\subsection{Algorithms for Computing Similarity} \subsection{Algorithms for Computing Similarity}
We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases. We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases.
They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance. They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein distance.
B-all determines the similarity between binary codings by means of Levenshtein distance. B-all determines the similarity between binary codings by means of Levenshtein distance.
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero. B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
Q-lev determines the similarity between quantized codings by using Levenshtein distance. Q-lev determines the similarity between quantized codings by using Levenshtein distance.
@ -211,7 +211,7 @@ The segmented timelines of the job are visualized in \Cref{fig:refJobs} -- remem
This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}. This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}.
The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview. The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview.
For example, we can see that several metrics increase in Segment\,12. For example, we can see that several metrics increase in Segment\,12.
We can also see an interesting result of our categorized coding, the write bytes are bigger than 0 while write calls are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}. We can also see an interesting result of our categorized coding, the \lstinline|write_bytes| are bigger than 0 while \lstinline|write_calls| are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}.
\begin{figure} \begin{figure}
\includegraphics[width=\textwidth]{job-timeseries5024292} \includegraphics[width=\textwidth]{job-timeseries5024292}
@ -223,7 +223,7 @@ We can also see an interesting result of our categorized coding, the write bytes
\section{Evaluation}% \section{Evaluation}%
\label{sec:evaluation} \label{sec:evaluation}
In the following, we assume the reference job (Job-M) is given and we aim to identify similar jobs. In the following, we assume the reference job (Job-M) is given, and we aim to identify similar jobs.
For the reference job and each algorithm, we created CSV files with the computed similarity to all other jobs from our job pool (worth 203 days of production of Mistral). For the reference job and each algorithm, we created CSV files with the computed similarity to all other jobs from our job pool (worth 203 days of production of Mistral).
During this process, the runtime of the algorithm is recorded. During this process, the runtime of the algorithm is recorded.
Then we inspect the correlation between the similarity and number of found jobs. Then we inspect the correlation between the similarity and number of found jobs.
@ -295,10 +295,10 @@ To confirm the hypotheses presented, we analyzed the job metadata comparing job
\paragraph{User distribution.} \paragraph{User distribution.}
To understand how the Top\,100 are distributed across users, the data is grouped by userid and counted. To understand how the Top\,100 are distributed across users, the data is grouped by user ID and counted.
\Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs. \Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs.
Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms. Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms.
We didn't include the group analysis in the figure as user count and group id are proportional, at most the number of users is 2x the number of groups. We didn't include the group analysis in the figure as user count and group ID are proportional, at most the number of users is 2x the number of groups.
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users. Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
\paragraph{Node distribution.} \paragraph{Node distribution.}