Noch eine Iteration
This commit is contained in:
parent
78224a4b35
commit
29d2b2501a
|
@ -175,7 +175,7 @@ By pre-filtering jobs with no I/O activity -- their sum across all dimensions an
|
||||||
|
|
||||||
\subsection{Algorithms for Computing Similarity}
|
\subsection{Algorithms for Computing Similarity}
|
||||||
We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases.
|
We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases.
|
||||||
They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance.
|
They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein distance.
|
||||||
B-all determines the similarity between binary codings by means of Levenshtein distance.
|
B-all determines the similarity between binary codings by means of Levenshtein distance.
|
||||||
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
|
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
|
||||||
Q-lev determines the similarity between quantized codings by using Levenshtein distance.
|
Q-lev determines the similarity between quantized codings by using Levenshtein distance.
|
||||||
|
@ -211,7 +211,7 @@ The segmented timelines of the job are visualized in \Cref{fig:refJobs} -- remem
|
||||||
This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}.
|
This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}.
|
||||||
The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview.
|
The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview.
|
||||||
For example, we can see that several metrics increase in Segment\,12.
|
For example, we can see that several metrics increase in Segment\,12.
|
||||||
We can also see an interesting result of our categorized coding, the write bytes are bigger than 0 while write calls are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}.
|
We can also see an interesting result of our categorized coding, the \lstinline|write_bytes| are bigger than 0 while \lstinline|write_calls| are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\includegraphics[width=\textwidth]{job-timeseries5024292}
|
\includegraphics[width=\textwidth]{job-timeseries5024292}
|
||||||
|
@ -223,7 +223,7 @@ We can also see an interesting result of our categorized coding, the write bytes
|
||||||
\section{Evaluation}%
|
\section{Evaluation}%
|
||||||
\label{sec:evaluation}
|
\label{sec:evaluation}
|
||||||
|
|
||||||
In the following, we assume the reference job (Job-M) is given and we aim to identify similar jobs.
|
In the following, we assume the reference job (Job-M) is given, and we aim to identify similar jobs.
|
||||||
For the reference job and each algorithm, we created CSV files with the computed similarity to all other jobs from our job pool (worth 203 days of production of Mistral).
|
For the reference job and each algorithm, we created CSV files with the computed similarity to all other jobs from our job pool (worth 203 days of production of Mistral).
|
||||||
During this process, the runtime of the algorithm is recorded.
|
During this process, the runtime of the algorithm is recorded.
|
||||||
Then we inspect the correlation between the similarity and number of found jobs.
|
Then we inspect the correlation between the similarity and number of found jobs.
|
||||||
|
@ -295,10 +295,10 @@ To confirm the hypotheses presented, we analyzed the job metadata comparing job
|
||||||
|
|
||||||
|
|
||||||
\paragraph{User distribution.}
|
\paragraph{User distribution.}
|
||||||
To understand how the Top\,100 are distributed across users, the data is grouped by userid and counted.
|
To understand how the Top\,100 are distributed across users, the data is grouped by user ID and counted.
|
||||||
\Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs.
|
\Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs.
|
||||||
Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms.
|
Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms.
|
||||||
We didn't include the group analysis in the figure as user count and group id are proportional, at most the number of users is 2x the number of groups.
|
We didn't include the group analysis in the figure as user count and group ID are proportional, at most the number of users is 2x the number of groups.
|
||||||
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
|
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
|
||||||
|
|
||||||
\paragraph{Node distribution.}
|
\paragraph{Node distribution.}
|
||||||
|
|
Loading…
Reference in New Issue