From 29d2b2501a6ee13c3f0603be7081362ab9054147 Mon Sep 17 00:00:00 2001 From: Eugen Betke Date: Thu, 3 Jun 2021 22:37:03 +0200 Subject: [PATCH] Noch eine Iteration --- lncs-from-jhps/main.tex | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lncs-from-jhps/main.tex b/lncs-from-jhps/main.tex index 8578ad5..2d61b5c 100644 --- a/lncs-from-jhps/main.tex +++ b/lncs-from-jhps/main.tex @@ -175,7 +175,7 @@ By pre-filtering jobs with no I/O activity -- their sum across all dimensions an \subsection{Algorithms for Computing Similarity} We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases. -They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance. +They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein distance. B-all determines the similarity between binary codings by means of Levenshtein distance. B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero. Q-lev determines the similarity between quantized codings by using Levenshtein distance. @@ -211,7 +211,7 @@ The segmented timelines of the job are visualized in \Cref{fig:refJobs} -- remem This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}. The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview. For example, we can see that several metrics increase in Segment\,12. -We can also see an interesting result of our categorized coding, the write bytes are bigger than 0 while write calls are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}. +We can also see an interesting result of our categorized coding, the \lstinline|write_bytes| are bigger than 0 while \lstinline|write_calls| are 0\footnote{The reason is that a few write calls transfer many bytes; less than our 90\%-quantile, therefore, write calls will be set to 0.}. \begin{figure} \includegraphics[width=\textwidth]{job-timeseries5024292} @@ -223,7 +223,7 @@ We can also see an interesting result of our categorized coding, the write bytes \section{Evaluation}% \label{sec:evaluation} -In the following, we assume the reference job (Job-M) is given and we aim to identify similar jobs. +In the following, we assume the reference job (Job-M) is given, and we aim to identify similar jobs. For the reference job and each algorithm, we created CSV files with the computed similarity to all other jobs from our job pool (worth 203 days of production of Mistral). During this process, the runtime of the algorithm is recorded. Then we inspect the correlation between the similarity and number of found jobs. @@ -295,10 +295,10 @@ To confirm the hypotheses presented, we analyzed the job metadata comparing job \paragraph{User distribution.} -To understand how the Top\,100 are distributed across users, the data is grouped by userid and counted. +To understand how the Top\,100 are distributed across users, the data is grouped by user ID and counted. \Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs. Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms. -We didn't include the group analysis in the figure as user count and group id are proportional, at most the number of users is 2x the number of groups. +We didn't include the group analysis in the figure as user count and group ID are proportional, at most the number of users is 2x the number of groups. Thus, a user is likely from the same group and the number of groups is similar to the number of unique users. \paragraph{Node distribution.}