bin algorithms -> B algorithms

This commit is contained in:
Eugen Betke 2020-12-01 12:48:48 +01:00
parent d20d36922e
commit 0b8858fcfc
1 changed files with 5 additions and 5 deletions

View File

@ -356,7 +356,7 @@ Finally, the quantitative behavior of the 100 most similar jobs is investigated.
To measure the performance for computing the similarity to the reference jobs, the algorithms are executed 10 times on a compute node at DKRZ which is equipped with two Intel Xeon E5-2680v3 @2.50GHz and 64GB DDR4 RAM.
A boxplot for the runtimes is shown in \Cref{fig:performance}.
The runtime is normalized for 100k jobs, i.e., for B-all it takes about 41\,s to process 100k jobs out of the 500k total jobs that this algorithm will process.
Generally, the bin algorithms are fastest, while the Q algorithms take often 4-5x as long.
Generally, the B algorithms are fastest, while the Q algorithms take often 4-5x as long.
Q\_phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L.
The Levenshtein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window.
The KS algorithm is faster than the others by 10x but it operates on the statistics of the time series.
@ -480,7 +480,7 @@ To understand how the Top\,100 are distributed across users, the data is grouped
\Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs.
For Job-S, we can see that about 70-80\% of jobs stem from one user, for the Q-lev and Q-native algorithms, the other jobs stem from a second user while bin includes jobs from additional users (5 in total).
For Job-M, jobs from more users are included (13); about 25\% of jobs stem from the same user; here, Q-lev, Q-native, and KS is including more users (29, 33, and 37, respectively) than the other three algorithms.
For Job-L, the two Q algorithms include with (12 and 13) a bit more diverse user community than the bin algorithms (9) but Q-phases cover 35 users.
For Job-L, the two Q algorithms include with (12 and 13) a bit more diverse user community than the B algorithms (9) but Q-phases cover 35 users.
We didn't include the group analysis in the figure as user count and group id is proportional, at most the number of users is 2x the number of groups.
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
@ -494,7 +494,7 @@ The boxplots have different shapes which is an indication, that the different al
\paragraph{Runtime distribution.}
The job runtime of the Top\,100 jobs is shown using boxplots in \Cref{fig:runtime-job}.
While all algorithms can compute the similarity between jobs of different length, the bin algorithms and Q-native penalize jobs of different length preferring jobs of very similar length.
While all algorithms can compute the similarity between jobs of different length, the B algorithms and Q-native penalize jobs of different length preferring jobs of very similar length.
For Job-M and Job-L, Q-phases and KS are able to identify much shorter or longer jobs.
For Job-L, the job itself isn't included in the chosen Top\,100 (see \Cref{fig:hist-job-L}, 393 jobs have a similarity of 100\%) which is the reason why the job runtime isn't shown in the figure itself.
@ -733,7 +733,7 @@ So this job type isn't necessarily executed frequently and, therefore, our Top\,
Some applications are more prominent in these sets, e.g., for B-aggzero, 32~jobs contain WRF (a model) in the name.
The number of unique names is 19, 38, 49, and 51 for B-aggzero, Q-phases, Q-native and Q-lev, respectively.
The jobs that are similar according to the bin algorithms (see \Cref{fig:job-M-bin-aggzero}) differ from our expectations.
The jobs that are similar according to the B algorithms (see \Cref{fig:job-M-bin-aggzero}) differ from our expectations.
The other algorithms like Q-lev (\Cref{fig:job-M-hex-lev}) and Q-native (\Cref{fig:job-M-hex-native}) seem to work as intended:
While jobs exhibit short bursts of other active metrics even for low similarity we can eyeball a relevant similarity.
The KS algorithm working on the histograms ranks the jobs correctly on the similarity of their histograms.
@ -871,7 +871,7 @@ Remember, for the KS algorithm, we concatenate the metrics of all nodes together
\subsection{Job-L}
The bin algorithms find a low similarity (best 2nd ranked job is 17\% similar), the inspection of job names (14 unique names) leads to two prominent applications: bash and xmessy with 45 and 48 instances, respectively.
The B algorithms find a low similarity (best 2nd ranked job is 17\% similar), the inspection of job names (14 unique names) leads to two prominent applications: bash and xmessy with 45 and 48 instances, respectively.
In \Cref{fig:job-L-bin-aggzero}, it can be seen that the found jobs have little in common with the reference job.
The Q-lev and Q-native algorithms identify a more diverse set of applications (18 unique names and no xmessy job).