bin algorithms -> B algorithms
This commit is contained in:
parent
d20d36922e
commit
0b8858fcfc
|
@ -356,7 +356,7 @@ Finally, the quantitative behavior of the 100 most similar jobs is investigated.
|
|||
To measure the performance for computing the similarity to the reference jobs, the algorithms are executed 10 times on a compute node at DKRZ which is equipped with two Intel Xeon E5-2680v3 @2.50GHz and 64GB DDR4 RAM.
|
||||
A boxplot for the runtimes is shown in \Cref{fig:performance}.
|
||||
The runtime is normalized for 100k jobs, i.e., for B-all it takes about 41\,s to process 100k jobs out of the 500k total jobs that this algorithm will process.
|
||||
Generally, the bin algorithms are fastest, while the Q algorithms take often 4-5x as long.
|
||||
Generally, the B algorithms are fastest, while the Q algorithms take often 4-5x as long.
|
||||
Q\_phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L.
|
||||
The Levenshtein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window.
|
||||
The KS algorithm is faster than the others by 10x but it operates on the statistics of the time series.
|
||||
|
@ -480,7 +480,7 @@ To understand how the Top\,100 are distributed across users, the data is grouped
|
|||
\Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs.
|
||||
For Job-S, we can see that about 70-80\% of jobs stem from one user, for the Q-lev and Q-native algorithms, the other jobs stem from a second user while bin includes jobs from additional users (5 in total).
|
||||
For Job-M, jobs from more users are included (13); about 25\% of jobs stem from the same user; here, Q-lev, Q-native, and KS is including more users (29, 33, and 37, respectively) than the other three algorithms.
|
||||
For Job-L, the two Q algorithms include with (12 and 13) a bit more diverse user community than the bin algorithms (9) but Q-phases cover 35 users.
|
||||
For Job-L, the two Q algorithms include with (12 and 13) a bit more diverse user community than the B algorithms (9) but Q-phases cover 35 users.
|
||||
We didn't include the group analysis in the figure as user count and group id is proportional, at most the number of users is 2x the number of groups.
|
||||
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
|
||||
|
||||
|
@ -494,7 +494,7 @@ The boxplots have different shapes which is an indication, that the different al
|
|||
|
||||
\paragraph{Runtime distribution.}
|
||||
The job runtime of the Top\,100 jobs is shown using boxplots in \Cref{fig:runtime-job}.
|
||||
While all algorithms can compute the similarity between jobs of different length, the bin algorithms and Q-native penalize jobs of different length preferring jobs of very similar length.
|
||||
While all algorithms can compute the similarity between jobs of different length, the B algorithms and Q-native penalize jobs of different length preferring jobs of very similar length.
|
||||
For Job-M and Job-L, Q-phases and KS are able to identify much shorter or longer jobs.
|
||||
For Job-L, the job itself isn't included in the chosen Top\,100 (see \Cref{fig:hist-job-L}, 393 jobs have a similarity of 100\%) which is the reason why the job runtime isn't shown in the figure itself.
|
||||
|
||||
|
@ -733,7 +733,7 @@ So this job type isn't necessarily executed frequently and, therefore, our Top\,
|
|||
Some applications are more prominent in these sets, e.g., for B-aggzero, 32~jobs contain WRF (a model) in the name.
|
||||
The number of unique names is 19, 38, 49, and 51 for B-aggzero, Q-phases, Q-native and Q-lev, respectively.
|
||||
|
||||
The jobs that are similar according to the bin algorithms (see \Cref{fig:job-M-bin-aggzero}) differ from our expectations.
|
||||
The jobs that are similar according to the B algorithms (see \Cref{fig:job-M-bin-aggzero}) differ from our expectations.
|
||||
The other algorithms like Q-lev (\Cref{fig:job-M-hex-lev}) and Q-native (\Cref{fig:job-M-hex-native}) seem to work as intended:
|
||||
While jobs exhibit short bursts of other active metrics even for low similarity we can eyeball a relevant similarity.
|
||||
The KS algorithm working on the histograms ranks the jobs correctly on the similarity of their histograms.
|
||||
|
@ -871,7 +871,7 @@ Remember, for the KS algorithm, we concatenate the metrics of all nodes together
|
|||
|
||||
\subsection{Job-L}
|
||||
|
||||
The bin algorithms find a low similarity (best 2nd ranked job is 17\% similar), the inspection of job names (14 unique names) leads to two prominent applications: bash and xmessy with 45 and 48 instances, respectively.
|
||||
The B algorithms find a low similarity (best 2nd ranked job is 17\% similar), the inspection of job names (14 unique names) leads to two prominent applications: bash and xmessy with 45 and 48 instances, respectively.
|
||||
In \Cref{fig:job-L-bin-aggzero}, it can be seen that the found jobs have little in common with the reference job.
|
||||
|
||||
The Q-lev and Q-native algorithms identify a more diverse set of applications (18 unique names and no xmessy job).
|
||||
|
|
Loading…
Reference in New Issue