2020-08-18 12:58:39 +00:00
\let \accentvec \vec
\documentclass [] { llncs}
\usepackage { todonotes}
2020-08-26 17:08:50 +00:00
\newcommand { \eb } [1]{ \todo [inline, color=green] { EB: #1} }
2020-08-18 12:58:39 +00:00
\newcommand { \jk } [1]{ \todo [inline] { JK: #1} }
\usepackage { silence}
\WarningFilter { biblatex} { Using}
\WarningFilter { latex} { Float too large}
\WarningFilter { caption} { Unsupported}
\WarningFilter { caption} { Unknown document}
2020-08-26 17:08:50 +00:00
\usepackage { changes}
\definechangesauthor [name=Betke, color=blue] { eb}
\newcommand { \ebrep } [2]{ \replaced [id=eb] { #1} { #2} }
\newcommand { \ebadd } [1]{ \added [id=eb] { #1} }
\newcommand { \ebdel } [1]{ \deleted [id=eb] { #1} }
\newcommand { \ebcom } [1]{ \comment [id=eb] { #1} }
2020-08-18 12:58:39 +00:00
\let \spvec \vec
\let \vec \accentvec
\usepackage { amsmath}
\let \vec \spvec
\usepackage { array}
\usepackage { xcolor}
\usepackage { color}
\usepackage { colortbl}
\usepackage { subcaption}
\usepackage { hyperref}
\usepackage { listings}
\usepackage { lstautogobble}
\usepackage [listings,skins,breakable,raster,most] { tcolorbox}
\usepackage { caption}
\lstset {
numberbychapter=false,
belowskip=-10pt,
aboveskip=-10pt,
}
\lstdefinestyle { lstcodebox} {
basicstyle=\scriptsize \ttfamily ,
autogobble=true,
tabsize=2,
captionpos=b,
float,
}
\usepackage { graphicx}
\graphicspath {
2020-08-19 18:01:48 +00:00
{ ./pictures/} ,
2020-08-20 11:23:32 +00:00
{ ../fig/} ,
{ ../}
2020-08-18 12:58:39 +00:00
}
\usepackage [backend=bibtex, style=numeric] { biblatex}
\addbibresource { bibliography.bib}
\usepackage { enumitem}
\setitemize { noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt}
\definecolor { darkgreen} { rgb} { 0,0.5,0}
\definecolor { darkyellow} { rgb} { 0.7,0.7,0}
\usepackage { cleveref}
\crefname { codecount} { Code} { Codes}
2020-08-27 09:16:22 +00:00
\title { A Workflow for Identifying Jobs with Similar I/O Behavior by Analyzing the Timeseries}
2020-08-18 12:58:39 +00:00
\author { Julian Kunkel\inst { 2} \and Eugen Betke\inst { 1} }
2020-08-26 17:08:50 +00:00
2020-08-18 12:58:39 +00:00
\institute {
University of Reading--%
\email { j.m.kunkel@reading.ac.uk} %
\and
DKRZ --
\email { betke@dkrz.de} %
}
\begin { document}
\maketitle
\begin { abstract}
2020-10-04 14:11:46 +00:00
Supercomputers execute 1000's of jobs every day.
Support staff at a data center have two goals.
Firstly, they provide a service to users to enable them the execution of their applications.
Secondly, they aim to improve the efficiency of the workflows in order to allow the data center to serve more workloads.
2020-08-18 12:58:39 +00:00
2020-10-04 14:11:46 +00:00
In order to optimize an application, its behavior and resource utilization must be monitored and then assessed.
Rarely, users will liaise with staff and request a performance analysis and optimization explicitly.
Therefore, the data center must deploy monitoring systems and staff must pro-actively identify candidates for optimization.
2020-08-18 12:58:39 +00:00
2020-10-04 14:11:46 +00:00
While it is easy to utilize statistics to rank applications based on the utilization of compute, storage, and network, it is tricky to find patterns in 100.000 of jobs, i.e., is there a class of jobs that aren't performing well.
When support staff investigates a single job, the question might be are there other jobs like this?
2020-08-18 12:58:39 +00:00
2020-10-04 14:11:46 +00:00
In this paper, a methodology and algorithms to identify similar jobs based on their temporal IO behavior is described.
A study is conducted investigating similar jobs starting from three reference jobs bearing in mind if this is effective to find similar jobs.
The data stems from DKRZ's supercomputer Mistral and included more than 500.000 jobs that have been executed during several months.
2020-08-18 12:58:39 +00:00
2020-10-04 14:11:46 +00:00
%Problem with definition of similarity.
Our analysis shows that this strategy is effective to identify similar jobs and revealed some interesting patterns on the data.
2020-08-18 12:58:39 +00:00
\end { abstract}
2020-10-04 14:11:46 +00:00
2020-08-18 12:58:39 +00:00
\section { Introduction}
%This paper is structured as follows.
%We start with the related work in \Cref{sec:relwork}.
%Then, in TODO we introduce the DKRZ monitoring systems and explain how I/O metrics are captured by the collectors.
%In \Cref{sec:methodology} we describe the data reduction and the machine learning approaches and do an experiment in \Cref{sec:data,sec:evaluation}.
%Finally, we finalize our paper with a summary in \Cref{sec:summary}.
\section { Related Work}
\label { sec:relwork}
\section { Methodology}
\label { sec:methodology}
2020-09-03 16:32:22 +00:00
\ebadd {
% Summary
For the analysis of the Kolmogorov-Smirnov-based similarity we perform two preparation steps.
Dimension reduction by mean and concatenation functions allow us to reduce the four dimensional dataset to two dimensions.
Pre-filtering omits irrelevant jobs in term of performance and reduces the dataset any further.
% Aggregation
The reduction of the file system dimension by the mean function ensures the time series values stay in the range between 0 and 4, independently how many file systems are present on an HPC system.
A fixed interval also ensure the portability of the approach to other HPC systems.
The concatenation of time series on the node dimension preserves I/O information of all nodes.
We apply no aggregation function to the metric dimension.
% Filtering
Zero-jobs are jobs with no sign of significant I/O load are of little interest in the analysis.
Their sum across all dimensions and time series is equal to zero.
Furthermore, we filter those jobs whose time series have less than 8 values.
% Similarity
For the analysis we use the kolmogorov-smirnov-test 1.1.0 Rust library from the official Rust Package Registry ``cargo.io''.
The similarity function \Cref { eq:ks_ similarity} calculates the inverse of reject probability $ p _ { \text { reject } } $ .
}
\begin { equation} \label { eq:ks_ similarity}
similarity = 1 - p_ { \text { reject} }
\end { equation}
2020-08-18 12:58:39 +00:00
Given: the reference job ID.
Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.
Adapt the algorithms:
\begin { itemize}
\item iterate for all jobs
\begin { itemize}
\item compute distance to reference job
\end { itemize}
\item sort the jobs based on the distance to ref job
\item create cumulative job distribution based on distance for visualization, allow users to output jobs with a given distance
\end { itemize}
A user might be interested to explore say closest 10 or 50 jobs.
Algorithms:
Profile algorithm: job-profiles (job-duration, job-metrics, combine both)
$ \rightarrow $ just compute geom-mean distance between profile
Check time series algorithms:
\begin { itemize}
\item bin
2020-08-19 18:01:48 +00:00
\item hex\_ native
\item hex\_ lev
\item hex\_ quant
2020-08-18 12:58:39 +00:00
\end { itemize}
\section { Evaluation}
\label { sec:evaluation}
2020-08-21 18:12:33 +00:00
For each reference job and algorithm, we created a CSV files with the computed similarity for all other jobs.
Next, we analyzed the performance of the algorithm.
Then the quantitative behavior and the correlation between chosen similarity and number of found jobs, and, finally, the quality of the 100 most similar jobs.
\subsection { Reference Jobs}
2020-08-20 10:48:27 +00:00
In the following, we assume a job is given and we aim to identify similar jobs.
2020-08-21 18:12:33 +00:00
We chose several reference jobs with different compute and IO characteristics:
2020-08-18 12:58:39 +00:00
\begin { itemize}
2020-08-21 18:12:33 +00:00
\item Job-S: performs post-processing on a single node. This is a typical process in climate science where data products are reformatted and annotated with metadata to a standard representation (so called CMORization). The post-processing is IO intensive.
2020-08-20 10:48:27 +00:00
\item Job-M: a typical MPI parallel 8-hour compute job on 128 nodes which writes time series data after some spin up. %CHE.ws12
\item Job-L: a 66-hour 20-node job.
The initialization data is read at the beginning.
Then only a single master node writes constantly a small volume of data; in fact, the generated data is too small to be categorized as IO relevant.
2020-08-18 12:58:39 +00:00
\end { itemize}
2020-08-21 18:12:33 +00:00
The segmented timeline of the jobs are visualized in \Cref { fig:refJobs} .
This coding is also used for the HEX class of algorithms (BIN algorithms merge all timelines together as described in \jk { TODO} .
The figures show the values of active metrics ($ \neq 0 $ ) only; if few are active then they are shown in one timeline, otherwise they are rendered individually to provide a better overview.
For example, we can see in \Cref { fig:job-S} , that several metrics increase in Segment\, 6.
2020-08-19 18:01:48 +00:00
2020-10-01 16:10:27 +00:00
In \Cref { fig:refJobsHist} , the histograms of all job metrics are shown.
A histogram contains the activities of each node and timestep without being averaged across the nodes.
This data is used to compare jobs using Kolmogorov-Smirnov.
The metrics at Job-L are not shown as they have only a handful of instances where the value is not 0, except for write\_ bytes: the first process is writing out at a low rate.
Interestingly, the aggregated pattern of Job-L in \Cref { fig:job-L} sums up to some activity at the first segment for three other metrics.
2020-08-19 18:01:48 +00:00
\begin { figure}
\begin { subfigure} { 0.8\textwidth }
2020-08-20 11:23:32 +00:00
\centering
2020-10-01 16:10:27 +00:00
\includegraphics [width=\textwidth] { job-ks-0timeseries4296426}
2020-08-26 14:09:14 +00:00
\caption { Job-S (runtime=15,551\, s, segments=25)} \label { fig:job-S}
2020-08-19 18:01:48 +00:00
\end { subfigure}
2020-08-20 11:23:32 +00:00
\centering
2020-08-19 18:01:48 +00:00
\begin { subfigure} { 0.8\textwidth }
2020-08-20 11:23:32 +00:00
\centering
2020-08-19 18:01:48 +00:00
\includegraphics [width=\textwidth] { job-timeseries5024292}
2020-08-26 14:09:14 +00:00
\caption { Job-M (runtime=28,828\, s, segments=48)} \label { fig:job-M}
2020-08-19 18:01:48 +00:00
\end { subfigure}
2020-08-20 11:23:32 +00:00
\centering
2020-08-19 18:01:48 +00:00
2020-08-21 18:12:33 +00:00
\caption { Reference jobs: segmented timelines of mean IO activity}
\label { fig:refJobs}
\end { figure}
\begin { figure} \ContinuedFloat
2020-08-19 18:01:48 +00:00
\begin { subfigure} { 0.8\textwidth }
2020-08-20 11:23:32 +00:00
\centering
2020-10-01 16:10:27 +00:00
\includegraphics [width=\textwidth] { job-ks-2timeseries7488914-30}
2020-08-19 18:01:48 +00:00
\caption { Job-L (first 30 segments of 400; remaining segments are similar)}
\label { fig:job-L}
\end { subfigure}
2020-08-20 11:23:32 +00:00
\centering
2020-08-21 18:12:33 +00:00
\caption { Reference jobs: segmented timelines of mean IO activity}
2020-08-19 18:01:48 +00:00
\end { figure}
2020-10-01 16:10:27 +00:00
\begin { figure}
\begin { subfigure} { 0.8\textwidth }
\centering
\includegraphics [width=\textwidth] { job-ks-0hist4296426}
\caption { Job-S} \label { fig:job-S-hist}
\end { subfigure}
\centering
\begin { subfigure} { 0.8\textwidth }
\centering
\includegraphics [width=\textwidth] { job-ks-1hist5024292}
\caption { Job-M} \label { fig:job-M-hist}
\end { subfigure}
\centering
\caption { Reference jobs: histogram of IO activities}
\label { fig:refJobsHist}
\end { figure}
%\begin{figure}\ContinuedFloat
%\begin{subfigure}{0.8\textwidth}
%\centering
%\includegraphics[width=\textwidth]{job-ks-2hist7488914}
%\caption{Job-L}
%\label{fig:job-L}
%\end{subfigure}
%\centering
%\caption{Reference jobs: histogram of IO activities}
%\end{figure}
2020-08-20 11:23:32 +00:00
2020-08-21 18:12:33 +00:00
\subsection { Performance}
\jk { Describe System at DKRZ from old paper}
2020-08-25 17:00:28 +00:00
To measure the performance for computing the similarity to the reference jobs, the algorithms are executed 10 times on a compute node at DKRZ.
A boxplot for the runtimes is shown in \Cref { fig:performance} .
2020-08-26 14:33:47 +00:00
The runtime is normalized for 100k jobs, i.e., for bin\_ all it takes about 41\, s to process 100k jobs out of the 500k total jobs that this algorithm will process.
2020-08-25 17:00:28 +00:00
Generally, the bin algorithms are fastest, while the hex algorithms take often 4-5x as long.
Hex\_ phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L.
The Levensthein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window.
Note that the current algorithms are sequential and executed on just one core.
For computing the similarity to one (or a small set of reference jobs), they could easily be parallelized.
We believe this will then allow a near-online analysis of a job.
2020-08-21 18:12:33 +00:00
2020-10-01 16:10:27 +00:00
\jk { To update the figure to use KS and (maybe to aggregate job profiles)? Problem old files are gone}
2020-08-21 18:12:33 +00:00
\begin { figure}
\centering
2020-08-25 17:00:28 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-21 18:12:33 +00:00
\centering
2020-08-25 17:00:28 +00:00
\includegraphics [width=\textwidth] { progress_ 4296426-out-boxplot}
2020-08-26 14:09:14 +00:00
\caption { Job-S (segments=25)} \label { fig:perf-job-S}
2020-08-21 18:12:33 +00:00
\end { subfigure}
2020-08-25 17:00:28 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-21 18:12:33 +00:00
\centering
2020-08-25 17:00:28 +00:00
\includegraphics [width=\textwidth] { progress_ 5024292-out-boxplot}
2020-08-26 14:09:14 +00:00
\caption { Job-M (segments=48)} \label { fig:perf-job-M}
2020-08-25 17:00:28 +00:00
\end { subfigure}
\begin { subfigure} { 0.31\textwidth }
\centering
\includegraphics [width=\textwidth] { progress_ 7488914-out-boxplot}
2020-08-26 14:09:14 +00:00
\caption { Job-L (segments=400)} \label { fig:perf-job-L}
2020-08-21 18:12:33 +00:00
\end { subfigure}
2020-08-25 17:29:23 +00:00
\caption { Runtime of the algorithms to compute the similarity to reference jobs}
2020-08-21 18:12:33 +00:00
\label { fig:performance}
\end { figure}
\subsection { Quantitative Analysis}
In the quantitative analysis, we explore for the different algorithms how the similarity of our pool of jobs behaves to our three reference jobs (Job-S, Job-M, and Job-L).
The cumulative distribution of similarity to the reference jobs is shown in \Cref { fig:ecdf} .
For example, in \Cref { fig:ecdf-job-S} , we see that about 70\% have a similarity of less than 10\% to Job-S for HEX\_ native.
BIN\_ aggzeros shows some steep increases, e.g., more than 75\% of jobs have the same low similarity below 2\% .
The different algorithms lead to different curves for our reference jobs, e.g., for Job-S, HEX\_ phases bundles more jobs with low similarity compared to the other jobs; in Job-L, it is the slowest.
% This indicates that the algorithms
The support team in a data center may have time to investigate the most similar jobs.
2020-08-26 14:09:14 +00:00
Time for the analysis is typically bound, for instance, the team may analyze the 100 most similar ranked jobs; we refer to them as the Top\, 100 jobs, and Rank\, i refers to the job that has the i-th highest similarity to the reference job -- sometimes these values can be rather close together as we see in the following histogram.
2020-08-21 18:12:33 +00:00
In \Cref { fig:hist} , the histograms with the actual number of jobs for a given similarity are shown.
2020-08-25 17:00:28 +00:00
As we focus on a feasible number of jobs, the diagram should be read from right (100\% similarity) to left; and for a bin we show at most 100 jobs (total number is still given).
2020-08-21 18:12:33 +00:00
It turns out that both BIN algorithms produce nearly identical histograms and we omit one of them.
In the figures, we can see again a different behavior of the algorithms depending on the reference job.
2020-08-25 17:00:28 +00:00
Especially for Job-S, we can see clusters with jobs of higher similarity (e.g., at hex\_ lev at SIM=75\% ) while for Job-M, the growth in the relevant section is more steady.
2020-10-01 16:10:27 +00:00
For Job-L, we find barely similar jobs, except when using the HEX\_ phases and ks algorithms.
HEX\_ phases find 393 jobs that have a similarity of 100\% , thus they are indistinguishable, while ks identifies 6880 jobs with a similarity of at least 97.5\% .
2020-08-21 18:12:33 +00:00
2020-08-25 17:00:28 +00:00
Practically, the support team would start with Rank\, 1 (most similar job, presumably, the reference job itself) and walk down until the jobs look different, or until a cluster is analyzed.
2020-08-20 11:23:32 +00:00
\begin { figure}
\begin { subfigure} { 0.8\textwidth }
\centering
2020-08-21 18:12:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/ecdf}
2020-08-20 11:23:32 +00:00
\caption { Job-S} \label { fig:ecdf-job-S}
\end { subfigure}
\centering
\begin { subfigure} { 0.8\textwidth }
\centering
2020-08-21 18:12:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/ecdf}
2020-08-20 11:23:32 +00:00
\caption { Job-M} \label { fig:ecdf-job-M}
\end { subfigure}
\centering
\begin { subfigure} { 0.8\textwidth }
\centering
2020-08-21 18:12:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/ecdf}
2020-08-20 11:23:32 +00:00
\caption { Job-L} \label { fig:ecdf-job-L}
\end { subfigure}
\centering
2020-08-21 18:12:33 +00:00
\caption { Quantitative job similarity -- empirical cumulative density function}
2020-08-20 11:23:32 +00:00
\label { fig:ecdf}
\end { figure}
\begin { figure}
2020-08-21 18:12:33 +00:00
\centering
2020-08-20 11:23:32 +00:00
2020-08-21 18:12:33 +00:00
\begin { subfigure} { 0.75\textwidth }
2020-08-20 11:23:32 +00:00
\centering
2020-10-04 14:11:46 +00:00
\includegraphics [width=\textwidth,trim={0 0 0 2.0cm},clip] { job_ similarities_ 4296426-out/hist-sim}
2020-08-20 11:23:32 +00:00
\caption { Job-S} \label { fig:hist-job-S}
\end { subfigure}
2020-08-21 18:12:33 +00:00
\begin { subfigure} { 0.75\textwidth }
2020-08-20 11:23:32 +00:00
\centering
2020-10-04 14:11:46 +00:00
\includegraphics [width=\textwidth,trim={0 0 0 2.0cm},clip] { job_ similarities_ 5024292-out/hist-sim}
2020-08-20 11:23:32 +00:00
\caption { Job-M} \label { fig:hist-job-M}
\end { subfigure}
2020-08-21 18:12:33 +00:00
\begin { subfigure} { 0.75\textwidth }
2020-08-20 11:23:32 +00:00
\centering
2020-10-04 14:11:46 +00:00
\includegraphics [width=\textwidth,trim={0 0 0 2.0cm},clip] { job_ similarities_ 7488914-out/hist-sim}
2020-08-20 11:23:32 +00:00
\caption { Job-L} \label { fig:hist-job-L}
\end { subfigure}
\centering
2020-08-21 18:12:33 +00:00
\caption { Histogram for the number of jobs (bin width: 2.5\% , numbers are the actual job counts). BIN\_ aggzeros is nearly identical to BIN\_ all.}
2020-08-20 15:16:46 +00:00
\label { fig:hist}
\end { figure}
2020-08-25 17:00:28 +00:00
\subsubsection { Inclusivity and Specificity}
2020-08-20 15:16:46 +00:00
2020-08-26 14:09:14 +00:00
When analyzing the overall population of jobs executed on a system, we expect that some workloads are executed several times (with different inputs but with the same configuration) or are executed with slightly different configurations (e.g., node counts, timesteps).
Thus, potentially our similarity analysis of the job population may just identify the re-execution of the same workload.
2020-08-27 09:16:22 +00:00
Typically, the support staff would identify the re-execution of jobs by inspecting job names which are user-defined generic strings\footnote { %
As they can contain confidential data, it is difficult to anonymize them without perturbing the meaning.
Therefore, they are not published in our data repository.
}
2020-08-20 19:39:42 +00:00
2020-08-26 14:09:14 +00:00
To understand if the analysis is inclusive and identifies different applications, we use two approaches with our Top\, 100 jobs:
We explore the distribution of users (and groups), runtime, and node count across jobs.
The algorithms should include different users, node counts, and across runtime.
To confirm hypotheses presented, we analyzed the job metadata comparing job names which validates our quantitative results discussed in the following.
2020-08-20 19:39:42 +00:00
2020-08-26 17:08:50 +00:00
2020-08-26 14:09:14 +00:00
\paragraph { User distribution.}
2020-08-25 17:00:28 +00:00
To understand how the Top\, 100 are distributed across users, the data is grouped by userid and counted.
2020-08-20 19:39:42 +00:00
\Cref { fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the top most user in the stack has the smallest number of jobs.
For Job-S, we can see that about 70-80\% of jobs stem from one user, for the hex\_ lev and hex\_ native algorithms, the other jobs stem from a second user while bin includes jobs from additional users (5 in total).
2020-10-01 16:10:27 +00:00
For Job-M, jobs from more users are included (13); about 25\% of jobs stem from the same user, here, hex\_ lev, hex\_ native, and ks is including more users (29, 33, and 37, respectively) than the other three algorithms.
2020-08-20 19:39:42 +00:00
For Job-L, the two hex algorithms include with (12 and 13) a bit more diverse user community than the bin algorithms (9) but hex\_ phases covers 35 users.
2020-08-26 14:09:14 +00:00
We didn't include the group analysis in the figure as user count and group id is proportional, at most the number of users is 2x the number of groups.
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
\paragraph { Node distribution.}
All algorithms reduce over the node dimensions, therefore, we naturally expect a big inclusion across node range -- as long as the average I/O behavior of the jobs are similar.
2020-08-27 13:18:23 +00:00
\Cref { fig:nodes-job} shows a boxplot for the node counts in the Top\, 100 -- the red line marks the reference job.
2020-08-26 14:09:14 +00:00
For Job-M and Job-L, we can observe that indeed the range of similar nodes is between 1 and 128.
2020-08-27 13:18:23 +00:00
For Job-S, all 100 top-ranked jobs use one node.
2020-08-26 14:09:14 +00:00
As post-processing jobs use typically one node and the number of postprocessing jobs is a high proportion, it appears natural that all Top\, 100 are from this class of jobs which is confirmed by investigating the job metadata.
The boxplots have different shapes which is an indication, that the different algorithms identify a different set of jobs -- we will analyze this later further.
\paragraph { Runtime distribution.}
2020-09-02 11:43:53 +00:00
The job runtime of the Top\, 100 jobs is shown using boxplots in \Cref { fig:runtime-job} .
2020-08-27 09:16:22 +00:00
While all algorithms can compute the similarity between jobs of different length, the bin algorithms and hex\_ native penalize jobs of different length preferring jobs of very similar length.
2020-10-01 16:10:27 +00:00
For Job-M and Job-L, hex\_ phases and ks are able to identify much shorter or longer jobs.
2020-08-26 14:09:14 +00:00
For Job-L, the job itself isn't included in the chosen Top\, 100 (see \Cref { fig:hist-job-L} , 393 jobs have a similarity of 100\% ) which is the reason why the job runtime isn't shown in the figure itself.
2020-08-20 19:39:42 +00:00
\begin { figure}
\begin { subfigure} { 0.31\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/user-ids}
\caption { Job-S} \label { fig:users-job-S}
\end { subfigure}
\begin { subfigure} { 0.31\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/user-ids}
\caption { Job-M} \label { fig:users-job-M}
\end { subfigure}
\begin { subfigure} { 0.31\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/user-ids}
\caption { Job-L} \label { fig:users-job-L}
\end { subfigure}
2020-08-27 09:16:22 +00:00
\caption { User information for all 100 top-ranked jobs}
2020-08-20 19:39:42 +00:00
\label { fig:userids}
\end { figure}
2020-08-20 15:16:46 +00:00
\begin { figure}
2020-08-25 17:29:23 +00:00
%\begin{subfigure}{0.31\textwidth}
%\centering
%\includegraphics[width=\textwidth]{job_similarities_4296426-out/jobs-nodes}
%\caption{Job-S} \label{fig:nodes-job-S}
%\end{subfigure}
\begin { subfigure} { 0.48\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/jobs-nodes}
2020-08-25 17:29:23 +00:00
\caption { Job-M (ref. job runs on 128 nodes)} \label { fig:nodes-job-M}
2020-08-20 15:16:46 +00:00
\end { subfigure}
2020-08-25 17:29:23 +00:00
\begin { subfigure} { 0.48\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/jobs-nodes}
2020-08-25 17:29:23 +00:00
\caption { Job-L (reference job runs on 20 nodes)} \label { fig:nodes-job-L}
2020-08-20 15:16:46 +00:00
\end { subfigure}
\centering
2020-08-27 09:16:22 +00:00
\caption { Distribution of node counts (for Job-S nodes=1 in all cases))}
2020-08-20 15:16:46 +00:00
\label { fig:nodes-job}
\end { figure}
\begin { figure}
2020-08-20 17:24:56 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/jobs-elapsed}
2020-08-27 09:16:22 +00:00
\caption { Job-S ($ job = 15 , 551 s $ )} \label { fig:runtime-job-S}
2020-08-20 15:16:46 +00:00
\end { subfigure}
2020-08-20 17:24:56 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/jobs-elapsed}
2020-08-27 09:16:22 +00:00
\caption { Job-M ($ job = 28 , 828 s $ )} \label { fig:runtime-job-M}
2020-08-20 15:16:46 +00:00
\end { subfigure}
2020-08-20 17:24:56 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/jobs-elapsed}
2020-08-27 09:16:22 +00:00
\caption { Job-L ($ job = 240 ks $ )} \label { fig:runtime-job-L}
2020-08-20 15:16:46 +00:00
\end { subfigure}
\centering
2020-08-27 09:16:22 +00:00
\caption { Distribution of runtime for all 100 top-ranked jobs}
2020-08-20 15:16:46 +00:00
\label { fig:runtime-job}
\end { figure}
2020-08-26 14:09:14 +00:00
\subsubsection { Algorithmic differences}
To verify that the different algorithms behave differently, the intersection for the Top\, 100 is computed for all combination of algorithms and visualized in \Cref { fig:heatmap-job} .
2020-10-04 14:11:46 +00:00
Bin\_ all and bin\_ aggzeros overlap with at least 99 ranks for all three jobs.
2020-10-01 16:10:27 +00:00
While there is some reordering, both algorithms lead to a comparable set.
All algorithms have significant overlap for Job-S.
For Job\- M, however, they lead to a different ranking and Top\, 100, particularly ks determines a different set.
Generally, hex\_ lev and Hex\_ native are generating more similar results than other algorithms.
2020-08-27 09:16:22 +00:00
From this analysis, we conclude that one representative from binary quantization is sufficient as it generates very similar results while the other algorithms identify mostly disjoint behavioral aspects and, therefore, should be analyzed individually.
2020-08-20 15:16:46 +00:00
\begin { figure}
2020-08-20 17:24:56 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/intersection-heatmap}
\caption { Job-S} \label { fig:heatmap-job-S}
\end { subfigure}
2020-08-20 17:24:56 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/intersection-heatmap}
2020-08-25 17:00:28 +00:00
\caption { Job-M} \label { fig:heatmap-job-M} %,trim={2.5cm 0 0 0},clip
2020-08-20 15:16:46 +00:00
\end { subfigure}
2020-08-20 17:24:56 +00:00
\begin { subfigure} { 0.31\textwidth }
2020-08-20 15:16:46 +00:00
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/intersection-heatmap}
\caption { Job-L} \label { fig:heatmap-job-L}
\end { subfigure}
2020-08-25 17:00:28 +00:00
2020-08-20 15:16:46 +00:00
\centering
2020-08-27 09:16:22 +00:00
\caption { Intersection of the 100 top-ranked jobs for different algorithms}
2020-08-20 15:16:46 +00:00
\label { fig:heatmap-job}
\end { figure}
2020-08-27 13:18:23 +00:00
%%%%%%%%%%% %%%%%%%%%%% %%%%%%%%%%% %%%%%%%%%%% %%%%%%%%%%% %%%%%%%%%%% %%%%%%%%%%% %%%%%%%%%%%
2020-08-20 15:16:46 +00:00
\section { Assessing Timelines for Similar Jobs}
2020-08-27 13:18:23 +00:00
To verify the suitability of the similarity metrics, for each algorithm, we investigated the timelines of all Top\, 100 jobs.
We subjectively found that the approach works very well and identifies suitable similar jobs.
2020-08-27 15:11:08 +00:00
To demonstrate this, we include a selection of job timelines -- typically Rank\, 2, Rank\, 15, and Rank\, 100, and selected interesting job profiles.
These can be visually and subjectively compared to our reference jobs shown in \Cref { fig:refJobs} .
2020-08-27 13:18:23 +00:00
2020-08-20 15:16:46 +00:00
\subsection { Job-S}
2020-08-27 13:18:23 +00:00
This job represents post-processing (CMORization) which is a typical step.
It is executed for different simulations and variables across timesteps.
The job name of Job-S suggests that is applied to the control variable.
In the metadata, we found 22,580 jobs with “cmor” in the name of which 367 jobs mention “control”.
2020-10-04 14:11:46 +00:00
The bin and ks algorithms identify one job which name doesn't include “cmor”,
All other algorithm identify only “cmor” jobs and 26-38 of these jobs are applied to “control” (see \Cref { tbl:control-jobs} ) -- only the ks algorithm doesn't identify any job with control.
2020-08-27 13:18:23 +00:00
A selection of job timelines is given in \Cref { fig:job-S-hex-lev} ; all of these jobs are jobs on control variables.
The single non-cmor job and a high-ranked non-control cmor job is shown in \Cref { fig:job-S-bin-agg} .
While we cannot visually see much differences between these two jobs compared to the cmor job processing the control variables, the algorithms indicate that jobs processing the control variables must be more similar as they appear much more frequently in the Top\, 100 jobs than in all jobs labeled with “cmor”.
2020-10-04 14:11:46 +00:00
For Job-S, we found that all algorithms work well and, therefore, omit further timelines.
2020-08-27 13:18:23 +00:00
\begin { table}
\centering
2020-10-04 14:11:46 +00:00
\begin { tabular} { r|r|r|r|r|r}
bin\_ aggzeros & bin\_ all & hex\_ lev & hex\_ native & hex\_ phases & ks\\ \hline
38 & 38 & 33 & 26 & 33 & 0
2020-08-27 13:18:23 +00:00
\end { tabular}
2020-10-04 14:11:46 +00:00
%\begin{tabular}{r|r}
% Algorithm & Jobs \\ \hline
% bin\_aggzeros & 38 \\
% bin\_all & 38 \\
% hex\_lev & 33 \\
% hex\_native & 26 \\
% hex\_phases & 33 \\
% ks & 0
%\end{tabular}
2020-08-27 13:18:23 +00:00
\caption { Job-S: number of jobs with “control” in their name in the Top-100}
\label { tbl:control-jobs}
\end { table}
2020-08-20 15:16:46 +00:00
\begin { figure}
\centering
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-27 13:18:23 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/bin_ aggzeros-0.6923--76timeseries4235560}
2020-08-27 15:11:08 +00:00
\caption { Non-cmor job: Rank\, 76, SIM=69\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-27 13:18:23 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/bin_ aggzeros-0.8077--4timeseries4483904}
2020-08-27 15:11:08 +00:00
\caption { Non-control job: Rank\, 4, SIM=81\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
2020-08-27 13:18:23 +00:00
\caption { Job-S: jobs with different job names when using bin\_ aggzeros}
\label { fig:job-S-bin-agg}
2020-08-20 15:16:46 +00:00
\end { figure}
2020-08-27 13:18:23 +00:00
2020-08-20 15:16:46 +00:00
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-27 13:18:23 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/hex_ lev-0.9615--1timeseries4296288}
2020-08-27 15:11:08 +00:00
\caption { Rank 2, SIM=96\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-27 13:18:23 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/hex_ lev-0.9012--15timeseries4296277}
2020-08-27 15:11:08 +00:00
\caption { Rank 15, SIM=90\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-27 13:18:23 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 4296426-out/hex_ lev-0.7901--99timeseries4297842}
2020-08-27 15:11:08 +00:00
\caption { Rank\, 100, SIM=79\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
2020-08-27 13:18:23 +00:00
\caption { Job-S with Hex-Lev, selection of similar jobs}
\label { fig:job-S-hex-lev}
2020-08-20 11:23:32 +00:00
\end { figure}
2020-08-27 13:18:23 +00:00
% \begin{figure}
% \begin{subfigure}{0.3\textwidth}
% \centering
% \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_native-0.9808--1timeseries4296288}
% \caption{Rank 2, SIM=}
% \end{subfigure}
% \begin{subfigure}{0.3\textwidth}
% \centering
% \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_native-0.9375--15timeseries4564296}
% \caption{Rank 15, SIM=}
% \end{subfigure}
% \begin{subfigure}{0.3\textwidth}
% \centering
% \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_native-0.8915--99timeseries4296785}
% \caption{Rank\,100, SIM=}
% \end{subfigure}
% \caption{Job-S with Hex-Native, selection of similar jobs}
% \label{fig:job-S-hex-native}
% \end{figure}
%
2020-08-20 17:24:56 +00:00
% \ContinuedFloat
2020-08-27 13:18:23 +00:00
%
% \begin{figure}
% \begin{subfigure}{0.3\textwidth}
% \centering
% \includegraphics[width=\textwidth]{job_similarities_4296426-out/bin_aggzeros-0.8462--1timeseries4296280}
% \caption{Rank 2, SIM=}
% \end{subfigure}
% \begin{subfigure}{0.3\textwidth}
% \centering
% \includegraphics[width=\textwidth]{job_similarities_4296426-out/bin_aggzeros-0.7778--14timeseries4555405}
% \caption{Rank 15, SIM=}
% \end{subfigure}
% \begin{subfigure}{0.3\textwidth}
% \centering
% \includegraphics[width=\textwidth]{job_similarities_4296426-out/bin_aggzeros-0.6923--99timeseries4687419}
% \caption{Rank\,100, SIM=}
% \end{subfigure}
% \caption{Job-S with bin\_aggzero, selection of similar jobs}
% \label{fig:job-S-bin-aggzeros}
% \end{figure}
2020-08-20 15:16:46 +00:00
\subsection { Job-M}
2020-08-27 15:11:08 +00:00
Inspecting the Top\, 100 for this reference jobs is highlighting the differences between the algorithms.
All algorithms identify a diverse range of job names for this reference job in the Top\, 100.
Firstly, the name of the reference job appears 30 times in the whole dataset so this kind job type isn't necessarily executed frequently and, therefore, our Top\, 100 is expected to contain other names.
Some applications are more prominent in these sets, e.g., for bin\_ aggzero, 32\, jobs contain WRF (a model) in the name.
The number of unique names is 19, 38, 49 to 51 for bin\_ aggzero, hex\_ phases, hex\_ native and hex\_ lev, respectively.
The jobs that are similar according to the bin algorithms differ from our expectation.
2020-08-20 17:24:56 +00:00
2020-08-20 15:16:46 +00:00
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-20 17:24:56 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/bin_ aggzeros-0.7755--1timeseries8010306}
2020-08-27 15:11:08 +00:00
\caption { Rank\, 2, SIM=78\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/bin_ aggzeros-0.7347--14timeseries4498983}
2020-08-27 15:11:08 +00:00
\caption { Rank\, 15, SIM=73\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-20 17:24:56 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/bin_ aggzeros-0.5102--99timeseries5120077}
2020-08-27 15:11:08 +00:00
\caption { Rank\, 100, SIM=51\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\caption { Job-M with Bin-Aggzero, selection of similar jobs}
\label { fig:job-M-bin-aggzero}
\end { figure}
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-20 17:24:56 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ lev-0.9546--1timeseries7826634}
2020-08-27 15:11:08 +00:00
\caption { Rank\, 2, SIM=95\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-20 17:24:56 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ lev-0.9365--2timeseries5240733}
2020-08-27 15:11:08 +00:00
\caption { Rank 3, SIM=94\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ lev-0.7392--15timeseries7651420}
2020-08-27 15:11:08 +00:00
\caption { Rank\, 15, SIM=74\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ lev-0.7007--99timeseries8201967}
2020-08-27 15:11:08 +00:00
\caption { Rank\, 100, SIM=70\% }
2020-08-20 15:16:46 +00:00
\end { subfigure}
2020-08-20 17:24:56 +00:00
\caption { Job-M with hex\_ lev, selection of similar jobs}
\label { fig:job-M-hex-lev}
2020-08-20 15:16:46 +00:00
\end { figure}
2020-08-20 17:24:56 +00:00
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ native-0.9878--1timeseries5240733}
2020-08-27 15:11:08 +00:00
\caption { Rank 2, SIM=99\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ native-0.9651--2timeseries7826634}
2020-08-27 15:11:08 +00:00
\caption { Rank 3, SIM=97\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ native-0.9084--14timeseries8037817}
2020-08-27 15:11:08 +00:00
\caption { Rank 15, SIM=91\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ native-0.8838--99timeseries7571967}
2020-08-27 15:11:08 +00:00
\caption { Rank 100, SIM=88\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\caption { Job-M with hex\_ native, selection of similar jobs}
\label { fig:job-M-hex-native}
\end { figure}
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ phases-0.8831--1timeseries7826634}
2020-08-27 15:11:08 +00:00
\caption { Rank 2, SIM=88\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ phases-0.7963--2timeseries5240733}
2020-08-27 15:11:08 +00:00
\caption { Rank 3, SIM=80\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ phases-0.4583--14timeseries4244400}
2020-08-27 15:11:08 +00:00
\caption { Rank 15, SIM=46\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 5024292-out/hex_ phases-0.2397--99timeseries7644009}
2020-08-27 15:11:08 +00:00
\caption { Rank 100, SIM=24\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\caption { Job-M with hex\_ phases, selection of similar jobs}
\label { fig:job-M-hex-phases}
\end { figure}
2020-08-20 15:16:46 +00:00
\subsection { Job-L}
2020-08-27 15:11:08 +00:00
For the bin algorithms, the inspection of job names (14 unique names) leads to two prominent applications: bash and xmessy with 45 and 48 instances, respectively.
2020-10-04 14:11:46 +00:00
The hex algorithms identify a more diverse set of applications (18 unique names and no xmessy job), and the hex\_ phases algorithm has 85 unique names.
The ks algorithm finds 71 jobs ending with t127, which is a typical model configuration.
2020-08-20 15:16:46 +00:00
2020-08-20 17:24:56 +00:00
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/bin_ aggzeros-0.1671--1timeseries7869050}
2020-08-27 15:11:08 +00:00
\caption { Rank 2, SIM=17\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/bin_ aggzeros-0.1671--2timeseries7990497}
2020-08-27 15:11:08 +00:00
\caption { Rank 3, SIM=17\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/bin_ aggzeros-0.1521--14timeseries8363584}
2020-08-27 15:11:08 +00:00
\caption { Rank 15, SIM=15\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/bin_ aggzeros-0.1097--97timeseries4262983}
2020-08-27 15:11:08 +00:00
\caption { Rank 100, SIM=11\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\caption { Job-L with bin\_ aggzero, selection of similar jobs}
\label { fig:job-L-bin-aggzero}
\end { figure}
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ lev-0.9386--1timeseries7266845}
2020-08-27 15:11:08 +00:00
\caption { Rank 2, SIM=94\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ lev-0.9375--2timeseries7214657}
2020-08-27 15:11:08 +00:00
\caption { Rank 3, SIM=94\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ lev-0.7251--14timeseries4341304}
2020-08-27 15:11:08 +00:00
\caption { Rank 15, SIM=73\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ lev-0.1657--99timeseries8036223}
2020-08-27 15:11:08 +00:00
\caption { Rank 100, SIM=17\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\caption { Job-L with hex\_ lev, selection of similar jobs}
2020-08-21 18:12:33 +00:00
\label { fig:job-L-hex-lev}
2020-08-20 17:24:56 +00:00
\end { figure}
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ native-0.9390--1timeseries7266845}
2020-08-27 15:11:08 +00:00
\caption { Rank 2, SIM=94\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ native-0.9333--2timeseries7214657}
2020-08-27 15:11:08 +00:00
\caption { Rank 3, SIM=93\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ native-0.8708--14timeseries4936553}
2020-08-27 15:11:08 +00:00
\caption { Rank 15, SIM=87\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ native-0.1695--99timeseries7942052}
2020-08-27 15:11:08 +00:00
\caption { Rank 100, SIM=17\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\caption { Job-L with hex\_ native, selection of similar jobs}
\label { fig:job-L-hex-native}
\end { figure}
\begin { figure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ phases-1.0000--14timeseries4577917}
2020-08-27 15:11:08 +00:00
\caption { Rank 2, SIM=100\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ phases-1.0000--1timeseries4405671}
2020-08-27 15:11:08 +00:00
\caption { Rank 3, SIM=100\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ phases-1.0000--2timeseries4621422}
2020-08-27 15:11:08 +00:00
\caption { Rank 15, SIM=100\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\begin { subfigure} { 0.3\textwidth }
\centering
2020-08-21 16:21:33 +00:00
\includegraphics [width=\textwidth] { job_ similarities_ 7488914-out/hex_ phases-1.0000--99timeseries4232293}
2020-08-27 15:11:08 +00:00
\caption { Rank 100, SIM=100\% }
2020-08-20 17:24:56 +00:00
\end { subfigure}
\caption { Job-L with hex\_ phases, selection of similar jobs}
\label { fig:job-L-hex-phases}
\end { figure}
2020-08-20 15:16:46 +00:00
\section { Conclusion}
2020-08-18 12:58:39 +00:00
\label { sec:summary}
2020-08-26 14:09:14 +00:00
One consideration could be to identify jobs that are found by all algorithms, i.e., jobs that meet a certain (rank) threshold for different algorithms.
That would increase the likelihood that these jobs are very similar and what the user is looking for.
2020-10-04 14:11:46 +00:00
The ks algorithm finds jobs with similar histograms which is not necessarily what we are looking for.
2020-08-18 12:58:39 +00:00
%\printbibliography
\end { document}