\let\accentvec\vec \documentclass[]{llncs} \usepackage{todonotes} \newcommand{\eb}[1]{\todo[inline]{(EB): #1}} \newcommand{\jk}[1]{\todo[inline]{JK: #1}} \usepackage{silence} \WarningFilter{biblatex}{Using} \WarningFilter{latex}{Float too large} \WarningFilter{caption}{Unsupported} \WarningFilter{caption}{Unknown document} \let\spvec\vec \let\vec\accentvec \usepackage{amsmath} \let\vec\spvec \usepackage{array} \usepackage{xcolor} \usepackage{color} \usepackage{colortbl} \usepackage{subcaption} \usepackage{hyperref} \usepackage{listings} \usepackage{lstautogobble} \usepackage[listings,skins,breakable,raster,most]{tcolorbox} \usepackage{caption} \lstset{ numberbychapter=false, belowskip=-10pt, aboveskip=-10pt, } \lstdefinestyle{lstcodebox} { basicstyle=\scriptsize\ttfamily, autogobble=true, tabsize=2, captionpos=b, float, } \usepackage{graphicx} \graphicspath{ {./pictures/}, {../fig/}, {../} } \usepackage[backend=bibtex, style=numeric]{biblatex} \addbibresource{bibliography.bib} \usepackage{enumitem} \setitemize{noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt} \definecolor{darkgreen}{rgb}{0,0.5,0} \definecolor{darkyellow}{rgb}{0.7,0.7,0} \usepackage{cleveref} \crefname{codecount}{Code}{Codes} \title{Using Machine Learning to Identify Similar Jobs Based on their IO Behavior} \author{Julian Kunkel\inst{2} \and Eugen Betke\inst{1}} \institute{ University of Reading--% \email{j.m.kunkel@reading.ac.uk}% \and DKRZ -- \email{betke@dkrz.de}% } \begin{document} \maketitle \begin{abstract} Support staff. Problem, a particular job found that isn't performing well. Now how can we find similar jobs? Problem with definition of similarity. In this paper, a methodology and algorithms to identify similar jobs based on profiles and time series are illustrated. Similar to a study. Research questions: is this effective to find similar jobs? The contribution of this paper... \end{abstract} \section{Introduction} %This paper is structured as follows. %We start with the related work in \Cref{sec:relwork}. %Then, in TODO we introduce the DKRZ monitoring systems and explain how I/O metrics are captured by the collectors. %In \Cref{sec:methodology} we describe the data reduction and the machine learning approaches and do an experiment in \Cref{sec:data,sec:evaluation}. %Finally, we finalize our paper with a summary in \Cref{sec:summary}. \section{Related Work} \label{sec:relwork} \section{Methodology} \label{sec:methodology} Given: the reference job ID. Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set. Adapt the algorithms: \begin{itemize} \item iterate for all jobs \begin{itemize} \item compute distance to reference job \end{itemize} \item sort the jobs based on the distance to ref job \item create cumulative job distribution based on distance for visualization, allow users to output jobs with a given distance \end{itemize} A user might be interested to explore say closest 10 or 50 jobs. Algorithms: Profile algorithm: job-profiles (job-duration, job-metrics, combine both) $\rightarrow$ just compute geom-mean distance between profile Check time series algorithms: \begin{itemize} \item bin \item hex\_native \item hex\_lev \item hex\_quant \end{itemize} \section{Evaluation} \label{sec:evaluation} In the following, we assume a job is given and we aim to identify similar jobs. We chose several reference jobs with different compute and IO characteristics visualized in \Cref{fig:refJobs}: \begin{itemize} \item Job-S: performs postprocessing on a single node. This is a typical process in climate science where data products are reformatted and annotated with metadata to a standard representation (so called CMORization). The post-processing is IO intensive. \item Job-M: a typical MPI parallel 8-hour compute job on 128 nodes which writes time series data after some spin up. %CHE.ws12 \item Job-L: a 66-hour 20-node job. The initialization data is read at the beginning. Then only a single master node writes constantly a small volume of data; in fact, the generated data is too small to be categorized as IO relevant. \end{itemize} For each reference job and algorithm, we created a CSV files with the computed similarity for all other jobs. Sollte man was zur Laufzeit der Algorithmen sagen? Denke Daten zu haben wäre sinnvoll. Create histograms + cumulative job distribution for all algorithms. Insert job profiles for closest 10 jobs. Potentially, analyze how the rankings of different similarities look like. \begin{figure} \begin{subfigure}{0.8\textwidth} \centering \includegraphics[width=\textwidth]{job-timeseries4296426} \caption{Job-S} \label{fig:job-S} \end{subfigure} \centering \caption{Reference jobs: timeline of mean IO activity} \label{fig:refJobs} \end{figure} \begin{figure}\ContinuedFloat \begin{subfigure}{0.8\textwidth} \centering \includegraphics[width=\textwidth]{job-timeseries5024292} \caption{Job-M} \label{fig:job-M} \end{subfigure} \centering \begin{subfigure}{0.8\textwidth} \centering \includegraphics[width=\textwidth]{job-timeseries7488914-30.pdf} \caption{Job-L (first 30 segments of 400; remaining segments are similar)} \label{fig:job-L} \end{subfigure} \centering \caption{Reference jobs: timeline of mean IO activity; non-shown timelines are 0} \end{figure} \begin{figure} \begin{subfigure}{0.8\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/ecdf.png} \caption{Job-S} \label{fig:ecdf-job-S} \end{subfigure} \centering \begin{subfigure}{0.8\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/ecdf.png} \caption{Job-M} \label{fig:ecdf-job-M} \end{subfigure} \centering \begin{subfigure}{0.8\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_7488914-out/ecdf.png} \caption{Job-L} \label{fig:ecdf-job-L} \end{subfigure} \centering \caption{Empirical cumulative density function} \label{fig:ecdf} \end{figure} \begin{figure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hist-sim} \caption{Job-S} \label{fig:hist-job-S} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/hist-sim} \caption{Job-M} \label{fig:hist-job-M} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_7488914-out/hist-sim} \caption{Job-L} \label{fig:hist-job-L} \end{subfigure} \centering \caption{Histogram for the number of jobs (bin width: 2.5\%, numbers are the actual job counts)} \label{fig:hist} \end{figure} \subsection{Quantitative Analysis of Selected Jobs} \begin{table} \caption{User and Group Information} \end{table} \begin{figure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/jobs-nodes} \caption{Job-S} \label{fig:nodes-job-S} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/jobs-nodes} \caption{Job-M} \label{fig:nodes-job-M} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_7488914-out/jobs-nodes} \caption{Job-L} \label{fig:nodes-job-L} \end{subfigure} \centering \caption{Distribution of node counts} \label{fig:nodes-job} \end{figure} \begin{figure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/jobs-elapsed} \caption{Job-S} \label{fig:runtime-job-S} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/jobs-elapsed} \caption{Job-M} \label{fig:runtime-job-M} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_7488914-out/jobs-elapsed} \caption{Job-L} \label{fig:runtime-job-L} \end{subfigure} \centering \caption{Distribution of elapsed runtime} \label{fig:runtime-job} \end{figure} Different algorithms ... \begin{figure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/intersection-heatmap} \caption{Job-S} \label{fig:heatmap-job-S} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/intersection-heatmap} \caption{Job-M} \label{fig:heatmap-job-M} \end{subfigure} \begin{subfigure}{0.5\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_7488914-out/intersection-heatmap} \caption{Job-L} \label{fig:heatmap-job-L} \end{subfigure} \centering \caption{Intersection of the top 100 jobs for the different algorithms} \label{fig:heatmap-job} \end{figure} \section{Assessing Timelines for Similar Jobs} \subsection{Job-S} \begin{figure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_lev-0.9615-timeseries4297102} \caption{Rank 2, SIM=0.9615} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_lev-0.9017-timeseries4570701} \caption{Rank 15, SIM=0.9017} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_lev-0.7901-timeseries4693267} \caption{Rank\,100, SIM=0.790} \end{subfigure} \caption{Job-S with Hex-Lev, selection of similar jobs} \label{fig:job-S-hex-lev} \end{figure} \begin{figure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_native-0.9808-timeseries4567314} \caption{Rank 2, SIM=} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_native-0.9375-timeseries4709700} \caption{Rank 15, SIM=} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_native-0.9001-timeseries4527630} \caption{Rank\,100, SIM=} \end{subfigure} \caption{Job-S with Hex-Native, selection of similar jobs} \label{fig:job-S-hex-native} \end{figure} \begin{figure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_phases-0.9153-timeseries4567314} \caption{Rank 2, $SIM=$ (same job as hex native Top1)} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_native-0.9268-timeseries4557849} \caption{Rank 15, $SIM=$} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_4296426-out/hex_phases-0.7382-timeseries4693267} \caption{Rank\,100, $SIM=$ } \end{subfigure} \caption{Job-S with Hex-Phases, selection of similar jobs} \label{fig:job-S-hex-phases} \end{figure} % \ContinuedFloat Bin aggzeros works quite well here too. \subsection{Job-M} \begin{figure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/bin_aggzeros-0.7755-timeseries7907734} \caption{Rank 2, $SIM=$} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/bin_aggzeros-0.7347-timeseries4244400} \caption{$SIM=$} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/bin_aggzeros-0.5306-timeseries8038026} \caption{$SIM=$ } \end{subfigure} \caption{Job-M with Bin-Aggzero, selection of similar jobs} \label{fig:job-M-bin-aggzero} \end{figure} \begin{figure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/bin_aggzeros-0.7755-timeseries7907734} \caption{Rank 2, $SIM=$} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/bin_aggzeros-0.7347-timeseries4244400} \caption{$SIM=$} \end{subfigure} \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{job_similarities_5024292-out/bin_aggzeros-0.5306-timeseries8038026} \caption{$SIM=$ } \end{subfigure} \caption{Job-M with Bin-Aggzero, selection of similar jobs} \label{fig:job-M-bin-aggzero} \end{figure} \subsection{Job-L} \section{Conclusion} \label{sec:summary} %\printbibliography \end{document}