191 lines
4.9 KiB
TeX
191 lines
4.9 KiB
TeX
\let\accentvec\vec
|
|
\documentclass[]{llncs}
|
|
|
|
\usepackage{todonotes}
|
|
\newcommand{\eb}[1]{\todo[inline]{(EB): #1}}
|
|
\newcommand{\jk}[1]{\todo[inline]{JK: #1}}
|
|
|
|
\usepackage{silence}
|
|
\WarningFilter{biblatex}{Using}
|
|
\WarningFilter{latex}{Float too large}
|
|
\WarningFilter{caption}{Unsupported}
|
|
\WarningFilter{caption}{Unknown document}
|
|
|
|
\let\spvec\vec
|
|
\let\vec\accentvec
|
|
\usepackage{amsmath}
|
|
\let\vec\spvec
|
|
|
|
\usepackage{array}
|
|
\usepackage{xcolor}
|
|
\usepackage{color}
|
|
\usepackage{colortbl}
|
|
\usepackage{subcaption}
|
|
\usepackage{hyperref}
|
|
\usepackage{listings}
|
|
\usepackage{lstautogobble}
|
|
\usepackage[listings,skins,breakable,raster,most]{tcolorbox}
|
|
\usepackage{caption}
|
|
|
|
|
|
\lstset{
|
|
numberbychapter=false,
|
|
belowskip=-10pt,
|
|
aboveskip=-10pt,
|
|
}
|
|
|
|
\lstdefinestyle{lstcodebox} {
|
|
basicstyle=\scriptsize\ttfamily,
|
|
autogobble=true,
|
|
tabsize=2,
|
|
captionpos=b,
|
|
float,
|
|
}
|
|
|
|
\usepackage{graphicx}
|
|
\graphicspath{
|
|
{./pictures/},
|
|
{../fig/}
|
|
}
|
|
|
|
\usepackage[backend=bibtex, style=numeric]{biblatex}
|
|
\addbibresource{bibliography.bib}
|
|
|
|
|
|
\usepackage{enumitem}
|
|
\setitemize{noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt}
|
|
|
|
\definecolor{darkgreen}{rgb}{0,0.5,0}
|
|
\definecolor{darkyellow}{rgb}{0.7,0.7,0}
|
|
|
|
|
|
\usepackage{cleveref}
|
|
\crefname{codecount}{Code}{Codes}
|
|
|
|
\title{Using Machine Learning to Identify Similar Jobs Based on their IO Behavior}
|
|
\author{Julian Kunkel\inst{2} \and Eugen Betke\inst{1}}
|
|
|
|
\institute{
|
|
University of Reading--%
|
|
\email{j.m.kunkel@reading.ac.uk}%
|
|
\and
|
|
DKRZ --
|
|
\email{betke@dkrz.de}%
|
|
}
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
|
|
Support staff.
|
|
Problem, a particular job found that isn't performing well.
|
|
Now how can we find similar jobs?
|
|
|
|
Problem with definition of similarity.
|
|
|
|
In this paper, a methodology and algorithms to identify similar jobs based on profiles and time series are illustrated.
|
|
Similar to a study.
|
|
|
|
Research questions: is this effective to find similar jobs?
|
|
|
|
The contribution of this paper...
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
|
|
%This paper is structured as follows.
|
|
%We start with the related work in \Cref{sec:relwork}.
|
|
%Then, in TODO we introduce the DKRZ monitoring systems and explain how I/O metrics are captured by the collectors.
|
|
%In \Cref{sec:methodology} we describe the data reduction and the machine learning approaches and do an experiment in \Cref{sec:data,sec:evaluation}.
|
|
%Finally, we finalize our paper with a summary in \Cref{sec:summary}.
|
|
|
|
\section{Related Work}
|
|
\label{sec:relwork}
|
|
|
|
\section{Methodology}
|
|
\label{sec:methodology}
|
|
|
|
Given: the reference job ID.
|
|
Create from 4D time series data (number of nodes, per file systems, 9 metrics, time) a feature set.
|
|
|
|
Adapt the algorithms:
|
|
\begin{itemize}
|
|
\item iterate for all jobs
|
|
\begin{itemize}
|
|
\item compute distance to reference job
|
|
\end{itemize}
|
|
\item sort the jobs based on the distance to ref job
|
|
\item create cumulative job distribution based on distance for visualization, allow users to output jobs with a given distance
|
|
\end{itemize}
|
|
|
|
A user might be interested to explore say closest 10 or 50 jobs.
|
|
|
|
Algorithms:
|
|
Profile algorithm: job-profiles (job-duration, job-metrics, combine both)
|
|
$\rightarrow$ just compute geom-mean distance between profile
|
|
|
|
Check time series algorithms:
|
|
|
|
\begin{itemize}
|
|
\item bin
|
|
\item hex\_native
|
|
\item hex\_lev
|
|
\item hex\_quant
|
|
\end{itemize}
|
|
|
|
\section{Evaluation}
|
|
\label{sec:evaluation}
|
|
|
|
Two study examples (two reference jobs):
|
|
\begin{itemize}
|
|
\item job-short: shorter length, e.g. 5-10, that has a little bit IO in at least two metadata metrics (more better).
|
|
\item job-mixed:
|
|
\item job-long: a very IO intensive longer job, e.g., length $>$ 20, with IO read or write and maybe one other metrics.
|
|
\end{itemize}
|
|
|
|
For each reference job: create CSV file which contains all jobs with:
|
|
\begin{itemize}
|
|
\item JOB ID, for each algorithm: the coding and the computed ranking $\rightarrow$ thus one long row.
|
|
\end{itemize}
|
|
Alternatively, could be one CSV for each algorithm that contains JOB ID, coding + rank
|
|
|
|
Create histograms + cumulative job distribution for all algorithms.
|
|
Insert job profiles for closest 10 jobs.
|
|
|
|
Potentially, analyze how the rankings of different similarities look like.
|
|
|
|
\Cref{fig:refJobs}
|
|
|
|
\begin{figure}
|
|
\begin{subfigure}{0.8\textwidth}
|
|
\includegraphics[width=\textwidth]{job-timeseries4296426}
|
|
\caption{Job-S} \label{fig:job-S}
|
|
\end{subfigure}
|
|
|
|
\caption{Reference jobs: timeline of mean IO activity}
|
|
\label{fig:refJobs}
|
|
\end{figure}
|
|
|
|
|
|
\begin{figure}\ContinuedFloat
|
|
|
|
\begin{subfigure}{0.8\textwidth}
|
|
\includegraphics[width=\textwidth]{job-timeseries5024292}
|
|
\caption{Job-M} \label{fig:job-M}
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.8\textwidth}
|
|
\includegraphics[width=\textwidth]{job-timeseries7488914-30.pdf}
|
|
\caption{Job-L (first 30 segments of 400; remaining segments are similar)}
|
|
\label{fig:job-L}
|
|
\end{subfigure}
|
|
\caption{Reference jobs: timeline of mean IO activity; non-shown timelines are 0}
|
|
\end{figure}
|
|
|
|
|
|
\section{Summary and Conclusion}
|
|
\label{sec:summary}
|
|
|
|
%\printbibliography
|
|
\end{document}
|