@ -136,25 +136,26 @@ Check time series algorithms:
\section{Evaluation}
\label{sec:evaluation}
Two study examples (two reference jobs):
In the following, we assume a job is given and we aim to identify similar jobs.
We chose several reference jobs with different compute and IO characteristics visualized in \Cref{fig:refJobs}:
\begin{itemize}
\item job-short: shorter length, e.g. 5-10, that has a little bit IO in at least two metadata metrics (more better).
\item job-mixed:
\item job-long: a very IO intensive longer job, e.g., length $>$ 20, with IO read or write and maybe one other metrics.
\item Job-S: performs postprocessing on a single node. This is a typical process in climate science where data products are reformatted and annotated with metadata to a standard representation (so called CMORization). The post-processing is IO intensive.
\item Job-M: a typical MPI parallel 8-hour compute job on 128 nodes which writes time series data after some spin up. %CHE.ws12
\item Job-L: a 66-hour 20-node job.
The initialization data is read at the beginning.
Then only a single master node writes constantly a small volume of data; in fact, the generated data is too small to be categorized as IO relevant.
\end{itemize}
For each reference job: create CSV file which contains all jobs with:
\begin{itemize}
\item JOB ID, for each algorithm: the coding and the computed ranking $\rightarrow$ thus one long row.
\end{itemize}
Alternatively, could be one CSV for each algorithm that contains JOB ID, coding + rank
For each reference job and algorithm, we created a CSV files with the computed similarity for all other jobs.
Sollte man was zur Laufzeit der Algorithmen sagen? Denke Daten zu haben wäre sinnvoll.
Create histograms + cumulative job distribution for all algorithms.
Insert job profiles for closest 10 jobs.
Potentially, analyze how the rankings of different similarities look like.
ggplot(data,aes(similarity,color=alg_name,group=alg_name))+stat_ecdf(geom="step")+xlab("SIM")+ylab("Fraction of jobs")+theme(legend.position="bottom")+scale_color_brewer(palette="Set2")