diff --git a/paper/main.tex b/paper/main.tex index a32622a..9c833f2 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -987,10 +987,24 @@ As expected, the histograms mimics the profile of the reference job, and thus, t \section{Conclusion} \label{sec:summary} -One consideration could be to identify jobs that are found by all algorithms, i.e., jobs that meet a certain (rank) threshold for different algorithms. +In this article, we conducted a study to identify similar jobs based on timelines of nine I/O statistics. +Therefore, we applied six different algorithmic strategies developed before and included this time as well a distance metric based on the Kolmogorov-Smirnov-Test. +The quantitative analysis shows that a diverse set of results can be found and that only a tiny subset of the 500k jobs is very similar to each of the three reference jobs. +For the small post-processing job, which is executed many times, all algorithms produce suitable results. +For Job-M, the algorithms exhibit a different behavior. +Job-L is tricky to analyze, because it is compute intense with only a single I/O phase at the beginning. +Generally, the KS algorithm finds jobs with similar histograms which are not necessarily what we subjectively are looking for. + +We found that the approach to compute similarity of a reference jobs to all jobs and ranking these based on their similarity was successful to find related jobs that we were interested in. +The HEX\_lev and HEX\_native work best according to our subjective qualitative analysis. +Typically, a related job stems from the same user/group and may have a related job name but the approach was inclusive. +However, all algorithms perform their task as intended. +The pre-processing of the algorithms and distance metrics differ leading to a different definition of similarity. +The the data center support/user must define how to define similarity to select the algorithm that suits best. +Another consideration could be to identify jobs that are found by all algorithms, i.e., jobs that meet a certain (rank) threshold for different algorithms. That would increase the likelihood that these jobs are very similar and what the user is looking for. -The KS algorithm finds jobs with similar histograms which are not necessarily what we are looking for. +Our next step is to foster a discussion in the community to identify and define suitable similarity metrics for the different analysis purposes. \printbibliography \end{document}