Conclusion

2020-11-20 09:58:22 +00:00 · 2020-11-20 09:58:22 +00:00 · fd867d55f0
commit fd867d55f0
parent 0a581098bb
1 changed files with 16 additions and 2 deletions
--- a/paper/main.tex
+++ b/paper/main.tex
@ -987,10 +987,24 @@ As expected, the histograms mimics the profile of the reference job, and thus, t
 \section{Conclusion}
 \label{sec:summary}

-One consideration could be to identify jobs that are found by all algorithms, i.e., jobs that meet a certain (rank) threshold for different algorithms.
+In this article, we conducted a study to identify similar jobs based on timelines of nine I/O statistics.
+Therefore, we applied six different algorithmic strategies developed before and included this time as well a distance metric based on the Kolmogorov-Smirnov-Test.
+The quantitative analysis shows that a diverse set of results can be found and that only a tiny subset of the 500k jobs is very similar to each of the three reference jobs.
+For the small post-processing job, which is executed many times, all algorithms produce suitable results.
+For Job-M, the algorithms exhibit a different behavior.
+Job-L is tricky to analyze, because it is compute intense with only a single I/O phase at the beginning.
+Generally, the KS algorithm finds jobs with similar histograms which are not necessarily what we subjectively are looking for.
+
+We found that the approach to compute similarity of a reference jobs to all jobs and ranking these based on their similarity was successful to find related jobs that we were interested in.
+The HEX\_lev and HEX\_native work best according to our subjective qualitative analysis.
+Typically, a related job stems from the same user/group and may have a related job name but the approach was inclusive.
+However, all algorithms perform their task as intended.
+The pre-processing of the algorithms and distance metrics differ leading to a different definition of similarity.
+The the data center support/user must define how to define similarity to select the algorithm that suits best.
+Another consideration could be to identify jobs that are found by all algorithms, i.e., jobs that meet a certain (rank) threshold for different algorithms.
 That would increase the likelihood that these jobs are very similar and what the user is looking for.

-The KS algorithm finds jobs with similar histograms which are not necessarily what we are looking for.
+Our next step is to foster a discussion in the community to identify and define suitable similarity metrics for the different analysis purposes.

 \printbibliography
 \end{document}