Conclusion

2020-11-20 09:58:22 +00:00 · 2020-11-20 09:58:22 +00:00 · fd867d55f0
commit fd867d55f0
parent 0a581098bb
1 changed files with 16 additions and 2 deletions
--- a/paper/main.tex
+++ b/paper/main.tex
@ -987,10 +987,24 @@ As expected, the histograms mimics the profile of the reference job, and thus, t
 \section{Conclusion}
 \label{sec:summary}
-One consideration could be to identify jobs that are found by all algorithms, i.e., jobs that meet a certain (rank) threshold for different algorithms.
+In this article, we conducted a study to identify similar jobs based on timelines of nine I/O statistics.
 Therefore, we applied six different algorithmic strategies developed before and included this time as well a distance metric based on the Kolmogorov-Smirnov-Test.
 The quantitative analysis shows that a diverse set of results can be found and that only a tiny subset of the 500k jobs is very similar to each of the three reference jobs.
 For the small post-processing job, which is executed many times, all algorithms produce suitable results.
 For Job-M, the algorithms exhibit a different behavior.
 Job-L is tricky to analyze, because it is compute intense with only a single I/O phase at the beginning.
 Generally, the KS algorithm finds jobs with similar histograms which are not necessarily what we subjectively are looking for.
 We found that the approach to compute similarity of a reference jobs to all jobs and ranking these based on their similarity was successful to find related jobs that we were interested in.
 The HEX\_lev and HEX\_native work best according to our subjective qualitative analysis.
 Typically, a related job stems from the same user/group and may have a related job name but the approach was inclusive.
 However, all algorithms perform their task as intended.
 The pre-processing of the algorithms and distance metrics differ leading to a different definition of similarity.
 The the data center support/user must define how to define similarity to select the algorithm that suits best.
 Another consideration could be to identify jobs that are found by all algorithms, i.e., jobs that meet a certain (rank) threshold for different algorithms.
 That would increase the likelihood that these jobs are very similar and what the user is looking for.
-The KS algorithm finds jobs with similar histograms which are not necessarily what we are looking for.
+Our next step is to foster a discussion in the community to identify and define suitable similarity metrics for the different analysis purposes.
 \printbibliography
 \end{document}