diff --git a/fig/progress_4296426-out-boxplot.png b/fig/progress_4296426-out-boxplot.png index 44bf303..8ad3765 100644 Binary files a/fig/progress_4296426-out-boxplot.png and b/fig/progress_4296426-out-boxplot.png differ diff --git a/fig/progress_4296426-out-cummulative.png b/fig/progress_4296426-out-cummulative.png index 661c18d..d4f4d42 100644 Binary files a/fig/progress_4296426-out-cummulative.png and b/fig/progress_4296426-out-cummulative.png differ diff --git a/fig/progress_5024292-out-boxplot.png b/fig/progress_5024292-out-boxplot.png index 051105e..3e3a9d3 100644 Binary files a/fig/progress_5024292-out-boxplot.png and b/fig/progress_5024292-out-boxplot.png differ diff --git a/fig/progress_5024292-out-cummulative.png b/fig/progress_5024292-out-cummulative.png index db82c14..52fe3d1 100644 Binary files a/fig/progress_5024292-out-cummulative.png and b/fig/progress_5024292-out-cummulative.png differ diff --git a/fig/progress_7488914-out-boxplot.png b/fig/progress_7488914-out-boxplot.png index 073af56..d565916 100644 Binary files a/fig/progress_7488914-out-boxplot.png and b/fig/progress_7488914-out-boxplot.png differ diff --git a/fig/progress_7488914-out-cummulative.png b/fig/progress_7488914-out-cummulative.png index 852b9fa..e259836 100644 Binary files a/fig/progress_7488914-out-cummulative.png and b/fig/progress_7488914-out-cummulative.png differ diff --git a/paper/main.tex b/paper/main.tex index ea83c0a..69b4818 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -150,6 +150,9 @@ Finally, we conclude our paper in \Cref{sec:summary}. Clustering of jobs based on their names +Multivariate time series +Levenshtein distance also known as Edit Distance (ED). + Vampir clustering of timelines of a single job. \section{Methodology} @@ -328,10 +331,11 @@ The runtime is normalized for 100k jobs, i.e., for BIN\_all it takes about 41\,s Generally, the bin algorithms are fastest, while the hex algorithms take often 4-5x as long. Hex\_phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L. The Levenshtein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window. +The KS algorithm is faster than the others by 10x but it operates on the statistics of the time series. + Note that the current algorithms are sequential and executed on just one core. For computing the similarity to one (or a small set of reference jobs), they could easily be parallelized. We believe this will then allow a near-online analysis of a job. -\jk{To analyze KS jobs} \begin{figure} \centering diff --git a/scripts/plot-performance.R b/scripts/plot-performance.R index 4433b4b..6186095 100755 --- a/scripts/plot-performance.R +++ b/scripts/plot-performance.R @@ -11,6 +11,9 @@ prefix = args[2] # Plot the performance numbers of the analysis data = read.csv(file) +levels(data$alg_name)[levels(data$alg_name) == "bin_aggzeros"] = "bin_aggz" +levels(data$alg_name)[levels(data$alg_name) == "hex_native"] = "hex_nat" +levels(data$alg_name)[levels(data$alg_name) == "hex_phases"] = "hex_phas" e = data %>% filter(jobs_done >= (jobs_total - 9998)) e$time_per_100k = e$elapsed / (e$jobs_done / 100000)