performance section

2020-10-23 09:54:47 +01:00 · 2020-10-23 09:54:47 +01:00 · c66723cfb0
commit c66723cfb0
parent f4c0f27aad
8 changed files with 8 additions and 1 deletions
--- a/fig/progress_4296426-out-boxplot.png
+++ b/fig/progress_4296426-out-boxplot.png
--- a/fig/progress_4296426-out-cummulative.png
+++ b/fig/progress_4296426-out-cummulative.png
--- a/fig/progress_5024292-out-boxplot.png
+++ b/fig/progress_5024292-out-boxplot.png
--- a/fig/progress_5024292-out-cummulative.png
+++ b/fig/progress_5024292-out-cummulative.png
--- a/fig/progress_7488914-out-boxplot.png
+++ b/fig/progress_7488914-out-boxplot.png
--- a/fig/progress_7488914-out-cummulative.png
+++ b/fig/progress_7488914-out-cummulative.png
--- a/paper/main.tex
+++ b/paper/main.tex
@ -150,6 +150,9 @@ Finally, we conclude our paper in \Cref{sec:summary}.

 Clustering of jobs based on their names

+Multivariate time series
+Levenshtein distance also known as Edit Distance (ED).
+
 Vampir clustering of timelines of a single job.

 \section{Methodology}
@ -328,10 +331,11 @@ The runtime is normalized for 100k jobs, i.e., for BIN\_all it takes about 41\,s
 Generally, the bin algorithms are fastest, while the hex algorithms take often 4-5x as long.
 Hex\_phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L.
 The Levenshtein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window.
+The KS algorithm is faster than the others by 10x but it operates on the statistics of the time series.
+
 Note that the current algorithms are sequential and executed on just one core.
 For computing the similarity to one (or a small set of reference jobs), they could easily be parallelized.
 We believe this will then allow a near-online analysis of a job.
-\jk{To analyze KS jobs}

 \begin{figure}
 \centering
--- a/scripts/plot-performance.R
+++ b/scripts/plot-performance.R
@ -11,6 +11,9 @@ prefix = args[2]

 # Plot the performance numbers of the analysis
 data = read.csv(file)
+levels(data$alg_name)[levels(data$alg_name) == "bin_aggzeros"] = "bin_aggz"
+levels(data$alg_name)[levels(data$alg_name) == "hex_native"] = "hex_nat"
+levels(data$alg_name)[levels(data$alg_name) == "hex_phases"] = "hex_phas"

 e = data %>% filter(jobs_done >= (jobs_total - 9998))
 e$time_per_100k = e$elapsed / (e$jobs_done / 100000)