performance section

This commit is contained in:
Julian M. Kunkel 2020-10-23 09:54:47 +01:00
parent f4c0f27aad
commit c66723cfb0
8 changed files with 8 additions and 1 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 69 KiB

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 165 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 75 KiB

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 109 KiB

After

Width:  |  Height:  |  Size: 203 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 89 KiB

After

Width:  |  Height:  |  Size: 142 KiB

View File

@ -150,6 +150,9 @@ Finally, we conclude our paper in \Cref{sec:summary}.
Clustering of jobs based on their names
Multivariate time series
Levenshtein distance also known as Edit Distance (ED).
Vampir clustering of timelines of a single job.
\section{Methodology}
@ -328,10 +331,11 @@ The runtime is normalized for 100k jobs, i.e., for BIN\_all it takes about 41\,s
Generally, the bin algorithms are fastest, while the hex algorithms take often 4-5x as long.
Hex\_phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L.
The Levenshtein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window.
The KS algorithm is faster than the others by 10x but it operates on the statistics of the time series.
Note that the current algorithms are sequential and executed on just one core.
For computing the similarity to one (or a small set of reference jobs), they could easily be parallelized.
We believe this will then allow a near-online analysis of a job.
\jk{To analyze KS jobs}
\begin{figure}
\centering

View File

@ -11,6 +11,9 @@ prefix = args[2]
# Plot the performance numbers of the analysis
data = read.csv(file)
levels(data$alg_name)[levels(data$alg_name) == "bin_aggzeros"] = "bin_aggz"
levels(data$alg_name)[levels(data$alg_name) == "hex_native"] = "hex_nat"
levels(data$alg_name)[levels(data$alg_name) == "hex_phases"] = "hex_phas"
e = data %>% filter(jobs_done >= (jobs_total - 9998))
e$time_per_100k = e$elapsed / (e$jobs_done / 100000)