performance section
Before Width: | Height: | Size: 69 KiB After Width: | Height: | Size: 65 KiB |
Before Width: | Height: | Size: 90 KiB After Width: | Height: | Size: 165 KiB |
Before Width: | Height: | Size: 75 KiB After Width: | Height: | Size: 71 KiB |
Before Width: | Height: | Size: 109 KiB After Width: | Height: | Size: 203 KiB |
Before Width: | Height: | Size: 73 KiB After Width: | Height: | Size: 71 KiB |
Before Width: | Height: | Size: 89 KiB After Width: | Height: | Size: 142 KiB |
|
@ -150,6 +150,9 @@ Finally, we conclude our paper in \Cref{sec:summary}.
|
||||||
|
|
||||||
Clustering of jobs based on their names
|
Clustering of jobs based on their names
|
||||||
|
|
||||||
|
Multivariate time series
|
||||||
|
Levenshtein distance also known as Edit Distance (ED).
|
||||||
|
|
||||||
Vampir clustering of timelines of a single job.
|
Vampir clustering of timelines of a single job.
|
||||||
|
|
||||||
\section{Methodology}
|
\section{Methodology}
|
||||||
|
@ -328,10 +331,11 @@ The runtime is normalized for 100k jobs, i.e., for BIN\_all it takes about 41\,s
|
||||||
Generally, the bin algorithms are fastest, while the hex algorithms take often 4-5x as long.
|
Generally, the bin algorithms are fastest, while the hex algorithms take often 4-5x as long.
|
||||||
Hex\_phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L.
|
Hex\_phases is slow for Job-S and Job-M while it is fast for Job-L, the reason is that just one phase is extracted for Job-L.
|
||||||
The Levenshtein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window.
|
The Levenshtein based algorithms take longer for longer jobs -- proportional to the job length as it applies a sliding window.
|
||||||
|
The KS algorithm is faster than the others by 10x but it operates on the statistics of the time series.
|
||||||
|
|
||||||
Note that the current algorithms are sequential and executed on just one core.
|
Note that the current algorithms are sequential and executed on just one core.
|
||||||
For computing the similarity to one (or a small set of reference jobs), they could easily be parallelized.
|
For computing the similarity to one (or a small set of reference jobs), they could easily be parallelized.
|
||||||
We believe this will then allow a near-online analysis of a job.
|
We believe this will then allow a near-online analysis of a job.
|
||||||
\jk{To analyze KS jobs}
|
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
|
|
|
@ -11,6 +11,9 @@ prefix = args[2]
|
||||||
|
|
||||||
# Plot the performance numbers of the analysis
|
# Plot the performance numbers of the analysis
|
||||||
data = read.csv(file)
|
data = read.csv(file)
|
||||||
|
levels(data$alg_name)[levels(data$alg_name) == "bin_aggzeros"] = "bin_aggz"
|
||||||
|
levels(data$alg_name)[levels(data$alg_name) == "hex_native"] = "hex_nat"
|
||||||
|
levels(data$alg_name)[levels(data$alg_name) == "hex_phases"] = "hex_phas"
|
||||||
|
|
||||||
e = data %>% filter(jobs_done >= (jobs_total - 9998))
|
e = data %>% filter(jobs_done >= (jobs_total - 9998))
|
||||||
e$time_per_100k = e$elapsed / (e$jobs_done / 100000)
|
e$time_per_100k = e$elapsed / (e$jobs_done / 100000)
|
||||||
|
|