Mit Grammarly geprueft
This commit is contained in:
parent
4cb07e0f85
commit
ced1384734
|
@ -64,7 +64,7 @@ In particular, we sketch a methodology that utilizes temporal I/O similarity to
|
|||
Practically, we apply several previously developed time series algorithms.
|
||||
A study is conducted to explore the effectiveness of the approach by investigating related jobs for a reference job.
|
||||
The data stem from DKRZ's supercomputer Mistral and include more than 500,000 jobs that have been executed for more than 6 months of operation.
|
||||
Our analysis shows that the strategy and algorithms bear potential to identify similar jobs but more testing is necessary.
|
||||
Our analysis shows that the strategy and algorithms bear the potential to identify similar jobs, but more testing is necessary.
|
||||
\end{abstract}
|
||||
|
||||
|
||||
|
@ -84,7 +84,7 @@ The support staff should focus on workloads for which optimization is beneficial
|
|||
By ranking jobs based on their utilization, it is easy to find a job that exhibits extensive usage of computing, network, and I/O resources.
|
||||
However, would it be beneficial to investigate this workload in detail and potentially optimize it?
|
||||
For instance, a pattern that is observed in many jobs bears potential as the blueprint for optimizing one job may be applied to other jobs as well.
|
||||
This is particularly true when running one application with similar inputs but also different applications may lead to similar behavior.
|
||||
This is particularly true when running one application with similar inputs, but also different applications may lead to similar behavior.
|
||||
Knowing details about a problematic or interesting job may be transferred to similar jobs.
|
||||
Therefore, it is useful for support staff (or a user) that investigates a resource-hungry job to identify similar jobs that are executed on the supercomputer.
|
||||
|
||||
|
@ -93,7 +93,7 @@ Re-executing the same job will lead to slightly different behavior, a program ma
|
|||
Job names are defined by users; while a similar name may hint to be a similar workload, finding other applications with the same I/O behavior would not be possible.
|
||||
|
||||
In the paper \cite{Eugen20HPS}, we developed several distance measures and algorithms for the clustering of jobs based on the time series and their I/O behavior.
|
||||
These distance measures can be applied to jobs with different runtime and number of nodes utilized but differ in the way they define similarity.
|
||||
These distance measures can be applied to jobs with different runtimes and the number of nodes utilized, but differ in the way they define similarity.
|
||||
They showed that the metrics can be used to cluster jobs, however, it remained unclear if the method can be used by data center staff to explore similar jobs effectively.
|
||||
In this paper, we refine these algorithms slightly, include another algorithm, and apply them to rank jobs based on their temporal similarity to a reference job.
|
||||
|
||||
|
@ -132,7 +132,7 @@ Vampir generally supports the clustering of process timelines of a single job, a
|
|||
|
||||
%Chameleon \cite{bahmani2018chameleon} extends ScalaTrace for recording MPI traces but reduces the overhead by clustering processes and collecting information from one representative of each cluster.
|
||||
%For the clustering, a signature is created for each process that includes the call-graph.
|
||||
In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectiveness of the approach.
|
||||
In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs, showing the general effectiveness of the approach.
|
||||
In \cite{rodrigo2018towards}, a characterization of the NERSC workload is performed based on job scheduler information (profiles).
|
||||
Profiles that include the MPI activities have shown effective to identify the code that is executed \cite{demasi2013identifying}.
|
||||
Many approaches for clustering applications operate on profiles for compute, network, and I/O \cite{emeras2015evalix,liu2020characterization,bang2020hpc}.
|
||||
|
@ -155,10 +155,10 @@ Therefore, we first need to define how a job's data is represented, then describ
|
|||
On the Mistral supercomputer at DKRZ, the monitoring system \cite{betke20} gathers in ten seconds intervals on all nodes nine I/O metrics for the two Lustre file systems together with general job metadata from the SLURM workload manager.
|
||||
The results are 4D data (time, nodes, metrics, file system) per job.
|
||||
The distance measures should handle jobs of different lengths and node count.
|
||||
In the open access article \cite{Eugen20HPS}\footnote{\scriptsize \url{https://zenodo.org/record/4478960/files/jhps-incubator-06-temporal-29-jan.pdf}}, we discussed a variety of options from 1D job-profiles to data reductions to compare time series data and the general workflow and pre-processing in detail.
|
||||
In the open-access article \cite{Eugen20HPS}\footnote{\scriptsize \url{https://zenodo.org/record/4478960/files/jhps-incubator-06-temporal-29-jan.pdf}}, we discussed a variety of options from 1D job-profiles to data reductions to compare time series data and the general workflow and pre-processing in detail.
|
||||
We will be using this representation.
|
||||
In a nutshell, for each job executed on Mistral, they partitioned it into 10 minutes segments\footnote{We found in preliminary experiments that 10 minutes reduces noise, i.e., the variation of the statistics when re-running the same job.} and compute the arithmetic mean of each metric, categorize the value into NonIO (0), HighIO (1), and CriticalIO (4) for values below 99-percentile, up to 99.9-percentile, and above, respectively.
|
||||
The values are chosen to be 0, 1, and 4 because we arithmetically derive metrics: naturally the value of 0 will indicate that no I/O issue appears; we weight critical I/O to be 4x as important as high I/O.
|
||||
The values are chosen to be 0, 1, and 4 because we arithmetically derive metrics: naturally, the value of 0 will indicate that no I/O issue appears; we weight critical I/O to be 4x as important as high I/O.
|
||||
This strategy ensures that the same approach can be applied to other HPC systems regardless of the actual distribution of these statistics on that data center.
|
||||
After the mean value across nodes is computed for a segment, the resulting numeric value is encoded either using binary (I/O activity on the segment: yes/no) or hexadecimal representation (quantizing the numerical performance value into 0-15) which is then ready for similarity analysis.
|
||||
By pre-filtering jobs with no I/O activity -- their sum across all dimensions and time series is equal to zero, the dataset is reduced from 1 million jobs to about 580k jobs.
|
||||
|
@ -167,12 +167,12 @@ By pre-filtering jobs with no I/O activity -- their sum across all dimensions an
|
|||
\subsection{Algorithms for Computing Similarity}
|
||||
We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases.
|
||||
They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance.
|
||||
B-all determines similarity between binary codings by means of Levenshtein distance.
|
||||
B-all determines the similarity between binary codings by means of Levenshtein distance.
|
||||
B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero.
|
||||
Q-lev determines similarity between quantized codings by using Levenshtein distance.
|
||||
Q-lev determines the similarity between quantized codings by using Levenshtein distance.
|
||||
Q-native uses a performance-aware similarity function, i.e., the distance between two jobs for a metric is $\frac{|m_{job1} - m_{job2}|}{16}$.
|
||||
There are various options for how a longer job is embedded in a shorter job, for example, a larger input file may stretch the length of the I/O and compute phases; another option can be that more (model) time is simulated. In this article, we consider these different behavioral patterns and attempt to identify situations where the I/O pattern of a long job is contained in a shorter job. Therefore, for jobs with different lengths, a sliding-windows approach is applied which finds the location for the shorter job in the long job with the highest similarity.
|
||||
Q-phases extract phase information and performs a phase-aware and performance-aware similarity computation.
|
||||
Q-phases extracts phase information and performs a phase-aware and performance-aware similarity computation.
|
||||
The Q-phases algorithm extracts I/O phases from our 10-minute segments and computes the similarity between the most similar I/O phases of both jobs.
|
||||
|
||||
|
||||
|
@ -180,7 +180,7 @@ The Q-phases algorithm extracts I/O phases from our 10-minute segments and compu
|
|||
Our strategy for localizing similar jobs works as follows:
|
||||
\begin{itemize}
|
||||
\item A user\footnote{This can be support staff or a data center user that was executing the job.} provides a reference job ID and selects a similarity algorithm.
|
||||
\item The system iterates over all jobs of the job pool computing the similarity to the reference job using the specified algorithm.
|
||||
\item The system iterates over all jobs of the job pool, computing the similarity to the reference job using the specified algorithm.
|
||||
\item It sorts the jobs based on the similarity to the reference job.
|
||||
\item It visualizes the cumulative job similarity allowing the user to understand how job similarity is distributed.
|
||||
\item The user starts the inspection by looking at the most similar jobs first.
|
||||
|
@ -188,12 +188,12 @@ Our strategy for localizing similar jobs works as follows:
|
|||
The user can decide about the criterion when to stop inspecting jobs; based on the similarity, the number of investigated jobs, or the distribution of the job similarity.
|
||||
For the latter, it is interesting to investigate clusters of similar jobs, e.g., if there are many jobs between 80-90\% similarity but few between 70-80\%.
|
||||
|
||||
For the inspection of the jobs, a user may explore the job metadata, searching for similarities, and explore the time series of a job's I/O metrics.
|
||||
For the inspection of the jobs, a user may explore the job metadata, search for similarities, and explore the time series of a job's I/O metrics.
|
||||
|
||||
\section{Reference Job}%
|
||||
\label{sec:refjobs}
|
||||
|
||||
For this study, we chose the reference job called Job-M: a typical MPI parallel 8-hour compute job on 128 nodes which write time series data after some spin up. %CHE.ws12
|
||||
For this study, we chose the reference job called Job-M: a typical MPI parallel 8-hour compute job on 128 nodes that write time series data after some spin up. %CHE.ws12
|
||||
The segmented timelines of the job are visualized in \Cref{fig:refJobs} -- remember that the mean value is computed across all nodes on which the job ran.
|
||||
This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}.
|
||||
The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview.
|
||||
|
@ -222,10 +222,10 @@ Finally, the quantitative behavior of the 100 most similar jobs is investigated.
|
|||
To measure the performance for computing the similarity to the reference job, the algorithms are executed 10 times on a compute node at DKRZ which is equipped with two Intel Xeon E5-2680v3 @2.50GHz and 64GB DDR4 RAM.
|
||||
A boxplot for the runtimes is shown in \Cref{fig:performance}.
|
||||
The runtime is normalized for 100k jobs, i.e., for B-all it takes about 41\,s to process 100k jobs out of the 500k total jobs that this algorithm will process.
|
||||
Generally, the B algorithms are fastest, while the Q algorithms often take 4-5x as long.
|
||||
Q\_phases and Levenshtein based algorithm are significantly slower.
|
||||
Generally, the B algorithms are the fastest, while the Q algorithms often take 4-5x as long.
|
||||
Q\_phases and Levenshtein-based algorithms are significantly slower.
|
||||
Note that the current algorithms are sequential and executed on just one core.
|
||||
They could easily be parallelized which would then allow for an online analysis.
|
||||
They could easily be parallelized, which would then allow an online analysis.
|
||||
|
||||
\begin{figure}
|
||||
|
||||
|
@ -253,7 +253,7 @@ In the quantitative analysis, we explore the different algorithms how the simila
|
|||
The support team in a data center may have time to investigate the most similar jobs.
|
||||
Time for the analysis is typically bound, for instance, the team may analyze the 100 most similar jobs and rank them; we refer to them as the Top\,100 jobs, and \textit{Rank\,i} refers to the job that has the i-th highest similarity to the reference job -- sometimes these values can be rather close together as we see in the histogram in
|
||||
\Cref{fig:hist} for the actual number of jobs with a given similarity.
|
||||
As we focus on a feasible number of jobs, we crop it at 100 jobs (total number of jobs is still given).
|
||||
As we focus on a feasible number of jobs, we crop it at 100 jobs (the total number of jobs is still given).
|
||||
It turns out that both B algorithms produce nearly identical histograms, and we omit one of them.
|
||||
In the figures, we can see again a different behavior of the algorithms depending on the reference job.
|
||||
We can see a cluster with jobs of higher similarity (for B-all and Q-native at a similarity of 75\%).
|
||||
|
@ -273,7 +273,7 @@ Practically, the support team would start with Rank\,1 (most similar job, e.g.,
|
|||
|
||||
When analyzing the overall population of jobs executed on a system, we expect that some workloads are executed several times (with different inputs but with the same configuration) or are executed with slightly different configurations (e.g., node counts, timesteps).
|
||||
Thus, potentially our similarity analysis of the job population may just identify the re-execution of the same workload.
|
||||
Typically, the support staff would identify the re-execution of jobs by inspecting job names which are user-defined generic strings.
|
||||
Typically, the support staff would identify the re-execution of jobs by inspecting job names, which are user-defined generic strings.
|
||||
|
||||
To understand if the analysis is inclusive and identifies different applications, we use two approaches with our Top\,100 jobs:
|
||||
We explore the distribution of users (and groups), runtime, and node count across jobs.
|
||||
|
@ -284,8 +284,8 @@ To confirm the hypotheses presented, we analyzed the job metadata comparing job
|
|||
\paragraph{User distribution.}
|
||||
To understand how the Top\,100 are distributed across users, the data is grouped by userid and counted.
|
||||
\Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs.
|
||||
Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev, and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms.
|
||||
We didn't include the group analysis in the figure as user count and group id is proportional, at most the number of users is 2x the number of groups.
|
||||
Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms.
|
||||
We didn't include the group analysis in the figure as user count and group id are proportional, at most the number of users is 2x the number of groups.
|
||||
Thus, a user is likely from the same group and the number of groups is similar to the number of unique users.
|
||||
|
||||
\paragraph{Node distribution.}
|
||||
|
@ -295,7 +295,7 @@ We can observe that the range of nodes for similar jobs is between 1 and 128.
|
|||
|
||||
\paragraph{Runtime distribution.}
|
||||
The job runtime of the Top\,100 jobs is shown using boxplots in \Cref{fig:runtime-job}.
|
||||
While all algorithms can compute the similarity between jobs of different length, the B algorithms and Q-native penalize jobs of different length preferring jobs of very similar length.
|
||||
While all algorithms can compute the similarity between jobs of different lengths, the B algorithms and Q-native penalize jobs of different lengths, preferring jobs of very similar lengths.
|
||||
Q-phases is able to identify much shorter or longer jobs.
|
||||
|
||||
\begin{figure}
|
||||
|
@ -325,10 +325,10 @@ We subjectively found that the approach works very well and identifies suitable
|
|||
To demonstrate this, we include a selection of job timelines and selected interesting job profiles.
|
||||
Inspecting the Top\,100 is highlighting the differences between the algorithms.
|
||||
All algorithms identify a diverse range of job names for this reference job in the Top\,100.
|
||||
The number of unique names is 19, 38, 49, and 51 for B-aggzero, Q-phases, Q-native and Q-lev, respectively.
|
||||
The number of unique names is 19, 38, 49, and 51 for B-aggzero, Q-phases, Q-native, and Q-lev, respectively.
|
||||
|
||||
When inspecting their timelines, the jobs that are similar according to the B algorithms (see \Cref{fig:job-M-bin-aggzero}) subjectively appear to us to be different.
|
||||
The reason lies in the definition of the B-* similarity which aggregate all I/O statistics into one timeline.
|
||||
The reason lies in the definition of the B-* similarity, which aggregates all I/O statistics into one timeline.
|
||||
The other algorithms like Q-lev (\Cref{fig:job-M-hex-lev}) and Q-native (\Cref{fig:job-M-hex-native}) seem to work as intended:
|
||||
While jobs exhibit short bursts of other active metrics even for low similarity, we can eyeball a relevant similarity particularly for Rank\,2 and Rank\,3 which have the high similarity of 90+\%. For Rank\,15 to Rank\,100, with around 70\% similarity, a partial match of the metrics is still given.
|
||||
|
||||
|
@ -417,9 +417,9 @@ While jobs exhibit short bursts of other active metrics even for low similarity,
|
|||
We introduced a methodology to identify similar jobs based on timelines of nine I/O statistics.
|
||||
The quantitative analysis shows that a diverse set of results can be found and that only a tiny subset of the 500k jobs is very similar to our reference job representing a typical HPC activity.
|
||||
The Q-lev and Q-native work best according to our subjective qualitative analysis.
|
||||
Related jobs stems from the same user/group and may have a related job name, but the approach was able to find other jobs as well.
|
||||
This was a first exploration of this methodology.
|
||||
In the future, we will expand the study comparing more jobs in order to identify the suitability of the methodology.
|
||||
Related jobs stem from the same user/group and may have a related job name, but the approach was able to find other jobs as well.
|
||||
This was the first exploration of this methodology.
|
||||
In the future, we will expand the study by comparing more jobs in order to identify the suitability of the methodology.
|
||||
|
||||
\printbibliography%
|
||||
|
||||
|
|
Loading…
Reference in New Issue