Mit Grammarly geprueft
This commit is contained in:
		
							parent
							
								
									4cb07e0f85
								
							
						
					
					
						commit
						ced1384734
					
				| @ -64,7 +64,7 @@ In particular, we sketch a methodology that utilizes temporal I/O similarity to | ||||
| Practically, we apply several previously developed time series algorithms. | ||||
| A study is conducted to explore the effectiveness of the approach by investigating related jobs for a  reference job. | ||||
| The data stem from DKRZ's supercomputer Mistral and include more than 500,000 jobs that have been executed for more than 6 months of operation.  | ||||
| Our analysis shows that the strategy and algorithms bear potential to identify similar jobs but more testing is necessary. | ||||
| Our analysis shows that the strategy and algorithms bear the potential to identify similar jobs, but more testing is necessary. | ||||
| \end{abstract} | ||||
| 
 | ||||
| 
 | ||||
| @ -84,7 +84,7 @@ The support staff should focus on workloads for which optimization is beneficial | ||||
| By ranking jobs based on their utilization, it is easy to find a job that exhibits extensive usage of computing, network, and I/O resources. | ||||
| However, would it be beneficial to investigate this workload in detail and potentially optimize it? | ||||
| For instance, a pattern that is observed in many jobs bears potential as the blueprint for optimizing one job may be applied to other jobs as well. | ||||
| This is particularly true when running one application with similar inputs but also different applications may lead to similar behavior. | ||||
| This is particularly true when running one application with similar inputs, but also different applications may lead to similar behavior. | ||||
| Knowing details about a problematic or interesting job may be transferred to similar jobs. | ||||
| Therefore, it is useful for support staff (or a user) that investigates a resource-hungry job to identify similar jobs that are executed on the supercomputer. | ||||
| 
 | ||||
| @ -93,7 +93,7 @@ Re-executing the same job will lead to slightly different behavior, a program ma | ||||
| Job names are defined by users; while a similar name may hint to be a similar workload, finding other applications with the same I/O behavior would not be possible. | ||||
| 
 | ||||
| In the paper \cite{Eugen20HPS}, we developed several distance measures and algorithms for the clustering of jobs based on the time series and their I/O behavior. | ||||
| These distance measures can be applied to jobs with different runtime and number of nodes utilized but differ in the way they define similarity. | ||||
| These distance measures can be applied to jobs with different runtimes and the number of nodes utilized, but differ in the way they define similarity. | ||||
| They showed that the metrics can be used to cluster jobs, however, it remained unclear if the method can be used by data center staff to explore similar jobs effectively. | ||||
| In this paper, we refine these algorithms slightly, include another algorithm, and apply them to rank jobs based on their temporal similarity to a reference job. | ||||
| 
 | ||||
| @ -132,7 +132,7 @@ Vampir generally supports the clustering of process timelines of a single job, a | ||||
| 
 | ||||
| %Chameleon \cite{bahmani2018chameleon} extends ScalaTrace for recording MPI traces but reduces the overhead by clustering processes and collecting information from one representative of each cluster. | ||||
| %For the clustering, a signature is created for each process that includes the call-graph. | ||||
| In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs showing the general effectiveness of the approach. | ||||
| In \cite{halawa2020unsupervised}, 11 performance metrics including CPU and network are utilized for agglomerative clustering of jobs, showing the general effectiveness of the approach. | ||||
| In \cite{rodrigo2018towards}, a characterization of the NERSC workload is performed based on job scheduler information (profiles). | ||||
| Profiles that include the MPI activities have shown effective to identify the code that is executed \cite{demasi2013identifying}. | ||||
| Many approaches for clustering applications operate on profiles for compute, network, and I/O \cite{emeras2015evalix,liu2020characterization,bang2020hpc}. | ||||
| @ -155,10 +155,10 @@ Therefore, we first need to define how a job's data is represented, then describ | ||||
| On the Mistral supercomputer at DKRZ, the monitoring system \cite{betke20} gathers in ten seconds intervals on all nodes nine I/O metrics for the two Lustre file systems together with general job metadata from the SLURM workload manager. | ||||
| The results are 4D data (time, nodes, metrics, file system) per job. | ||||
| The distance measures should handle jobs of different lengths and node count. | ||||
| In the open access article \cite{Eugen20HPS}\footnote{\scriptsize \url{https://zenodo.org/record/4478960/files/jhps-incubator-06-temporal-29-jan.pdf}}, we discussed a variety of options from 1D job-profiles to data reductions to compare time series data and the general workflow and pre-processing in detail.  | ||||
| In the open-access article \cite{Eugen20HPS}\footnote{\scriptsize \url{https://zenodo.org/record/4478960/files/jhps-incubator-06-temporal-29-jan.pdf}}, we discussed a variety of options from 1D job-profiles to data reductions to compare time series data and the general workflow and pre-processing in detail.  | ||||
| We will be using this representation. | ||||
| In a nutshell, for each job executed on Mistral, they partitioned it into 10 minutes segments\footnote{We found in preliminary experiments that 10 minutes reduces noise, i.e., the variation of the statistics when re-running the same job.} and compute the arithmetic mean of each metric, categorize the value into NonIO (0), HighIO (1), and CriticalIO (4) for values below 99-percentile, up to 99.9-percentile, and above, respectively.  | ||||
| The values are chosen to be 0, 1, and 4 because we arithmetically derive metrics: naturally the value of 0 will indicate that no I/O issue appears; we weight critical I/O to be 4x as important as high I/O. | ||||
| The values are chosen to be 0, 1, and 4 because we arithmetically derive metrics: naturally, the value of 0 will indicate that no I/O issue appears; we weight critical I/O to be 4x as important as high I/O. | ||||
| This strategy ensures that the same approach can be applied to other HPC systems regardless of the actual distribution of these statistics on that data center. | ||||
| After the mean value across nodes is computed for a segment, the resulting numeric value is encoded either using binary (I/O activity on the segment: yes/no) or hexadecimal representation (quantizing the numerical performance value into 0-15) which is then ready for similarity analysis. | ||||
| By pre-filtering jobs with no I/O activity -- their sum across all dimensions and time series is equal to zero, the dataset is reduced from 1 million jobs to about 580k jobs. | ||||
| @ -167,12 +167,12 @@ By pre-filtering jobs with no I/O activity -- their sum across all dimensions an | ||||
| \subsection{Algorithms for Computing Similarity} | ||||
| We reuse the B and Q algorithms developed in~\cite{Eugen20HPS}: B-all, B-aggz(eros), Q-native, Q-lev, and Q-phases. | ||||
| They differ in the way data similarity is defined; either the time series is encoded in binary or hexadecimal quantization, the distance measure is the Euclidean distance or the Levenshtein-distance. | ||||
| B-all determines similarity between binary codings by means of Levenshtein distance. | ||||
| B-all determines the similarity between binary codings by means of Levenshtein distance. | ||||
| B-aggz is similar to B-all, but computes similarity on binary codings where subsequent segments of zero activities are replaced by just one zero. | ||||
| Q-lev determines similarity between quantized codings by using Levenshtein distance. | ||||
| Q-lev determines the similarity between quantized codings by using Levenshtein distance. | ||||
| Q-native uses a performance-aware similarity function, i.e., the distance between two jobs for a metric is $\frac{|m_{job1} - m_{job2}|}{16}$. | ||||
| There are various options for how a longer job is embedded in a shorter job, for example, a larger input file may stretch the length of the I/O and compute phases; another option can be that more (model) time is simulated. In this article, we consider these different behavioral patterns and attempt to identify situations where the I/O pattern of a long job is contained in a shorter job. Therefore, for jobs with different lengths, a sliding-windows approach is applied which finds the location for the shorter job in the long job with the highest similarity. | ||||
| Q-phases extract phase information and performs a phase-aware and performance-aware similarity computation. | ||||
| Q-phases extracts phase information and performs a phase-aware and performance-aware similarity computation. | ||||
| The Q-phases algorithm extracts I/O phases from our 10-minute segments and computes the similarity between the most similar I/O phases of both jobs. | ||||
| 
 | ||||
| 
 | ||||
| @ -180,7 +180,7 @@ The Q-phases algorithm extracts I/O phases from our 10-minute segments and compu | ||||
| Our strategy for localizing similar jobs works as follows: | ||||
| \begin{itemize} | ||||
|   \item A user\footnote{This can be support staff or a data center user that was executing the job.} provides a reference job ID and selects a similarity algorithm. | ||||
|   \item The system iterates over all jobs of the job pool computing the similarity to the reference job using the specified algorithm. | ||||
|   \item The system iterates over all jobs of the job pool, computing the similarity to the reference job using the specified algorithm. | ||||
|   \item It sorts the jobs based on the similarity to the reference job. | ||||
|   \item It visualizes the cumulative job similarity allowing the user to understand how job similarity is distributed. | ||||
|   \item The user starts the inspection by looking at the most similar jobs first. | ||||
| @ -188,12 +188,12 @@ Our strategy for localizing similar jobs works as follows: | ||||
| The user can decide about the criterion when to stop inspecting jobs; based on the similarity, the number of investigated jobs, or the distribution of the job similarity. | ||||
| For the latter, it is interesting to investigate clusters of similar jobs, e.g., if there are many jobs between 80-90\% similarity but few between 70-80\%. | ||||
| 
 | ||||
| For the inspection of the jobs, a user may explore the job metadata, searching for similarities, and explore the time series of a job's I/O metrics. | ||||
| For the inspection of the jobs, a user may explore the job metadata, search for similarities, and explore the time series of a job's I/O metrics. | ||||
| 
 | ||||
| \section{Reference Job}% | ||||
| \label{sec:refjobs} | ||||
| 
 | ||||
| For this study, we chose the reference job called Job-M: a typical MPI parallel 8-hour compute job on 128 nodes which write time series data after some spin up.   %CHE.ws12 | ||||
| For this study, we chose the reference job called Job-M: a typical MPI parallel 8-hour compute job on 128 nodes that write time series data after some spin up.   %CHE.ws12 | ||||
| The segmented timelines of the job are visualized in \Cref{fig:refJobs} -- remember that the mean value is computed across all nodes on which the job ran. | ||||
| This coding is also used for the Q algorithms, thus this representation is what the algorithms will analyze; B algorithms merge all timelines together as described in~\cite{Eugen20HPS}. | ||||
| The figures show the values of active metrics ($\neq 0$); if few are active, then they are shown in one timeline, otherwise, they are rendered individually to provide a better overview. | ||||
| @ -222,10 +222,10 @@ Finally, the quantitative behavior of the 100 most similar jobs is investigated. | ||||
| To measure the performance for computing the similarity to the reference job, the algorithms are executed 10 times on a compute node at DKRZ which is equipped with two Intel Xeon E5-2680v3 @2.50GHz and 64GB DDR4 RAM. | ||||
| A boxplot for the runtimes is shown in \Cref{fig:performance}. | ||||
| The runtime is normalized for 100k jobs, i.e., for B-all it takes about 41\,s to process 100k jobs out of the 500k total jobs that this algorithm will process. | ||||
| Generally, the B algorithms are fastest, while the Q algorithms often take 4-5x as long. | ||||
| Q\_phases and Levenshtein based algorithm are significantly slower.  | ||||
| Generally, the B algorithms are the fastest, while the Q algorithms often take 4-5x as long. | ||||
| Q\_phases and Levenshtein-based algorithms are significantly slower.  | ||||
| Note that the current algorithms are sequential and executed on just one core. | ||||
| They could easily be parallelized which would then allow for an online analysis. | ||||
| They could easily be parallelized, which would then allow an online analysis. | ||||
| 
 | ||||
| \begin{figure} | ||||
| 
 | ||||
| @ -253,7 +253,7 @@ In the quantitative analysis, we explore the different algorithms how the simila | ||||
| The support team in a data center may have time to investigate the most similar jobs. | ||||
| Time for the analysis is typically bound, for instance, the team may analyze the 100 most similar jobs and rank them; we refer to them as the Top\,100 jobs, and \textit{Rank\,i} refers to the job that has the i-th highest similarity to the reference job -- sometimes these values can be rather close together as we see in the histogram in | ||||
| \Cref{fig:hist} for the actual number of jobs with a given similarity. | ||||
| As we focus on a feasible number of jobs, we crop it at 100 jobs (total number of jobs is still given). | ||||
| As we focus on a feasible number of jobs, we crop it at 100 jobs (the total number of jobs is still given). | ||||
| It turns out that both B algorithms produce nearly identical histograms, and we omit one of them. | ||||
| In the figures, we can see again a different behavior of the algorithms depending on the reference job. | ||||
| We can see a cluster with jobs of higher similarity (for B-all and Q-native at a similarity of 75\%).  | ||||
| @ -273,7 +273,7 @@ Practically, the support team would start with Rank\,1 (most similar job, e.g., | ||||
| 
 | ||||
| When analyzing the overall population of jobs executed on a system, we expect that some workloads are executed several times (with different inputs but with the same configuration) or are executed with slightly different configurations (e.g., node counts, timesteps). | ||||
| Thus, potentially our similarity analysis of the job population may just identify the re-execution of the same workload. | ||||
| Typically, the support staff would identify the re-execution of jobs by inspecting job names which are user-defined generic strings. | ||||
| Typically, the support staff would identify the re-execution of jobs by inspecting job names, which are user-defined generic strings. | ||||
| 
 | ||||
| To understand if the analysis is inclusive and identifies different applications, we use two approaches with our Top\,100 jobs: | ||||
| We explore the distribution of users (and groups), runtime, and node count across jobs. | ||||
| @ -284,8 +284,8 @@ To confirm the hypotheses presented, we analyzed the job metadata comparing job | ||||
| \paragraph{User distribution.} | ||||
| To understand how the Top\,100 are distributed across users, the data is grouped by userid and counted. | ||||
| \Cref{fig:userids} shows the stacked user information, where the lowest stack is the user with the most jobs and the topmost user in the stack has the smallest number of jobs. | ||||
| Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev, and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms. | ||||
| We didn't include the group analysis in the figure as user count and group id is proportional, at most the number of users is 2x the number of groups. | ||||
| Jobs from 13 users are included; about 25\% of jobs stem from the same user; Q-lev and Q-native include more users (29, 33, and 37, respectively) than the other three algorithms. | ||||
| We didn't include the group analysis in the figure as user count and group id are proportional, at most the number of users is 2x the number of groups. | ||||
| Thus, a user is likely from the same group and the number of groups is similar to the number of unique users. | ||||
| 
 | ||||
| \paragraph{Node distribution.} | ||||
| @ -295,7 +295,7 @@ We can observe that the range of nodes for similar jobs is between 1 and 128. | ||||
| 
 | ||||
| \paragraph{Runtime distribution.} | ||||
| The job runtime of the Top\,100 jobs is shown using boxplots in \Cref{fig:runtime-job}. | ||||
| While all algorithms can compute the similarity between jobs of different length, the B algorithms and Q-native penalize jobs of different length preferring jobs of very similar length. | ||||
| While all algorithms can compute the similarity between jobs of different lengths, the B algorithms and Q-native penalize jobs of different lengths, preferring jobs of very similar lengths. | ||||
| Q-phases is able to identify much shorter or longer jobs. | ||||
| 
 | ||||
| \begin{figure} | ||||
| @ -325,10 +325,10 @@ We subjectively found that the approach works very well and identifies suitable | ||||
| To demonstrate this, we include a selection of job timelines and selected interesting job profiles. | ||||
| Inspecting the Top\,100 is highlighting the differences between the algorithms. | ||||
| All algorithms identify a diverse range of job names for this reference job in the Top\,100. | ||||
| The number of unique names is 19, 38, 49, and 51 for B-aggzero, Q-phases, Q-native and Q-lev, respectively. | ||||
| The number of unique names is 19, 38, 49, and 51 for B-aggzero, Q-phases, Q-native, and Q-lev, respectively. | ||||
| 
 | ||||
| When inspecting their timelines, the jobs that are similar according to the B algorithms (see \Cref{fig:job-M-bin-aggzero}) subjectively appear to us to be different.  | ||||
| The reason lies in the definition of the B-* similarity which aggregate all I/O statistics into one timeline. | ||||
| The reason lies in the definition of the B-* similarity, which aggregates all I/O statistics into one timeline. | ||||
| The other algorithms like Q-lev (\Cref{fig:job-M-hex-lev}) and Q-native (\Cref{fig:job-M-hex-native}) seem to work as intended: | ||||
| While jobs exhibit short bursts of other active metrics even for low similarity, we can eyeball a relevant similarity particularly for Rank\,2 and Rank\,3 which have the high similarity of 90+\%. For Rank\,15 to Rank\,100, with around 70\% similarity, a partial match of the metrics is still given. | ||||
| 
 | ||||
| @ -417,9 +417,9 @@ While jobs exhibit short bursts of other active metrics even for low similarity, | ||||
| We introduced a methodology to identify similar jobs based on timelines of nine I/O statistics. | ||||
| The quantitative analysis shows that a diverse set of results can be found and that only a tiny subset of the 500k jobs is very similar to our reference job representing a typical HPC activity. | ||||
| The Q-lev and Q-native work best according to our subjective qualitative analysis. | ||||
| Related jobs stems from the same user/group and may have a related job name, but the approach was able to find other jobs as well. | ||||
| This was a first exploration of this methodology.  | ||||
| In the future, we will expand the study comparing more jobs in order to identify the suitability of the methodology. | ||||
| Related jobs stem from the same user/group and may have a related job name, but the approach was able to find other jobs as well. | ||||
| This was the first exploration of this methodology.  | ||||
| In the future, we will expand the study by comparing more jobs in order to identify the suitability of the methodology. | ||||
| 
 | ||||
| \printbibliography% | ||||
| 
 | ||||
|  | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user