Abstract
This commit is contained in:
parent
fd867d55f0
commit
c6dd942e2c
|
@ -87,17 +87,17 @@ DKRZ --
|
||||||
One goal of support staff at a data center is to identify inefficient jobs and to improve their efficiency.
|
One goal of support staff at a data center is to identify inefficient jobs and to improve their efficiency.
|
||||||
Therefore, a data center deploys monitoring systems that capture the behavior of the executed jobs.
|
Therefore, a data center deploys monitoring systems that capture the behavior of the executed jobs.
|
||||||
While it is easy to utilize statistics to rank jobs based on the utilization of computing, storage, and network, it is tricky to find patterns in 100.000 jobs, i.e., is there a class of jobs that aren't performing well.
|
While it is easy to utilize statistics to rank jobs based on the utilization of computing, storage, and network, it is tricky to find patterns in 100.000 jobs, i.e., is there a class of jobs that aren't performing well.
|
||||||
Similarly, when support staff investigates a specific job in detail, e.g., because it is inefficient or highly efficient, it is relevant to identify related jobs.
|
Similarly, when support staff investigates a specific job in detail, e.g., because it is inefficient or highly efficient, it is relevant to identify related jobs to such a blueprint.
|
||||||
This allows staff to understand the usage of the exhibited behavior better and to assess the optimization potential.
|
This allows staff to understand the usage of the exhibited behavior better and to assess the optimization potential.
|
||||||
|
|
||||||
\medskip
|
\medskip
|
||||||
|
|
||||||
In this paper, a methodology to rank the similarity of all jobs to a reference job based on their temporal IO behavior is described.
|
In this paper, a methodology to rank the similarity of all jobs to a reference job based on their temporal I/O behavior is described.
|
||||||
Practically, we apply several previously developed time series algorithms and also utilize Kolmogorov-Smirnov to compare the distribution of the statistics.
|
Practically, we apply several previously developed time series algorithms and also utilize Kolmogorov-Smirnov to compare the distribution of the statistics.
|
||||||
A study is conducted to explore the effectivity of the approach which starts from three reference jobs and investigates related jobs.
|
A study is conducted to explore the effectiveness of the approach which starts from three reference jobs and investigates related jobs.
|
||||||
The data stems from DKRZ's supercomputer Mistral and includes more than 500.000 jobs that have been executed for more than 6 months of operation. %203 days.
|
The data stems from DKRZ's supercomputer Mistral and includes more than 500.000 jobs that have been executed for more than 6 months of operation. Our analysis shows that the strategy and algorithms are effective to identify similar jobs and revealed interesting patterns in the data.
|
||||||
%Problem with the definition of similarity.
|
It also shows the need for the community to jointly define the semantics of similarity depending on the analysis purpose.
|
||||||
Our analysis shows that the strategy and algorithms are effective to identify similar jobs and revealed interesting patterns in the data.
|
%203 days.
|
||||||
\end{abstract}
|
\end{abstract}
|
||||||
|
|
||||||
|
|
||||||
|
@ -179,7 +179,6 @@ For example, Evalix \cite{emeras2015evalix} monitors system statistics (from pro
|
||||||
PAS2P \cite{mendez2012new} extracts the IO patterns from application traces and then allows users to manually compare them.
|
PAS2P \cite{mendez2012new} extracts the IO patterns from application traces and then allows users to manually compare them.
|
||||||
In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- similar to Fourier analysis.
|
In \cite{white2018automatic}, a heuristic classifier is developed that analyzes the I/O read/write throughput time series to extract the periodicity of the jobs -- similar to Fourier analysis.
|
||||||
The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns that stress the file system.
|
The LASSi tool \cite{AOPIUOTUNS19} periodically monitors Lustre I/O statistics and computes a "risk" factor to identify IO patterns that stress the file system.
|
||||||
|
|
||||||
In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center-wide deployed monitoring system.
|
In contrast to existing work, our approach allows a user to identify similar activities based on the temporal I/O behavior recorded with a data center-wide deployed monitoring system.
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue