Abstract
In the real signal analysis the main problem is the nonstationarity of given data. The nonstationarity can be manifested in different ways. One of the possibility is the assumption that the signal is a mixture of different processes that exhibit different statistical properties. Thus before the further analysis the observed data should be segmented. In this paper we propose an automatic segmentation method which is based on $\alpha $stable distribution approach. In the proposed procedure we estimate the parameters of stable distribution for consecutive subsignals of given length and then by using expectationmaximization algorithm we classify the parameters. The obtained classes correspond to different segments of the signal. The proposed procedure we apply to the real vibration signal from roadheader working in mining industry. As a final result we obtained segments of real signal which constitute samples of different behaviors and are related to different modes of operation of the machine.
1. Introduction
Analysis and modeling of real signals measured by advanced data acquisition systems provide possibility to obtain information about the considered process. Very often observed time series exhibits strong nonstationary behavior resulting from the fact that it consists of not one but multiple processes that coexist or occur one after the other. In most of the cases those processes cannot be analyzed by the same tools because they exhibit completely different character and have different statistical properties. Sometimes the processes can be described by the same or similar models but their parameters differ. Therefore before the modeling of given signal its preliminary segmentation should be performed.
One of the most fundamental reason for segmentation is extraction of parts (segments) of the signal with homogenous properties. Segmentation is also related to finding such time points where the signal changes the properties and switches to the other regime [14].
There are many segmentation methods and during recent years number of papers in this field has increased. Segmentation techniques are very often dedicated to applications for which they were used. Some of the methods are based on special behavior of the signal in time domain [1, 3, 5] however there are also methods which are based not on the raw signal but on its transformations to other domains [6].
Signal segmentation has been applied in many areas. It is especially crucial in condition monitoring (to isolate shocks related to damage) [6], machine performance analysis (to find when machine operates under overloading, idle mode etc.) [4], experimental physics [3], biomedical signals (like ECG signals) [7], speech analysis [8] and in seismic signal analysis [9]. In this paper we are especially interested in the first two mentioned applications.
In this paper we propose the segmentation method which is based on the modeling of given signal by stable distribution and on the expectationmaximization algorithm. The stable distributions are especially important in the context of modeling of data with visible peaks but it should be mentioned that for $\alpha $ parameter equal to 2 the stable distribution reduces to Gaussian one. There are many applications of such distributions like finance, environmental engineering or condition monitoring [10]. The expectationmaximization algorithm was invented for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables. However this algorithm very often is applied in the classification problem. In the proposed technique for the application to raw twochannel signal first we estimate the appropriate parameters of stable distribution and then by using expectationmaximization algorithm and Silhouette criterion we classify the data. We apply this procedure to the real vibration signal from roadheader working in the mining industry. Investigated machine is one of the most crucial components of the technological process in underground mining industry. It eliminates blasting operations in exploitation area and ensures continuous mechanical drilling. There exist many factors having influence on vibration signal profile (various physical and mechanical properties of deposit, design features of the machine, depth of drilling, motion parameters and wear level of machine elements). Appropriate segmentation of vibration signal allows for identification of spectrum domain components corresponding to different processes.
2. Methodology
The segmentation procedure of roadheader vibration signal is based on the alphastable distribution approach. As it was mentioned, the $\alpha $stable distribution is an extension of Gaussian one, namely for some values of parameters ($\alpha =$2) it reduces to normal distribution. A random variable $X$ is an $\alpha $stable distributed if its characteristic function takes the following form:
where $\alpha $$(0<\alpha \le 2)$ is stability parameter, $\beta $$(1\le \beta \le 1)$ is asymmetry parameter, $\mathrm{\Sigma}$ ($\mathrm{\Sigma}>$ 0) is scale parameter and $\mu $$(\mu \in R)$ is location parameter. Of the four parameters defining the family of stable distribution, most attention has been focused on the stability parameter $\alpha $. The $\alpha $stable distributed random variable has heavy tails, i.e. its cumulative probability density function decays with power law. Therefore, there is a high probability of the variable having extreme values which is useful in modelling of impulsive signal [10]. The other distribution that can be useful in impulsive signal modelling is presented in [11].
In our segmentation procedure first the raw signal is divided into consecutive subsignals of length $N$ samples. The subsignals overlap by $K$ %. Next, for each subsignal the empirical standard deviation ($\sigma $ is calculated. Because the subsignals exhibit behavior related to heavy tailed data (visible peaks), then we propose to model the data by stable distribution and next parameters taken to the classification are $\alpha $ and $\mathrm{\Sigma}$. In the literature one can find different estimation methods that can be used here [12, 13]. In this paper we apply the regression method which is based on the characteristic function of the considered distribution (see Eq. (1)).
As a final step, the mentioned parameters ($\sigma $, $\alpha $, $\mathrm{\Sigma}$) are used in the expectationmaximization algorithm to the classification. We normalize the data before the further analysis. We should mention that in the analysis we consider twochannel signal and for subsignals for each of them we estimate the parameters therefore in the expectationmaximization algorithm we take under consideration six parameters.
In statistics, an expectationmaximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step which creates a function for the expectation of the loglikelihood evaluated using the current estimate for the parameters, and a maximization (M) step which computes parameters maximizing the expected loglikelihood found on the E step. These parameterestimates are then used to determine the distribution of the latent variables in the next E step [1417]. EM is frequently used for data clustering in machine learning and computer vision. In natural language processing, two prominent instances of the algorithm are the BaumWelch algorithm and the insideoutside algorithm for unsupervised induction of probabilistic contextfree grammars. In our procedure we also propose to estimate number of clusters with Silhouette criterion [18, 19] for limited range of number of clusters $k$ (in our application $k=$1:5) with the measure of distance set to Euclidean.
In Fig. 1 we present the scheme of our segmentation procedure.
Fig. 1Scheme of the segmentation procedure
3. Real signal analysis
The analyzed real data set represents vibration signal of a roadheader. The sampling frequency $Fs=$ 25 kHz. Measurement was performed by using two accelerometers placed orthogonally on the mining head’s arm. Duration of measurement was equal to 60 seconds. The analyzed twochannel data set is presented in Fig. 2.
Fig. 2The twochannels raw roadheader vibration signal
a) Channel 1
b) Channel 2
According to the presented segmentation procedure we divide the twochannel raw signal into subsignals consisting of 10000 observations with overlap 50 %. Next, for each subsignal the standard deviation and two parameters related to $\alpha $stable distributions are calculated. In Fig. 3 we present the estimated parameters for subsignals corresponding to both analyzed channels.
Fig. 3The estimated parameters σ, α, Σ for subsignals (segments) related to twochannels vibration signal
These six statistics represent set of $N$ points $\chi $ in sixdimensional feature space, within which we perform clustering. Dataset $\chi $ is then centered and variance over each dimension is set to 1. This operation allows to achieve more reliable and efficient cluster amount analysis.
After obtaining the set of parameters we estimate number of clusters with Silhouette criterion for limited range of number of clusters $k=$5. When optimal number of clusters $k$ is estimated, EM algorithm performs clustering of the dataset $\chi $ splitting it into $k$ classes. For the analyzed signal we obtain that the optimal number of clusters is equal to four. In Fig. 4 we present the division of the raw signal from the first channel and second channel into four clusters.
Fig. 4Result of data clustering. Data points in the left part are displayed as features of signal corresponding to appropriate channel in 3dimensional space for presentational purpose, however they were analyzed and clustered in 6 dimensions. Data is also presented in the form before centering and variance normalization for clear visual interpretation; top – analyzed raw signal; bottom – association of data points to clusters in the function of time for comparison with original signal above it
a) Channel 1
b) Channel 2
As a result of the presented procedure we obtain segments belonging to certain clusters. We can apply the technique once again to data from one clustering in order to identify processes inside it, i.e.to make the socalled “nextlayer analysis”. As an example, we consider part of the third cluster presented in red color in Fig. 4. For this cluster we observe that there are regions of denser and more sparse impulses that suggests the segment is a mixture of different processes with different statistical properties, see Fig. 5. By using presented methodology we separate the data into classes.
For the signal related to the third cluster we apply the introduced technique. In this case the Silhouette evaluation estimated two clusters and results of EM clustering are presented in Fig. 6.
Fig. 5Third cluster (marked in red in Fig. 4) from channel 1
Fig. 6Secondlayer analysis of the third cluster. Distinction of two separate groups is clearly visible
a) Channel 1
b) Channel 2
In practice we can proceed with as many layers as we wish, since number of layers can be provided as a parameter to the implemented algorithm. Although it is a good practice to limit search space of Silhouette estimation to about 23 clusters for deeper layers, and proceed with no more than 23 layers of analysis. If we go past those recommendations, results will make no sense since algorithm will greedily try to find clusters even as single points.
4. Summary
In this paper we have proposed a new segmentation technique which can be applied to the vibration signal from roadheader working in the mining industry. Very harsh mining conditions, presence of many sources of interferences and many different operation modes have impact on the difficulty of signal analysis, for example in the diagnostic point of view. Therefore it is necessary to develop a novel technique for signal segmentation in order to identify different operational modes. The proposed methodology is automatic and based on the stable distribution approach, Silhouette criterion and expectationmaximization algorithm applied in last step of the procedure, i.e. to the classification of the appropriate parameters. As a result we obtain the time periods for which the analyzed signal has homogenous structure. The obtained segments correspond to different modes of operation of the machine. Furthermore, detection and parameterization of such events might help to improve efficiency of machine usage, for example to minimize number of segments and their duration for machine operation under idle mode or overload mode.
References

Lopatka M., Laplanche C., Adam O., Motsch J.F., Zarzycki J. Nonstationary timeseries segmentation based on the Schur prediction error analysis. 13th Workshop on Statistical Signal Processing, 2005, p. 251256.

Wyłomańska A., Zimroz R., Janczura J. Identification and stochastic modelling of sources in copper ore crusher vibrations. Journal of Physics: Conference Series, Vol. 628, 2015, p. 012125.

Gajda J., Sikora G., Wyłomańska A. Regime variance testing – a quantile approach. Acta Physica Polonica B, Vol. 44, Issue 5, 2013, p. 10151035.

Wyłomańska A., Zimroz R. Signal segmentation for operational regimes detection of heavy duty mining mobile machines – a statistical approach. Diagnostyka, Vol. 15, Issue 2, 2014, p. 3342.

Makowski R., Zimroz R. A procedure for weighted summation of the derivatives of reflection coefficients in adaptive Schur filter with application to fault detection in rolling element bearings. Mechanical Systems and Signal Processing, Vol. 38, 2013, p. 6577.

Obuchowski J., Wyłomańska A., Zimroz R. The local maxima method for enhancement of timefrequency map and its application to local damage detection in rotating machines. Mechanical Systems and Signal Processing, Vol. 46, 2014, p. 389405.

Azami H., Mohammadi K., Bozorgtabar B. An improved signal segmentation using moving average and SavitzkyGolay filter. Journal of Signal and Information Processing, Vol. 3, 2012, p. 3944.

Makowski R., Hossa R. Automatic speech signal segmentation based on the innovation adaptive filter. International Journal on Applied Mathematics and Computer Science, Vol. 24, 2014, p. 259270.

Popescu T. D. Signal segmentation using changing regression models with application in seismic engineering. Digital Signal Processing, Vol. 24, 2014, p. 1426.

Żak G., Obuchowski J., Wyłomańska A., Zimroz R. Application of ARMA modeling and alphastable distribution for local damage detection in bearings. Diagnostyka, Vol. 15, Issue 3, 2014, p. 311.

Stefaniak P., Wyłomańska A., Obuchowski J., Zimroz R. Procedures for Decision Thresholds Finding in Maintenance Management of Belt Conveyor System Statistical Modeling of Diagnostic Data. Lecture Notes in Production Engineering, Springer, 2015, p. 391402.

Allen J. B. Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 25, Issue 3, 1977, p. 235238.

Samorodnitsky G., Taqqu M. S. Stable NonGaussian Random Processes. Chapman and Hall, New York, 1994.

Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, Vol. 39, Issue 1, 1977, p. 138.

Sundberg R. Maximum likelihood theory for incomplete data from an exponential family. Scandinavian Journal of Statistics, Vol. 1, Issue 2, 1974, p. 4958.

Neal R., Hinton G. A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants. Learning in Graphical Models, MIT Press, Cambridge, MA, 1999, p. 355368.

Hastie T., Tibshirani R., Friedman J. 8.5 the EM algorithm. The Elements of Statistical Learning. Springer, New York, 2001, p. 236243.

Kaufman L., Rouseeuw P. J. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Hoboken, NJ, 1990.

Rouseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, Vol. 20, Issue 1, 1987, p. 5365.
About this article
This work is partially supported by the Statutory Grant No. S40128 and No. S50112 (G. Zak).