Abstract
Inspired by the successful experience of convolutional neural networks (CNN) in image classification, encoding vibration signals to images and then using deep learning for image analysis to obtain better performance in bearing fault diagnosis has become a highly promising approach. Based on this, we propose a novel approach to identify bearing faults in this study, which includes imageinterpreted signals and integrating machine learning. In our method, each vibration signal is first encoded into two Gramian angular fields (GAF) matrices. Next, the encoded results are used to train a CNN to obtain the initial decision results. Finally, we introduce the random forest regression method to learn the distribution of the initial decision results to make the final decisions for bearing faults. To verify the effectiveness of the proposed method, we designed two case analyses using Case Western Reserve University (CWRU) bearing data. One is to verify the effectiveness of mapping the vibration signal to the GAFs, and the other is to demonstrate that integrated deep learning can improve the performance of bearing fault detection. The experimental results show that our method can effectively identify different faults and significantly outperform the comparative approach.
Highlights
 In this study we model a new bearing fault detection method.
 The method is composed of data preprocessing, imageinterpreted vibration signal, and ensemble deep learning.
 We use GAFs approach to construct the representation of data because GAFs contain temporal correlation of vibration signal.
 We construct an integrated deep model to achieve a high accuracy rate of bearing fault detection.
 We conclude that our method can obtain a better performance of bearing fault detection on CWRU dataset.
1. Introduction
In modern industries, machine health monitoring is a prerequisite for maintaining the proper operation of industrial machines. Breakdowns in industrial machines can cause huge financial losses and even pose a threat to the people who use them. Therefore, the need for better and smarter machine healthmonitoring technologies has never ceased [1]. Rolling bearings are considered the most common and critical mechanical components in rotating machinery, and their health can have a significant impact on the performance, stability, and service life of the machine. Because rolling bearings are usually in harsh operating environments, they are prone to failure during operation. Failure to detect defects in time can lead to unplanned machine downtime or even catastrophic damage. Therefore, rolling bearing fault detection is essential for the safe and reliable operation of machinery and production [2].
Recently, several bearing fault recognition methods have been proposed. Learningbased (including statistical learning methods and neural network methods) recognition methods can capture mechanical fault information by learning historical data and thereby enabling the automated analysis of bearing faults. The flow chart of these methods usually includes data preprocessing, feature extraction, and classifier design. Although a welldesigned classification algorithm is a prerequisite for automated bearing fault detection [3], data preprocessing and feature extraction are also important steps.
Designing manual features based on the signal mechanism is a hot field for bearing fault diagnosis. Chen et al. [4] merged the bearing signal features of the time and frequency domains and then inputted these features into a deep fully connected network for fault detection. Bao et al. [5] calculated the Lkurtosis feature in the envelope spectrum of a vibration signal to detect pulse periodicity. Chen et al. [6] first transformed a vibration signal into the spectrum domain and extracted the mapping amplitude entropy as a learnable feature. Zhao et al. [7] first performed empirical mode decomposition (EMD) on a vibration signal, and then selected the top fewmode components containing the main information of the signal to extract the sample entropy. Liu et al. [8] proposed a feature extraction method based on variational mode decomposition and singular value decomposition (SVD). Unfortunately, these manual feature extraction processes are laborious and unfriendly to machinelearning designers.
Fig. 1The flow chart of the proposed method
Owing to the powerful feature learning ability of deep learning techniques, many researchers have attempted to introduce deep learning into the field of fault detection. Example include convolutional neural network (CNN)based methods [1], [2], [9], [10][14], sparse autoencoder (SAE)based methods [15], [16], and recursive neural network (RNN)based methods [17], [18], etc. Because the original signal is easily affected by noise, the signal is often transformed into an amplitudefrequency domain sequence. Generally speaking, CNN models are good at learning deep features from image data; thus, onedimensional vibration signals encoded as twodimensional image data have attracted much attention. CNN and their improved models have been successfully applied to image classification because they can extract robust features directly from twodimensional images. Many imageinterpreting vibration signal approaches have been proposed. Ding et al. [2] proposed a method for reconstructing a twodimensional wavelet packet energy image (WPI) of the frequency space. The WPI can represent the dynamic structure of the wavelet packet energy distribution of different bearing faults. However, the WPI method combines the wavelet packet transform and phase space reconstruction technique, which not only has high time complexity but also loses the information of the original signal when performing multiple transformations of the representation space. Mantas et al. [19] converted the time series to the Permutation Entropy (PE) pattern, which is a 2D digital image with multiscale time delays. However, the method ignores the amplitude information of the time series, and it is sensitive to noise in the view of the principle behind the method. Wang et al. [20] combined Symmetrized Dot Pattern (SDP) with CNN for intelligent bearing fault diagnosis. SDP method [21] converts the time series into polar SDP images, which have the frequency and amplitude of the raw signals. In [22] and [9], they convert the time series into a 2D gray image. In addition, a variety of methods use the timefrequency image to represent the time series [23], [24], but the timefrequency analysis methods cost a lot of time so it is difficult for actual online diagnosis. Wang et al. [25] encoded a time series as the GAF image and then took advantage of the deep model for image representation learning to obtain a better classification accuracy rate. The algorithm was validated on a 20time series dataset and outperformed the traditional kNN+DTW method, which was earlier considered the most effective method for time series classification.
The advantages of imageinterpreting time series as a GAF image include the following: 1) there is no spatial transformation of the original time series, and the encoding process is performed in the original representation space. In other words, the time complexity of the algorithm is of the linear order Ο($n$), where n is the length of the time series. 2) The principle of the approach is simple, easy to understand, and reproducible. Surprisingly, the GAF pictorial method has rarely been applied to bearing fault detection problems. The GAF pictorial method has two different representations, including the Gramian Angular Summation Field (GASF) and the Differential Difference Field (GADF). The GASF and GADF provide different levels of information, such that, the final decision results of using them for deep learning are different. Given this, this study introduces the stacking generalization method [26], [27] to fuse the initial decision results based on GASF and GADF. The proposed method improved the classification accuracy rate of bearing faults and increased the reliability of the results.
The flow of the proposed approach is shown in Fig. 1. It can be seen that the method is composed of data preprocessing, imageinterpreted vibration signal, and ensemble deep learning. The imageinterpreting stage was used to better explore the vibration signals information. Ensemble deep learning can make full use of different levels of information to achieve better performance in bearing fault detection.
In sum, the main contributions of this study are as follows:
1) The GAFs approach is introduced to encode the bearing vibration signal into GASF and GADF matrices. The GAFs contain temporal correlation of vibration signal. The 2DCNN is used to learn the deep features of the images, and then the idea of the stacking ensemble method is combined to construct an integrated deep model to achieve a high accuracy rate of bearing fault detection. It is a decisionlevel fusion strategy. Due to the deep learning with GASF and GADF obtains different accuracy rates, namely, they contain different information for classification. Therefore, building an ensemble model is an ideal fault detection scheme.
2) Last but not least, we design two experiments on CWRU datasets to evaluate the performance of bearing fault classification. With those comparable results, we demonstrate that our method achieves a better performance than the comparative method.
3) The rest of the paper is organized as follows: In Section 2, we introduce the principle of encoding time series as GAF images; we present the proposed method in Section 3; next, we conduct a performance test on the CWRU dataset. Finally, in Section 5, we conclude the paper.
2. Imageinterpreted time series
Encoding vibration signals into images of different granularities is a popular research area for bearing fault diagnosis. Here, we introduce the GAF encoding method, which first transforms the vibrating bearing signal into a 2D image, and then uses a 2DCNN to learn the knowledge of the image.
2.1. GAF encoding method
Given a time series $X=\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\}$, where ${x}_{i}$ is scaled in $[\mathrm{1,1}]$ or $\left[\mathrm{0,1}\right]$ using:
Then, the $\stackrel{~}{X}$ can be encoded as the angular cosine and the timestamp as the radius using Eq. (3):
where ${t}_{i}$ is the timestamp and $N$ is a constant factor. Based on this, a onedimensional time series is mapped to the twodimensional image. Because ${x}_{i}\in \left[\mathrm{1,1}\right]$ or ${x}_{i}\in \left[\mathrm{0,1}\right]$, $\mathrm{c}\mathrm{o}\mathrm{s}\left(\varphi \right)$ ($\varphi \in \left[0,\pi \right]$) is monotonic, such that GAF encoding is bijective. In other words, a time series can be mapped only to a unique polar coordinate space. In addition, the time dependence is preserved by the $r$ coordinates.
Fig. 2GASF images of 10 bearing status
Fig. 3GADF images of 10 bearing status
Once performing the GAF encoding for time series, the temporal correlations within different time intervals are identified by considering the triangular sum/difference between each point:
The GAF matrix was constructed in the original representation space of the time series. Therefore, the encoding method has significant advantages in terms of efficiency, while avoiding temporal information loss in the process of representation space transformation [25]. Work [25] pointed out that the GAF has two significant advantages. First, the GAF matrix preserves the time dependence of the time series, that is, each element of the GAF matrix was generated sequentially from the top left to the bottom right according to the temporal order of the original time series. Second, the GAF matrix contains temporal correlations; for example, $GAF\left(i,j\right)$ denotes the time interval $k=\leftij\right$ relative correlation in the direction.
In practical applications, direct encoding of the bearing vibration signal in the time domain into a GAF matrix is often unsatisfactory. This is because the bearing signal in the time domain can easily be contaminated by noise. Therefore, we encoded the amplitudespectrum sequence of the bearing signal into the GAF image. In the amplitude spectrum, the energy of useful information is concentrated in a narrow range of frequency bands, whereas the noise energy is distributed over the entire frequency band. Assuming that $\stackrel{}{X}=\left\{\stackrel{}{{x}_{1}},\stackrel{}{{x}_{2}},\cdots ,\stackrel{}{{x}_{n}}\right\}$ is the amplitude spectrum of a bearing vibration signal, such that $\stackrel{}{X}$ can be denoised according to Eq. (6):
where, $\mu $ is the mean value of $X$. As the amplitude spectrum of the vibration signal is symmetric, it is possible to consider only the left amplitude spectrum. Fig. 2 and 3 show the GASF and GADF images of 10 different bearing faults, respectively. The class distributions of these faults are shown in Table 1.
Table 1Class distribution of 9 types of faults
Fault type  Load (HP)  Speed (rpm)  Ball  Inner race  Outer race 
0.007"  0  1797  class 1  class 4  class 7 
0.014"  0  1797  class 2  class 5  class 8 
0.021"  0  1797  class 3  class 6  class 9 
3. Denoising method
In real applications, the sampling points of the time series are usually very large; therefore, it is necessary to reduce the dimensionality of the time series before GAF encoding. Considering that the piecewise aggregate approximation (PAA) algorithm [28] can not only preserve the basic trend of the time series but also has low time complexity, we use the PAA algorithm to preprocess the bearing vibration signal.
PAA is a simple and effective time series smoothing algorithm that preserves the trend of the time series. The time complexity of PAA is low; therefore, the PAA algorithm is widely used in time series analysis problems. Considering the time series $X=\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\}$ is mapped to a new time series $\widehat{X}=\left\{\widehat{{x}_{1}},\widehat{{x}_{2}},\cdots ,\widehat{{x}_{m}}\right\}$,$\mathrm{}\widehat{{x}_{i}}$ can be calculated using the following formula:
From Eq. (7), we find that $X$ is sequentially divided into $m$ blocks of equal size, and the mean value of each block is used to rerepresent the block. The PAA algorithm has a certain noise reduction effect, as it uses the mean value to smooth the data. Clearly, the selection of $m$ is crucial. If $m$ is too large, the smoothed result loses the original structural information, where $m$ is too small, and the effect is not suitable for noise reduction. From Eq. (7), we can also see that the traditional PAA needs to satisfy $n/m$ as an integer. For $n/m$ is a noninteger number that can be found in [29] and [30].
3.1. Stacking integration methodology
Bagging, boosting, and stacking methods are three commonly used ensemble learning methods. The bagging method is an algorithm to reduce the variance in the estimate by using voting or mean reversion to achieve the fusion of multiple decision results [31]. Boosting method can upgrade weak learners to strong learners. Unlike the parallel learning approach of the bagging method, boosting method is a sequential framework. Boosting method works by sequentially training an initial learner from the training set, and then adjusting the distribution of training samples according to the results of the initial learner, thus, those instances of wrong decisions of the previous initial learner will be received attention. AdaBoost [32] method is a very classic boosting method. The stacking method is a different fusion method that is essentially representation learning. The principle of the stacking method is that they perform the secondstage learning with the result of the initial learner. Stacking method has yielded unusually brilliant results in many data mining competitions (e.g., data science competitions on the Kaggle platform). For example, in the solution proposed by the grand prize winner of the 2009 Netflix recommendation competition, integrating multiple initial learners is the core of its design [34].
In the case of the classification task, the basic process of the stacking method is to learn different classification algorithms ${\mathcal{L}}_{1},\cdots ,{\mathcal{L}}_{K}$ on the dataset $D$. ${d}_{i}=\left({x}_{i},{y}_{i}\right)\in D$ is an instance, where ${x}_{i}$ is the feature vector and ${y}_{i}$ is the corresponding label. In the first stage of the stacking method, a set of base classifiers ${C}_{1},\cdots ,{C}_{K}$ where ${C}_{i}={\mathcal{L}}_{i}\left(D\right)\mathrm{}$are generated. In the second stage, a metaclassifier was learned based on the outputs of the base classifiers. Note that the leaveoneout method or crossvalidation [35] was applied to generate the training set for learning the metaclassifiers [33]. For the leaveoneout method, each base classifier uses almost all examples and leaves the remaining one for testing. The procedure can be formalized as $\forall i=1,\cdots ,n$ ($n$ is the number of examples), ${C}_{t}^{i}={\mathcal{L}}_{t}\left(D{d}_{i}\right)$, $\forall t=1,\cdots ,K$, and next, the base learner is used to classify ${d}_{i}$ by ${\stackrel{~}{y}}_{i}^{t}={C}_{t}\left({x}_{i}\right)$. Therefore, ${d}_{i}$ can be reconstructed to a new vector $\left(\left({\stackrel{~}{y}}_{i}^{1},\cdots ,{\stackrel{~}{y}}_{i}^{K}\right),{y}_{i}\right)$. The inputs of the metalearning phase comprised the predictions of the base classifier.
As the leaveoneout method reconstructs only a sample per learning, it increases the time cost of the reconstruction step, whereas crossvalidation predicts a predefined subset of the original sample set at a time and gets the predictions of the base classifier on these subsets. Thus, crossvalidation is preferred for applying the stacking algorithm for big data.
4. Experiments
Here, we used CWRU bearing data [36] to verify the effectiveness of the proposed method. The dataset comprised a multivariate vibration time series generated by the bearing test equipment, as shown in Fig. 4. In this study, the bearing dataset included the following four conditions: normal, outer race failure, inner race failure, and roller failure. The driveside vibration signal was used with a sampling rate of 48 kHz and a motor load of 0 hp. Table 1 shows the class distributions of the selected data.
Fig. 4Teststand [36]
Table 2Accuracy rate of 5 methods on raw CWRU
Algorithm  GASF+2DCNN  GADF+2DCNN  WPI+2DCNN  Amplitude spectrum + 1DCNN  Ensemble method 
acc  99.4 %  99.8 %  98.7 %  75.6 %  100 % 
5. Experimental design
Here, we used a 2DCNN as the initial learner. Considering useable data is small, the 2DCNN model includes only three convolutional layers, three pooling layers, and one fully connected structure. Meanwhile, the batch normalization [37] and the dropout [38] methods are used to reduce the risk of deep model overfitting. The 2DCNN uses the crossentropy loss function, and Adam optimizes the algorithm [39]. We used the random forest regression method as the metalearner. We follow the common practice of dividing the CWRU dataset was divided into the training set, test set, and validation set in the ratio of 0.5, 0.3, and 0.2. The training set was used to train the 2DCNN and the validation set was used to train the random regression model. The experiments are repeated 10 times under different random seeds, and the final results are taken as the mean value of the 10 experiments. Considering the uniform distribution of classes, performing different algorithms can be well measured by the traditional classification accurate rate:
5.1. Raw CWRU data
In this section, the experiments were divided into two parts. The first was to verify the validity of the image interpretation of vibration signals. Its purpose was to compare the fault recognition performance before and after image encoding. We used the 1DCNN model to learn onedimensional vibration signals and applied the 2DCNN model to the imaged data. Second, we compare the performance of the proposed method with that of the existing method in [2] (hereafter referred to as WPI+2DCNN). The topological structure of 2DCNN is described in the above section. In addition, the 1DCNN model contains a onedimensional convolutional layer, a pooling layer, and a fully connected structure. To reduce the overfitting of CNN, batch normalization and dropout are used to reduce the risk of overfitting.
We showed the accuracy rates on the original CWRU data in Table2. From the table, we can conclude that: 1) the vibration signal encoded to a twodimensional image can be better learned to achieve better performance; 2) the proposed ensemble method gets better performance than the existing method. Although WPI+2DCNN also gets the accurate rate close to our approach, it has a high time cost than the proposed method. Due to GAFs performing the outer product of the time series, so the time complexity of GAFs is $\mathrm{O}\left({N}^{2}\right)$, $N$ is the length of time series. Work [2] does not give the time complexity of WPI, such that we add a test to compare their runtimes. Table 3 shows the time consumption of imaging the raw CWRU data using GAFs and WPI respectively. From the table, we can see that WPI costs 2938.267 seconds, which is awful larger than GASF and GADF.
Table 3Runtime of GAFs and WPI
GASF  GADF  WPI  
Runtime (s)  2.289 (s)  2.236 (s)  2938.267 (s) 
Why can fusing the knowledge of both GASF and GADF can improve accurate rates? From Fig. 2, we can see that class 5, class 6, and class 7 have a similar representation in the GASF image while presenting a significant difference in the GADF domain. In the same light, class 2 and class 3 have similar GADF features, instead, have different GASF features. Fig. 2 and 3 explain the advantages of our method. In fact, from the perspective of information theory, fusing different representation features of the GAF can increase the information entropy of the inputs and help improve the accuracy of the learningbased prediction model.
5.2. Noiseadded CWRU data
To further verify the robustness of our method, we added Gaussian white noise to raw CWRU data. We added noise to the vibration signal with SNR of [–5 dB, –2 dB, 2 dB, 5 dB]. The experimental results are in Table 4. From the table, we can see that: 1) the proposed method can achieve better performance than WPI+2DCNN in a noisy environment. 2) The fault identification performance of GADF+2DCNN is close to GASF+2DCNN. 3) The ensemble model fusing the GAF image information can achieve excellent performance. Note that for the bearing signal with complex noise, we can choose advanced denoising methods instead of formula (6). Examples include deep learning, wavelet shrinkagebased, SVDbased methods, and EMDbased methods.
In summary, we can conclude that fusing the knowledge of the GASF and the GADF can improve the performance of bearing fault diagnosis for the CWRU dataset. Considering that the time cost of building the GAF was low, the proposed method was more in line with the practical application requirements.
Table 4Accuracy rate of 5 methods on noisy CWRU
–5 dB  –2 dB  2 dB  5 dB  
WPI+2DCNN  87.1 %  93.2 %  97.6 %  97.8 % 
GASF+2DCNN  99.0 %  99.0 %  99.1 %  99.4 % 
GADF+2DCNN  99. 4 %  99.1 %  99.0 %  99.6 % 
Ensemble method  99.8 %  100 %  100 %  100 % 
6. Conclusions
This study proposed a bearing fault detection method that combines imageinterpreting vibration signals and integrating deep learning, which can realize the accurate identification of bearing faults. Our method encoded onedimensional vibration signals into a twodimensional image and then used a 2DCNN to obtain the initial decision result. Finally, we introduced a decision layer integration method to realize the fusion of multiple underlying decisions. Experiments on the CWRU realworld dataset show that the proposed method can obtain a better recognition accuracy rate than the existing method (i.e., WPI+2DCNN), even when Gaussian white noise is added to the original vibration signal.
Altogether, the learningbased method for bearing fault detection is provided in this work. Next, we plan to apply our method to different publicly available bearing failure datasets and laboratory datasets.
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant No. 61901191), and the Shandong Provincial Natural Science Foundation (Grant No. ZR2020LZH005).
References

W. Zhang, C. Li, G. Peng, Y. Chen, and Z. Zhang, “A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load,” Mechanical Systems and Signal Processing, Vol. 100, No. 2, pp. 439–453, Feb. 2018, https://doi.org/10.1016/j.ymssp.2017.06.022

X. Ding and Q. He, “Energyfluctuated multiscale feature learning with deep convnet for intelligent spindle bearing fault diagnosis,” IEEE Transactions on Instrumentation and Measurement, Vol. 66, No. 8, pp. 1926–1935, Aug. 2017, https://doi.org/10.1109/tim.2017.2674738

S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, “Deep learning algorithms for bearing fault diagnostics – a review,” in 2019 IEEE 12th International Symposium on Diagnostics for Electrical Machines, Power Electronics and Drives (SDEMPED), Aug. 2019, https://doi.org/10.1109/demped.2019.8864915

Z. Chen and W. Li, “Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network,” IEEE Transactions on Instrumentation and Measurement, Vol. 66, No. 7, pp. 1693–1702, Jul. 2017, https://doi.org/10.1109/tim.2017.2669947

W. Bao, X. Tu, Y. Hu, and F. Li, “Envelope spectrum Lkurtosis and its application for fault detection of rolling element bearings,” IEEE Transactions on Instrumentation and Measurement, Vol. 69, No. 5, pp. 1993–2002, May 2020, https://doi.org/10.1109/tim.2019.2917982

M. Chen, D. Yu, and Y. Gao, “Fault diagnosis of rolling bearings based on graph spectrum amplitude entropy of visibility graph,” (in Chinese), Journal of Vibration and Shock, Vol. 40, No. 4, pp. 23–29, 2021, https://doi.org/10.13465/j.cnki.jvs.2021.04.004

Z. Zhao and S. Yang, “Sample entropybased roller bearing fault diagnosis method,” (in Chinese), Journal of Vibration and Shock, Vol. 31, No. 64, pp. 23–29, 2021, https://doi.org/10.13465/j.cnki.jvs.2012.06.012

C. Liu et al., “Rolling bearing fault diagnosis based on variational mode decomposition and fuzzy Cmeans clustering,” Proceedings of the Chinese Society of Electrical Engineering, Vol. 35, No. 13, pp. 1–8, Aug. 2016.

L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neural networkbased datadriven fault diagnosis method,” IEEE Transactions on Industrial Electronics, Vol. 65, No. 7, pp. 5990–5998, Jul. 2018, https://doi.org/10.1109/tie.2017.2774777

X. Guo, L. Chen, and C. Shen, “Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis,” Measurement, Vol. 93, pp. 490–502, Nov. 2016, https://doi.org/10.1016/j.measurement.2016.07.054

I. H. Ozcan, O. C. Devecioglu, T. Ince, L. Eren, and M. Askar, “Enhanced bearing fault detection using multichannel, multilevel 1D CNN classifier,” Electrical Engineering, Vol. 104, No. 2, pp. 435–447, Apr. 2022, https://doi.org/10.1007/s00202021013092

J. Cao, S. Wang, X. Yue, and N. Lei, “Rolling bearing fault diagnosis of launch vehicle based on adaptive deep CNN,” (in Chinese), Journal of Vibration and Shock, Vol. 39, No. 5, pp. 97–104, 2020, https://doi.org/10.13465/j.cnki.jvs.2020.05.013

S. Dong, X. Pei, W. Wu, B. Tang, and X. Zhao, “Rolling bearing fault diagnosis method based on multilayer noise reduction technology and improved convolutional neural network,” Journal of Mechanical Engineering, Vol. 57, No. 1, p. 148, 2021, https://doi.org/10.3901/jme.2021.01.148

G. Jin, “Research on endtoend bearing fault diagnosis based on deep learning under complex conditions,” University of Science and Technology of China, Hefei, 2020.

S. Haidong, J. Hongkai, L. Xingqiu, and W. Shuaipeng, “Intelligent fault diagnosis of rolling bearing using deep wavelet autoencoder with extreme learning machine,” KnowledgeBased Systems, Vol. 140, No. 1, pp. 1–14, Jan. 2018, https://doi.org/10.1016/j.knosys.2017.10.024

J. Sun, C. Yan, and J. Wen, “Intelligent bearing fault diagnosis method combining compressed data acquisition and deep learning,” IEEE Transactions on Instrumentation and Measurement, Vol. 67, No. 1, pp. 185–195, Jan. 2018, https://doi.org/10.1109/tim.2017.2759418

L. Guo, N. Li, F. Jia, Y. Lei, and J. Lin, “A recurrent neural network based health indicator for remaining useful life prediction of bearings,” Neurocomputing, Vol. 240, No. 3, pp. 98–109, May 2017, https://doi.org/10.1016/j.neucom.2017.02.045

H. Jiang, X. Li, H. Shao, and K. Zhao, “Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network,” Measurement Science and Technology, Vol. 29, No. 6, p. 065107, Jun. 2018, https://doi.org/10.1088/13616501/aab945

M. Landauskas, M. Cao, and M. Ragulskis, “Permutation entropybased 2D feature extraction for bearing fault diagnosis,” Nonlinear Dynamics, Vol. 102, No. 3, pp. 1717–1731, Nov. 2020, https://doi.org/10.1007/s11071020060146

H. Wang, J. Xu, R. Yan, and R. X. Gao, “A new intelligent bearing fault diagnosis method using SDP representation and SECNN,” IEEE Transactions on Instrumentation and Measurement, Vol. 69, No. 5, pp. 2377–2389, May 2020, https://doi.org/10.1109/tim.2019.2956332

X. Zhu, J. Zhao, D. Hou, and Z. Han, “An SDP characteristic information fusionbased CNN vibration fault diagnosis method,” Shock and Vibration, Vol. 2019, p. 3926963, Mar. 2019, https://doi.org/10.1155/2019/3926963

H. Wang, J. Xu, R. Yan, C. Sun, and X. Chen, “Intelligent bearing fault diagnosis using multihead attentionbased CNN,” Procedia Manufacturing, Vol. 49, pp. 112–118, 2020, https://doi.org/10.1016/j.promfg.2020.07.005

Y. Xu, Z. Li, S. Wang, W. Li, T. SarkodieGyan, and S. Feng, “A hybrid deeplearning model for fault diagnosis of rolling bearings,” Measurement, Vol. 169, p. 108502, Feb. 2021, https://doi.org/10.1016/j.measurement.2020.108502

D. Neupane, Y. Kim, and J. Seok, “Bearing fault detection using scalogram and switchable normalizationbased CNN (SNCNN),” IEEE Access, Vol. 9, pp. 88151–88166, 2021, https://doi.org/10.1109/access.2021.3089698

Z. Wang and O. Tim, “Imaging timeseries to improve classification and imputation,” in Proceedings of the 24th International Conference on Artificial Intelligence Marina del Ray, pp. 3939–3945, 2015.

L. Breiman, “Stacked regressions,” Machine Learning, Vol. 24, No. 1, pp. 49–64, Jul. 1996, https://doi.org/10.1007/bf00117832

D. H. Wolpert, “Stacked generalization,” Neural Networks, Vol. 5, No. 2, pp. 241–259, Jan. 1992, https://doi.org/10.1016/s08936080(05)800231

Z. Zhu et al., “Time series mining based on multilayer piecewise aggregate approximation,” in 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 174–179, Jul. 2016, https://doi.org/10.1109/icalip.2016.7846629

J. Lin et al., “Experiencing SAX: a novel symbolic representation of time series,” Data Mining and Knowledge Discovery, Vol. 15, No. 2, pp. 107–144, Jul. 2016.

Y. Huang, W. Jin, P. Ge, and B. Li, “Radar emitter signal identification based on multiscale information entropy,” (in Chinese), Journal of Electronics and Information Technology, Vol. 41, No. 5, pp. 1084–1091, 2019, https://doi.org/10.11999/jeit180535

L. Breiman, “Bagging predictors,” Machine Learning, Vol. 24, No. 2, pp. 123–140, Aug. 1996, https://doi.org/10.1007/bf00058655

Y. Freund and R. E. Schapire, “A decisiontheoretic generalization of online learning and an application to boosting,” Journal of Computer and System Sciences, Vol. 55, No. 1, pp. 119–139, Aug. 1997, https://doi.org/10.1006/jcss.1997.1504

S. Džeroski and B. Ženko, “Is combining classifiers with stacking better than selecting the best one?,” Machine Learning, Vol. 54, No. 3, pp. 255–273, Mar. 2004, https://doi.org/10.1023/b:mach.0000015881.36452.6e

J. Sill, T. Gabor, M. Lester, and L. David, “Featureweighted linear stacking,” ArXiv Preprint, ArXiv:0911.0460, 2009.

Z. Zhou, Machine Learning. Beijing, China: Tsinghua Press, 2016.

“Bearing data center.” Case Western Reserve University. https://csegroups.case.edu/bearingdatacenter/pages/download

S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456, 2015, https://doi.org/10.48550/arxiv.1502.03167

Srivastava N. et al., “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, Vol. 15, No. 1, pp. 1929–1958, 2014, https://doi.org/10.5555/2627435.2670313

D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” ArXiv Preprint, ArXiv:1412.6980, 2014, https://doi.org/10.48550/arxiv.1412.6980