Recent advancements of signal processing and artificial intelligence in the fault detection of rolling element bearings: a review

. A rolling element bearing is a common component in household and industrial machines. Even a minor fault in this section has a negative impact on the machinery's overall operation. As a result, the industry suffers significant financial losses, and this damage can potentially result in catastrophic failures. Therefore, even a little fault in the rolling element bearings must be recognized and remedied as soon as possible. Many ways for detecting REB defects have been created in recent years, and new methods are being introduced on a daily basis. This article will provide a summary of such methods, with a focus on vibration analysis techniques. The newest advancements in this field will be recognizable to readers of this article. Anyone interested in defect diagnostics of rolling element bearings can utilize this material. approach is widely used Network Moderate Slow Medium to Large Classification, compression, recognition, and forecasting are applications


Introduction
All industries are becoming smarter as part of the fourth industrial revolution. One of the most significant aspects of these smart industries is Prognostics and Health Management (PHM). PHM is a strategy for monitoring system health that provides comprehensive yet tailored solutions, and it is essentially a three-part summary. First and foremost, detecting the presence of a fault is often known as fault detection. Defect diagnosis, on the other hand, is the process of determining the type of fault and its location. Finally, the prognosis might be defined as an estimate of the component's remaining useful life. PHM aids in the cost-effectiveness of all activities in the industry, as well as the protection of the machinery. It also aids in the efficiency and reliability of an asset [1], [2]. This paper describes the recent advancements in the fault diagnosis of rolling element bearings.
Rolling element bearing (REB) is an inevitable machine element used in rotating machinery. The determination of this machine element is to offer an easy movement or rotation of a shaft and housing while transmitting the power [3]. Rolling element bearings, like any other moving component, are prone to developing faults. Fig. 1 lists the most common bearing defects. A damaged bearing can cause serious problems with the machine's operation and, in the worst-case scenario, catastrophic failure. Early diagnosis of bearing defects and correct severity level assessments is therefore crucial for the machines' smooth operation. Many methods are introduced in recent years to find the faults occurring in the bearings. Some of them are vibration analysis, acoustic emission measurement, temperature measurement, wear debris assessment, motor current signature evaluation, etc. Vibration analysis and acoustic emission measurement, however, are the most extensively employed. As a result, these two methods are chosen to create a summary in rolling element-bearing fault diagnostics. Although such defect diagnostic procedures include several steps, this work focuses on signal processing methods. Even though numerous review papers have been published in this field in the past, new review articles are still needed because new methods are introduced regularly.
In 1999 N. Tandon and A. Choudhury [4] reviewed the techniques used for fault diagnostics of REB, they described the vibration and acoustic measurement techniques and compared those with some recent methods. In 2014 Y. Hassan Ali et al. [5] published a summary article on fault diagnosis (FD) and condition monitoring of REB based on AE analysis with the help of artificial intelligence. Soon afterward two researchers [6] wrote a review article about the signal processing methods for the diagnostics and prognosis of REB, they discussed the important condition monitoring tools and their capabilities in depth. In 2019 a research article [7] summarized the techniques used in FD, fault measurement, and fault modeling of REB.
The objectives of this article are to review the latest developments in fault diagnostics and fault classification of REB, to give a brief idea about vibration analysis and acoustic emission measurement, to compare different signal processing techniques with their pros and cons, and to familiarize the role of artificial intelligence methods in identification and classification of bearing faults. The paper's scope aids anyone interested in REB fault diagnostic approaches in learning about the current state of the art. The data for this literature survey is taken from various research papers published in the leading journals in the last three decades. This paper's methodology is as follows: A brief theory of the vibration and AE analysis is discussed in Section 2. Various signal processing methods in FD of REBs are discussed in Section 3. Artificial intelligence techniques are described in Section 4. Section 5 contains the summary and discussion; after all, Section 6 is where the findings are discussed.

Vibration analysis (VA)
The motion of a body is represented by vibration, which is fundamentally oscillatory around a mean position and can be periodic or aperiodic. To use vibration monitoring to detect flaws in machines, the machine's vibration must first be recorded, and then the vibration signal must be processed to acquire useful information about the machine's health. Vibrations should be recorded in close proximity to the bearings that support the rotating shafts. Vibrations at any location should be measured in three mutually perpendicular directions if possible. Vibration monitoring is preferred in the majority of rotating machinery. Because every dynamic machine component manifests itself in the machine's measured vibration response at its characteristic frequencies, this is the case. As a result, for fault diagnosis, signal analysis of the recorded vibrations from the machines provides a significant and simple methodology for detecting flaws in operational machines [8], [9].

Acoustic emission analysis
Acoustic emission (AE) is the spontaneous release of transient elastic stress energy when a material deforms. Waves of 2 MHz or higher are emitted by materials under stress. High-frequency piezoelectric-based AE sensors can detect these waves on a body's surface. These AE waves might be continuous or in bursts, depending on the level of internal stress. The cause of the interior defect can be discovered using the triangulation approach, which involves placing many AE sensors on the surface of a body and collecting high-speed data on the arrival times of these AE waves. This technique is a non-destructive testing method, and it can be used in a variety of ways, such as health monitoring of structures, machine tools, gears, bearings, assessing tribological and wear behavior, etc. [10]- [12]. The benefits and drawbacks of vibration analysis and AE measurement are given in Table 1.

Signal processing
Data acquisition, signal processing, and fault classification are the three pillars of the fault diagnostic procedure in vibration and AE analysis, as given in figure 2. Sensors and analyzers can be used to collect data. The most often used sensors for collecting vibration data include accelerometers, displacement transducers, velocity transducers, and so on. Piezoelectric sensors are used to collect AE signals. These collected data will be transferred to the signal processing unit, which will process the signals further. The data contain all of the relevant information about the bearings' condition. However, the information gathered is a mix of bearing data, information about other functioning elements including gears, shafts, etc., and noise. The signal processing unit may be required in this case. Signal denoising, signal filtering, signal amplification, feature extraction, and fault identification are all examples of signal processing. The signal can be processed in a variety of ways. Some of them are wavelet transform, fast Fourier transform, envelope analysis, root mean square, empirical mode decomposition, matching pursuit, cepstrum, Wigner Ville, etc. [13]. The extraction and selection of features can be conducted particularly in three domain techniques and a cyclo-stationary analysis. The three scales are the time scale, frequency scale, and time-frequency scale. The next sections go through each of these in-depth.

Time domain techniques
A time-domain analysis is the simplest and most fundamental signal processing technique, which gives the information of a signal concerning the time. In the time domain technique, the typical characteristic features in particular crest factor, kurtosis value, peak value, root mean square value, etc. are investigated to get the idea of fault present in the bearings [14], [15]. The RMS value represents the vibration's power content. The signal's greatest value over time is called the peak. The crest factor is the representation of the ratio of the input signal's peak value to its RMS value. Kurtosis is a statistical metric that is employed to describe the shape of a signal. In essence, it is a metric for determining the "peakedness" of a random signal. Each feature's benefits and drawbacks are listed in table 2. Some researchers [16], [17] published articles on intelligent diagnosis techniques of REB faults employing time-domain characteristics and artificial neural networks (ANN). The RMS value, kurtosis value, and negative log-likelihood parameters are employed as the inputs for ANN, which are obtained from the processed time-domain signals. The time-domain analysis technique, on the other hand, is ineffective for higher-order systems. Furthermore, employing time-domain analysis to acquire all information concerning the REB's failure is unfeasible. As a result, we'll look into other options.

Frequency domain techniques
In the frequency domain, several characteristics which are not detectable in the time domain can be found. The frequency-domain approach plots a signal's information along the frequency axis vs its amplitude. Signal feature extraction and selection employing frequency domain include fast Fourier transform, power spectrum, Cepstrum, and envelope spectrum. This section explains them.

Fast Fourier transform (FFT)
The Fourier transform was put forward by Joseph Fourier, a French physicist and mathematician in the 19th century. He found that "any periodic function can be decomposed in a series of simple oscillating functions, namely sines and cosines". That means Fourier transform separates a vibration signal in the form of complex exponential functions at dissimilar frequencies [18]. In 2007 two researchers [19] investigated the fault diagnostics of a rolling bearing employing FFT of the Intrinsic Mode Functions (IMF) with the help of Hilbert Huang Transform (HHT). This paper addressed the inefficiency of traditional DFT and the inherent problems of Wavelet Transform (WT) such as lengthy computational time and inappropriate resolution for investigating non-stationary vibration signals coming out of faulty bearings. However, the primary drawback of FFT is that it computes over the entire period so that it is impossible to distinguish the stationary and transient components of the signal easily.

Envelope analysis
Envelope Analysis (EA) is a renowned technique to collect periodic impacts from a machine's vibration or AE signal. The envelope analysis technique's primary idea is to use the envelope spectrum to remove the disturbance effect and accentuate the fault feature [20]. Hilbert transform (HT) is one of the most extensively employed techniques in envelope spectrum analysis. It has an outstanding application in the area of fault identification and diagnostics of machine elements. McInerny et al. [21] used EA in 2003 for basic vibration signal processing to detect bearing faults. In 2017 C. Mishra et al. [22] studied the defect diagnostics of REB by the application of envelope analysis as well as wavelet de-noising along with sigmoid function-based thresholding. As per the technique, it was supposed that the vibration signal coming from the defective bearings consists of a specific part indicating a large number of fault features and noise components. For extracting these large-scale features from the vibration signal, a Bayesian estimator was used after processing the wavelet coefficients.

Power spectrum
Power spectral techniques have great applications in the context of fault diagnostics of machinery. According to this technique, it is hard to investigate the phase indication of the signal, which contains severe diagnostic information. So, for conserving the phase information, the collective time-frequency analysis technique is adopted. The power spectrum is frequently selected for health monitoring techniques as it efficiently denotes the time, The authors tried to find out a unique method for identifying the localized defects present in the inner and outer races of the rolling contact bearing through the envelope-power spectrum of the Laplace coefficient. It is clear to observe that the peaks of the power spectrum are very sensitive to the shaft fluctuations, this is considered as a drawback while processing the signal.

Cepstrum
Cepstrum analysis is a nonlinear signal processing procedure that can be used in the detection, classification, and removal of harmonic and sideband families from vibration and AE signals. A study [26] was conducted on the fault diagnostics of automotive ball bearings at the early stages by the application of the Minimum Variance Cepstrum (MVC). The authors presented MVC for the investigation of repeated impulse signals present in noisy circumstances. MVC is accomplished through a logarithmic-power spectrum, it is intended by the least variance procedure. In 2019 D. Ibarra-Zarate et al. [27] proposed a methodology to extract vibration and AE signals from faulty bearings based on the cepstrum pre-whitening technique. They found that the suggested approach can perform well in defect identification of bearings in medium and late stages. Researchers have used the different variance of cepstrum such as MVC and cepstrum prewhitening for fault diagnosis of machine elements, but it is computationally expensive.

Cyclo-stationary analysis
A different context is introduced for investigating periodically time-variant signals by cyclo-stationary analysis. It is assumed that the signals are static in traditional stationary signal processing techniques, whereas the cyclo-stationary analysis treats the signal as a periodically time-varying element. Nonstationary signals are considered cyclo-stationary when certain of their characteristics are periodic [28], [29]. P. K. Kankar et al. in 2013 [30] investigated the defect diagnostics of REBs by the combination of cyclic autocorrelation and wavelet transform. The key concept of the paper was developing a feature extraction system with the help of a cyclic autocorrelation of raw vibration signals coming out of the defective bearings. When the non-stationary signals underwent cyclo-stationary analysis, more than a few distinct modulating frequencies were present.

Time-frequency domain techniques
The time domain and frequency domain analyses were discussed in sections 3.1 and 3.2. The integration of both of these analyses will be discussed in this section. The Time-frequency domain gives the information of signals in both time and frequency scales. A vibration signal consists of many signal components having different characteristics such as linear and non-linear behavior, stationary and non-stationary, etc. So, it is substantial to conduct the time-frequency analysis for extracting the fault characteristic properties and eliminating noise components efficiently [31]. Important time-frequency practices used are short-time Fourier transform, empirical mode decomposition, wavelet transform, empirical wavelet transform, matching pursuit, Wigner-Ville distribution, etc.

Short-time Fourier transform (STFT)
STFT is a basic and straightforward signal transformation technique for converting timedomain signals to time-frequency space. This approach uses a window function to multiply time series in which the non-stationary vibration signal can almost be found to be locally stationary and then convert into the time-frequency domain. To improve the signs of localized fault, spectrograms of STFT are averaged in the time-frequency domain. The phase underlines the energy flow associated with the impacts between defective components of the bearing as well as strengthens the signal-to-noise ratio [32], [33]. In 2015 H. Gao et al. [34] studied the vibration signal feature extraction procedures for fault diagnostics of REBs by the application of STFT plus non-negative matrix factorization (NMF). Researchers adopted a new time-frequency distribution (TFD) matrix-factorization by merging the ideas of TFD with NMF. However, STFT gives a constant time resolution once the window size is fixed. Hence further investigations are required to overcome this drawback in real-time applications. EMD overestimates the number of modes so that there is a chance to occur the mode mixing problem while processing the signal EWT EWT focuses more on the oscillating part It is also faster in terms of computing than EMD If the input signal consists of two chirps that overlap in both the temporal and frequency domains, the EWT will be unable to distinguish between them Matching Pursuit MP is a materialistic method that finds the best waveform from the signal at each iteration The dictionary density determines the success of the MP algorithm. Increased density improves efficiency, but it may also result in increased computing time and storage space WVD It has a good frequency and time resolution. Its implementation does not necessitate the use of a window function The cross-term interference misleadingly indicates the presence of signal components between auto-terms WT Provide better temporal localization at high frequencies than STFT, are more versatile than STFT, and have a wider range of wavelet functions Choosing the mother wavelet type is challenging

Empirical mode decomposition (EMD)
EMD is a time-frequency domain self-adaptive decomposition algorithm that may decompose any vibration signal into empirical modes corresponding to the numerous oscillation modes implanted in the signal. Any vibration signal could be generated by the linear superposition of empirical modes, according to the EMD algorithm. EMD is a signal processing approach that can be used to treat non-stationary and non-linear data whose local-time measure is dependent on the signal. It can also decompose the signal into a finite number of IMFs [35], [36]. A study [37] was conducted to review the application of the EMD procedure in FD of rotating machinery in which the authors tried to report the earliest EMD technique alone, advanced EMD, and the combination of EMD with other techniques. In 2015 J. Ben Ali et al. [38] studied automated fault diagnostics of the REB from vibration signals. The key concept of the research was the implementation of the EMD as well as ANN for fault diagnostics and fault classification. A Health Index (HI) was introduced and tested based on the vibration signal of three bearings at different speed and torque conditions. However, EMD overestimates the number of modes so there is a chance to occur the mode mixing problem while processing the signal.

Empirical wavelet transform (EWT)
Vibration signals consist of frequency modulated and amplitude modulated components. EWT is utilized to extract the above-said signal components from vibration signals. EWT could be applied for fault identification as well as the fault classification of the REB [39]. In 2019 S. N. Chegini et al. [40] wrote an article on the de-noising technique for the diagnosis of faulty bearings by the application of a new empirical wavelet transform. The authors divided the paper into two stages; one was the de-noising stage, and the other was the fault diagnosis stage. In the de-noising stage, the EWT method was employed to break down the vibration signal into different empirical modes. In the fault-diagnosis phase, the presence of a fault and its location were identified by applying the envelope spectra and kurtosis number of the de-noised signal.

Matching pursuit
The Matching Pursuit algorithm is able to break down any vibration signal into the linear expansion of waveforms, which is suitable for a dictionary of functions. The above said wave-forms are carefully chosen for the sake of getting the finest match of signal structures. The matching pursuit decomposition provides an interpretation of the signal structures. If a signal structure does not correlate well with any dictionary function, it is sub-decomposed into numerous functions and its information is diluted. Matching pursuit is a materialistic algorithm that selects a waveform at each iteration that would be the best part of the signal [41]. In 2005 H. Yang et al. [42] investigated the fault diagnostics of REB by the application of basis pursuit. This paper explained the vibration signal feature extraction coming out of a defective bearing with inner-race and outer-race defects by using the application of a time-frequency procedure called the Basis Pursuit. The interpretation of the analyzed results was easier in the basis pursuit technique because it denoted the characteristic properties with excellent resolution in the time-frequency domain. However, the success of the MP algorithm depends on the density of the dictionary. If the density increases, the efficiency increases, but it could result in an increase in the computational time and storage space.

Wigner Ville distribution (WVD)
WVD is the time-frequency domain technique coming under Cohen class distribution, which could be employed in the space of fault diagnostics and fault classification of the REBs. The primary advantage of WVD includes better resolution in time and frequency domains. WVD has gained significant consideration in recent years for analyzing non-stationary or periodic signals and it could be used in machine condition monitoring, structure bone noise identification, etc. [43], [44]. In 2011 Y. Zhou et al. [45] studied the Wigner-Ville distribution by virtue of the Cyclic Spectral Density (CSD) to investigate the cyclo-stationary vibration signals coming out of faulty bearings. In this process, the CSD of the cyclo-stationary signals was calculated and WVD was applied through the inverse Fourier transform. One major disadvantage of WVD is cross-term interference. These cross-terms misleadingly indicate the presence of signal components between auto-terms.

Wavelet transform (WT)
Wavelet transform is a major reliable method in the time-frequency domain for defect diagnostics of REBs. In this method, the feature information would be stored in time-frequency domains. The most important step in this technique is to choose a suitable wavelet when investigating the vibration signals coming from the faulty bearings. A wavelet function is a tiny wave that holds oscillating wave-like features and is required for implementing the wavelet transform [46], [47]. In a paper [48], the theory and functions of wavelet transform in defect diagnosis of rotating machines were reviewed. WT has four derivatives: CWT, DWT, WPT, and TQWT, which are explained below.

Continuous wavelet transform (CWT)
Wavelet coefficients of the CWT method are the indices of similarity of the chosen wavelet and the analyzed signal. It is well known that the real Morlet wavelet is similar to the damped oscillating waveform and hence matches the transient vibrations formed due to the impulsive engagement of the rolling component with point or line defects of the bearing race. The standard manner of representing the CWT is to use a two-or three-dimensional plot called a scalogram which plots the modulus of the wavelet transform as functions of location over a range of scales [49], [50]. In 2009 H. Hong and M. Liang [51] studied the severity assessment of fault for REBs by the combined application of CWT and Lempel-Ziv complexity. The authors tried to measure the severity of the bearing fault by using results obtained from the continuous wavelet transform. In the first stage, the CWT identified the best characteristic features and eradicated the presence of noise and unwanted signals as far as possible. In the second stage, the Lempel-Ziv complexity numbers were computed, and the severity of the bearing defect was measured effectively. Even though the CWT is shift-insensitive, it is computationally expensive. It should be considered when using CWT in FD of machine elements like REB.

Discrete wavelet transform (DWT)
Wavelets offer time-scale data of the signal empowering the extraction of characteristic features that vary with time. This behavior makes the wavelets a perfect means for analyzing transient or non-stationary signals. The discrete wavelet transform is derivable from a CWT. DWT can give results with a higher resolution at different frequencies by removing redundant information [52]. S. Sharma et al. in 2015 [53] investigated the fault diagnostics of the rolling contact bearing in a variable, invariable speed, and load requirements. The characteristic extraction of vibration signal in time and frequency domains was performed by DWT, and the character reduction to remove the redundancy of the signal was carried out by Orthogonal Fuzzy Neighborhood Discriminative Analysis (OFNDA). The classification of defects and prognostication of the conditions of the elements was conducted by the Dynamic Recurrent Neural Network (DRNN). Even though the computational speed of DWT is more, it ignores the highfrequency part of the vibration signal, and it concentrates only on the low-frequency part in each decomposition level. This may adversely affect the efficiency of the FD procedure.

Wavelet packet transform (WPT)
Wavelet packet transform is an innovative time-frequency analysis procedure, which is having vast applications in the defect diagnostics of rotating machines. WPT decomposes a signal into several wavelet packets in the form of a full binary tree. It enhances the weak transients from noisy signals [54], [55]. In 2002 two researchers [56] studied the fault diagnostics of REBs by the implementation of wavelet packets. Following the paper, the WPT was employed as a symmetric means to analyze vibration signals coming out of the defective bearings. The objective of the research included the provision of a time-frequency breakdown of vibration signal and the selection of components that carries significant analytical information. This method could be conducted with minimal user intervention because of the flexibility of WPT and the efficient parameter selection eligibility. WPT resolves the limitations of DWT, it can decompose highfrequency as well as low-frequency parts of the vibration signal, but WPT is shift-sensitive. In 2005 Z. K. Peng et al. [57] published an article on the fault diagnosis of REB by comparing an upgraded HHT along with a wavelet transform. The researchers proposed an improved HHT method with the aid of WPT and intrinsic mode functions. The role of WPT in the suggested method was a preprocessor to break down the vibration signal coming from defective bearing into a series of narrowband signals. After that, IMFs were generated with the help of EMD. Finally, useful IMFs were selected by removing undesired IMFs in the selection process. TQWT is a newly designed, updated version of the wavelet transform that, depending on the Q-factor number, may break down any vibration signal into low Q-factor, high Q-factor, and residual components [58]. A. Anwarsha and T. Narendiranath Babu wrote a review article on the role of TQWT in the fault diagnosis of rolling element bearings [59]. In 2014 H. Wang et al. [60] published an article on early weak defect feature extraction of the REB employing Ensemble Empirical Mode Decomposition (EEMD) with the application of tunable Q-factor wavelet transform. By this technique, the decomposition of the collected signal is carried out by using EEMD. Then the application of TQWT to the selected intrinsic mode functions with the biggest kurtosis value. Furthermore, envelope demodulation is employed for the carefully chosen lowest Q-factor element. TQWT can overcome the limitations of the conventional WT, the quality factor of TQWT can be tuned easily, but it is computationally expensive.

Artificial intelligence for fault classification
The paper discussed the many types of signal processing approaches used in vibration analysis in the preceding sections. All of the approaches require the assistance of a professional. The most important aspect of current condition monitoring, however, is the capability to identify and repair faults without the assistance of a human. This is where artificial intelligence enters the picture. Artificial intelligence techniques have gained great applications in the area of defect diagnostics of rotating machines because of their reliability and adaptation capabilities. Furthermore, it does not necessitate complete previous substantial familiarity, which might be hard to acquire in actual practice. K-nearest neighbor, naive bayes classifier, support vector machine, deep learning, and artificial neural network are the important artificial intelligent techniques for fault identification [61]. A comparative study [62] was conducted on the effectiveness of ANN, SVM, and Gaussian Regression Process in estimating the bearing's remaining useful life. The numerous artificial intelligence methods utilized in the fault diagnostics of REBs are described in this section.
Artificial intelligence is demonstrated when a machine can execute a task that was previously accomplished by a human and was assumed to require the ability to learn, reason, and solve problems. Artificial intelligence (AI) is the ability of machines to appear to think for themselves. There are two sorts of AI techniques in general utilized in REB fault diagnosis. They're called Machine Learning and Deep Learning, respectively. The following section explains these concepts.

Machine learning algorithms
Machine learning is a subset of artificial intelligence that focuses on machines' capability to receive a collection of data and learn for themselves, modifying algorithms as they get a better understanding of the data they're processing. Machine learning and artificial intelligence are frequently used interchangeably. Machine learning is divided into three categories: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is a sort of machine learning that learns from training data with learning targets that are labeled. It usually makes predictions based on a learned mapping that generates an output for each input. There are many different types of mapping, such as decision trees, logistic regression, support vector machines, etc. There are two types of supervised learning. The first is known as classification, and the second is known as regression. When the output variable is a category, such as red or blue, faulty or healthy, the problem is called a classification problem. A classification model attempts to get some decisions from the examined data. The goal of regression is to predict the value of each data point, for example, prediction of temperature, age, etc. As a result, it's a method for analyzing the correlation between a scalar dependent variable and one or more explanatory variables. Unsupervised learning entails constructing an internal representation of the input, such as finding clusters or extracting features. The program must determine the data patterns on its own because no information about the correct output labels is available. As a result, the input is unlabeled, and the algorithm must figure out what it is. Clustering or dimensionality reduction are two examples of unsupervised learning. Clustering is a crucial topic in unsupervised learning since it involves identifying a structure or pattern in a set of uncategorized data. The process of lowering the number of random variables by establishing a set of principal variables in the problem under consideration is referred to as dimensionality reduction. The information available for training in reinforcement learning is truly intermediate between supervised and unsupervised learning. As a result, instead of training samples that show the proper output for a provided input. The training data are considered to provide merely a rough indicator of whether or not a given action is correct. So, we're concerned with the problem of finding the best course of action in a given situation. The following are some of the most often utilized machine learning algorithms in REB's fault diagnosis field.

K-nearest neighbors (KNN)
The k-nearest neighbor technique is a simple pattern recognition methodology; it is particularly a multi-class classifier. The KNN technique is based upon the learning analogy and does not require any parameters to be chosen carefully. It is highly successful in the identification of statistical patterns, and it can accomplish better categorization accuracy for unlabeled distribution data. One of the advantages of the KNN algorithm is that it can eliminate the misclassification error to a great extent, especially when the training is carried out from a large number of data sets. Where K indicates the number of neighbors which affects the categorization accuracy of the KNN method [63]- [65]. KNN has better classification accuracy when the training dataset is larger. But then it is computationally expensive and requires more storage space.

Support vector machine (SVM)
A support vector machine is an efficient tool in the area of machine defect diagnostics, which is capable of making a consistent decision for a lesser quantity of datasets and has a better generalization ability. In the condition monitoring process, SVM is used to find specific patterns in obtained signals, which are subsequently categorized based on the machine's problem incidence [66], [67]. In 2018 two researchers [68] proposed a new enhanced support vector machine categorization procedure with multi-domain characteristic features. To ameliorate the low recognition accuracy and insufficient feature extraction of current processing approaches with a single domain feature, they suggested a defect categorization algorithm through enhanced SVM with multiple domain features. In the feature extraction stage, vibration signals' fundamental properties and condition information were extracted by applying statistical analysis, FFT, and Variation Mode Decomposition (VMD) approaches. In the feature selection stage, a meaningful sensitive feature was selected with the improved calculation efficiency by the Laplace Score Algorithm technique. In the fault identification stage, a particle swarm optimization-based SVM classification model was applied. The main limitations of SVM are, that it is a time-consuming procedure when the training dataset is too large, and it is inappropriate for multi-class classification since it is used basically for binary-class classification.

Naive Bayes (NB) classifier
The Naive Bayes classifier is a categorization method for defect diagnostics of the REBs, specifically based upon the Bayes rule. In other words, the NB classifier is a controlled learningclassification technique based on probability, it has achieved great popularity because of its unique classification model and outstanding classification outcome. One of the advantages of the NB classifier is that it evaluates the characteristic features of the signal from a lesser number of training data [69]- [71]. However, the NB classifier requires prior probability and strong assumptions. Furthermore, it is nearly impossible in a real-time context to assume that all predictor features are mutually independent.

Artificial neural network (ANN)
An artificial neural network is an optimization technique inspired by the human brain. It can be used for processing information such as data classification and pattern recognition. ANN is one of the computational models in artificial intelligence techniques, particularly an interrelated assembly of uncomplicated processing components, parts, and nodes. In various ANN operations, processing of the data is being conducted through a single neuron, which results in slower computation. The advantages of ANN include better learning skills, noise reduction properties, and computation capabilities. Nevertheless, the successful execution of ANN-based diagnostics heavily is reliant on the appropriate choice of the nature of network structure and the number of training data that do not necessarily exist in actual practice [72]- [74]. Some researchers [75] published an overview of the application of ANN in defect diagnostics and fault classification of REB. Two researchers [76] made a comparison study between ANN and SVM in defect classification and FD of bearings. They found that SVM has some advantages in classification accuracy over ANN. In 2003 B. Samanta and K. R. Al-Balushi [77] published an article on fault diagnostic of the rolling-element bearing by the implementation of artificial neural networks in time-domain features. The characteristic features for ANN inputs were collected from time-domain vibration signals of regular and faulty bearings. In a study [78] the remaining useful life of slow-speed bearings was estimated by the application of multilayer ANN and linear regression classifier by analyzing AE signals. However, the limitations of ANN such as hardware dependency, unexplained functioning of the network, the complexity of showing a problem to the network, etc. to be considered when selecting ANN for fault classification of machine elements like REB.

Fuzzy logic system
Fuzzy logic is a potential tool for decision-making to solve problems with imprecision and uncertainties. A classical set allows answering two variables either true (1) or false (0), whereas a fuzzy set allows answering the range between zero and one [79]. Fuzzy sets can be used as the fault classification tool in the area of defect diagnosis of REB. In 2019 F. Gougam et al. [80] suggested a hybrid technique for FD of bearings by the combined application of the EWT and fuzzy logic system. The proposed method helped them to detect the early-stage fault under variable operating conditions. Two researchers [81] put forward a new method for FD of REB by taking the advantage of fuzzy sets, hierarchical entropy, and support vector machines. A study [82] was conducted on rotating machinery by incorporating the applications of fuzzy logic and adaptive filter technique to detect and assess the severity of the faults. However, a major drawback of fuzzy logic is that it completely depends on human knowledge and expertise. Moreover, it is required to update the fuzzy rules regularly in the control system.

Particle swarm optimization (PSO)
Particle swarm optimization is an evolutionary and stochastic optimization technique inspired by nature that is used to address computationally difficult optimization issues [83]. Y. Cheng et al. [84] used the PSO algorithm in defect identification of REB to solve the inverse filter of the deconvolution problem. A study [85] was conducted on FD of REB with the combined application of PSO as a feature selection method and SVM as a classification method. In 2016 C. Yi et al. [86] conducted experiments on the fault feature extraction of REB by taking the advantage of PSO and variational mode decomposition. However, PSO has a low convergence rate in the iterative process, and it easily falls into local optima when it is dealing with high-dimensional problems. Moreover, it requires human knowledge and expertise.

Deep learning architectures
With the rise of the Internet of Things, intelligent manufacturing has shifted its attention to the collection of massive amounts of data known as big data [87]. Big data analytics is the practice of analyzing large amounts of data to find hidden patterns, unknown relationships, and other insights. Its main purpose is to assist people or machines in making intelligent decisions by evaluating big data streams from many sources [88]. Deep learning technology has been added to address the existing issues of prognostics and diagnostics caused by the vast, diverse, high-speed, and variable big data generated by industrial systems. PHM technology based on deep learning has been used to diagnose faults and assess the health of motors, gearboxes, bearings, and other mechanical components, and has outperformed previous methods [89]- [92].  It has a simple premise, is straightforward to apply, is robust to control parameters, and is computationally efficient PSO has a low convergence rate in the iterative process, and it easily falls into local optima when it is dealing with high dimensional problems

Deep Learning
The feature extractor is not required for learning features and spotting flaws, and the deep architecture allows for learning more complicated structures from data It necessitates a significant quantity of sample data, it is expensive to train for a long time due to the complex data models, etc The general methods of fault diagnostics of rolling element bearings were discussed in the preceding sections. But in this section, we'll go over some of the most cutting-edge techniques in this field. For this, the most recent four years' worth of research publications has been used. Although the benefits of AI approaches used in fault classification are discussed in the previous part, this section covers a much larger version of the most powerful deep learning techniques. Deep learning is a subset of machine learning in which a model learns to do categorization tasks using only images, numbers, text, or voice as input. Neural network architecture is generally utilized to execute deep learning. The term "deep" refers to the network's number of layers; the more layers, the deeper the network. Deep neural networks can have more than hundreds of layers, whereas conventional neural networks have just two or three.
Deep learning is state-of-the-art because of its classification accuracy. This level of precision is made possible by three factors: One, easier access to large amounts of labeled data, two, increased computer power, and three, expert-built models. Inspired by biological nerve systems, a deep neural network integrates numerous nonlinear processing layers, using simple pieces functioning in parallel. An input layer, multiple hidden layers, and an output layer make up the structure. Each hidden layer utilizes the output of the preceding layer as its input, and the layers are linked by the nodes or neurons. The following are some of the most cutting-edge deep learning algorithms in the field of rolling element bearing's fault classification.

Deep belief networks (DBN)
A DBN is a generative graphical framework made up of numerous layers of latent variables, most of which are binary, that can denote hidden characteristics in input observations. Like an RBM (Restricted Boltzmann Machine) model, the link between the leading two levels of a DBN is undirected, and so a DBN with one hidden layer is just an RBM. Except for the final, all of the remaining connections in DBM are directed graphs to the input layer. J. Tao et al. [93] wrote an article on the fault detection of rolling element bearings on the basis of deep belief networks. In 2020, S. Liu et al. [94] suggested a fault identification procedure of REB based on the enhanced convolutional DBN.

Autoencoders
Autoencoder is a three-layer neural network that uses its output layer to try to recreate its input. As a result, an autoencoder's output layer has an equal number of units as the input layer. There are two aspects to the autoencoder technology. Between the input and the hidden layer is the encoder, and between the hidden and the output layer is the decoder. As a result, during the encoding phase, the input trials are frequently recorded in a lower-dimensional feature space. This procedure can be done until the necessary feature dimensional space is obtained. In the decoding phase, we use reverse processing to regenerate the true features from the lower-dimensional features. In a study [95] an autoencoder-based FD of REB using a deep graph convolutional network (DGCN) was proposed. In this method, acoustic signals were collected as graphs and fed into DGCN, where the features were extracted. In 2018 A. Prosvirin et al. [96] studied autoencoder-based FD of bearings by the application of a convolution neural network with kurtogram representation. In this study, they transformed the one-dimensional AE signal into a two-dimensional kurtogram representation that allows CNN to extract high-quality features. Based on an Improved Stack Autoencoder (SAE) and SVM, M. Cui et al. [97] suggested a solution for the FD of rolling bearings. In [98], a unique SAE model for bearing problem diagnostics is created employing a dynamic learning rate, which effectively overcomes the fixed learning rate's drawbacks.

Convolutional neural network (CNN)
The most well-known convolutional neural networks are the next deep architecture. An input layer, many alternating convolutions and max-pooling layers, one fully connected layer, and one classification layer make up the CNN's overall design. The convolutional layers and the subsampling or pooling layers permit the network to pick up filters that are particular to certain regions of the data. The convolution layers assist the framework in preserving the spatial arrangement of pixels found in any picture. The network can summarize the pixel information thanks to the pooling layers. In 2017 L. H. Wang et al. [99] wrote an article on the application of a convolution neural network in the fault diagnostics of motors. In this paper, the advantages of CNN are that it is highly efficient in pattern recognition, it does not require the pre-treatment of input images and it is easily adaptable in the translation of input images. The original signal is used as input in [100] to create a unique network structure based on singular value decomposition (SVD) and 1DCNN, which allows for intelligent identification of bearing problems. Using a Two-Dimensional Convolutional Neural Network (2DCNN), X. Peng et al. [101] devised a method for diagnosing rolling bearing faults. For bearing fault diagnostics, A. Khorram et al. [102] presented an end-to-end CNN plus LSTM deep learning technique.

Recurrent neural networks (RNN)
RNNs are feed-forward networks that span many time steps. A network node collects existing data inputs as well as hidden node values obtaining information from prior time steps at any given time. RNNs are special in that they may operate on a sequence of vectors across time. Sequences could be used in the input, output, or, in the most common situation, both. The three types of RNNs most typically utilized in the FD of REB are Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional Recurrent Neural Networks (BRNN). The upgraded classic recurrent neural network (RNN), Long Short-Term Memory (LSTM), can acquire the whole historical information of input data. However, there are certain drawbacks to RNN, such as the possibility of gradient disappearance or gradient explosion during backpropagation. Input, output, and forget gates are added to the LSTM to solve these concerns. In recurrent neural networks, a gated recurrent unit (GRU) is a gating mechanism that can be employed in both their complete form and numerous reduced variations. Because they don't have an output gate, they have a smaller number of parameters than LSTM. Bi-directional RNNs forecast or label every element of a series on the basis of the element's past and future perspectives using a finite sequence. Concatenating the outputs of two RNNs, one processing the sequence from left to right and the other from right to left, accomplishes this.
For the shortcomings of existing fault detection methods, an end-to-end intelligent fault diagnosis procedure for the bearing is suggested in [103], which combines a long short-term memory network (LSTM) with a one-dimensional CNN. X. Chen et al. [104] offered a neural network with automated feature learning that accepts raw vibration signals as inputs and employs two CNNs with varying kernel sizes to collect distinct frequency signal properties. Then, based on the learned attributes, LSTM was employed to determine the defect kind. Using a recurrent neural network, Z. An et al. [105] proposed a novel bearing intelligent failure diagnostic framework for time-varying operating situations. Multi-sensor bearing defect diagnostics based on a 1D convolutional LSTM network were proposed in [106].

Transfer learning
Transfer learning is a concept in deep learning that involves taking the knowledge learned while addressing one problem and employing it to a similar but different problem. For instance, the skills learned while learning to distinguish cars can be applied to recognizing trucks to some extent. And this is an exciting breakthrough in the field of deep neural networks. So, there are two phases here. The first step is pre-training, which is training a network with a large quantity of data so that the model may learn the weights and biases, and then fine-tuning, which entails transferring these weights to another network for testing or training a similar new model. In addition, rather than starting from scratch, the network can use pre-trained weights. Z. Wang et al. [107] proposed a method for the fault detection of REB with the help of transfer learning techniques. Some researchers proposed a method for the fault detection of bearings based on the transfer learning approach [108].
We're currently discussing different types of deep learning methods. Deep Forward Networks, Restricted Boltzmann Machines, Generative Adversarial Networks, Deep Reinforcement Learning, and Bayesian Deep Learning are some of the other deep learning methods accessible. But we only covered the methods for fault classification of rolling element bearings in this article. The deep learning algorithms have better performance in the modeling of high-level data processing [109]. However, there are still some drawbacks of deep learning exists such as it necessitates a significant quantity of sample data, it is expensive to train for a long time due to the complex data models, etc.

Summary and discussion
Rolling element bearing is an indispensable component in every machine, so its fault diagnosis is very important. Data acquisition, signal processing, and fault classification are the three main components of the fault diagnostic technique. Some of the approaches used in signal processing and fault categorization are described in this paper. Every day, new developments in this field emerge. It's nearly hard to construct a summary article that covers all of the papers in the collection. However, practically all of the most common ways are covered. The advantages and disadvantages of each approach are listed in the tables, making it very easy for readers to understand the various methods.
Until about a decade ago, any of the above-mentioned approaches were used to diagnose and classify most faults. However, the current trend is to combine multiple methods to achieve superior fault classification findings. Artificial Intelligence (AI) is also becoming more prevalent in this field. With the advancement of artificial intelligence in this field, it is now possible to detect flaws ahead of time without the involvement of humans. Table 8 summarizes the benefits and drawbacks of the most generally used artificial intelligence approaches for defect identification. Today, academics are attempting to learn more about deep learning's potential and how to apply it to improve defect diagnostics. In addition, this article included table 9 which summarized the work of numerous researchers, including the main objectives, techniques used, and their main findings.
The information for this article was gathered from research and review papers published in peer-reviewed publications. A review of over a hundred research papers reveals that there are still some gaps in this field. The majority of papers deal with a single fault, with only a few dealings with multiple faults. It's also worth noting that many articles clearly indicate the presence of a fault but do not specify the fault's exact size or width. Likewise, when we come to AI-based methods, deep learning, has numerous advantages, but its computational complexity remains a challenge. To address these issues, new ways must be introduced into the field. Easy to implement, require a smaller number of data samples [139] Bearing fault diagnosis Fuzzy logic + multiscale permutation entropy Better performance, Better accuracy [35], [110] Fault diagnosis of bearings EMD + Hilbert Transform Better output in analyzing various frequency ranges [112], [113] Early detection of defects and diagnostic monitoring in REB High-frequency resonance technique Suitable to detect both inner race and outer race faults at incipient stages

Conclusions
An attempt has been made to summarize the various vibration analysis methodologies for fault diagnosis of rolling element bearings in this work. Every day, a slew of new techniques emerges in this discipline. As a result, writing a review paper that covers all of the methods is nearly impossible. This article summarized over a hundred peer-reviewed research publications that explained how to detect faults in rolling element bearings using time-domain, frequency-domain, time-frequency domain, cyclo-stationary analysis, and artificial intelligence-based methodologies. RMS in the time domain, FFT in the frequency domain, and WT in the time-frequency domain are the most often utilized techniques in REB fault diagnostics, according to the published articles. Similarly, in AI-based approaches, SVM is the most often used methodology. However, more studies on deep learning have recently been published. In this article, the aforementioned method's advantages and disadvantages are clearly outlined in distinct tables. Therefore, this paper is valuable for research students and people who want to learn more about vibration analysis methodologies for rolling element bearing failure diagnosis.