Research on road damage recognition and classification based on improved VGG-19

. In recent years, methods of road damage detection, recognition and classification have achieved remarkable results, but there are still problems of efficient and accurate damage detection, recognition and classification. In order to solve this problem, this paper proposes a road damage VGG-19 model construction method that can be used for road damage detection. The road damage image is processed by digital image processing technology (DIP), and then combined with the improved VGG-19 network model to study the method of improving the recognition speed and accuracy of VGG-19 road damage model. Based on the performance evaluation index of neural network model, the feasibility of the improved VGG-19 method is verified. The results show that compared with the traditional VGG-19 model, the road damage VGG-19 road damage recognition model proposed in this paper shortens the training time by 79 % and the average test time by 68 %. In the performance evaluation of the neural network model, the comprehensive performance index is improved by 2.4 % compared with the traditional VGG-19 network model. The research is helpful to improve the model performance of VGG-19 road damage identification network model and its fit to road damages.


Introduction
Whether it is concrete road or asphalt road, after a period of driving, various damages, deformations and other defects will occur one after another [1].The generation and extension of road damages will reduce the performance and life of road [2].At the same time, road damage can also cause serious traffic jams and accidents, resulting in adverse social impacts.Efficient and accurate recognition, detection and classification of road damages are the focus of researchers [3].
With the continuous development of China's economy, the development trend of highway construction is becoming more and more complicated [4].Road engineering plays an important role in national economic construction [5].The number of modern highway construction has risen sharply and road structure system has been continuously improved.At the same time, the identification and detection of road damages are also facing major challenges.Although road damage recognition methods have been widely used, most of these recognition methods rely on artificial vision for recognition and detection, which has the problems of low detection accuracy, long time consumption and high risk.With the progress of the times and the development of science and technology, the method of detecting road damages based on machine vision recognition is gradually rising.It meets the needs of most road damage identification and detection, intelligently and automatically completes the data collection of various road damages, and greatly improves the speed and accuracy of road damage detection.The efficiency of the method based on machine vision recognition to detect road damages is about 40 times that of artificial vision, and the detection rate and accuracy are far more than artificial vision [6][7][8][9][10].
Deep Learning (DL) is a new research direction in the field of Machine Learning (ML).It is introduced into machine learning to make it closer to the original goal -Artificial Intelligence (AI) [11].With the rapid development of deep learning theory, AlexNet has made breakthroughs in the field of object detection [12].Since then, algorithms based on large-scale visual recognition challenges and Convolutional Neural Networks (CNN) have been studied in depth, and great progress has been made in image classification, object detection and semantic segmentation [13].Target detection technology based on deep learning has been widely used in biomedicine, transportation, safety, civil engineering and other fields [14].The classification and detection of road damages has always been an active field of civil engineering research [15].Fan R., Bocus M. J. et al., as the first researchers to use convolutional neural networks in the field of road damage detection and recognition [16].Taichi Yamada developed an automatic mobile robot that uses a 2D laser scanner to obtain road damage information, which can measure the depth of road damage [17].With the advancement of imaging technology and the continuous improvement of photo resolution, image-based road damage recognition has gradually become the mainstream [18].Allen Zhang et al. proposed an efficient architecture CrackNet based on convolutional neural network (CNN), which can realize pixel-level automatic crack detection in 3D asphalt road [19].Singh, Janpreet et al. proposed Mask-RCNN (Mask-Region Convolutio) [20].
At present, the recognition and detection of road damages based on deep learning has achieved remarkable research results, but in general, the recognition and detection of road damages is mainly aimed at cracks, ignoring the ductility and variability of road damages [21].After cracks occur on the road surface, failure to deal with them in time will lead to structural damage to the road, decrease the bearing capacity of the road surface, accelerate local or patchy damage to the road surface, and evolve into other types of damage, such as potholes and looseness [22].Moreover, in the detection of road damage identification, only cracks are considered, and other types of road damages are not considered.There will be over-fitting, false detection, etc., resulting in low accuracy and slow speed in the process of deep learning [23].However, among many neural networks, VGG-19 (Visual Geometry Group-19, VGG-19) can deal with image classification problems well due to its own structural characteristics.Moreover, in recent years, VGG-19 has achieved remarkable results in the field of structural damage identification, such as medicine, civil engineering, computer and other fields [24][25][26].However, there are few studies in the field of pavement disease identification.The VGG-19 network model uses small convolution kernels and small pooling kernels.It has the advantages of simple structure, more channels, deeper layers, and wider feature maps.It is mostly used in image recognition and classification work [27].However, due to its large amount of computing resources and many parameters used, the VGG-19 network model is time-consuming and low-precision in the process of recognition, detection and classification.In addition, there is still a lack of samples in the current road damage identification, which leads to insufficient model training and inaccurate test results in the process of training and testing of the VGG-19 network model.
Therefore, this paper studies the classification method of road damage recognition based on improved VGG-19.The road damage image samples are obtained by network extraction and camera equipment shooting.Digital Image Processing (DIP) such as Region of Interest (ROI) and Histogram Equalization (HE) were used to remove the noise interference under the influence of environmental factors and obtain the gray image of road damages.In the case of limited sample labels, a VGG-19 network model for identification and classification of various road damages is established based on MATLAB R2022b.Based on statistical data analysis methods, the performance indicators of each network model are sorted and analyzed.On this basis, the VGG-19 network model which can accurately and efficiently train, identify and classify road damages is obtained through experimental comparison.

Road damage image processing based on improved VGG-19
In order to realize the accurate identification of road damage in the case of a small number of samples, the VGG-19 network model is constructed based on MATLAB.It is necessary to process the obtained road damage images to make the illumination on the images uniform, reduce noise interference, eliminate irrelevant information in the images, restore useful real information, enhance the detectability of relevant information and simplify the data to the greatest extent, obtain samples that can effectively identify road damage parameter information, and construct a VGG-19 network model [28].

Road damage image
The road images used in this study include cracks, potholes, rutting, loose and normal road images, a total of 1051 road damage images were obtained.The road crack image is from the public dataset of road damage image of Crack Forest; potholes and normal images are from public source datasets in CSDN; road rut and loose images are from the road damage data set in ExtremeMart.In addition, in order to expand the number of training samples, the camera equipment is used to shoot the road damage image, a total of 150 road damage images, and a limited road damage image sample library is constructed.Randomly selected images of non-damage and different damage types from the sample library, as shown in Fig. 1.The road surface damage contained in Fig. 1 are analyzed.The contrast is blurred, and the unique texture features of various damage types are not obvious.They are seriously disturbed by environmental noise such as automobiles and illumination.Therefore, in order to better fit the VGG-19 network model, it is necessary to perform image denoising and image enhancement processing on road damage images.

Road damage image processing
The quality of the directly obtained pavement disease image is poor, and the network model is difficult to capture the feature information of the disease image [29].In order to make the pavement disease image better fit the VGG-19 network model, eliminate the redundant noise interference, and enhance the feature information beneficial to the network model training in the pavement disease image, the pavement disease image is processed based on image processing methods such as gray level changes (GLC) normalization (Normalize), region of interest extraction (ROI), normalization (Normalize), histogram equalization (HE), and median filtering (MF) to obtain pavement disease images with strong recognition, high contrast, and obvious texture features.Improve the efficiency of VGG-19 classification and recognition images.The image processing flow of pavement disease in this study is shown in Fig. 2.

Gray level change
The gray level transformation of the image refers to the method of changing the gray value of each pixel in the source image point by point according to a certain transformation relationship according to a certain target condition.The purpose is to improve the image quality and make the image display effect clearer.The gray-scale transformation method used in the research is a gray-scale linear transformation, which belongs to a kind of gray-scale transformation.By establishing a gray-scale mapping to adjust the gray level of the source image, the purpose of image enhancement is achieved.The principle is to enhance or weaken the gray level of the image by transforming the pixel value of the image through a specified linear function.The formula of linear transformation of gray scale is a common one-dimensional linear function, as shown: Let  be the original gray value, then the calculation formula of the gray value of the transformed  is as shown: where  represents the slope of the line, that is, the degree of tilt,  represents the intercept of the linear function in the -axis.In the formula, the values of different  and  represent different meanings: when  > 1, it means the contrast of the enhanced image, the pixel value of the image increases after linear transformation, and the overall effect is enhanced.When  = 1, it means that the image brightness can be adjusted by adjusting the size of .When 0 <  < 1, the overall contrast of the image is weakened; when  < 0, it means that the bright area in the original image darkens and the dark area in the original image brightens.

Region of interest extraction
After converting road damage image into a grayscale image, the image part containing the texture features of the road damage is set as the "region of interest" (ROI) of the image.ROI interception is a very critical step in image processing.The interception of ROI region can reduce the amount of data in subsequent image processing while eliminating some noise.The study selected a window of 2440×2600 to intercept the ROI region of five road damage images.

Road damage image normalization
In the process of image processing, image normalization is a necessary preprocessing process.There are three purposes to study image normalization.First, if image normalization is not performed, the size of the image cannot perfectly fit the input layer size of the VGG-19 network model, resulting in the network model cannot be trained.Second, if the input layer size is set to be large, the gradient transmitted to the input layer will become large during back propagation.If the gradient is large, a smaller learning rate must be set, otherwise it will cross the optimal.In this case, the selection of the learning rate needs to refer to the size of the input layer value, and the data normalization operation can easily select the learning rate, so as to facilitate the network model training ; third, image normalization can remove the average brightness value of the image.Subtracting the statistical average of the data on each image can remove the common part and highlight individual differences.
Based on a common image normalization principle, the road damage image is normalized (Normalize) batch processing.The principle is shown in Eq. ( 3): Among them,  and  are the values before and after normalization, respectively.MaxValue and MinValue are the maximum and minimum gray values of the original image, respectively.On this basis, the image size is modified to 244×244 to perfectly match the input layer of the network model.

Histogram equalization
Histogram plays an important role in image processing.The gray histogram of the image describes the gray distribution in the image, which can intuitively show the amount of each gray level in the image.Histogram equalization (HE) is a method to enhance the contrast of images.The main idea is to transform the histogram distribution of an image into an approximate uniform distribution through the cumulative distribution function, thereby enhancing the contrast of the image.In order to extend the brightness range of the original image, a mapping function is needed to evenly map the pixel values of the original image to the new histogram, but the mapping function has two necessary conditions : First, the order of the original pixel values cannot be disrupted, and the size relationship between bright and dark after mapping cannot be changed; second, the mapping must be in the original range, that is, the range of the pixel mapping function should be between 0 and 255.
Since the image is composed of pixels, the image histogram equalization is solved by the discrete cumulative distribution function.In the process of histogram equalization, the mapping method is shown in Eq. ( 4): Among them,  refers to the value of the current gray level after the cumulative distribution function mapping,  is the sum of pixels in the image,  is the number of pixels in the current gray level, and  is the total number of gray levels in the image.

Median filtering
The median filtering (MF) is a nonlinear smoothing image processing technology, which is a nonlinear image processing technology based on the order statistics theory and can effectively suppress noise.The basic principle of median filtering is to replace the value of a point in a digital image or digital sequence with the median value of each point in a neighborhood of the point, so that the surrounding pixel values are close to the true value, thereby eliminating isolated noise points.
Based on the two-dimensional median filtering function (medfilt2) of MATLAB, the median filtering (MF) processing of road damage images is carried out by filling the image edges, adding salt and pepper noise, and data conversion.Set the one-dimensional sequence  , take the window length  ( is odd), and perform median filtering on it.The number of  is successively taken out from the input sequence, and the number of m is sorted by size, and the number whose serial number is the center point is taken as the filter output.The principle is as shown in Eq. ( 5 ( Taking the road crack image as an example, the road crack image after image processing is shown in Fig. 3.

Fig. 3. Final image of road cracks after image processing
By analyzing the road crack image information in Fig. 3, it can be seen that the crack image processed by image processing methods such as gray level changes (GLC), region of interest extraction (ROI), normalization (Normalize), histogram equalization (HE) and median filtering (MF) has uniform illumination, the contrast of interference noise is significantly reduced, and the contrast of cracks is significantly increased.This image can perfectly fit the training and recognition of the neural network model.This road damage image can form a road damage image data set for use by the neural network model.

Improved methods of VGG-19 model
In convolutional neural networks, VGG-19 is often used in the field of object recognition.Each layer of the VGG-19 neural network will use the output of the previous layer to further extract more complex features until it is complex enough to be used to identify objects, so each layer can be seen as many local feature extractors.However, there are still some shortcomings and defects in the identification of road damages.The ordinary VGG-19 network model often appears in the training and testing of neural networks, such as slow speed, low training accuracy and over-fitting.In order to solve the above problems, combined with the characteristics of road damage images, an improved model based on VGG-19 network model structure is proposed: Road damage VGG-19.
The construction of Road damage VGG-19 network model adopts the method of adjusting batch size and adding batch normalization layer.Because the most important factor affecting the training speed and test accuracy of the VGG-19 network model is the size of the batch size, the batch size can be defined as the number of samples taken in one training, which belongs to a hyperparameter, usually set when the model is trained, generally 2.The size of the Batch size affects the optimization degree and speed of the model.At the same time, it directly affects the use of GPU memory.Too large or too small Batch size has a great impact on the model.Therefore, it is necessary to set an appropriate batch size value to find the best balance between training speed and memory capacity.In the training process of the neural network, as the depth deepens, the input value distribution will shift to the upper and lower ends of the value interval, which leads to the disappearance of the gradient of the low-level neural network during back propagation, which is an important reason for the slower convergence of the deep network.The batch Normalization can force the distribution of the input values of each layer of the neural network back to the standard normal distribution with a mean value of 0 and a variance of 1 through a certain standardization method.Its role can accelerate the convergence speed of the model training, so that the model can detect and identify the road damage image faster and more accurately with a small number of sample labels.The Road damage VGG-19 network model is obtained by adding the batch normalization layer, and its network structure is shown in Fig. 4.
By analyzing the network structure shown in Fig. 4, it can be seen that the road damage image enters the VGG-19 network input layer as input data.After five convolution layers, five pooling layers, data normalization, and full connection layers, the network model recognizes the image as a crack image.Then, the final output layer of the VGG-19 network model outputs the image as a road crack image.
Based on the original VGG-19 convolutional neural network model and the same equipment, the batch size is adjusted and the batch normalization layer is added to adjust the model structure to improve the training speed and verification accuracy of the model.Adjusting the size of the batch size can significantly improve the speed and verification accuracy of the model in identifying road damage images.In addition, with the continuous increase of the training depth of the model, the road damage image is used as the input of the model, and the input value distribution will be offset and close to the upper and lower ends of the value range.The addition of the batch normalization layer can force the distribution of the input value of each layer of the road damage image back to the standard normal distribution with a mean of 0 and a variance of 1, thereby    By analyzing the data in Table 1, it can be seen that Model (5) has much higher efficiency than other models in pavement disease identification, and the verification accuracy of Model (5) for road damage images is higher than that of other models.It can be seen that the ability of Model (5) to recognize road damage images is higher than other models.Therefore, Model ( 5) is used as the VGG-19 network model of this study, and it is named Road damage VGG-19 road damage identification model.In addition, according to the type of road damage recognition during model training, the final output size of the model is defined as 5.

Improved VGG-19 network model training
The configuration of the running device is one of the main factors affecting the training time of the model, that is, the CPU and GPU of the device.Experimental environment: CPU: AMD Ryzen 7 6800H with Radeon Graphics, 3.20 GHz; gPU: NVIDIA GeForce RTX3060 Lap Top GPU, combined with MATLAB deep learning framework, completed training in MATLAB R2022b environment.
In order to improve the training speed and verification accuracy of the model, the matching ability and recognition ability of the model training for the road damage image are strengthened.The training parameters of the model (1-5) are uniformly set to 64 mini batch size, 0.0001 Initial Learn Rate and 30 Validation Frequency.The setting of the mini batch size can divide the road damage image sample set into equal subsets, and train each subset to improve the training speed of the road damage recognition model.Setting the initial learning rate reasonably can make the road damage recognition model converge to the local minimum in a suitable time when training and recognizing the road damage image, so as to improve the training speed of the road damage recognition model.In addition, reasonably setting the number of iterations (Validation Frequency) can prevent the over-matching problem of the road damage recognition model when training and recognizing the road damage image.
In addition to the above unified training parameters, the fundamental difference between model (1-5) training is whether the batch normalization layer and the position of the batch normalization layer are added, which determines the speed of training time and the accuracy of training.The specific settings can be seen in Table 1 in Section 2.1.
After the VGG-19 network model training is completed, the trained model parameters are saved for subsequent model testing.

Improved VGG-19 network model testing
In this study, in the MATLAB R2022b test environment, after training the model (1-5), the trained model parameters are loaded, and the accuracy of classification and recognition is obtained by model test classification.In the test set, the cracks, potholes, ruts, loose and normal roads are tested and identified in turn, and the average time and accuracy of each road damage identification test are counted to form the model (1-5) test data comparison table, see Table 2.By analyzing the data in Table 2, it can be seen that average test time of model (1-4) is more than 0.45 s.However, the average test time of model (5), namely Road damage VGG-19 road damage identification model, is only 0.19 s, and the average test time of Road damage VGG-19 road damage identification model is the shortest.Secondly, the test accuracy of Road damage VGG-19 road damage recognition model is 98 %, which is the second in all test models, which is in line with the significance of model modification, and has played a role in improving the accuracy of the model and reducing the test time.
It can be concluded that the efficiency, verify accuracy and average test time of Model (5), that is, Road damage VGG-19 pavement disease recognition model, are better than other models.

Construction of network model performance evaluation index
In order to verify the effectiveness of the network model, the feasibility of model (1-5) is evaluated after the training and testing stages are completed.Performance evaluation indicators such as Accuracy, Sensitivity, Specificity, Precision, and F1 Score were used to verify the performance of the model.These values depend on the true positive (TP), false negative (FN), true negative (TN) and false positive (FP) values, which can be calculated from the confusion matrix of the five models.The calculation formula and corresponding description of the performance index are as follows: Accuracy represents the proportion of all samples with correct prediction in all test samples, reflecting the recognition accuracy of road damage recognition model for road damage images.Sensitivity, also known as recall rate, represents the proportion of correct prediction results in all positive events, reflecting the recognition ability of road damage recognition model for road damage images.Specificity expresses the recognition ability of negative samples, which is similar to sensitivity, and also reflects the recognition ability of road damage recognition model for road damage images.Precision represents the proportion of true positive samples in all predicted positive samples, reflecting the recognition accuracy of road damage recognition model for road damage images.F1 score.In the competition of some classification problems, F1 score is used as the final evaluation method, which represents the harmonic average of precision and recall.The maximum is 1 and the minimum is 0. For the road damage recognition model, F1 score is often used to reflect the performance of the model.
The essence of the classification confusion matrix is a table.Through the collection of samples, we can directly  now the positive and negative situations of the predicted road damage images in the real situation [30].At the same time, the results of the classification model can be obtained from the sample data, and the Positive and Negative in the data can also be grasped.Therefore, four basic indicators are obtained, which are collectively referred to as the first-level indicators.The four indicators are presented in the table to obtain the following matrix, namely the classification confusion matrix, which defines TP, FN, TN and FP based on True, False, Positive and Negative.For the road damage recognition model, the classification confusion matrix reflects the number of observations of the road damage model for the road damage image misclassification and classification.The classification confusion matrix of (1-5) is shown in Fig. 5.
By analyzing the confusion matrix in Fig. 5, it can be seen that the output type of the network model is five types.Therefore, the classification confusion matrix of models (1-5) belongs to the multi-classification confusion matrix.In each classification confusion matrix, the real class and the prediction class are included.The real class represents the real value, and the prediction class represents the predicted value.The serial numbers 1-5 represent the cracks, potholes, rutting, loose and normal road images respectively.Taking the classification confusion matrix of Model (5) as an example, the blue number in the 5×5 matrix represents the correct number of road damages identified for each type.Taking the first column of the matrix as an example, it represents that 292 images are identified as road crack damages, 6 images are identified as road pothole damages, 2 images are identified as road rutting damages, 3 images are identified as road loose damages and 5 images are identified as normal road.The blue percentage below the first column represents the proportion of the damage with the real label as the road crack being correctly predicted, and the yellow percentage represents the proportion of the image with the real label as the road crack being incorrectly predicted as other types of road damages.According to the data obtained in the matrix, the values of TP, FN, TN and FP can be calculated, and then the evaluation indexes can be calculated.By analyzing the image information in Fig. 6, it can be seen that the accuracy, sensitivity, precision and F1 score of Model ( 5), namely Road damage VGG-19 road damage recognition model, are higher than those of the other four models.However, the specificity is lower than the model (4).Through comprehensive analysis, the research improves the performance of the model by modifying the parameters and structure of the VGG-19 network model.

Conclusions
In the research, the problems existing in the current road damage detection and recognition are discussed, which may be caused by the model itself and the image itself.Therefore, the research first uses digital image processing techniques such as grayscale conversion, ROI interception, histogram equalization, and median filtering to remove environmental noise interference in road damage images, and obtain road damage images with high contrast and clear targets.Then, the research starts with optimizing the parameters and structural mechanism of the network model, and proposes a total of five road damage identification VGG-19 network model improvement schemes.The five models are trained and tested in turn.Finally, the Road damage VGG-19 road damage identification model with the shortest average test time (0.19 s) and the highest test accuracy (98 %) is selected as the final model.Finally, based on the first-level evaluation index of neural network model, the second-level and third-level indexes of performance evaluation are calculated, and the comprehensive performance evaluation of five road damage identification models is carried out.
The verification analysis shows that the accuracy, sensitivity, precision and F1 score of the road damage VGG-19 road damage recognition model proposed in this paper are higher than other improved models in the comprehensive performance evaluation link, and the comprehensive performance is improved by 2.4 % compared with the traditional VGG-19 network model.It is proved that the model performance of road damage recognition model can be improved by modifying the parameters and structure of VGG-19 neural network.
accelerating the convergence speed of the model training and making the model training process more stable.Five improvement schemes were designed: (1) traditional VGG-19 model (2) adding a batch normalization layer at the end of the traditional VGG-19 model structure (3) adding a batch normalization layer before each activation function in the traditional VGG-19 model (4) adding only two batch normalization layers on the basis of the traditional VGG-19 model.Add and modify the batch size to 64 before the last two fully connected layers of the model structure.(5) Based on the traditional VGG-19 model, the model structure is modified to 19 layers, which is the same as Model (4).Two batch normalization layers are added at the same position, and the batch size is 64.The training information of the improved VGG-19 network model is shown in the Table 1 .

Fig. 5 .
Classification confusion matrix 4.2.Performance comparison of road damage recognition network models According to the classification confusion matrix in Fig. 6, the values of TP, FN, TN and FP are calculated.TP, FN, TN and FP are all related to the recognition ability of the model to road damage images.The evaluation indexes were calculated by combining the Eqs.(6-10) in Section 3.1 with the values of TP, FN, TN and FP.The value of accuracy represents the recognition accuracy of the model for all types.However, other performance evaluation indicators represent the performance values of the model for a certain type of recognition.Therefore, this study corresponds other performance evaluation indicators to road crack identification [30].In the performance evaluation of the model, the values of TP, FN, TN and FP are the first-level indicators, and the values of Accuracy, Sensitivity, Specificity and Precision calculated by them are the second-level indicators.On the basis of these four second-level indicators, a third-level indicator, F1 score, will be generated.The comparison of each evaluation index is shown in Fig. 6.

Fig. 6 .
Fig. 6.Comprehensive comparison chart of neural network model performance

Table 1 .
Improved VGG-19 network model training information Model serial number Training time Accuracy of validation

Table 2 .
Improved VGG-19 network model test information Model serial number Average test time Accuracy