Indoor and outdoor multi-source 3D data fusion method for ancient buildings

J OURNAL OF M EASUREMENTS IN E NGINEERING Abstract. Ancient buildings carry important information, such as ancient politics, economy, culture, customs. However, with the course of time, ancient buildings are often damaged to different degrees, so the restoration of ancient buildings is of great importance from the historical point of view. There are three commonly used non-contact measurement methods, including UAV-based oblique photogrammetry, terrestrial laser scanning, and close-range photogrammetry. These methods can provide integrated three-dimensional surveys of open spaces, indoor and outdoor surfaces for ancient buildings. Theoretically, the combined use of the three measurement methods can provide 3D (three-dimensional) data support for the protection and repair of ancient buildings. However, data from the three methods need to be fused urgently, because if the image data is not used, it will lead to a lack of real and intuitive texture information, and if only image matching point clouds are used, their accuracy will be lower than that of terrestrial laser scanning point clouds, and it will also lead to a lack of digital expression for components with high indoor historical value of ancient buildings. Therefore, in this paper, a data fusion method is proposed to achieve multi-source and multi-scale 3D data fusion of indoor and outdoor surfaces. It takes the terrestrial laser point cloud as the core, and based on fine component texture features and building outline features, respectively, the ground close-range image matching point cloud and UAV oblique image matching point cloud are registered with the terrestrial laser point cloud. This method unifies the data from three measurements in the point cloud and realizes the high-precision fusion of these three data. Based on the indoor and outdoor 3D full-element point cloud formed by the proposed method, it will constitute a visual point cloud model in producing plans, elevations, sections, orthophotos, and other elements for the study of ancient buildings.


Introduction
The ancient architectural heritage represents an important part of our country's history and culture. The effective protection and rational reuse of the ancient architectural heritage are of far-reaching significance to the inheritance of the traditional culture of the Chinese nation and the promotion of cultural exchanges between different nationalities. With the rapid development of  The rest of the paper is organized as follows: Section 2 reviews the current application status and related work of UAVs, station-mounted 3D laser scanners, and SLR cameras in the protection of ancient buildings, and proposes the idea and method of multi-source data fusion in this paper; Section 3 focuses on the fusion method between the acquired laser point cloud and the image matching point clouds; Section 4 demonstrates the feasibility of the proposed method through a case study; and finally, Section 5 summarizes the perspectives for future works, improvements, and tests.

History of issue
Chen K. et al. [9] and Sun B. [10] proposed a 3D digital reconstruction method that automatically fused aerial photographs and ground photographs, respectively in protecting the archway of Yueling ancient village in Guilin and the Jingjiang Mausoleum in Guilin, Guangxi. In his research of Mufu in the ancient city of Lijiang, Wang R. [11] combined 3D laser scanning technology, Building Information Management (BIM), and oblique photography to find the damage of the ancient buildings in Mufu. Then based on the obtained point cloud data and BIM reference model, the author simulated preventive restoration of those buildings. Song L. [12] analyzed the BIM, terrestrial laser scanning, total station and other technologies, and proposed that it should be applied to the field of ancient building protection, which can provide safe and reliable assistance for ancient building data mapping and data retention. Yin H. et al. [13] integrated multi-source heterogeneous point clouds by integrating and applying diversified acquisition technologies, using multi-view global optimization registration, Bundle Adjustment (BA) algorithm and other algorithms as well as a variety of professional software to contribute to the preservation of ancient building information. Meng Q. et al. [14] used a combination of aerial oblique photography and 3D laser scanning technology to measure blind areas such as roofs, and obtained complete building point cloud data. Sun B. et al. [15] proposed another 3D model reconstruction method of multi-data complementary fusion of open space that joined 3D laser scanning technology in protecting the archway of "Xiao Yi Ke Feng" in the Yueling village. Lu C. [16] proposed a 3D virtual reconstruction algorithm for the Qing Dynasty ancient architecture based on the image sequence fusion according to the improved 2DPCA-SIFT feature matching algorithm. The main idea of the paper [17] is to apply multi-sensor data fusion techniques to produce more accurate height information and combine OSM-Derived building footprints for urban 3D reconstruction. Those data sources can be categorized as digital elevation models derived from optical or SAR imagery as well as point clouds reconstructed from SAR-optical image pairs through stereogrammetry. The paper [18] proposed an idea to analyse our legacy through time: 4D reconstruction and visualization of cultural heritage. For this aim, the different available metric data sources are systemized and evaluated in terms of their suitability. The paper [19] presented an overview of 3D building façade reconstruction; it focused on highlighting the current research on data and key technologies used to enrich building façades, especially the methods used for façade parsing and building-opening detection. It is a feasible way to minimize the obscuring of façade elements by static and dynamic elements of the urban environment using data fusion from multi-modal sources and different platforms. To sum up the existing methods, the current papers do not contain the detailed images as well as real and intuitive texture data. They have only an image matching point cloud, which is less accurate than the terrestrial laser scanning point cloud, resulting in the lack of digital expression for components with the higher indoor historical value of ancient buildings. Therefore, this paper proposes a method that takes the terrestrial laser point cloud as the core, and based on the fine component texture features and building outline features, respectively, the ground close-range image matching point cloud and UAV oblique photography image matching point cloud are registered with the terrestrial laser point cloud, so as to achieve the purpose of indoor and outdoor multi-source multi-scale 3D data fusion.
Multi-source data fusion method consists in fusing data obtained from multiple sources of the same target or scene and processing them as per the selected rules to obtain more accurate, complete, and effective information, which can be used to synthesize images of new spatial and temporal characteristics and spectral features to achieve a comprehensive description of the target or scene [11]. In terms of spatial fusion, the multi-source heterogeneous data is unified into the same coordinate system to achieve the registration by different data (images and point clouds); in terms of information fusion, it combines the features of integrity, reliability, accuracy, etc. to generate information-rich fused digital point cloud models.
(1) From the perspective of data complementarity for the outer building surface and point cloud holes above a certain height, the oblique photography from drones can complete the operation from a height larger than that of the building, then a missing part of the terrestrial laser point cloud can be compensated with the help of oblique images.
(2) From the perspective of data complementarity for the inner building surface and historical components with special significance, exquisite structure, and rich value, close-range images can be obtained through ground close-range photogrammetry to describe individual objects in detail.
In this paper, through a comprehensive comparison and analysis of three different data characteristics of the terrestrial laser point cloud, oblique image, and ground close-range image, the following fusion process is designed and proposed, as shown in Fig. 2.

Registration and fusion of laser point cloud and image matching point cloud
In the research on the registration of terrestrial laser point cloud and image matching point cloud based on feature primitives (two data densities are quite different), the point feature primitives existing in the terrestrial laser point cloud may not necessarily match those existing in the image matching point cloud, and sparse or dense processing often leads to a loss and omission of point cloud feature information [20]. As for the surface primitives, the stable plane point clouds in the cross-source point cloud cover mostly the façade and the ground. They can hardly satisfy the requirement to create complete wall façade due to the availability of an overlapping area of cross-source point clouds. While the ground point cloud features are scarce, so it is difficult to use them as registration primitives alone [21]. As for the fusion perspective of multiple primitives, on the one hand, different sources of point cloud data induce differences in understanding the semantic information contained in the same target, on the other hand, the fusion and extraction of multiple features often complicate the algorithm [3]. At the same time, conventional buildings have regular structures and remarkable features. There is a lot of reliable information on buildings that meet the requirements of specificity and stability, such as building walls, roofs, appendages, and other contour features.

Image matching point cloud generation
After the close-range images are obtained by ground close-range photogrammetry, the 2D image data is used to generate a dense 3D point cloud with color information [22]- [23]. Firstly, the high-efficiency feature algorithm (Scale-Invariant Feature Transform, SIFT) is used to extract and match feature points on the acquired image information. Secondly, in order to improve the accuracy and stability of feature matching points, the outlier and mismatched points are eliminated by the random sampling consensus (RANSAC iterative algorithm) [24], i.e. by the 8-point method. And then to estimate the camera pose and restore the scene geometry and other information based on the Structure from Motion (SfM) algorithm [25], the BA algorithm [26] is applied for nonlinearity least squares. It optimizes camera parameters and 3D points to reduce or eliminate error accumulation and improve the robustness of the solution, and, finally, to determine the 3D space points for obtaining dense 3D point clouds. The overall cost function of the BA algorithm is as follows: In the above formula, (T, ) is the projection error, is the Lie group corresponding to the camera pose, is the sum of feature point coordinates, is the number of trajectories, is the number of viewing angles, is 1 when the camera observes the trajectory , and 0 when no trajectory is observed.
is the data generated by observing the feature point at the camera pose . ℎ , is the observed model, and − ℎ , is the observed error. SfM algorithm is a batch method of state estimation, also known as sparse reconstruction, which uniformly processes the data observed for a period of time, and uses the observation data multiple times to estimate the camera pose and feature point coordinates at multiple times. At present, there are many kinds of open source or commercial software based on the SfM algorithm, such as ContextCapture, Pix4D, MicMac, Agisoft Metashape, etc. The specific workflow is as follows: (1) Import images. Before importing close-range photography data, it is required to check the quality of the photos to avoid problems such as exposure, distortion, and blur that affect the accuracy.
(2) Align the photo. After importing the images, it is required to align the photos by obtaining the feature points and performing feature matching, as well as restoring the pose of the camera to obtain sparse point cloud data. The higher the selection accuracy is, the longer the corresponding processing time will be.
(3) Generate a dense point cloud. The dense point cloud data is generated through the multi-view stereo matching algorithm and by determining the optimal camera pose. At the same time, the mesh can be generated according to the generated dense point cloud data, and textures can be added to obtain a 3D model.
(4) Export the data. After data processing, the generated dense point cloud data can be exported in txt, ply, las, and other supporting formats.

Terrestrial laser point cloud and close-range image matching point cloud registration based on texture features
Because the fine components of ancient buildings have clear and rich texture features, it is relatively easy to select feature point pairs. For the registration of laser point cloud and close-range image matching point cloud, a variety of current open-source software offer the possibility of implementation, such as CloudCompare [27], a 3D point cloud processing software based on the General Public License (GPL) open-source protocol. The main principle of alignment is to determine the target and source point clouds, select at least three pairs of feature points in the corresponding point clouds with typical texture features for coarse alignment, obtain the initial transformation parameters of the two 3D point sets, and apply the Iterative Closest Point (ICP) algorithm [28] to solve the optimal transformation matrix for further optimization to achieve fine registration.
The registration based on point-line feature primitives is mainly intended to calculate the transformation of different point cloud space coordinate systems. The movement between the two coordinate systems consists of a rotation and a translation. This movement is called rigid body transformation. The laser point cloud data is used as the target point cloud ( = 1, ⋯ , ), and the dense point cloud is generated based on close-range images as the source point cloud ( = 1, ⋯ , ). The rigid body change of the source point cloud is as follows: In the above formula, the relationship between the coordinate transformations of the source point cloud is described by the rotation matrix and the translation vector , so that the new source point cloud obtained from the source point cloud through transformation is the closest to the target point cloud , that is, the number of maximized points which the nearest distance between the two point sets is less than , this criterion is called Largest Common Points (LCP) [29].
The LCP transforms the point cloud registration for computing the optimal rotation * and translation * with minimization, so that the following formulas are satisfied: In the above formula, and denote the corresponding point pairs in the source and target point clouds, is the number of corresponding point pairs, is the weight of the corresponding point pairs, and its specific solution process is as follows: (1) Find the point pairs , from the target point cloud and the source point cloud. (2) Compute the weighted centroids of both point sets: (3) Decentralize the point pairs , found in the target point cloud and the source point cloud: (4) Compute the 3 × 3 covariance matrix: (7) Compute the optimal translation as: * = − * .

Terrestrial laser point cloud and oblique image matching point cloud registration based on outline features
For the registration of terrestrial laser point cloud and oblique image matching point cloud, based on the paper [31], where the registration is realized within the building roof contour as the line feature constraint, and this paper improves the roof contour boundary estimation algorithm and introduces the point-to-plane ICP algorithm [32]. This is a new strategy for removing mismatched points to improve the registration efficiency and accuracy. After obtaining the data support of the oblique image matching point cloud and the terrestrial laser point cloud, the main registration technology route is shown in Fig. 3.

Building outline primitive extraction
Based on the clear oblique images collected by the UAV, the dense point cloud is obtained by completing the image dense matching. After preprocessing with several steps like stitching, denoising, etc. for the original image matching point cloud, the Cloth Simulation Filter (CSF) algorithm [33] is applied to separate ground points and non-ground points. The Conditional Euclidean Clustering [34] algorithm in the Point Cloud Library (PCL) is used to remove vegetation point clouds in non-ground points based on the number of point cloud clusters. After obtaining the point cloud of the main building, the conventional roof point cloud segmentation method is achieved by removing the wall point cloud, but when the image matching point cloud building façade is missing or severely deformed, this method is not applicable. The statistical characteristics of the roof location usually show that the point cloud value is larger than other locations, so the point cloud segmentation of the ancient building roof can be achieved by setting the height limit. Initially, the least square fitting algorithm [35] is used to compute the plane equation of the location of the building ground point cloud. After that, the height from the roof to the ground is manually measured several times, and the relationship between the mean height and the distance is calculated and compared from the point to the plane equation, and then the roof segmentation is completed. Since the approximate plane is not completely close to the ground, the average height is reduced by 0.5 meters and compared with d. The specific process is shown in Fig. 4. (1) Remove the duplicate points in the point cloud of the 2D roof plane, establish a tree for the point cloud after deduplication, and traverse the point with the smallest value in the search point set. Take the point P1 in the above figure as an example, where the value of point P1 is , and P1 is the starting point (red point) and current point of boundary estimation ( Fig. 5(a)). (2) After finding the current point, perform a neighborhood search for the current point, and search for the nearest points. When = 3, the nearest neighbor points are P2, P3, and P4. These three nearest neighbor points are used as candidate points for the next boundary point search, and the current point is connected with all candidate points. If the left direction of the -axis is taken as the positive direction of the coordinate axis, point P2 is the point with the largest direction angle in the positive direction among all the points connected to P1 (Fig. 5(b)). At this point, point P2 is set as the next starting point and current point for boundary estimation.
(3) Connect the two boundary points P1 and P2 with a line (black solid line), perform a neighborhood search for a new current point, and search for the nearest points. When = 3, the nearest neighbor points are P3, P4, and P5, and the three nearest neighbor points are used as candidate points for the next boundary point search. They all are connected with the current point to find the point where the connecting line and the previous boundary line of segment P1-P2 form the largest direction angle, which is set as the next starting point and current point for the boundary estimation.
(4) Iterative step (3) and traverse are applied to check all points until returning to the starting point P1, where the boundary estimation ends, and all boundary points are obtained (Fig. 5(c)).

Coarse registration based on building contours
In the outline point cloud for a building roof, the transformation matrix is obtained to complete the coarse registration through the centroid constraint and Principal Component Analysis (PCA) [36]. Firstly, for the extracted image matching point cloud building contours and terrestrial laser point cloud building contours, the center-of-mass constraints are established to find the center-ofmass difference, and the translation parameters are determined in the coarse registration transformation matrix . The usual formula for calculating the center-of-mass coordinates of an object is as follows: Among them, = ( , , ), = 1,2,3 ⋯ are the coordinates of each mass point, is the mass corresponding to the mass point. To calculate the point cloud center of mass, let the above equation = 1, and the formula for calculating the center-of-mass coordinates of the point cloud can be rewritten as Eq. (11): According to the above equation, the center-of-mass coordinates of the source and target point clouds are calculated separately, and the translation matrix required for coarse registration can be determined by finding the difference in the center-of-mass coordinates. Then the PCA algorithm is used to downscale the building contour point cloud and construct the local coordinate system. As shown in Fig. 6.
(1) For the nearest neighbor of a point in the point cloud = ( , , ⋯ ), the corresponding covariance matrix can be calculated by Eq. (12): where, is the number of neighboring points, is the center of mass of the point cloud found in the previous section.
(2) According to Eq. (13), the three largest eigenvalues , , and the eigenvectors , , are obtained: (3) The eigenvectors corresponding to the three eigenvalues are orthogonal to each other, and the eigenvector is in the same direction as the normal vector of a plane of the point cloud , and the plane formed by the eigenvectors and is orthogonal to the eigenvector , the three eigenvectors , , corresponding to these three eigenvalues can ideally characterize the three axes , , in the local coordinate system of the point cloud to achieve the purpose of data dimensionality reduction and local coordinate system construction. And finally, the vectors of the source and target point clouds representing the corresponding coordinate axes into the Rodriguez formula [37] are substituted to calculate the rotation matrix.

Point-to-plane ICP algorithm for fine registration
After the coarse registration, the previous method directly uses the ICP algorithm to achieve the fine registration, because of the differences ignored in the density and acquisition perspective of the two cross-source point clouds [38]- [40], it often leads to a local optimum in registration, and the use of point-to-plane ICP algorithm [32] can improve this problem. By changing the original rule of the minimum distance between points, the minimum distance from the point to the tangent plane of the corresponding point is used as the object of iterative optimization. The core diagram of the algorithm is shown in Fig. 7. Due to the influence of gimbal jitter, camera shooting angle, and error of dense matching algorithm in the image data acquisition stage, the point cloud of image matching generated has a horizontal point accuracy that is 1/3 higher than the vertical point accuracy [41]. Therefore, the ground point cloud separated in the ground filtering and the main point cloud of the building are introduced to ensure the fine registration. First, the main point cloud of the building is restored to the ground point cloud, and then the coarse registration transformation matrix is used to transform the fused point cloud. After making the two cross-source point clouds in a roughly aligned state, the point-to-plane ICP algorithm is used to complete the fine registration.
In Fig. 7, the red line indicates the source point cloud, the green line indicates the target point cloud, is the point on the target point cloud, is the point on the source point cloud, indicates the tangent plane distance from the point on the source point cloud to the corresponding point on the target point cloud. For the point-to-plane ICP algorithm, the error function is designed by minimizing the sum of squared point-to-plane distances as in Eq. (14): where, is the normal vector of the point , and is the (4×4) transformation matrix consisting of the rotation matrix and translation matrix . The above registration methods solve the problems of different geometric reference frames, inconsistent scales, and inconsistent density of point clouds, so that the registered data has high-precision three-dimensional coordinate information, diverse perspective and scale features, and rich spectral information.

Data collection and preprocessing
In this paper, the data about ancient building restoration were collected. These data mainly concern a temple, which was taken as the core survey object, and has the total construction area of 658.29 square meters. In order to understand fully the surrounding environment of the scenic spot and the current preservation status of the temple's roof, firstly, the whole-range oblique images of the scenic spot were obtained through aerial photography by UAV ( Fig. 8(a)); then, the high-overlapped clear ground close-range photos of the temple pillar base were captured by a digital camera (Fig. 8(b)); Finally, the indoor and outdoor building structures of the temple were scanned by a terrestrial laser scanner to obtain a high-precision terrestrial laser point cloud (Fig. 8(c)).  Table 2. In case of using the ContextCapture, there are some steps to generate the dense image matching point cloud from UAV oblique images, such as aerial triangulation, calculation of external orientation elements, generation of sparse and dense point clouds, and so on, as shown in the following Fig. 9(a). Canon EOS 70D has been applied as the close-range image data acquisition device in this paper. There are two main methods of shooting close-range images: upright photography and cross-direction photography [42]. The objects of the ground close-range image are mainly the pillar bases in the temple, which are small, but the surface of each pillar base is engraved with different carvings, and the texture is complex and exquisite, so the cross-direction photography is taken. After filtering and preprocessing raw images, the authors use Agisoft Metashape to generate the dense point cloud of the pillar base in the temple according to the ground close-range images, as shown in Fig. 9(b) above. The basic information about the camera for ground close-range photogrammetry for the pillar bases is presented in Table 3. Faro Focus 3D X130 has been applied as the terrestrial laser point cloud data (Fig. 9(c)) acquisition equipment. When scanning a certain volume of buildings, it is necessary to set up stations for multiple scans. After scanning the terrestrial laser point cloud, the point cloud stitching of multiple stations is completed first. Then the noise points and outlier points (such as walking tourists, distant vegetation, etc.) contained in the point cloud shall be processed manually and with Statistical Outlier Removal (SOR) algorithm to complete the filtering operation. The basic information for terrestrial laser point cloud data acquisition and pre-processing operations is shown in Table 4, where the SOR parameter is the number of points to use for mean distance estimation, is the standard deviation multiplier threshold.

Data fusion
The dense point cloud data of the pillar base in the temple are obtained from a close-range image with higher texture details. Then it is matched with the terrestrial laser point cloud using the registration method based on the above-mentioned point-line texture primitives. The experiment adopts Align and ICP methods in CloudCompare with the following steps: 1) load the laser point cloud and the close-range image matching point cloud, 2) select at least three pairs of feature points in the target point cloud and the source point cloud, and 3) perform two types of point clouds coarse and fine registration. Thereby the registered point cloud is intended to supplement the lack of detailed texture of the terrestrial laser point cloud (Fig. 10).  Fig. 11. Roof boundary and local coordinate system of point cloud As for the fusion of the oblique image matching point cloud and the terrestrial laser point cloud, after ground and vegetation filtering, the building contour primitive extraction and registration of the two cross-source point clouds are completed according to the above-mentioned technical process. Firstly, based on the respective roof contour boundaries, the covariance matrix, eigenvalues, and corresponding eigenvectors of the two point clouds are calculated separately according to the PCA algorithm, which in turn determines the respective local coordinate systems of the two point clouds (Fig. 11).
Secondly, the difference in the center of mass of the two point clouds is calculated to determine the translation parameters. Then the vectors representing the coordinate axes into the Rodriguez formula [37] are substituted to calculate the rotation matrix and complete the roof boundary coarse registration, as shown in Fig. 12. In this Figure, the blue line indicates the image matching point cloud after rotation and translation, the red one indicates the terrestrial laser point cloud, and the two contour point clouds are roughly aligned after registration to provide a good initial position for fine registration. Before the fine registration, the following steps are provided: 1) restore the main point cloud of the building to the ground point (see Fig. 13), 2) transform the fused point cloud by the coarse registration transformation matrix, so that the two cross-source point clouds are roughly aligned. The visualization effect of fine registration by the point-to-plane ICP algorithm after 500 iterations of computation is shown in Fig. 14

Accuracy analysis for close-range image matching point cloud and terrestrial laser point cloud
In order to explore the influence of different parameter settings on the point clouds accuracy and calculation results during the use of commercial software, the parameters are set according to the four levels of highest, high, medium, and low respectively when the ground close-range images facing the inner pillar bases of the temple generate a dense point cloud. The low point cloud model results are used as the benchmark for comparison. For 48 close-range images of a certain pillar base, the differences among the photo alignment time, dense point cloud generation time, total time, and Root Mean Square (RMS) re-projection error involved in different parameter settings are shown in Table 5  As described in Table 5, when the model parameters are increased by one level, the photo alignment time and the dense point cloud generation time almost double, but the RMS re-projection error of the result point cloud continues to shrink, and the point cloud model accuracy continues to be improved. A comparison of the accuracy of a pillar base local model with four parameter settings and the registration are respectively shown in Fig. 15. Because the inner pillar base of the temple occupies a small area and is rich in texture details, the number of close-range photos that need to be taken is less than the number of oblique images. Therefore, when dense point clouds are generated under the hardware equipment conditions allowed, highest model parameter settings can be used to ensure the accuracy of their detailed texture features.
Aiming at the analysis of the registration accuracy between the close-range image matching point cloud and the terrestrial laser point cloud, the Align and ICP methods are used in CloudCompare to complete the registration. The root mean square error (RMSE) of the Euclidean distance between the nearest neighbors of the source and target point clouds is chosen as the evaluation index of the registration accuracy. This index is calculated according to Eq. (16)(17), in which the best RMSE of the dense point cloud with highest parameter settings and the laser point cloud is 0.216495 cm after fine registration: where is total number of all corresponding points after registration, is the Euclidean distance between the th nearest neighbor point in the target point cloud and the corresponding point

Accuracy analysis for oblique image matching point cloud and terrestrial laser point cloud
For an analysis of the registration accuracy between the oblique image matching point cloud and the terrestrial laser point cloud, a total of 6 points on the building and ground are selected, as shown in Fig. 16. The different ratios are set to remove the mismatched points after fine registration. Compared with 0 % (no removal), when the removal ratio of the mismatched points is set to 15 %, the registration error is the smallest (Table 6). Meanwhile, the registration accuracy of the ground point cloud at all three locations is higher than that of the building point cloud, which is closely related to the horizontal and vertical point accuracy of image matching point clouds and the principle of point-to-plane ICP algorithm. The average value of the optimal registration error at the three ground locations (Table 6) is 2.1964 cm. Finally, the average improvement in registration accuracy after removing the mismatched points according to the optimal ratio is 20.2 % (Table 7). In order to verify the feasibility of the method mentioned in subsection 3.3 of this paper, we added Guangzhou ancient building data with fewer ground points for roof contour feature extraction and registration accuracy analysis, as shown in Fig. 17. Then we compared the results of two datasets with the manual registration in CloudCompare ( Table 8). The RMSEs obtained in CloudCompare are respectively 3.6438 cm and 12.6334 cm, while the RMSEs obtained by using the registration method from this paper to remove the mismatched points are respectively 2.2016 cm and 10.0381 cm. In comparison with CloudCompare, the accuracy of registration of the two datasets based on our method is improved.

Discussion
The experiments show that the proposed method is effective, but there are still several aspects to discuss: -Data collection. Because the buildings in temple dataset are located in mountainous areas and there are many occlusions (for example: tall trees) around them, the shape of the point clouds at the above-occluded positions does not match the actual shape of the buildings based on the image matching point cloud generated by the existing oblique images. These missing situations are also actual for high-rise buildings, such as the school mentioned above. Therefore, with due course of data accumulation, it will be possible to consider the involvement of drones to low-altitude flights for supplementary shots at such locations. Then supplementary shots are added to oblique images to participate in intensive image matching. Thus, the method works can be verified by additional experiments.
-Data processing. 1. Feature primitive extraction: The building outline features in this paper mainly include the roofs of ancient buildings in the coarse registration stage. Although they can be used as feature primitives to achieve the coarse registration for the considered data sets, but many different styles of buildings may be used in the same dataset. At present, this paper has not carried out research on the method of registration based on other feature primitives, and the rigor of the classification and extraction of feature primitives needs to be improved.
2. Point cloud filtering: In the face of redundant point clouds that cannot be removed by ground and vegetation filtering, such as flower beds, independent appendages outside buildings, and bulky vegetation, it is still necessary to select and delete them manually. This reduces the degree of the method automation to a certain extent. In the future, it will be required to consider introducing a more refined point cloud segmentation algorithm to improve the automation of the registration method used in this paper.
3. Data processing volume: Before the extraction of feature primitives and point cloud filtering, in order to improve computing efficiency, point cloud sparse operation is often performed on the point cloud dataset to be processed. Therefore, while some point clouds are deleted, to a certain extent, some characteristics of the point clouds eliminated in the thinning process are inevitably ignored. Of course, this is also the problem to be solved for most of the current point cloud data processing methods. The common solution of the existing methods is as follows: improving computer hardware performance.
Regarding the registration accuracy of the registration method used in this paper, it depends to a certain extent on the robustness of the point-to-plane ICP algorithm. When looking for the tangent plane corresponding to the nearest point, if the point cloud is too sparse, it may lead to an increase of mismatched point pairs. The specific relationship between cross-source point cloud density difference and registration accuracy can be studied in the follow-up, and the precise registration algorithm can be improved.

Conclusions
This paper mainly takes a temple in a scenic spot in the Shandong Province as an example, combines terrestrial laser scanning, UAV oblique photography, and close-range photography to collect multi-source and multi-scale 3D data of the ancient building. Moreover, it realizes the fusion of multi-source and multi-scale 3D data of the air-ground integration through the registration of the close-range photography point cloud, oblique photography point cloud and terrestrial laser point cloud according to the building roof profile and point-line features. Considering the situation that the building roof point cloud of the terrestrial laser point cloud is missing, and the image matching point cloud of building façade is missing or unrealistic, the authors created a point cloud model as per this paper method. It overcomes to a certain degree the point cloud hole of a single type of point cloud, and makes the building façade and roof information complete. At the same time, the image matching point cloud assigns its own RGB information to the laser point cloud, and the color point cloud formed by the two point sets enriches the attributes of the point cloud model. The collection and fusion of multi-source data can improve the accuracy and speed of point cloud acquisition of ancient buildings, and provide the basic point cloud data for post-production of three-dimensional models of ancient buildings, drawing plans, etc., so as to improve the digitization of ancient buildings and the protection of historical buildings.
Based on the existing research results, in order to improve the registration accuracy and efficiency of the method applied in this paper, the perspective study will consider: 1. Dense matching of multi-angle oblique images. Specifically explore whether images from manual low-altitude aerial photography, which are jointly involved in the dense matching process of aerial imagery, can complete the match and increase the number of point clouds in the obscured area. 2. Selection and comparison of feature primitives. From the perspective of surface primitives or multi-primitive fusion, mining various types of potential registration primitives, such as doors, balconies, etc., can be considered so that the registration method can be applied to scenes that are more complex. 3. Registration accuracy. The precise registration of the method used in this paper needs to be carried out under the condition of containing a large area of the plane point cloud (for example: ground point cloud), and the datasets containing a large range of low-rise buildings can be introduced to verify the method feasibility in future.
Xiaoyu Zhao received a bachelor's degree in Geographic Information Science from Taiyuan University of Technology, Shanxi, China, in 2020. From 2021 until now, she is a Master student in Beijing University of Civil Engineering and Architecture, Beijing, China. Her research interests include 3D Point Cloud registration and point cloud processing. In this paper she was responsible for data processing.
Haocheng Zhang received a bachelor's degree in GIS from Beijing University of Civil Engineering and Architecture, Beijing, China, in 2021. From 2021 until now, he is a Master student in Beijing University of Civil Engineering and Architecture, Beijing, China. His research interests include Deep Learning and 3D modeling. In this paper he was responsible for data collecting and pre-processing.
Xiaohang Zhou received a bachelor's degree in spatial information and digital technology from Henan University of Technology, Zhengzhou, China, in 2020. From 2021 until now, she is a Master student in Beijing University of Civil Engineering and Architecture, Beijing, China. Her research interests include point cloud registration and slope deformation monitoring. In this paper she was responsible for data processing.