Supplementation of synthetic object replicas for increasing precision of microrobot trajectory keypoints

. Artificial neural networks are becoming more popular with the development of artificial intelligence. These networks require large amounts of data to function effectively, especially in the field of computer vision. The quality of an object detector is primarily determined by its architecture, but the quality of the data it uses is also important. In this study, we explore the use of novel data set enhancement technique to improve the performance of the YOLOv5 object detector. Overall, we investigate three methods: first, a novel approach using synthetic object replicas to augment the existing real data set without changing the size of the data set; second - rotation augmentation data set propagating technique and their symbiosis, third, only one required class is supplemented. The solution proposed in this article improves the data set with a help of supplementation and augmentation. Lower the influence of the imbalanced data sets by data supplementation with synthetic yeast cell replicas. We also determine the average supplementation values for the data set to determine how many percent of the data set is most effective for the supplementation.


Introduction
Computer vision in micro-robotics [1] is a challenging task as images and video streams in such settings are usually low resolution and may prove difficult to be used for identifying objects of various classes [2], [3]. Furthermore, the gathering of real yeast cell images can be a difficult and very time-consuming task as it includes the preparation of yeast solution and manual acquisition of data in a controlled environment [4]. Therefore, we explore if a combination of raw and supplemented synthetic data can improve the precision of yeast cell detection and ease the data preparation process.
With the recent progress in the field of artificial intelligence [5], new perspectives have arisen in numerous directions. One of the most studied and applied topics is connected to computer vision-related problems for example in industrial robot control [6], [7]. Even though the precision of AI-based computer vision solutions has been remarkably improved in the last decade [8], [9], several drawbacks can be observed when the training data is hard to obtain. One of the problems, in this case, is connected to imbalanced classes in the training data sets. Such situations can arise in many real-world problems, for example, in disease diagnostics as the cases of the disease usually are rare compared to healthy samples [10]. Most machine learning algorithms are designed to maximize accuracy when the number of samples in each class is about equal. Therefore, typically the imbalance in classes leads to comparatively high accuracy in predicting the majority class but fails to accurately detect the minority class [11]. Currently, this challenge has been tackled by oversampling and different learning techniques [12], generative adversarial networks [13], [1], or fully synthetically generated datasets [15], [16]. In this article, we explore how class supplementation with synthetic object replicas can increase the precision of object detection tasks. In essence, the problem we are dealing with in this article is connected to these low-ranking classes that are poorly detected by the object detector or not detected at all. The aim is to increase the detection precision of low-performing classes by supplementing synthetic replicas into existing images. Moreover, the research is supplemented by analyzing how the augmentation of synthetic yeast cells can contribute to increasing the precision of this task. In addition, the detection precision is directly connected to the precision of a micro-robot trajectory, where the key points in the trajectory e.g., manipulation position, approach and retreat positions directly depend on the results of the yeast cell detection task.

Object replica supplementation
As many machine learning algorithms are designed to maximize overall accuracy, the minority classes tend to achieve remarkably lower results. In this research, the number of underrepresented objects in training data sets is artificially increased to improve the accuracy of the object detector. The data sets are also additionally augmented to further increase the accuracy. These techniques are the basis of this article and complement each other in an accuracy improvement.

Data generation and augmentation
For image generation, the free and open-source 3D creation software Blender is used. It supports the entirety of the 3D pipeline -modelling, rigging, animation, simulation, rendering, compositing and motion tracking, video editing and 2D animation pipeline. The whole generation process can be divided into three phases-creating preforms, alteration of said preforms and blurring. All phases of the process are visualized in Fig. 2 using yeast cell images as reference.
In the first phase, visible in Fig. 2(a), the preforms are created similar in size and proportion to the real objects seen in real-world data. They are created consisting of smaller points in their outline so they could be easily altered in the following phase.
The second phase, Fig. 2(b), is the alteration of the preform. This is done to make the data appear more realistic. The object is altered by many different manipulations in an automated process, but to a degree, so that the object would still appear as close to real data as possible.
The third phase, Fig. 2(c), consists of blurring the object and placing it on the real image. The blurring is done to completely blend the two making it appear as real as possible. This can be done in many ways, for example, using algorithms or image editing software. The blurring technique is dependent on the specific image generated, so for better results, an object detector that focuses more on the shape and relationship of the objects rather than specific colours is to be used. Round objects have an advantage over those of different shapes in that when augmented by rotation, it can be easily done within their bounding box. In this way, the original size and boundary stay the same. The process behind the augmentation of the data set used in this article is illustrated in Fig. 3.

Supplementation
The process of supplementing synthetic object replicas into real-world images can be accomplished through various approaches, including the three-dimensional gravitational method, manual insertion, and fully automated supplementation. This paper focuses on the three-dimensional gravitational method, which involves several steps carried out in Blender. The first step in this method is importing real, labelled images into Blender, as shown in Fig. 4(a, d). The labelled objects in these images are then automatically converted into passive 3D objects, either in cube or sphere form, as visible in Fig. 4(b, e, c, f). This conversion ensures that the synthetic objects don't overlap with the labelled objects, and it also adds a factor of randomness to the process. The next step involves adding the synthetic data, which is generated according to the approach described in Section 2.2.1. A three-dimensional gravitational method is used, synthetic objects are spawned above the image and dropped using gravity simulation. The physics engine in Blender calculates the motion of these objects under the influence of gravity, taking into account their mass and the force of gravity. The previously extruded, labelled objects in the real image ensure that the synthetic ones don't overlap with them, and the physics simulation adds a factor of randomness to the supplementation process. Blender also allows for the customization of the physics simulation, including adjusting the strength of gravity, the friction of surfaces, and the collision properties of objects. This enables users to fine-tune the physics simulation to achieve their desired results. In the final step, the generated data is acquired as an image for further use. The imported background image is deleted, leaving only transparent synthetic data. These images are then blurred and placed on their respective real images and saved. Fig. 4 illustrates the supplementation process using screenshots from Blender. Fig. 4(a) shows the original image from the microscope, while Fig. 4(b, c) and (e, f) show the tagged objects in the 3D environment with the boundary beams in cube and sphere form, respectively, stretched along the axis. Fig. 4(d, e, f) shows how the 3D environment is used to add object replicas into existing images, rotated by 20 degrees along the and axes.
While the three-dimensional gravitational method is effective, it is not the only approach for supplementing synthetic object replicas into real-world images. The manual method requires the user to insert each synthetic object replica manually, which is time-consuming. The fully automated method, on the other hand, is an algorithmic method that automatically inserts the generated synthetic object in free spaces.

Training setup
A wide variety of machine learning algorithms are currently available to detect objects in images. At the beginning of the study, YOLOv5 was one of the most popular real-time, single-stage object detection algorithms with the best AP scores and FPS trade-offs [17], which is why it was used for all the following experiments. YOLOv5 is a family of object detection architectures and models pretrained on the COCO data set. Training and evaluation were performed on Linux OS workstations with A100 GPUs.

Experimental setup
The experimental setup is connected to an important issue in automated equipment for living cell manipulation where the detection of living cells highly dictates the precision of the whole system. To perform a manipulation task the end-effector of the automated equipment or, in this case, a microrobot, must be positioned close to the cell, therefore exact location of the cell must be found, with a typical precision of micrometres. The living cells are mostly placed in growth mediums, which are transparent therefore increasing the difficulty of the object detection task. Furthermore, the gathering of real yeast cell images for training purposes can be a difficult and very time-consuming task. It includes the preparation of a yeast solution, which often requires more effort and competencies from the researchers than the research itself. Also, the data has to be manually acquired in a controlled environment, which can result in some classes being underrepresented, therefore resulting in an overall worse dataset. Yeast cell (Saccharomyces cerevisiae) is a very common and cheap live cell material, available in the market. In addition to that, yeast cells are very strong mechanically [18] and can survive in harsh conditions; they can hibernate in the dry state and revive in favourable ambient conditions. The size of these cells is about 5 micrometres, and they are round in shape, therefore our choice falls to these live single-cell organisms. For the training of the manipulation systems and trial of various manipulation methods yeast cells seem very suitable objects.

Data acquisition
An original design four-axis micro-robot with scanning electrochemical microscope possibilities was used for the experiments. It consists of the following core modules: mechanical manipulating system, optical system, motion controller, electrochemical signal reader, and main control unit. The main controller, which is realized on a PC, generally controls the system, runs the user interface and generates robot movement trajectory and measurement commands. They are used by the motion controller and signal reader. Lower-level controllers and devices perform the more specific tasks assigned to them. The hardware for this task was the original design of the manipulating system with the microscope. The mechanical system of the developed microscope has four degrees of freedom. It is based on the kinematic scheme of the typical orthogonal manipulator like a 3D printer or similar CNC machine. An optical microscope made from cast iron was used as a housing for the device to ensure high thermal stability and sufficient stiffness. During the redesign, new precisely controlled drives were installed, simultaneously maintaining their minimal position deviation ensured by the precisely machined fixing point on the microscope base. As a result, an orthogonal manipulation system with a movable table ( -axes) and two parallel -axes was developed. The first -axis is used to control the focal distance of the optical microscope. The second one moves the electrode or any sensor or gripper up and down. The optical microscope and the measuring electrode are mounted on separate parallel axes due to the need for asynchronous motion. The and -axis have a resolution of about 1 micrometre and control the movements of the table on which the test specimen is placed. -axes have a resolution of about 0.75 micrometres. The high accuracy and resolution of the drives are ensured by using micrometre's pitch ball-screw drives controlled by stepper motors operating at 1/256 micro-step mode and advanced control methods. Therefore, using commercially available modules, relatively low-cost components, design, and software solutions proven in other fields and an original control and data fusion algorithm, automated scanning of the microscopic size biological or technical objects was utilized in the data acquisition process.

Yeast cell classes
The primary class, the highest in priority to detect in this study is the individual class of the yeast cell (YC_individual). As a result of biological processes, other classes of these objects are derived from this class. The different derivations of these classes are listed in Fig. 5. In essence, every combination contains this individual yeast cell, however, we change how we address the remaining yeast cells. Overall, the different impacts on the detection results made by the various classes are investigated and compared with the results achieved using supplementation of underrepresented classes.

Tests and results
Three approaches were explored for this research. A supplementation of under-represented classes, a supplementation of under-represented classes with rotational augmentation, and a supplementation of classes of interest.

Data sets
Data sets are in different ratios, to experimentally evaluate if the ratio has an impact on the precision of the object detection. In the beginning, the training data sets consist of 70 raw images only. On every iteration, the percentage of supplemented images in the data set is increased by 10 % until the data set is 100 % supplemented images. The same gradual replacement is also applied to the validation data set. In total 200 raw images were used, 70 for training (unaltered, raw images), 70 were devoted to supplementation purposes, 30 for validation and 30 for testing. Fig. 6. Images for the under-represented classes data set The data set was supplemented with objects from 3 under-represented classes: YC_individual, YC_parent, and YC_child. The number of objects per image is two YC_individual objects, five YC_parent and five YC_child objects. Fig. 7(a) illustrates a raw image from the small-scale objects data set, Fig. 7(b) illustrates an image from the small-scale object's data set with supplemented objects, Fig. 7(c) illustrates a raw image from the large-scale object's data set and Fig. 7(d) illustrates an image from the large-scale object's data set with supplemented objects. Images were supplemented with objects representing the YC_individual class.

Evaluation metrics
The detection is primarily evaluated using mean Average Precision (mAP) parameters. mAP@50 is the mean average precision with an IOU (Intersection Over Union) of 0.5. mAP@50-95 is the mean average precision with an IOU of 0.5 with step 0.05 till 0.95. Precision = True Positive / (True Positives + False Positives). Recall = True Positives / (True Positives + False Negative). Metrics like mAP@50 and mAP@50-95 allow to quickly see improvements and give a complete picture of the effectiveness of data sets [19]. mAP@50 and mAP@50-95 are absolute metrics, while precision and recall are relative metrics. mAP@50 is the official VOC [20] metric and mAP@50-95 is the official COCO metric [21].

Results
To train and test the effects of supplementation, the training and validation data sets were also supplemented and augmented by rotation. In such a way as to verify and subtract a single set of data. In total 3 subsets were evaluated to study different aspects and respective results of the supplementation. For supplementation of classes of interest, two data sets of objects of various sizes (small-scale and large-scale objects data sets) were also studied, and YOLO models were trained and evaluated. Fig. 8 illustrates the achieved results of the data set supplementation with object replicas of underrepresented classes with the metric mAP@50. Supplementation without rotation augmentation is illustrated with dashed lines and data with 5-time rotation augmentation with continuous lines. Supplementation was done with a step of 10 %. Even though on average the precision increase is observed through all the classes when the rotational augmentation is applied, the highest precision increase can be observed for the under-represented classes. Fig. 9 illustrates the achieved results of the data set supplementation with synthetic object replicas of underrepresented classes such as YC_individual, YC_parent, and YC_child with the metric mAP@50-95. The validation dataset was also supplemented. The supplementation step was 20 %. All the variations when the validation data set is supplemented with synthetically generated yeast cells show higher precision ratings when compared to the baseline where no ROBOTIC SYSTEMS AND APPLICATIONS. JUNE 2023, VOLUME 3, ISSUE 1 training nor the validation data was supplemented.

Results supplementation of classes of interest
This subsection describes the achieved results of an approach where only objects in the class of interest are supplemented. The primary class, that is the highest priority to detect is the yeast cell individual, we called it YC_individual. There are also two other classes, YC_group, where multiple yeast cells are close together and YC_part for partially visible yeast cells. Supplementation is only done for the YC_individual class. The distribution of classes is shown in Fig. 5 and in Table 2.  Fig. 10 illustrates the achieved results of the experiment where real-world images were supplemented with synthetic yeast cell replicas, with the supplementation step of 10 %. Solid lines are for the results of the large-scale object dataset, however, the dashed lines are from the results of the small-scale object data set, thus the mean results are illustrated with grey lines. Overall, the improvement is visible from the first supplementation step. Notably, the large-scale dataset achieved a remarkably higher precision rating than the small-scale dataset.

Conclusions
Imbalanced data sets can lead to decreased detection precision in the minority classes. In realworld scenarios, such imbalance in data set classes is a well-known problem, and the possible solutions can vary across different use cases. Data sets that require primary detection of one of the classes are also characteristic. In the experiments performed in this article, we approach these problems by supplementing real data with synthetically generated object replicas.
For small-scale data sets, supplementation with synthetic object replicas is a difficult and timeconsuming task. Small-scale data sets require a careful, almost individual approach to supplementing each image with object replicas for good results. The supplemented images must be as realistic as possible for the best possible results. The smaller the objects are in the data sets, the greater the number of supplemented objects should be for better results. Supplementing a data set of large-scale objects with synthetic object replicas is a much more efficient task and the results are also more rewarding from a smaller amount of supplemented object replicas.
In the experimental setup, we test our approach on yeast cells. As it is rather difficult to obtain real yeast cell images, such imbalance in the data sets naturally arises, however, the precision of the detection of these cells should be equal among all the classes. In experiments with real-world data, the imbalance is observed and, thus, the detection precision for minority classes is rather poor. By supplementing the real-world data with additional synthetic objects from classes that originally have been underrepresented we observe an increase in the overall precision of object detection. The study found that supplementing the data with underrepresented classes and objects, as well as using rotational augmentation, improved the precision of the object detection task. The improvement was greater for the data set that included rotational augmentation. In cases where the data was not augmented and was small, increasing the validation data set with supplemented images had a significant impact on the improvement. However, when the data set was larger, the effect of supplementing the validation data set was less pronounced. Overall, the supplementation and augmentation techniques were effective for all data classes and data sets.
Also, precision improvements were achieved when only the class of interest was supplemented. With the supplementation step of 10 %, improvements were observed in each step of the supplementation. Adding synthetic data to real data requires testing to find the peak of precision, as adding more synthetic images can result in lower precision. Even though the performed tests have been described with the yeast cells, the proposed approach is object agnostic and can be extended to different use cases by respective image generation. Overall, in this study, we have achieved more precise object detection which serves for increased precision of microrobot trajectory key points. corresponding author on reasonable request.
Modris Laizans has been working at the Institute of Electronics and Computer Science since 2019. Graduated Riga Technical University (Riga, Latvia) in the year 2021 with a professional master's degree in smart electronic systems. Currently is a research assistant at the institute and Ph.D. student at RTU. The main research topics are computer vision and artificial neural networks. Janis Arents has been working at the Institute of Electronics and Computer Science since 2016. Graduated from Riga Technical University (Riga, Latvia) in the year 2018 with a professional master's degree in electrical engineering. Currently, is a researcher at the institute and PhD student at RTU. The main research topics relate to innovative digital technology usage in industrial applications, including computer vision and artificial intelligence-based systems for industrial process automation.
Oskars Vismanis master's student in the faculty of Computer Science and Information Technology at Riga Technical University, Riga, Latvia. Research assistant at the Institute of Electronics and Computer Science. His current research interests include robotics, multi-agent systems, and adaptive technology.