Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics December 5-8, 2017, Macau SAR, China
Study on Visual Navigation System for Intelligent Robot in the Unstructured Environment Qimeng Tan1, Qing-qing Wei1, Ruqi Ma1, Lei Chen1, Yaobing Wang 1
Yimin Lin 2
Beijing Key Laboratory of Intelligent Space Robotic System Technology and Applications Beijing Institute of Spacecraft System Engineering CAST Beijing 100094, China
[email protected] Institute of Optical Communication & Optoelectronics Beijing University of Posts & Telecommunications, Beijing 100876, China
Abstract—Autonomous visual navigation system is of significance for intelligent robot in the unstructured environment as it is capable to complete a series of missions i.e. capturing, processing, analyzing, understanding and decision without any priori knowledge in advance. A framework of the visual navigation system is proposed for intelligent robot under the unstructured environment in this article, which is composed of systematic composition, workflow and key algorithms such as a matching algorithm with adaptive weighted filtering computed by a hierarchical clustering approach, a local optimization algorithm for object detection based on random ferns classifier and Hough voting, an online learning algorithm for object tracking based on random ferns classifier and clustering segmentation and an algorithm for obstacle avoidance based on the calculation of maximum inscribed circle with the center of regional centroid. Finally, the validation of all the presented algorithms has been demonstrated and verified by a practical experiment for delivering the specified object through the obstacle passage by robot independently. The proposed visual system is suitable for difficult cases for example changes in object appearance, scale and pose, partial occlusions and poor illumination under the unstructured environment. Keywords—intelligent robot; visual navigation; stereo correspondence; object detection; object tracking; obstacle avoidance
I. INTRODUCTION Autonomous visual navigation which possesses a series of abilities i.e. image capturing, processing, analyzing, understanding and decision, has always been one of great important research topics in the field of intelligent robot, especially in the unknown environment. DeSouza[1] has reviewed the past 20 years developing history of the visual navigation technology for robot whose application environment can be classified into two categories: a structured environment and an unstructured environment. The former can provide a few priori knowledge for the visual system in order to represent typical features of environmental information; nevertheless the latter means an unknown environment where the visual system is impossible to get no information in advance. At present, more and more achievements have been made in dealing with the controlling theory and approach for robot
visual navigation under the structured environment. Take few typical foreign successful cases[2-6] for example: LAGR (Learning Applied to Ground Vehicles program) sponsored by DARPA utilize two stereo cameras and a GPS to navigate in the outdoor field autonomously ranging from 2004 to 2008; BigDog robot developed by Boston Dynamics adopts Bumblebee stereo sensors to reconstruct 3D scene in 2005; a visual navigation system composed of a stereo camera and a texture projector is devised to assist PR2 robot launched by Willow Garage to complete autonomous navigation and obstacle avoidance in accordance with the principle of active vision measurement within 2009; biomimetic robot iCub sponsored by European ITALK (Integration and Transfer of Action and Language Knowledge) program employs stereo cameras to recognize optical road signs on the highway in 2010; Robot Atlas developed by Boston Dynamics can carry out few emergency missions such as searching and rescue with the help of a laser radar and stereo cameras in 2012; robot dog AIBO ERS-7 is launched by SONY with a camera to recognize the features of the football field and guide robot to kick the football into the goal rapidly and accurately. Besides, many civil scholars and experts also have gained several achievements for visual navigation of intelligent robot e.g. first lunar rover Yutu with two sets of stereo cameras launched by Beijing institute of spacecraft system engineering, one set fixed on the mast of the rover for visual navigation can capture several stereo images from different views separately so as to reconstruct 3D surrounding environment within the range of 20 meters by image registration, and the other set is responsible for obstacle detection in the close range in order to offer reference support for path planning of the rover; autonomous driving vehicle Surui developed by Beijing institute of technology and BYD automobile company jointly can travel on the standard track and avoid obstacle freely by fixing visual sensors and ranging radar; an intelligent welding robot developed by Shanghai Jiao Tong University applies visual sensors to finish the welding mission autonomously. In conclusion, all the above navigation systems are designed to satisfy some specific scene and special requirements, unable to be qualified for autonomous navigation in the unknown complex environment. The existing researches on robot vision theory, approach and technology are still incapable of analyzing and understanding the visual information as soon as
This work is supported by National Science Foundation of China (No.61231018)
978-1-5386-3742-5/17/$31.00 © 2017 IEEE
1618
human real time, which limits its application and development in the more universal environment. To get rid of the dilemma, a framework of the visual navigation system is proposed which focuses on solving few key technical problems for instance stereo correspondence, object detection and tracking, obstacle avoidance to ensure intelligent robot capable of object tracking real time and autonomous guidance in the unknown environment. II. COMPOSITION Fixed on the end effecter of intelligent robot, a framework of the visual navigation system for intelligent robot is mainly composed of 3 parts: a set of stereo camera, a LED light source and a controlling box, shown as the following figure.
1) After the calibration of intrinsic and extrinsic parameters, stereo cameras capture left and right images of the unknown scene synchronously. 2) A dense 3D disparity map of the unknown scene has been obtained by a local matching algorithm with adaptive weighted filtering computed by a hierarchical clustering approach. 3) The obstacle can be detected and avoided by an algorithm based on the calculation of maximum inscribed circle with the center of regional centroid. The results can be introduced into the module of path planning to provide guidance for robot. 4) Aiming at 2D images or 3D disparity map, the interest object can be detected by a local optimization algorithm based on random ferns classifier and Hough voting automatically 5) Once the object is detected successfully, the position of the object estimated by an online learning tracking algorithm based on random ferns classifier and clustering segmentation for local optimization will be introduced into the next module. Otherwise, carry out the preset motion planning for robot.
Fig. 1. The sketch of hardware configuration for the visual navigation system.
y
Stereo camera is a binocular camera with the image resolution of 352 288 pixels, frame frequency of 25fps and the baseline distance of 40mm.
y
LED light source can provide assistant illumination for stereo camera under the unknown environment.
y
The controlling box consists of a power supply module and an information controlling module. The former can provide the power for the stereo camera and the light source. The latter is responsible for controlling the camera to capture and transmit the video images, for controlling the working state of LED light source. III.
WORKFLOW
6) A safe and passable path for intelligent can be calculated and planned by introducing both locations of the object and the obstacle, which should be send to the controlling system of robot for visual navigation to adjust spatial pose of end effector to be close to the object gradually. IV.
ALGORITHMS
A. Stereo correspondence The key for stereo correspondence[7] is how to not only reduce the noise mixed in the disparity map and simplify the complexity of computation as soon as possible but also preserve the edge details of the object very well. According to the similarity of local image features, a matching algorithm with adaptive weighted filtering computed by a hierarchical clustering approach is proposed for dense stereo correspondence, which can be generalized as follows: 1) A initializing model of matching cost between both images can be computed by introducing the absolute differences in the image intensity of every pixel and its corresponding gradient within the local window, which can be formulated as follows:
M (u, d ) = a ⋅ min[ I L (u ) − I R (u − d ) , IJ1 ] + (1 − a) ⋅ min[ ∇ x I L (u) − ∇ x I R (u − d ) , IJ 2 ]
(1)
Where d denotes as the disparity of image pixel u , M (u,d ) can be defined as the initial matching cost corresponding to u and d , a is a weighted coefficient, I L and I R shows both images separately, ∇ x reflects the
Fig. 2. Schematic diagram of workflow for the visual navigation system.
Fig. 2 illustrates the workflow of the visual navigation system for intelligent robot, which should be generalized into the following steps:
gradient of every pixel in the x-direction, τ 1 and τ 2 are both truncated values relating to the image intensity and the gradient independently.
1619
2) The initial matching cost should be aggregated by multiplying weighted filtering in order to reduce the noisy disturbance as more as possible.
the minimum of matching cost function of every pixel within the entire image shown as D (u, v) = arg min C (u , v, d )
C (i, d ) =
¦ W (i, j )M ( j, d ) ¦ W (i, j ) j∈Ni
d ∈L
(2)
j∈N i
Where both i and j are pixel coordinates in the x-direction, N i represents the size of the matching window relating to the i th pixel. W (i, j ) is defined as the weights for filtering model,
which can be derived as:
2
2 Ii − I j i− j W (i, j ) = exp(− ) exp(− ) 2 2σ s 2σ r2
(3)
Where both I i and I j refer to the image intensity of the i th , j th pixel respectively. Similarly, both σ s and σ r are two
constants corresponding to both the spatial difference and the intensity difference between both pixels.
Where u, v indicate the pixel coordinate in the image, L reflects the total number of disparity level, C (u , v, d ) represents the aggregated matching cost corresponding to the pixel (u , v ) and its disparity of d . 4) Owing to the above dense disparity map is mixed with several mismatching pixels without satisfying the smoothness constraint. This matter should be solved by a new method of disparity refinement which can be divided into two steps: firstly, after obtaining another disparity map with considering reference as the right image, all the pixels in both disparity maps can be classified into a stable region S and a unstable region U in accordance with consistency constraint between left and right images; secondly, a fine disparity space volume DSV can be calculated as follows d − D (u , v ) D S V (u , v , d ) = ® ¯0
It is supposed that there is K sampling sets {m1i , m2i ," , mKi } relating to the image intensity I i of the i th pixel, then (3) can be formulated further.
W (i, j ) = C exp(−
i− j
σ s2
2
K
)¦ exp(− n =1
Ii − mni
σ r2 2
2
) exp(−
2
mni − I j
)
σ r2 2
(4)
Here a hierarchical clustering approach which is designed to generate the above K sampling sets in accordance with similar intensity corresponding to the neighboring pixels, can lower complexity of the calculation for the image intensity efficiently. It is assumed that the image intensity of the i th pixel is the same as its neighboring j th pixel intensity within the n th category by a hierarchical clustering approach, which can be shown as (5)
mni = mnj
Equation (4) should be rewritten as 2
K
C (i, d ) =
¦ exp(− n=1
K
2 2 mnj − I j Ii − mni i− j − ) exp( )exp(− )M ( j, d ) (6) ¦ 2 2 σ r 2 j∈Ni σs σ r2 2
¦ exp(− n =1
2
2 2 mnj − I j Ii − mni i− j − ) exp( )exp(− ) ¦ 2 2 σ r 2 j∈Ni σs σ r2 2
The above equation explains the aggregating model of matching cost through filtering, which can be used to reduce the disturbing noise of the original disparity map effectively. 3) According to (6), a dense disparity map can be generated by searching the optimal disparity corresponding to
(7)
(u , v ) ∈ S (u , v ) ∈ U
(8)
B. Object detection Object detection and tracking have always been one of challenging problems[8-11] of autonomous visual navigation, especially for solving few difficult cases i.e. variant pose of interest object, partial occlusion and complex illumination, etc.
A local optimization algorithm based on random ferns classifier and Hough voting is proposed to detect the interest object automatically and exactly. It can be divided into the following steps: 1) After collecting lots of positive and negative samples, the position relationship of local regions in positive samples can be computed relative to reference center of interest object so as to generated a codebook for coding local binary feature of the object which can be used to scan all the local regions to describe object or background together with pyramid scanning window. The codebook is of significance for improving the precision of the classification. 2) Random ferns classifier can be trained to recognize the presence or absence of the object in accordance with the generated codebook for local properties. The output of every fern in random ferns classifier can be drawn as a frequency distribution histogram in order to construct a Hough map including the mathematical relationship between local binary feature of object and Hough parameter space with considering invariant rotation and scale. Once an unknown sample occurs, it will be introduced to update the random ferns classifier. 3) According to the Hough map, Hough voting has been carried out to construct a new 3D Hough space by incorporating the image coordinates and a scale factor. A probabilistic voting model is formulated to estimate the location of object under the local maxima suppose that the
1620
specified scale and image coordinates. Subsequently, both local binary feature and their corresponding voting results should be saved in the vessel Θ . 4) Local maximum can be searched by Mean-shift clustering approach, whose coordinates in the 3D Hough space can be used to describe the object to be detected.
5) Combing with structure parameters of visual system, the computed diameter should be compared with the size of cross section of end effector for robot. If the former is larger than the latter, the region is considered to be safe to robot for going forward. Otherwise, the region is not safe, the visual system has to capture new stereo images to compute again.
5) All the regions of the object can be back-projected and acquired by introducing local maximum and its corresponding local binary feature in the vessel Θ . It provides a precise object region without the disturbance of background noise to retrain the classifier and to guide extraction of the object features in the next frame.
V. EXPERIMENTS AND RESULTS
C. Object tracking Subsequently, an online learning algorithm for object tracking is presented based on random ferns classifier and clustering segmentation, which can be summarized as follows.
1) In the first frame, user can select some local binary features of the object to represent the positive sample and choose random background information as the negative sample, which can be used to train the original classifier.
Fig. 3. Photograph of experiment for delivering object by robot.
2) The above introduced algorithm can be applied to detect the interest object by drawing a color bounding box in each frame real time, besides, variant pose or motion of the object also can be estimated between neighboring frames.
As shown in Fig. 3, an experiment for delivering object autonomously by intelligent robot has been carried out to verify validation of the above algorithms such as stereo correspondence, object detection and tracking, obstacle avoidance. The experiment can be divided into 2 main steps.
3) In accordance with the motion constraints and the accurate binary features, the appearance of the interest object can be distinguished from the background quickly and exactly, whose results will be introduced into the classifier to update.
1) The visual navigation system needs to guide the end effector of robot to be close to the object gradually based on the position of the object estimated by object detecting and tracking.
The algorithm is suitable to solve the problem of sample drifting in the process of online tracking.
2) The visual navigation system needs to calculate a safe and passable path for robot which can grasp the object to get through a narrow passage.
D. Obstacle avoidance An algorithm for obstacle avoidance mainly consists of the following steps.
1) Introduce a 3D disparity map of the scene from the stereo correspondence module. 2) In accordance with practical experience, the threshold of disparity should be set to 30 for segmenting the passing region. If disparity is larger than the threshold, the region is considered as an obstacle region, otherwise, it can be considered as a passable region. Subsequently, the passable region need to be represented by extracting the max joint region further to prevent the noise disturbance existing in the disparity results. 3) Decide the passing region has been detected or not, if yes, then go to next step, but no, new stereo images need to be captured again.
In this experiment, intelligent robot is chosen as VS6577G mechanical arm developed by DENSO, whose max operating range limits to 850mm and repeatable positioning accuracy of f0.03mm. The visual navigation system is fixed on the flange of the robot, whose 3D size is 50h50h25mm and weight approximates 250g. A. Object Detection First of all, there are few samples of interest objects for instance a calculator, a box and a cup need to be collected as a training set, as shown in Figure. The sample number of 3 objects is 84, 66 and 56 respectively. The view of camera should be changed to cover all the features of samples constantly. The negative samples can be extracted from the background, the number of which should be equal to that of positive samples.
4) Once the passing region has been determined, its maximum inscribed circle can be drawn based on the location of the regional centroid so as to calculate the diameter of the circle.
1621
Experimental results have shown that the proposed algorithm of object detection achieves well performance against few disturbances such as partial shadow, poor illumination of environment and scale variation. Fig. 4. Typical cases for training sample sets of 3 different objects.
Fig. 5 illustrates the testing set composed of 3 complicated scenes such as partial occlusion, poor illumination and scale variation to verify validation and robust of the algorithm for object detection.
(a)
(b)
B. Object Tracking Fig. 7 illustrates the process of object tracking by the proposed online learning algorithm. All the figures are captured by left camera, which indicates robot getting close to the object (shown as the calculator in the yellow bounding box) gradually.
(a)
(c)
Fig. 5. Testing scenes (a) partial occlusion (b) partial occlusion+poor illumination (c) partial occlusion + scale variation.
Fig. 7.
(b)
(c)
(d)
(e)
Result of object tracking. (a)~(e) refer to Position 1 to Position 5.
Subsequently, 3D coordinates of the object both in the coordinate systems of the visual navigation system and robot can be calculated separately by introducing the calibrated parameters.
(a-1) TABLE I. Position Number
(a-2)
(a-3)
3D COORDINATES OF THE OBJECT CORRESPONDING TO DIFFERENT POSITIONS/MM Coordinate system of visual navigation system (mm)
Coordinate system of robot(mm)
1
(-1.58, -28.06, 211.48)
(40.09, 409.18, 400.86)
2
(-35.58, -24.59, 183.94)
(40.13, 409.16, 400.83)
3
(-50.01, -33.14, 154.27)
(40.13, 408.66, 400.85)
4
(-66.64, -50.87, 125.68)
(40.15, 409.08, 400.80)
5
(-93.98, -44.72, 94.77)
(40.13, 408.74, 400.91)
(b-1)
C. Obstacle Avoidance In this experiment, a narrow passage[12] is designed for delivering the object by robot, which consists of 6 different typical scenes shown as Fig. 8. Region 1 is an entrance; Region 2 is defined as a corner with a right-angle up to the next region; Region 3 is set as another corner with a right-angle turning right to the next region; Region 4 is an obstacle with a high step to be stride forward; Region 5 is defined as another corner with a right-angle down to the next region; the last region is described as an exit. The height and depth of the passage are all 25 mm, which can accommodate the end effector of robot to get through successfully.
(b-2)
(b-3)
(c-1)
(c-2)
(c-3) Fig. 6. Results of object detection. Three columns refer to detecting results of 3 objects(calculator(a), box(b), cup(c)) for Hough voting, back-projection and final detection in 3 different scenes (1)(2)(3) specified in Fig. 5 respectively.
Fig. 8. Photograph of narrow passage.
1622
Fig. 9 illustrates the original left image, 3D disparity map and passable path of all the regions in the narrow passage. Shown as Table II, the position of every passable region can be computed by the visual navigation system so as to convert 3D coordinates in the coordinate system of robot by introducing the calibrated parameters. Once no passable region can be detected, robot has to change the view of stereo cameras constantly until to find another passing path.
(a)
(b)
(c)
(d)
concluded to require the visual system scanning surrounding environment again. VI.
CONCLUSIONS
In this paper, a new visual navigation system for intelligent robot has been proposed by introducing systematic composition, workflow and key algorithms such as a matching algorithm with adaptive weighted filtering computed by a hierarchical clustering approach, a local optimization algorithm for object detection based on random ferns classifier and Hough voting, an online learning algorithm for object tracking based on random ferns classifier and clustering segmentation and an algorithm for obstacle avoidance based on the calculation of maximum inscribed circle with the center of regional centroid. The first algorithm for dense stereo correspondence has well performance with accuracy and efficiency. The second algorithm for object detection is suitable to few difficult cases i.e. partial occlusion, poor illumination and scale variations. The third algorithm for object tracking can solve the problem of sample drifting effectively. Experimental results have proven and verified the validation and robust of all the above algorithms to ensure robot capable of object detecting, tracking and obstacle avoidance autonomously and accurately with the help of the visual navigation system, especially for the unstructured environment. REFERENCES [1]
(e)
(f) Fig. 9. Results of all the specified regions for obstacle avoidance. The first column refers to the original left image, the second column relates to disparity map; the third column indicates passable region. (a) Region 1; (b) Region 2; (c) Region 3; (d) Region 4; (e) Region 5; (f) Region 6.
TABLE II. Region Number
POSITIONAL RESULTS FOR ROBOT IN THE DIFFERENT REGIONS Image Coordinate(pixel)
Coordinate system of robot(mm)
Region 1
-256120
˄36.64, -17.01, 236.95˅
Region 2
-240144
˄27.35ˈ-3.07ˈ236.95˅
Region 3
üü
üü
Region 4
-268145
˄43.61ˈ-2.49ˈ236.95˅
Region 5
-243145
˄29.09ˈ-2.49ˈ236.95˅
Region 6
-235134
˄24.45ˈ-8.88ˈ236.95˅
The experimental results have demonstrated that the visual navigation system can calculate a passable path accurately on the basis of 3D disparity map mixed with few noise disturbance shown as Fig. 9(b),(d)and (e). Besides, Fig. 9(c) describes an obstacle with a high step where no accessible region are
DeSouza G. N. and Kak A. C. Vision for mobile robot navigation: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(2): 237-267. [2] Jackel L. D., Krotkov E., Perschbacher M., et al. The DARPA LAGR program Goals, challenges, methodology, and phase I results[J]. SCI Journal of Field Robotics, 2007. [3] Srinivasan M., Thurrowgood S. and Soccol D. Competent vision and navigation systems[J]. Robotics & Automation Magazine, IEEE, 2009, 16(3): 59-71. [4] Kendoul F. Survey of advances in guidance, navigation, and control of unmanned rotorcraft systems[J]. Journal of Field Robotics, 2012, 29(2): 315-378. [5] Hu W., Tan T., and Wang L. et al. A survey on visual surveillance of object motion and behaviors[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2004, 34(3): 334352. [6] Cai G., Chen B. M., Lee T. H. An overview on development of miniature unmanned rotorcraft systems[J]. Frontiers of Electrical and Electronic Engineering in China, 2010, 5(1): 1-14. [7] Martin M. C. Evolving visual sonar: Depth from monocular images[J]. Pattern Recognition Letters, 2006, 27(11): 1174-1180. [8] Prasad D. K. Survey of the problem of object detection in real images[J]. International Journal of Image Processing, 2012, 6(6): 441. [9] Szeliski R. Computer Vision: Algorithms and Applications[M]. Springer, 2011: 226-229. [10] Witten I. H. and Frank. E. Data Mining: Practical machine learning tools and techniques[M]. Morgan Kaufmann, 2005: 114-118. [11] Wang Q., Chen F., and Xu W. et al. An experimental comparison of online object-tracking algorithms[C]. Proc. SPIE8138, Wavelets and Sparsity XIV, 2011: 81381A. [12] Lenser S. and Veloso M. Visual sonar: Fast obstacle avoidance using monocular vision[C]. IROS, 2003, 1: 886-891.
1623