Diego – the humanoid biped

Speech

Finding a Red Ball and Taking It

by on Jan.23, 2011, under Motion, Speech, Vision

After proving that the robot’s arm articulation is precise enough and that the vision system works well in finding and tracking a ball, I decided to take it a step further and combine sensing and actuating. Actually a few steps further, but before explaining what are all the steps the robot does, let’s see the video:

The robot performs the following steps in this video:

1. Speech recognition of all commands (SAPI 5.1)
2. Looking for the red ball with it’s stereoscopic cameras.
3. Finding the ball’s image and stereo-matching it using the sum of absolute differences algorithm.
4. Calculating the distance of the ball using triangulation.
5. Calculating the Cartesian (x,y,z) position of the ball using trigonometry.
6. Performing inverse kinematics (IK) to determine joint angles to get to the ball.
7. Actuating the servos to take the ball.

Let’s focus on some of the new algorithms.

The red ball is found in the left image by locating the “most red” area in it. Pure red is found in the image by the following formula calculated for each pixel:

diff(x,y)=red(x,y)-blue(x,y)-green(x,y)

the ‘diff’ signal will have its highest values for areas which have only the red color component. Then, an adjustable threshold is applied to cut off other color areas.

After this is done, we commence with stereo matching. This is done along so called epipolar lines. Since the cameras’ optical axes in this system are parallel, the epipolar lines are horizontal. This means that the ball’s image will appear in the right image at the same y-coordinate as on the left image, but translated along the x axis.  The search along the x is done using the 2D sum of absolute differences between the left and right images. The area in the right image that most resembles the area in the left image is declared stereo-matched.

Next, knowing the camera focus lengths and positions of the center of the ball in both images we can calculate the angles under which the cameras see the object. By also knowing the inter-camera distance, we can triangulate the distance of the object.

After the direct line distance is calculated, trigonometry is used to figure out the Cartesian distances in the x,y and z directions. These are fairly complex calculations, since the head’s tilt and yaw angles must be accounted for too.

Knowing the x,y and z coordinates of the ball compared to the origin of the hand in the shoulder, we need to figure out the angles in the arm’s joints to reach the object. This problem is called inverse kinematics. It is an iterative process in which the angles are adjusted until the hand reaches the ball.

Finally, when the angles are calculated the robot performs these and grabs the ball.

Leave a Comment more...

Speech Recognition and Generation on Roboard

by on Jan.19, 2011, under Speech

The new Roboard single board computer provided enough processing power for implementing higher level cognitive functions on my robot. One of these functions is speech. Since I installed Windows XP on the Roboard and used Microsoft Visual Studio for code development, I also started using Windows’ own Speech API (SAPI) for speech recognition and generation. As input device I use the well amplified microphone on the Minoru 3D stereoscopic vision system. As output device I use a regular non-amplified speaker. Speech recognition works quite well almost out of the box, when the recognition grammar is limited to a number of phrases. This setup is very adequate to my current needs.

Leave a Comment more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Archives

All entries, chronologically...