Robot user interface application
by paloskar on Apr.16, 2012, under Software
I created this test application for demonstrating the capabilities of the new robot class model. The video gives a quick overview of the system (the volume might be a bit low):
You can download this application at the following links:
- source code only – 17kB zip
- visual studio 2005 project, dependencies and executable – 66MB zip
The second option is a VS2005 project with all the needed dependencies. It also contains the complied and linked executable robot_class.exe in the release folder. This zip file should be unpacked to C:\ for the Festival library to work properly (it creates its own sub-folders). It was tested under Windows Vista.
Generalized robot class model – transition to C++
by paloskar on Apr.14, 2012, under Software
After a long time developing my robot’s software in C, I decided to make a transition to C++. I realized that a class system would allow me to keep a better structure of the code. This is why I developed a new object system, which looks like this in UML (click on it for a larger view):
In the demo/test application, the robot class is controlled directly by the user interface class. The robot class aggregates member objects for controlling different functionalities: the speech system, vision system, kinematics model. The vision class consists of two or more camera classes, which in turn contain classes for blob manipulation. The vision class is also responsible for triangulation to calculate distances of objects in the robot’s field of view. The kinematics class describes the kinematic model of the robot using arrays of joint structures and segments. It also contains a member function for calculating direct kinematics.
Moving on to open-source libraries
by paloskar on Apr.14, 2012, under Software
I am a strong believer in open-source software. To live up to this belief, I replaced Microsoft SpeechAPI with PocketSphinx speech recognition and Festival speech synthesis libraries. This brings me up to 4 different libraries for this project:
- GTK – the GIMP Toolkit for creating the graphical user interface,
- OpenCV – by WillowGarage for computer vision,
- PocketSphinx – by CMU for speech recognition,
- Festival – by the University of Edinburgh for speech synthesis.
Two Half Steps Make One Whole Step
by paloskar on Jan.23, 2011, under Motion
Installing the new Roboard processing unit allowed much more precise and sophisticated servo movements than the TI micro-controller did before. Thanks to this the robot was able to complete this step:
Next, the software part of saving and loading these steps needs to be improved to facilitate the process o developing new repeatable walking patterns.
Finding a Red Ball and Taking It
by paloskar on Jan.23, 2011, under Motion, Speech, Vision
After proving that the robot’s arm articulation is precise enough and that the vision system works well in finding and tracking a ball, I decided to take it a step further and combine sensing and actuating. Actually a few steps further, but before explaining what are all the steps the robot does, let’s see the video:
The robot performs the following steps in this video:
1. Speech recognition of all commands (SAPI 5.1)
2. Looking for the red ball with it’s stereoscopic cameras.
3. Finding the ball’s image and stereo-matching it using the sum of absolute differences algorithm.
4. Calculating the distance of the ball using triangulation.
5. Calculating the Cartesian (x,y,z) position of the ball using trigonometry.
6. Performing inverse kinematics (IK) to determine joint angles to get to the ball.
7. Actuating the servos to take the ball.
Let’s focus on some of the new algorithms.
The red ball is found in the left image by locating the “most red” area in it. Pure red is found in the image by the following formula calculated for each pixel:
diff(x,y)=red(x,y)-blue(x,y)-green(x,y)
the ‘diff’ signal will have its highest values for areas which have only the red color component. Then, an adjustable threshold is applied to cut off other color areas.
After this is done, we commence with stereo matching. This is done along so called epipolar lines. Since the cameras’ optical axes in this system are parallel, the epipolar lines are horizontal. This means that the ball’s image will appear in the right image at the same y-coordinate as on the left image, but translated along the x axis. The search along the x is done using the 2D sum of absolute differences between the left and right images. The area in the right image that most resembles the area in the left image is declared stereo-matched.
Next, knowing the camera focus lengths and positions of the center of the ball in both images we can calculate the angles under which the cameras see the object. By also knowing the inter-camera distance, we can triangulate the distance of the object.
After the direct line distance is calculated, trigonometry is used to figure out the Cartesian distances in the x,y and z directions. These are fairly complex calculations, since the head’s tilt and yaw angles must be accounted for too.
Knowing the x,y and z coordinates of the ball compared to the origin of the hand in the shoulder, we need to figure out the angles in the arm’s joints to reach the object. This problem is called inverse kinematics. It is an iterative process in which the angles are adjusted until the hand reaches the ball.
Finally, when the angles are calculated the robot performs these and grabs the ball.
Object Recognition and Tracking
by paloskar on Jan.19, 2011, under Head, Vision
In the below video my robot performs a couple of things. First he recognizes speech. With the limited grammar it works great. Then he visually recognizes objects of the chosen color in its field of view. Finally he turns his head towards where the recognized object is moving.
As mentioned in an earlier post, speech recognition is done using Windows’ Speech API, while generation is done using Windows Text To Speech (TTS).
Object recognition is based on detection of an area with the color given by speech recognition. There is no stereoscopy involved in this case. Video acquisition and image processing is done using the famous OpenCV library. It provides all the necessary routines that make it easy to implement any computer vision idea. Here I separate the given color from the image, find the area’s center and then call servo control functions of the Roboard to turn the robot’s head in order to keep the red or blue ball in the center of the field of view.
Speech Recognition and Generation on Roboard
by paloskar on Jan.19, 2011, under Speech
The new Roboard single board computer provided enough processing power for implementing higher level cognitive functions on my robot. One of these functions is speech. Since I installed Windows XP on the Roboard and used Microsoft Visual Studio for code development, I also started using Windows’ own Speech API (SAPI) for speech recognition and generation. As input device I use the well amplified microphone on the Minoru 3D stereoscopic vision system. As output device I use a regular non-amplified speaker. Speech recognition works quite well almost out of the box, when the recognition grammar is limited to a number of phrases. This setup is very adequate to my current needs.
Arm and Hand Development
by paloskar on Jan.19, 2011, under Arms, Servos
One of my long time favorite topics in robotics is visual servoing. In order to be able to do this, my robot needed fairly sophisticated arms and hands. Each of the old hands had 2 degrees of freedom: one in the shoulder and another in the elbow. The new construction sports 5 DOFs: two in the shoulder and one in the elbow, wrist and hand.
The upper three servos are regular sized Hitec HS-475HB, while the lower two (wrist and hand) are micro sized Hitec HS-85BB. The frame of the arm is built of Lynxmotion Servo Erector Set elements, just like the legs. The hand is a gripper device produced by Robix and sold by Lynxmotion.
After the construction was completed, I wanted to test the hand’s precision. I connected the arm servos to the Roboard unit and sent a series of pre-programmed movements to grab a small red ball.
This worked out quite well as you can see it in the above video, which encouraged me to continue with my visual servoing ambitions.
Stereoscopic Vision System for the Robot
by paloskar on Jan.18, 2011, under Electronics, Head, Vision
Since the new processing unit for my robot, the Roboard RB-100, has USB ports, I can start using standard PC devices to fulfill different robot functions. This is so much easier than writing the device driver code myself. In this light I got hold of a very interesting off-the-shelf PC stereoscopic camera system: the Minoru 3D. I must thank the company, Promotion and Display Technologies, for taking this brave step and marketing such a device. This is how it comes when you buy it:

Minoru 3D consists of two web cameras in one sleek housing. For my purposes, I needed only the electronics, that would allow me to connect two webcams to a single USB port. This is why I stripped down the housing and came up with this:
Finally, here is how it looks like when the camera system is installed on the robot’s head on a pair of pan and tilt servo motors:
I found the Minoru 3D to be a great robotics component. It works very well under Windows and delivers good image quality. Very conveniently, it also contains a very well amplified microphone, which can be very useful in speech recognition applications. As a hint for anyone who might want to use this device in a similar fashion: when trying to take the plastic housing apart, there is no need for breaking it. The whole thing is held together with a concealed screw on the back of the neck of the Minoru housing. In future posts I will talk more about how I used this newly acquired vision system to add to my robot’s sensing capabilities.
Single Board Computer Intellect
by paloskar on Jan.18, 2011, under Electronics
The TI micro-controller I used for managing the robot’s walk proved to fall short in processing power. This is why I started looking for a replacement that would allow plenty of MIPS for controlling the servos and for other, higher level cognitive tasks. This is why I decided to get a small single board computer (SBC). Generic SBCs did not seem to have dedicated outputs for controlling servos. Because of this, I decided to go with a SBC designed specifically for robots: the Roboard RB-100. It runs on 1GHz and has 256Mb of RAM. It has plenty of PWM servo outputs, COM ports, I2C, USB, SPI, RS-485, A/D inputs, LAN, etc. I mounted the SBC on the back of the robot.

The Roboard has an mini PCI slot that can be fitted with a graphics card, wireless card or any other standard mini PCI device. As mass storage, a Micro SD card can be used. The device is fully PC compatible. I installed a regular version of Windows XP on it. I develop programs in Microsoft Visual Studio IDE on a remote PC and transfer the compiled output files to the device through LAN using Windows Remote Desktop. I write applications in C and use GTK for defining the user interface. The Roboard proved to be a very good choice. It worked exactly as advertised without any issues. I would recommend it to anyone looking for a robot development platform that will provide enough processing power for almost any imaginable task. I will write more about the higher level processing capabilities that I implemented in future posts.
