Teaching a vehicle to see in real time

It’s easy for you to instantly identify a stop sign. Whether it happens to be night or day, you can still likely identify a stop sign. If it’s fair or raining, morning or mid-day, directly in front of you or skewed to the side, your brain still relays to you that you’re seeing a stop sign. Teaching a computer to make that same connection regardless of the conditions, however, presents a challenge.

Building a scalable network that detects and identifies objects as fast as your brain starts with the vision of the vehicle. Forward-facing cameras and radar will soon be standard equipment in all cars. Those cameras will see everything we see, naturally, but those cameras have to be connected to a system that can accurately identify the things in the field of view.

hereblogselfhealingmaprt body1 1000heads 2017 10 03 blog

As the first step, the navigation system must be taught what to look for. Teaching a computer to identify a particular sign is a significant task. To reliably learn a new sign, for example, any system will require example pictures of that sign in all conditions, and from all possible angles. This results in hundreds of thousands of images of the same sign.

HERE begins with a representative set of images and amplifies the data using data augmentation and synthetic data techniques. Those images are used to train the models to identify the signs in question. Using these models HERE built an extended database of geo-located signs.

The next task is identifying those objects in real-time.

Consider the stop sign example from before. Imagine that stop sign is in front of a vehicle that can take a picture of the intersection in front of it. Unfortunately, that single picture is of limited use. A 2D picture of an intersection may contain a multitude of important objects: the stop sign, cross-walk markers, turn signs printed on the road, the middle lane divider, the posted speed limit, and quite a bit more. The objects might be detected by the system’s AI, but their distance and relative position would be extremely difficult to tell from one picture.

To solve this, the HERE system takes multiple pictures of the environment – at a rate of 20-30 pictures per second. As a vehicle moves, each picture is paired with the GPS location data. The navigation system then triangulates where objects are in the scene, which transforms a 2D image into a 3D environment model.

hereblogselfhealingmaprt body2 1000heads 2017 10 03 blog

In this example, we have a database of objects that can be identified in any environment. That database is paired with location data, and a 3D view of the car’s environment. These two pieces, joined together in a scalable distributed network, are how a HERE autonomous car can orient itself on the road, and respond instantly to changing conditions.

The ecosystem of information extends well beyond this example. When differences between the HD Live Map and a car’s data are detected, the information must be added to the cloud. That new information has to be processed at the edge, then distributed to other vehicles when deemed necessary. This enables Self Healing Maps, and it’s how HERE is continuing to enable an autonomous world.