See how the Tangram Vision Platform can radically accelerate your perception roadmap.
Table of Contents
Here’s a thought experiment: Imagine if you were in charge of designing the sensor system of an autonomous machine. What would you put on there?
Let’s think it through together. For starters, this autonomous machine will probably need some way to register its surroundings. Humans do this primarily through vision, so let’s slap a camera on there. This creates images of visible light over a certain field of view. We can put one in the primary direction of motion to see where this machine is going.
However: we don’t have a great sense of geometry with just a camera. There could be holes in our path that just blend into the scenery, or obstacles that would be hard to detect without some knowledge of the topography. We can compensate for this by placing a depth sensor alongside the camera. This adds a ton of valuable information when moving at slow speeds.
However! What happens when this machine moves at high speeds? The depth sensor’s data range is only effective up to 5m, give or take; this isn’t far enough to react when moving at highway speeds. Let’s compensate for this short range by putting a LiDAR unit on the top. This gives a fast, long-range measurement of the entire surrounding environment.
Even though other engineers might not go through the exact same thought process that we did above, it’s still the case that these three modalities are probably the most common starting point for any autonomous device. One will find the combination everywhere in service robots, autonomous vehicles, etc. Therefore, I’ll refer to these three sensor modalities collectively as the Common Sensors: Sensoria Communia.
💡 Or Sensoria Vulgaria, depending on the era.
And it’s no wonder the Common Sensors are the go-to sensing combo. With these three modalities, we’ve seemingly solved the problems of short- and long-range vision at both slow and high speeds. They offer a superior blend of form, function, and practicality for the budding roboticist. They’re so complimentary that when someone deviates from the formula, headlines are made (even if they come back around later).
We can see this more clearly by comparing them against one another in a few key metrics, both technical and commercial.
I’ve gone ahead and assigned every modality a value of 1-10 for each metric, with a 1 being “least favorable” and 10 being “most favorable”. Here’s how they compare:
This is subjective, of course, but even with my rough value estimates, it’s easy to see that these modalities compliment one another quite well.
Let’s take the OV7251, one of the most popular cameras sold today, as an example of a Common camera. It has a “focusing range” of 65mm → infinity, but it’s a fixed-focus lens, so it can’t work at all distances at all times. This makes its “range” somewhat lacking. When paired with its fidelity (640x480 pixels) and FOV (86.5 degree diagonal), its capacity for autonomous application seems limited.
But not so fast! What this camera is really good at is delivering reliable and constant data, no matter what the conditions of the environment. We can therefore use it as a supplement to our other sensors: it can add visible data to the depth maps produced by an Intel RealSense D435i (a Common depth sensor), or fuse with the point cloud created by a Velodyne Puck (a Common LiDAR).
Likewise, a Velodyne Puck delivers a very sparse data packet: 16 beam sweeps in a full circle around the puck. Granted, that sweep delivers data over 100m away, which is what accounts for its great range rating, yet this data by itself isn’t enough to consistently identify objects, paths, or intent in the environment. When paired with the Intel RealSense D435i’s dense depth data, that problem goes away. One can identify potential hazards in the RealSense line of sight, and then continue to monitor those hazards as one moves around via the Velodyne Puck (if they’re clever with their mapping approach).
There are all sorts of ways these modalities compensate one another; suffice to say, these are comfortable technical compliments. On the other hand, if they were so complimentary, one would expect the overlap of every modality graph to fill the metrics chart completely. This would equate to having an all-seeing sensor system capable of vision in any environment… which the Commons obviously does not provide. But is that possible in the first place?
Is it, indeed? Let’s leave that question hanging and look at the other side of this sensing coin: Commercial Metrics.
Same deal here. I’ve assigned a value of 1-10 based on my own experience. Here’s how they measure up:
Let’s take a closer look at LiDAR, the most lackluster commercial performer of the Commons. For starters, LiDAR is an expensive modality. A secondhand “like new” condition Ouster OS0-32, which has a 32-beam sweep, goes for nearly $2000 on Ebay; a brand new one will cost over $5500 off of a third-party retailer. A secondhand Velodyne VLP-16, which is no longer manufactured, goes for a more modest $100-$500, but comes with no guarantees of performance, functionality, or support.
Why am I showing these third-party sellers, instead of the manufacturers themselves? Because one can’t find the price of a LiDAR sensor on a manufacturer website! There’s a sales pitch between a prospective buyer and that pricing quote, so that they can get a sense of the buyer’s scale and commitment. Companies like Ouster and Velodyne make much of their money selling to autonomous vehicle OEMs; if a prospect isn’t buying hundreds of units, it’s probably not worth their time. And who could blame them? It’s a competitive environment with a lot of market share to capture… but it sure doesn’t help anyone else trying to get these integrated into their systems.
🚨 This includes manufacturers like Luminar that claim to price their LiDAR between $500-$1000, because that cost is still reserved for Volvo, not me.
This literal cost to entry is probably why advancements in LiDAR software have traditionally been the domain of academia and industry. There are very few open-source packages that deal with LiDAR in a meaningful, algorithmic way; instead, most deal with just the visualization or representation of LiDAR data in a system, like ROS or Foxglove. This lowers the Engineering Ease of the modality quite a bit, especially when integrated with the rest of the Commons. How can an engineer expect to move fast with LiDAR if every project has to start from scratch?
Depth is much more equipped for commercial use, though its cost and throughput can be shaky at times. For instance, I’ve been going on about the Intel RealSense D435i for most of this post. Though I don’t have solid evidence of this, I’m convinced its the most ubiquitous commercial depth sensor on the market today, making it a deserved member of the Commons. Shockingly, though, Intel itself stated publicly in August 2021 that they were shutting down the RealSense division. This led to a rush of orders, reservations, and off-market deals by robotics companies to secure any and all RealSense devices before supply dried up. Intel later course-corrected a bit, stating that they will continue making stereo depth modules, but the announcement made the robotics community realize that they were all heavily reliant on one manufacturer for their depth. The ease of tech development via the RealSense SDK didn’t make up for the fact that a key sensor supply was now in doubt.
Cameras seem to be strong in every commercial facet. Indeed, their usefulness and ubiquity in autonomy can’t be understated. However, its impressive chart hides the fact that most of these advantages came from developments within the autonomous community itself. Open-source tools like ROS and OpenCV treat cameras with reverence, devoting a majority of their resources to working with images and video streams. This makes it incredibly easy to engineer a proof-of-concept camera system (maxing out Engineering Ease), which in turn lowers the barrier to entry for camera-based products (maxing out Proven Business Case).
The ubiquity of the cameraphone has also dramatically decreased the cost and increased the manufacturing throughput of the most common camera systems. In fact, cameras like the OV7251 are sold in bulk at prices less than $30 a pop and are themselves integrated into larger sensing packages like the Luxonis OAK-D-Lite. Cameras have been given a step up commercially compared to all other sensing systems, and it shows.
But, as the classic saying goes, cameras can’t carry an autonomous system by themselves. The Common Sensors are common because of their complimentary technical abilities. As an autonomy developer, one would hope that all three modalities would max out their commercial metrics: a super-cheap, reliable, easy-to-integrate sensor combo that can be deployed whenever. This just isn’t the case.
Could it be?
It’s clear that the Common Sensors, though complimentary, aren’t a sensing panacea. Anyone who has attempted a system like this for a truly autonomous solution will find that it is still deficit. There are blind spots; data throughput issues; connectivity issues; communication issues; data fidelity issues; sensor fusion issues; algorithmic issues; etc, etc, etc.
However, I have good news to share: there are more sensing modalities.
That’s right! There are many, many ways of sensing the world around us. Sure, they aren’t used nearly as often as the Commons (or else they’d be in the Commons), but they hit some of the same sweet spots as cameras, depth, and LiDAR, along with bringing their own unique approaches to autonomous sensing. I call these the Unknown Sensors: Sensoria Obscura.
💡 a.k.a. the Dark Sensors, if translating literally. But, um, don’t.
I’m going to be diving into these Sensoria Obscura over the coming weeks and months, using this fancy 8-point metric system to break them down and build them up. These are some of the most exciting sensing modalities for autonomy; they just haven’t had their time yet, whether that’s due to the slow pace of technology, or the commercial concerns of industry. However, they hold a great deal of promise for advancing the state of the art.
Stay tuned as Tangram Vision dives into some of the coolest sensing tech today!