See how the Tangram Vision Platform can radically accelerate your perception roadmap.
Table of Contents
A Portrait Of The Author As An Event Camera Owner. Image captured by a Samsung SmartThings Vision camera, June 2020.
Two years ago, the computer vision community was abuzz with anticipation for a new type of sensor that promised a revolution in speed and efficiency.
These new sensors are called event cameras, and also know as dynamic vision sensors (DVS) or neuromorphic cameras.
Yet two years later, many of us are now wondering why event cameras seem to have joined Spinal Tap in the “Where Are They Now?” file.
The idea behind event cameras is compelling: a camera that works very much like the human eye. In doing so, it can react quicker, under more circumstances, while transmitting much less data. But before we can understand how an event camera works, let’s first review traditional cameras.
The fundamental manner in which most cameras work hasn’t changed much since the very first daguerreotype images were captured in the mid 19th century. A shutter would open, and light would hit a substrate that would then capture an image. In the 19th century, that substrate was silver-plated copper. In 2020, that substrate is a light-sensitive chip.
In a traditional camera, the shutter opens intermittently to let light in, which the substrate then captures as an image. In the context of video cameras, this is done multiple times per second to capture motion. The speed with which a motion camera can open and close its shutter becomes a limiting (or, if you’re a cup half full kind of person, enabling) factor that determines camera performance.
Once an image is captured on the substrate, something must be done to it. From 1840 to around 1960, that “something” was typically using a noxious bath of chemicals on the substrate to develop the images that subsequently become photographic prints and movies. Starting in the mid 1960s, digital recording formats allowed the substrate to record in ones and zeroes. Enter the second limiting (or enabling) factor: data transmission and storage. The earliest 2-inch magnetic video tape from the 1960s required quite a lot of space to store an hour’s worth of color video:
Today, we’re not as concerned with local storage as we are with cloud data transfer and remote storage. True, compression has allowed us to lower the amount of data that a typical camera transmits. On the flip side, this has led system designers to increase the number of cameras, increase their data quality, and increase their usage. Back to square one with data constraints!
So, to recap, typical cameras:
Now how do event cameras compare?
To start with, event cameras have no shutter. But, like a CMOS camera, they do have a chip with hundreds of thousands of tiny, light-sensitive pixels. Each one of those pixels is programmed to record data under a simple set of conditions:
If either of the first two conditions occurs, that pixel transmits the new value. If the third condition occurs, it sends no new value.
Each pixel in an event camera operates independently. That means that event cameras have microsecond responsiveness. That is one millionth of a second. One millionth. That 240fps camera in your iPhone doesn’t seem so fast now, huh?
On top of that laser-fast speed, the data that an event camera sends is, by default, compressed. Any pixel that doesn’t register a change sends no value, just like a typical image compression algorithm. But, better yet, those pixels that do send data can send a minuscule packet that barely even registers.
So, to recap, event cameras:
Great question. Where are they? Well, they still exist. But it appears that those companies who may choose to implement them in a mass-produced product are still in “wait and see” mode.
As a result, there are very few companies still actively producing event cameras. Our current contenders are:
There has also been activity by two of the larger consumer electronics companies, Samsung and Sony. Samsung is the only company to date that has tried to mass market an event-camera-powered product to consumers, with the SmartThings Vision camera.
The SmartThings Vision camera was marketed briefly by Samsung Australia as a home surveillance camera that offered greater levels of privacy. It was pulled from the market after just a short time on sale.
With the first consumer event camera product a failure, event camera companies are turning to industrial use cases to find a market. Robotic vision. Object scanning. Security. So far, there have been few takers who are willing to change their system designs from the current industry-standard CMOS designs to event camera designs. So what gives?
We’ll start with the most obtuse factor. If you’d like to manufacture a custom CMOS-based camera, there is an embarrassment of contract manufacturers ready to develop that camera to your precise specifications. Chances are, the specific components you need are already being mass produced to a high specification of quality and performance.
Want to develop an event camera based system? Good luck funding a manufacturer with the willingness to transfer a profitable CMOS semiconductor assembly line to what they’ll believe to be a brand new, commercially unproven technology. And, if you do, it’ll cost you an arm and a leg (or, in camera terms, an eyeball?) to get them manufactured, and those costs will need to be transferred to your customers.
In consumer markets, this will make your product price-prohibitive. In industrial markets, it will relegate your product to niche applications, which will suppress demand, thereby keeping costs perpetually high, which continues to suppress demand, which…well, you get the idea. Until a radical breakthrough in demand occurs, this vicious cycle will limit supply chain participants’ interest in producing the components that are unique to event cameras.
To build on one of the key points from the previous factor, event cameras simply aren’t cheap. The least expensive models that are broadly available cost in the thousands of dollars each. That severely limits applications to those products that can assume such a massive additional cost into an aggregate bill of materials.
Our focus market of robotics already struggles to justify the addition of a single multi-thousand-dollar LiDAR unit, and can only do so because of the sheer utility that these long-range, 360° sensors add to robotic platforms. Event cameras will absolutely outperform even high dynamic range (HDR) CMOS cameras in robotic contexts — but not to the degree required to justify their extra cost.
Industries like robotics are at the cutting edge of technology. However, the hardware design decisions that robotics companies make tend to be conservative.
Because of the complexity of robotic platforms, proven and reliable components are of paramount importance in system design. Event cameras are simply too new and exotic to be seriously considered for a robotic platform, for example.
This is also reflected in a key leading indicator for what imaging technologies will receive the most industry attention in the near future: academic research. Peruse the paper submissions for key computer vision conferences like CVPR or ICCP, and you’ll see very few focused on event cameras. And a plethora focused on other modalities like CMOS and depth.
While few CMOS cameras (even HDRs) can come close to the levels of performance offered by a typical event camera, the performance gap in key areas has been closing.
Modern CMOS cameras offer ever-faster shutters, higher pixel density and higher dynamic ranges that make them “good enough” for applications where event cameras shine. And they are inexpensive enough that multiple CMOS cameras can be utilized in single purpose applications, collectively overcoming some of the deficiencies that would otherwise occur in a single CMOS deployment.
So why spend thousands of dollars on a single event camera when you could design a good enough CMOS-based system that costs…$150? $30? Even under $10?
Some of event cameras’ inherent advantages have been whittled away by factors that are exogenous to camera design. Consider data usage.
A decade ago, available bandwidth was scarce in wired scenarios (for instance, an IP camera used for security in a retail store) and even less so for wireless scenarios (a drone with an inspection camera transmitting over a 3G network). Today, data pipes are much larger, data transmission speeds have radically accelerated with LTE and 5G rollout, and data compression algorithms have progressed as well. For robotic platforms that process data locally, hosts have gotten much more powerful, too. Therefore, data efficiency is no longer as great an advantage for event cameras as it once was.
True, greater amounts of data transmission impact other areas of system performance (for instance, battery life and available compute). However, in most applications, the aggregate data requirements for above average application performance is now aligned with transmission and processing thresholds that preserve battery life and compute resources.
So will we ever see broad-based deployment of event cameras? In our estimation, the answer is somewhere between not for another few years to not at all. Current CMOS technology is simply good enough (and getting better still) to win the argument against justifying the massive investments that would be required for a large shift in markets to event cameras.
That’s not to say that event cameras can’t find a market at all. They are still incredibly powerful, as evidenced by some incredible demos that show just what they are capable of:
Drone dodgeball, demonstration by the University of Zurich Robotics and Perception Group
But will they? Or, like our friends Nigel, David, and Derek from Spinal Tap, will it be shark sandwiches for event cameras?
The Tangram Vision Platform lets perception teams develop and deploy faster.