    < Back to Blog

# Deriving Derivatives in Perception

###### March 8, 2022
Tutorial

# Introduction

In [our tutorial on Creating Camera Calibration From Scratch](https://www.tangramvision.com/blog/calibration-from-scratch-using-rust-part-1-of-3), we explored how to create a system to calibrate a camera with a relatively simple camera model. We showed how to frame camera calibration as an optimization problem and use a solver like Gauss-Newton to reach a solution.

Like in that example, many optimization problems in computer vision lack a [closed form solution](https://mathworld.wolfram.com/Closed-FormSolution.html#:~:text=An%20equation%20is%20said%20to,not%20be%20considered%20closed%2Dform.), so gradient-based iterative optimization is usually necessary. In those cases, you'll have to compute derivatives and their multi-dimensional analogues: [Jacobians](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant).

This is all well and good until your camera models (and thus your cost function) become so complex that deriving correct Jacobians becomes difficult to do by hand. Camera models with a large number of parameters (such as those that model lens distortion) and complex interplay between those parameters are particularly challenging and error prone.

Even if you manage to come up with an expression, the potential for mistakes remains high, and writing reliable tests to check your implementation may be difficult.

# Alternatives To Hand-Built Jacobians

So what do we do? There are a few options: avoiding computing some of the derivatives, numerical differentiation, automatic differentiation, and symbolic differentiation. Let’s explore each of these options to understand why they may (or may not) be the best alternative to deriving Jacobians the hard way. We’ll also finish this article with a Jupyter Notebook which shows how to generate Jacobian symbolically using SymPy. If you want to see that now, skip down to the bottom.

## Alternative One: Avoid Computing (Some) Derivatives

In many cases, it's desirable to have the fast converging behavior of [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method), which requires second derivatives. Yet frequently, the second derivatives and [Hessians](https://en.wikipedia.org/wiki/Hessian_matrix) (the matrix of second derivatives) of such problems are either too complicated to derive or too computationally complex to calculate.

What *is* reasonable is to compute first derivatives and Jacobians. The [Gauss-Newton algorithm](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm) uses a [Taylor series approximation](https://en.wikipedia.org/wiki/Taylor_series) to come up with an approximate expression for the Hessian that only uses the Jacobian, without any second derivatives. Other [Quasi-Newton methods](https://en.wikipedia.org/wiki/Quasi-Newton_method) use first derivatives combined with the finite differences between iterations to improve an approximation of the Hessian over iterations.

You can even choose an optimization algorithm that reduces or eliminates the use of derivatives. Since many of these algorithms have a greater degree of approximation, they may come at some cost: slower (as in number of iterations) convergence, slower runtime, or greater likelihood of divergence. Some techniques (e.g. [evolutionary algorithms](https://en.wikipedia.org/wiki/Evolutionary_algorithm), [particle swarm optimization](https://en.wikipedia.org/wiki/Particle_swarm_optimization)) don't even directly approximate the local shape of the optimization landscape, instead relying on other heuristics to guide a more direct search of the parameter space for optima.

Other techniques attempt to approximate the information provided by our derivatives. For instance, many optimization techniques that use derivatives have a "Secant" version (e.g. Secant Gauss Newton, Secant Dogleg) that uses the finite differences between iterations as a stand-in for a derivative.

Because of the trade-offs listed above and the availability of better alternatives, these types of approximations aren’t often the best choice for many computer vision problems.

## Alternative Two: Numerical Differentiation

The value of a derivative at a given point can be calculated using [numerical differentiation.](https://en.wikipedia.org/wiki/Numerical_differentiation) Calculating a numerical derivative essentially amounts to using the [definition of a derivative](https://en.wikipedia.org/wiki/Derivative#Definition). Instead of calculating a limit as the step size approaches zero, we simply choose a small step size. This approximation is perfect for linear and constant functions and can work well for functions with relatively low curvature in the neighborhood of the input and step size. This approach can produce low quality output for more complex functions.

For functions with multi-dimensional input, the process must be repeated for each input dimension to generate a Jacobian, which can be expensive if function evaluation is also expensive.

The choice of finite step size can be difficult as two opposing forces push the step size in opposite directions. The definition of derivative suggests using a very small step size since the step size should approach zero in the ideal case. Using excessively small step sizes with finite precision on a computer will produce highly inaccurate results. Common floating point implementations have poor precision for very small values even if the expected result is scaled reasonably well. The subtraction and division of very small numbers will cause a great deal of round off. Conversely, making the step size large moves the approximation away from its theoretical underpinnings, meaning we'll simply be calculating some [secant](https://en.wikipedia.org/wiki/Secant_line#Curves) instead of an approximation of the derivative.

Because of the high degree of non-linearity and the availability of better alternatives, this also generally isn't a great choice in computer vision applications.

## Alternative Three: Automatic Differentiation

Automatic Differentiation is a class of techniques which can simultaneously evaluate a function and its derivative (precisely, not an approximation) at a certain input. The programmer does not provide a formula for the derivative nor is a formula explicitly calculated (that would be Symbolic Differentiation — see below). They simply write the function they want a derivative of and upon evaluating the function, an AutoDiff system will come up with the value of the function and its derivative. While this might leave the programming in the dark as to what the expression for the derivative is, the value of the derivative at a given iteration is frequently all that is needed to implement an optimization algorithm.

There are a handful of different ways to do automatic differentiation and many implementations in many programming languages. Sometimes these require the use of a separate mathematics library which can become relatively intrusive, requiring a lot of changes to existing code. Others are more streamlined: the [Ceres optimization library's AutoDiff](http://ceres-solver.org/automatic_derivatives.html) feature makes use of template programming tricks to make the process comparatively unobtrusive.

Some languages even have Automatic Differentiation as a first class language feature. The [Enzyme LLVM plugin](https://enzyme.mit.edu/) hopes to provide AutoDiff at the compiler level, enabling AutoDiff as an extension to any language targeting that compiler (e.g. [oxide-enzyme](https://github.com/rust-ml/oxide-enzyme)).

There are a few caveats to consider. There can be issues with evaluating derivatives at discontinuities. Handling this may require additional intervention by the programmer. AutoDiff algorithms are best suited to operating on pure functions. It can be difficult for AutoDiff algorithms to reason about values introduced through statefulness (I/O, static and global variables etc.). A lot of these problems apply to the other classes’ differentiation technique as well, but the relatively hands-off nature of Automatic Differentiation can more easily lead to these problems being overlooked.

Automatic Differentiation is a great choice for those doing machine learning, computer vision or other optimization problems. Automatic differentiation will provide a precise evaluation of the derivative at a given value and is often just as fast as a derivative function written by a programmer. Keeping the caveats in mind, those with easy access to AutoDiff should consider it as the first thing to try given the high potential upside in developer productivity and low potential downside in performance.

## Alternative Four: Symbolic Differentiation

Symbolic Differentiation involves entering the function in question into a computer algebra package. The algebra package mechanically performs many of the same techniques (expression simplifications, the chain rule, lists of differentiation rules etc.) that a human might use to calculate an expression for the derivative. After the expression is calculated it may be evaluated from the optimization algorithm directly if possible or, if the Symbolic Differentiation tool is stand-alone, the expression can be transcribed into your library or program.

Given the advantages of Automatic Differentiation, it may not be clear why you'd choose Symbolic Differentiation. One obvious reason is that automatic differentiation simply may not be available for whatever reason on the platform you’re using for your optimization. Or perhaps, you simply want to see the full expression of the derivative for your own understanding.

Automatic Differentiation can also have edge cases (e.g. [removable discontinuities](https://mathworld.wolfram.com/RemovableDiscontinuity.html)) that are more easily addressed in a function you write explicitly, but doing so requires knowing the derivative expression. There are even some cases where the expression resulting from Symbolic Differentiation may be faster to evaluate than doing so with Automatic Differentiation. For example, there may be some constraints on your inputs (e.g. they’re all non-negative, or fall into some range) that you’re unable to express in the algebra package. These constraints can be used to simplify the output expression which results in a function that’s faster to evaluate.

Therefore, Symbolic Differentiation can also be a great choice for those doing machine learning, computer vision, or other optimization problems as an alternative to Automatic Differentiation.

---

# Computing Derivatives Using SymPy

At this point we’ve described the four alternatives to hand building a Jacobian, and we’ve concluded that Automatic Differentiation and Symbolic Differentiation are the best alternative for those working in machine learning, computer vision, and other areas of optimization. Let’s put this into practice in code so you can see directly how this might be implemented. For this exercise, we’ll move forward with Symbolic Differentation.

> As always, all the code found here is also hosted on the [Tangram Visions Blog Repository](https://gitlab.com/tangram-vision/oss/tangram-visions-blog).

We’ve built a Jupyter Notebook which shows how to generate Jacobian symbolically using SymPy, a symbolic math package for Python. Follow along as we dive in...

# In Conclusion

In this article we discussed the role of derivatives and their higher-dimension analogues: Jacobians as they pertain to the field of computer vision. We reviewed at a high-level the tools available for computing derivatives and provided an in-depth example using SymPy.

Building your own optimization system can be a rewarding process. With that said, we’d be remiss if we didn’t note that it is also a process that can take a highly experienced computer vision or perception engineer months to complete (ask us how we know). Tangram Vision’s flagship calibration module, [TVCal](https://www.tangramvision.com/sdk/multimodal-calibration), uses some of the above techniques to generate complex calibrations in a matter of seconds.