A Different Way To Think about Plane Fitting

We're using plane normals in not-so-normal ways

Paul Schroeder
,
Senior Perception Engineer

Jul 23, 2025

We've put together a jupyter notebook for this blog post. It can be found here: https://colab.research.google.com/drive/144pDQJARtzZ_GotbrLOVJfMM_ZU6QgYJ#scrollTo=a6234ff7-4fba-495b-89d7-2b3cd490188d

As a 3D computer vision professional, I’ve been asked to fit a plane to a set of 3D points many times in my career. In fact, the take-home question in the interview which led to my first job out of school was on this very topic. The plane fitting problem arises naturally in computer vision: sometimes the plane itself is the salient feature of interest, or we’re looking for the ground in a LiDAR scan so it can be rotated and viewed in a legible way. Either way, it’s a common enough primitive that it’s worth considering how we model and fit them. While this is well-trod territory, our recent blog post about optimization, 3D rays, and Lie groups has got me thinking about the problem a bit differently.

This article assumes some understanding of 3D computer vision, linear algebra and least-squares optimization. Of particular interest is the use of manifolds to represent quantities like 3D rotations and transformations.

Fitting Planes the "Usual" Way

First, let’s briefly consider how we parameterize a plane in 3D. A common way is to indicate the direction of the plane’s normal, represented as a 3D unit-vector, and a distance along the normal where we find the plane.

Therefore, a point $x\in\mathbf{R}^3$ on the plane satisfies the following equation:

$$ \hat{n}^Tx+d=0 $$

With this parameterization, there’s an ambiguity in sign. If we flip the sign of the normal, we can also flip the sign of the distance to get the same plane.

We can now use this formulation to fit a plane to a point sample. Let's use this one:

The way I’ve always done this was with a technique I learned in school. First, we calculate the centroid of all points, subtract it away from the set giving us a zero-mean set of points, centered about the origin.

$$ x_{centroid} = \frac{1}{n}\sum_{i}^{n}x_i \\ x_i \in \mathbf{R}^3 $$

Next we want to analyze the resulting zero-mean points to identify the plane normal. We can perform what is essentially Principal Components Analysis to do so. A 3D plane exists in three-dimensions, but it is fundamentally a 2D object. Therefore, you’d expect most or all of the spread or variance of a 3D planar point-cloud to be “explained” by a 2D subspace. PCA identifies directions which are weighted by how much of the data's variance is in that direction. Practically, PCA is the process of taking eigendecomposition of the sample covariance matrix. The eigenvectors give us these directions and the associated eigenvalues tell us how much of the data's variance is in that direction.

$$ COV_{sample}= \sum_{i}^{n}(x_i - x_{centroid})(x_i - x_{centroid})^T \in \mathbf{R}^{3\times3} $$

So for the ideal noiseless plane fitting problem, we'd expect two non-zero eigenvalues and one zero eigenvalue. The eigenvectors associated with the non-zero eigenvalues will span the subspace formed by the plane; the eigenvector associated with the zero eigenvalue, i.e. the one that spans the null-space, will correspond to the plane’s normal. In practice, the presence of noise will cause the otherwise zero eigenvalue associated with the plane's normal to have some positive value, but for a mostly planar point cloud it will be dominated by the other two eigenvalues.

You can see this at work in the figure below. The eigenvectors are shown as colored arrows and are scaled by their associated eigenvalues. Our two large-magnitude eigenvalues/vectors are represented in red and green; they correspond to the dimensions of most "spread" in the point distribution. The blue vector, meanwhile, only captures a fraction of the variance in this distribution. That blue vector is our plane normal!

So that explains how to fit $\hat{n}$. To fit $d$ you take a point known to be on the plane (like the centroid might be, assuming there isn’t a lot of noise or bias), and plug that into the above plane equation and solve.

What To Do When the “Usual” Breaks Down

PCA is a very effective technique. It’s simple to implement, easily-explainable and works in the presence of noise, provided there aren’t outliers—e.g. the noise is zero-mean Gaussian noise. To handle outliers, you can wrap this technique in RANSAC. This is effective since fitting planes to small samples and testing for model fitness are fast and simple operations. But RANSAC has its drawbacks. A big one being that the time needed to fit the model is a function of the outlier rate— meaning it can be slow in the presence of many outliers. There are robust versions of PCA, but as far as I understand, they’re not straightforward extensions of the base algorithm.

What I’d really like is a way to turn this into a parametric, unconstrained, non-linear, least-squares problem. Least-squares and non-linear least squares are the foundation of so many classic 3D vision techniques (ICP, Bundle Adjustment, Zhang’s Calibration, KLT Tracking, Pose Graph Optimization— I could go on).

There are many advantages to using non-linear least-squares:

  • There’s a good chance that in a given computer vision system, you may already be optimizing iterative, non-linear, least squares problems, meaning that there’s already a system in place to optimize and solve.

  • It’s straightforward to jointly estimate multiple parameters at the same time, producing more consistent results. For example, if we decided to make the 3D sample points parameters themselves (because they’re e.g. the result of triangulating between two views) and optimize them alongside the plane parameters, we may produce a more consistent result than estimating the 3D points and separately, later estimating the plane parameters.

  • Extending the problem to weighted least squares opens up the possibility of downweighting low-confidence inputs (e.g the 3D sample points) and introducing priors on parameters (e.g. the plane normal)

  • It’s relatively easy to use robust cost functions to introduce outlier rejection to a non-linear least squares problem.

Plane Fitting the Least-Squares Way

Posing the least squares plane fighting problem is straightforward. We’ll use a point-to-plane cost function, not unlike what you see with some variants of ICP.

$$ \underset{\hat{n},d}{\text{argmin}}\sum_{i}\left\lVert \hat{n}^Tx_i-d \right\rVert_{2}^{2} $$

You may notice that this is actually a linear cost function. The trouble is that the normal needs to be a unit vector. So without further modifications, this becomes a constrained optimization problem, and that is a whole can of worms. We’ve run into this problem before when trying to optimize over 3D transformations. Rotation matrices and quaternions have similar constraints that make them unsuitable for use directly in optimization problems.

Unit Vector As Manifold?

The solution is to optimize on-manifold. For rotations, SO(3)’s status as a Lie group makes this relatively easy. We just need an expression for the exponential- and logarithm-maps and some corresponding jacobians, and we’re good to go.

So can we do the same with a 3D unit vector? The space of 3D unit-vectors is the “Unit 2-Sphere”, or S2, which corresponds to an ordinary sphere as you or I would conceptualize. However, as a previous blog post (see “Symmetry of the Sphere”) points out, S2 is not a Lie group. Recall that the Hairy-Ball Theorem states, “… that there is no nonvanishing continuous tangent vector field on even-dimensional n-spheres”. S2 is an n-sphere of dimension 2, so the theorem unfortunately applies.

However, just because we’re not optimizing over a Lie group doesn’t mean we’re stuck. You absolutely can optimize over a manifold that's not Lie group, but we'll save that for another article. Let’s see if we can leverage rotations.

Optimizing Over a Subset of Rotations

Even before learning about S2’s disqualification from being a Lie-group and how to work around that, I had the following geometrical intuition about how to use manifolds to reason about 3D unit vectors: We can parameterize a unit vector by starting with a canonical unit vector, let's say, the Z axis, and rotate it so that it points in a different direction to get a new unit vector. We can then use the rotation itself to parameterize the unit vector.

This animation shows how a vector (red) in the XY plane (gray) maps to a point on the unit sphere (green) by rotating (purple) the Z axis (blue).

Our only issue is that a general rotation in 3D has three degrees of freedom, but a unit vector has just two— making this choice an overparameterization. The trick is to recognize that a rotation about the canonical unit vector doesn’t change that vector. So we can think of S2 as mapping to an equivalence class of SO(3) where each member of S2 has a corresponding class of rotations that move the canonical vector to that point on the sphere, allowing for any roll around the canonical vector.

Practically speaking, this means we can fit unit vectors (like plane normals) by optimizing on the SO(3) manifold. When optimizing on SO(3), which is a Lie group, one typically works with the local tangent-space coordinates (this would be good time to review “A micro Lie theory”) of which there are three parameters. In this case, with the Z-axis as our starting point, we’ll set the third component to be zero since it has no effect on the resulting vector. That third parameter would have the effect of rotating a point that sits on the axis of rotation (which does nothing). This gives us two parameters ( $r_x, r_y$) with which to characterize the unit vector.

Unit Vector As Manifold (Really)

The local tangent-space coordinates for manifolds representing spatial rotations have a very useful property of being interpretable as an axis-angle representation of a rotation. The magnitude of the tangent-space coordinate is the angle, and the direction is the angle about which the rotation occurs. Hopefully it’s clear that the tangent space of the Z-axis in 3D is the XY plane. Thus, we can interpret the 2D tangent-space coordinates of our rotation as a vector in the XY plane about which we rotate the Z-axis.

Any vector in the XY plane maps to a corresponding 3D unit vector. The mapping isn’t unique since we can keep wrapping around the sphere, but it is “smooth”. A change in the tangent space coordinates can be related to a change in the resulting unit vector through the jacobian of the exponential map. The big trick to optimizing on-manifold is recognizing that the tangent space is just an unconstrained vector space, so we get to do all the things we’re used to doing: calculus, optimization techniques like Gauss-Newton etc.

Fitting the Plane

Let’s see how to put all this into practice. Recall the original optimization problem:

$$ \underset{\hat{n},d}{\text{argmin}}\sum_{i}\left\lVert \hat{n}^Tx_i-d \right\rVert_{2}^{2} $$

Let’s replace $\hat n$ with the rotation we described earlier

$$ \underset{R,d}{\text{argmin}}\sum_{i}\left\lVert (R\hat{z})^Tx_i-d \right\rVert_{2}^{2} $$

We’re not actually going to optimize over a rotation matrix $R$, so let’s introduce $r_x, r_y$ which parameterize our SO(3)/SO(2) rotation and the exponential map $\text{exp}(\cdot)$ which maps those parameters to a rotation matrix or quaternion etc. that we use to apply the rotation.

$$ \underset{r_x, r_y,d}{\text{argmin}}\sum_{i}\left\lVert (\text{exp}([r_x r_y]) \hat{z})^Tx_i-d \right\rVert_{2}^{2} $$

Now finally, our set of parameters $r_x, r_y,d$ form an unconstrained parameter vector and the problem is now a regular, unconstrained, nonlinear, least squares problem. To solve it, we could use something like Gauss Newton. We just need to evaluate the residual vector (the part inside the norm) and the jacobian of the residual with respect to the parameters.

In the jupyter notebook for this post, I generated some random planar points and showed how to use the above technique to fit a plane. The optimization problem is solved with SciPy’s least squares solver — simply pass in the residual, jacobian and data points.

A plane fit to some points.
The same points and fit plane viewed on-edge

Now we can try adding some gross outliers. As you can see, our fit plane no longer lines up very well with the data:

Both the PCA-based approach and the non-robust least-squares approaches struggle to fit the plane well in the presents of outliers

However, now that we're using least-squares, we can easily make use of robust least squares fitting. Simply by changing one parameter, I can instruct SciPy to use a robust cost function (”loss” in their parlance), giving us robustness to outliers with a minimum of effort or increase in computational cost.

Same points with original fit (blue disc) and robust fit (reddish disc)
Same points and fits viewed on edge

Even if you’re working outside a batteries-included library like SciPy, adding a robust cost to an iterative least squares system is fairly straightforward, certainly more simple than turning PCA into Robust PCA.

On to S2…

We’ve shown that plane fitting can be an on-manifold least squares problem. Despite S2 (the space of 3D unit vectors) not being a Lie group, we can still use SO(3) (which is a Lie group) to, in effect, fit a 3D unit vector and thus a plane normal.

In Part II, we’ll use the same plane fitting problem to show how to optimize over S2 more directly, even though it is not a Lie group.

Further Reading

Tangram Newsletter

Subscribe to our newsletter and keep up with latest calibration insights and Tangram Vision news.

Tangram Newsletter

Subscribe to our newsletter and keep up with latest calibration insights and Tangram Vision news.

Tangram Newsletter

Subscribe to our newsletter and keep up with latest calibration insights and Tangram Vision news.