Derivative as local linearization

Publish at: 02 May 2026

How can continuous change be replaced by linear structure locally?

What is the minimal structure that captures infinitesimal behavior?

A curve may be represented by an equation, and a moving point may be recorded at different positions, but continuous change asks for something more precise than comparison between two separated states.

The previous step introduced the central idea: curved behavior can often be understood by looking for linear behavior under local magnification.

Now that idea needs a name.

The derivative is the local linear approximation to a changing process at a point.

That sentence is deliberately structural. It does not begin with a formula for slope, even though slope is the first representation most people meet. Slope is what the derivative looks like when both the input and output are one-dimensional and coordinates have already been chosen. The deeper object is a linear map that describes how small input changes produce first-order output changes.

Start with a function f. At an input x, change the input by a small amount h. The output changes from f(x) to f(x + h), so the actual output displacement is

f(x + h) - f(x)

The derivative asks whether this displacement has a stable linear part.

If there is a linear map L such that, for small h, the output change is captured to first order by L(h), then L is the derivative of f at x.

x ──f──▶ f(x)
│          │
│ local    │ local
│ linear   │ linear
▼          ▼
x+h ──▶ f(x)+L(h)

The diagram says that the actual change of the function is being replaced, locally, by a linear rule. The point x is sent to f(x). A nearby point x + h is approximated by sending the small displacement h through the linear map L.

The curved process remains curved. The derivative records the linear part visible at one point.

This is why the derivative belongs naturally after linear algebra. A nonlinear function may have no global linearity at all. It may bend, accelerate, flatten, or turn. But near a well-behaved point, its first-order behavior can still be organized by a linear map.

For a real-valued function of one real variable, that linear map is especially simple. Every linear map from the real line to itself has the form

h |-> a h

for some number a. That number is the familiar derivative value, often written f'(x).

In that setting, the derivative is represented by a slope.

But the slope is a coordinate-level presentation of the structural fact. The derivative itself is the map

h |-> f'(x) h

which takes an input displacement and returns the corresponding first-order output displacement.

This distinction becomes unavoidable in higher dimensions. If a function sends points in R^n to points in R^m, a small input change is a vector, and the first-order output change is another vector. The derivative at a point is then a linear map from input displacements to output displacements. Once bases are chosen, that map can be represented by a matrix, often the Jacobian matrix.

The same discipline from linear algebra returns again:

the derivative is the structure
the matrix is a representation after coordinates and bases have been chosen
the invariant content is the local linear behavior, not the particular array of numbers used to write it

This keeps calculus from collapsing into coordinate manipulation. Coordinates let us compute derivatives, just as they let us compute in analytic geometry. But the structural idea is independent of any one coordinate description.

At a point on a curve, the tangent line is the geometric face of the same idea. The curve may bend globally, but near the point the tangent line captures its first-order direction. If we magnify the curve closer and closer to the point, the tangent is what the curve increasingly resembles, provided the curve is differentiable there.

The tangent line is therefore more than a visual aid. It is the geometric expression of local linearization.

curve near x
     )
    )
---*---------  tangent
   x

In one dimension, this tangent is determined by a slope. In higher-dimensional settings, the local linear object may be a tangent plane or a linear map between tangent spaces. The basic pattern remains the same: replace a curved process near a point by the linear structure that best captures its first-order behavior.

The word "best" has to be handled carefully. Calculus uses an approximation, but not every approximate line deserves to be called a derivative. The linear part must absorb all first-order behavior, leaving only an error that becomes negligible compared with the size of the input change as the input change shrinks.

In compact form, the requirement can be written as

f(x + h) = f(x) + L(h) + error(h)

where the error is smaller than the first-order scale of h near zero.

That error condition is the boundary between ordinary approximation and differentiation. Many lines may look reasonable from far away. The derivative is the one whose remaining error disappears at the correct scale under local refinement.

The full rigor of that shrinking process belongs to analysis, where limits, convergence, and completeness become central objects. At the calculus level, the structural role is already clear: the derivative is the linear part that survives infinitesimal inspection.

This explains several familiar facts at once.

If a function is already linear, its derivative is the same linear map everywhere. There is no hidden curvature to remove. The global rule and the local rule agree.

If a function is nonlinear, the derivative can change from point to point. Each point may have its own local linear description. Calculus therefore studies not just one linear approximation, but how local linear approximations vary across a domain.

If a function has a corner, cusp, jump, or wild local behavior, the derivative may fail to exist. The failure means that no single linear map captures the first-order behavior at that point. The local magnification does not settle into a coherent linear structure.

This gives differentiability a structural meaning.

A differentiable function is one whose behavior near the point can be compressed into a linear map plus higher-order error.

The derivative also changes what counts as an invariant.

Under local linearization, the exact curved path is not preserved. What is preserved is first-order behavior: rate, direction, tangency, and the linear relation between small changes in input and small changes in output. These are local invariants. They are attached to a point and describe how the function behaves there, rather than across an entire interval or region.

This local nature is new. Earlier structures often came with global transformations. A group operation applied to the whole object. A linear map acted on an entire vector space. A rigid motion moved a whole geometric figure while preserving distance and angle.

The derivative is attached to a point.

That pointwise attachment matters because it creates a layered structure. There is a global function f, and over each suitable point x there is a local linear map Df_x. The function moves points. The derivative moves infinitesimal displacements near those points.

global level:  x ──f──▶ f(x)

local level:   h ──Df_x──▶ Df_x(h)

The global and local levels must fit together. The derivative belongs to the local level, but it is not independent of the global function. It is extracted from how the function behaves near a point.

This is the structural heart of differential calculus.

The objects are points in spaces where continuous variation can be inspected locally, together with functions between such spaces. The local morphisms are linear approximations attached to points. What composes at this level will be local linear maps, provided they are compatible with composition of the original functions. The invariants are first-order behavior, tangent direction, local rate, and the linear relation between infinitesimal input and output changes. The defining relation is local approximation:

f(x + h) = f(x) + L(h) + higher-order error

with L linear and the error negligible at first order. Equality of derivative values is therefore equality of local linear behavior, while equality of formulas depends on representation. Global accumulation, integration, and the rigorous foundation of limits remain outside the frame for now.

The derivative is the linear map that replaces change locally.

Once this is clear, the next question is forced by composition.

If f carries x to y, and g carries y to z, then g . f carries x to z. Each function may have its own derivative at the relevant point.

How should the derivative of the composite be related to the derivatives of the pieces?

And why must local linearization respect composition?

References

Continue exploring