In mathematics and computer science, a canonical, normal, or standardform of a mathematical object is a standard way of presenting that object as a mathematical expression. Often, it is one which provides the simplest representation of an object and allows it to be identified in a unique way. The distinction between "canonical" and "normal" forms varies from subfield to subfield. In most fields, a canonical form specifies a unique representation for every object, while a normal form simply specifies its form, without the requirement of uniqueness.[1]
The canonical form of a positive integer in decimal representation is a finite sequence of digits that does not begin with zero. More generally, for a class of objects on which an equivalence relation is defined, a canonical form consists in the choice of a specific object in each class. For example:
In computer science, and more specifically in computer algebra, when representing mathematical objects in a computer, there are usually many different ways to represent the same object. In this context, a canonical form is a representation such that every object has a unique representation (with canonicalization being the process through which a representation is put into its canonical form).[2] Thus, the equality of two objects can easily be tested by testing the equality of their canonical forms.
Despite this advantage, canonical forms frequently depend on arbitrary choices (like ordering the variables), which introduce difficulties for testing the equality of two objects resulting on independent computations. Therefore, in computer algebra, normal form is a weaker notion: A normal form is a representation such that zero is uniquely represented. This allows testing for equality by putting the difference of two objects in normal form.
Canonical form can also mean a differential form that is defined in a natural (canonical) way.
Definition
Given a set S of objects with an equivalence relationR on S, a canonical form is given by designating some objects of S to be "in canonical form", such that every object under consideration is equivalent to exactly one object in canonical form. In other words, the canonical forms in S represent the equivalence classes, once and only once. To test whether two objects are equivalent, it then suffices to test equality on their canonical forms.
A canonical form thus provides a classification theorem and more, in that it not only classifies every class, but also gives a distinguished (canonical) representative for each object in the class.
Formally, a canonicalization with respect to an equivalence relation R on a set S is a mapping c:S→S such that for all s, s1, s2 ∈ S:
s1Rs2 if and only if c(s1) = c(s2) (decisiveness), and
sRc(s) (representativeness).
Property 3 is redundant; it follows by applying 2 to 1.
In practical terms, it is often advantageous to be able to recognize the canonical forms. There is also a practical, algorithmic question to consider: how to pass from a given object s in S to its canonical form s*? Canonical forms are generally used to make operating with equivalence classes more effective. For example, in modular arithmetic, the canonical form for a residue class is usually taken as the least non-negative integer in it. Operations on classes are carried out by combining these representatives, and then reducing the result to its least non-negative residue.
The uniqueness requirement is sometimes relaxed, allowing the forms to be unique up to some finer equivalence relation, such as allowing for reordering of terms (if there is no natural ordering on terms).
A canonical form may simply be a convention, or a deep theorem. For example, polynomials are conventionally written with the terms in descending powers: it is more usual to write x2 + x + 30 than x + 30 + x2, although the two forms define the same polynomial. By contrast, the existence of Jordan canonical form for a matrix is a deep theorem.
History
According to OED and LSJ, the term canonical stems from the Ancient Greek word kanonikós (κανονικός, "regular, according to rule") from kanṓn (κᾰνών, "rod, rule"). The sense of norm, standard, or archetype has been used in many disciplines. Mathematical usage is attested in a 1738 letter from Logan.[3] The German term kanonische Form is attested in a 1846 paper by Eisenstein,[4] later the same year Richelot uses the term Normalform in a paper,[5] and in 1851 Sylvester writes:[6]
"I now proceed to [...] the mode of reducing Algebraical Functions to their simplest and most symmetrical, or as my admirable friend M. Hermite well proposes to call them, their Canonical forms."
In the same period, usage is attested by Hesse ("Normalform"),[7]Hermite ("forme canonique"),[8]Borchardt ("forme canonique"),[9] and Cayley ("canonical form").[10]
"In Mathematics, denotes a form, usually the simplest or most symmetrical, to which, without loss of generality, all functions of the same class can be reduced."
Examples
Note: in this section, "up to" some equivalence relation E means that the canonical form is not unique in general, but that if one object has two different canonical forms, they are E-equivalent.
Large number notation
Standard form is used by many mathematicians and scientists to write extremely large numbers in a more concise and understandable way, the most prominent of which being the scientific notation.[11]
The symbolic manipulation of a formula from one form to another is called a "rewriting" of that formula. One can study the abstract properties of rewriting generic formulas, by studying the collection of rules by which formulas can be validly manipulated. These are the "rewriting rules"—an integral part of an abstract rewriting system. A common question is whether it is possible to bring some generic expression to a single, common form, the normal form. If different sequences of rewrites still result in the same form, then that form can be termed a normal form, with the rewrite being called a confluent. It is not always possible to obtain a normal form.
Lambda calculus
A lambda term is in beta normal form if no beta reduction is possible; lambda calculus is a particular case of an abstract rewriting system. In the untyped lambda calculus, for example, the term does not have a normal form. In the typed lambda calculus, every well-formed term can be rewritten to its normal form.
In graph theory, a branch of mathematics, graph canonization is the problem of finding a canonical form of a given graph G. A canonical form is a labeled graph Canon(G) that is isomorphic to G, such that every graph that is isomorphic to G has the same canonical form as G. Thus, from a solution to the graph canonization problem, one could also solve the problem of graph isomorphism: to test whether two graphs G and H are isomorphic, compute their canonical forms Canon(G) and Canon(H), and test whether these two canonical forms are identical.
Computing
In computing, the reduction of data to any kind of canonical form is commonly called data normalization.
In the field of software security, a common vulnerability is unchecked malicious input (see Code injection). The mitigation for this problem is proper input validation. Before input validation is performed, the input is usually normalized by eliminating encoding (e.g., HTML encoding) and reducing the input data to a single common character set.
Other forms of data, typically associated with signal processing (including audio and imaging) or machine learning, can be normalized in order to provide a limited range of values.
^In some occasions, the term "canonical" and "normal" can also be used interchangeably, as in Jordan canonical form and Jordan normal form (see Jordan normal form on MathWorks).
^The term 'canonization' is sometimes incorrectly used for this.