Fundamentals of Lambda Calculus
Lambda calculus is a formal system to study computable functions based on variable binding and substitution. Introduced in the 1930s by Alonzo Church, it is (in its typed form) the fundamental concept of functional programming languages like Haskell and Scala. Although the topic might seem very theoretical, some basic knowledge in lambda calculus can be very helpful to understand these languages, and where they originated from, much better. The goal of this article is to introduce some basic concepts of lambda calculus, which later on can be mapped to real world usage scenarios with functional programming languages.
Functions
Since lambda calculus is all about computable functions, a basic understanding of functions and its properties is useful.
A function, in its mathematical sense, describes the relation between a set of possible input and a set of possible output values. It associates values in the input set, the domain of a function, to exactly one value of the output set, the codomain of the function.
To give an example: The square function, i.e. a function which maps any real value to its product with itself, is usually notated like this:
$$ f: \mathbb{R} \rightarrow \mathbb{R} \newline x \mapsto x^2 $$
Here, one can see that the domain and codomain of the function are the real values $\mathbb{R}$. The function itself is defined by the equation $x \mapsto x^2$, which shows that each input value is mapped to the square value of itself.
Already mentioned was one important property for functions: The same input value for a function leads always to the same result. Multiple possible output values for a given input value are not allowed under this definition.
Notation
Lambda calculus is defined by three basic concepts: expressions, variables and applications.
The term expressions is the most broad one: it can be composed of names (also referred to as variables) and functions.
Functions in lambda calculus have a head, represented with the lambda symbol $\lambda$ plus a following variable name, and a body. The body of a function is an expression as well and separated from the head with a dot. Therefore, a simple function can be notated like this:
$$ \lambda x.x $$
In this given function, the variable $x$ is referred to as bound, since the identifier also appears in the function body. Variables occurring in the function’s body but not in the head are referred to as free variables, as $y$ in the following example:
$$ \lambda x.y $$
When a function is applied to a given argument, the bound variable, and each occurrence in the function’s body, will have the value of this argument. The term application therefore is used when a function is applied to an argument.
A lambda calculus term can be expressed in BackusNaur Form (BNF) as (see Rojas, 1998):
<expression> := <name>  <function>  <application>
<function> := λ <name>.<expression>
<application> := <expression> <expression>
Further notation details
When dealing with lambda calculus some further conventions about the notation are important to know:

Omitting outermost parentheses $$ (\lambda x.x) \equiv \lambda x.x $$

Application of functions to arguments is left associated, i.e. grouping takes place from left to right. This is especially useful with multiple arguments $$ \begin{aligned} \underbrace{\vphantom{\frac{1}{2},\frac{1}{3}}\lambda x.x}_{Function} \enspace \underbrace{\vphantom{\frac{1}{2},\frac{1}{3}}1}_{Argument} &\equiv \lambda x.x \enspace 1 \cr[2.5em] ((fx)y)z &\equiv f \thinspace x\thinspace y\thinspace z \end{aligned} $$

The body of a function extends to the right as far as possible
$$ \lambda x.xy \equiv \lambda x.(xy) \not\equiv (\lambda .x)y $$
$\alpha$Conversion
In lambda calculus the actual name or symbol used for bound variables carries no meaning, there is no semantic behind the variable name. Therefore:
$$ \lambda x.x \equiv \lambda y.y \equiv \lambda z.z $$
Renaming variables in lambda calculus is referred to as $\alpha$conversion. In the above example the functions are considered $\alpha$equivalent.
Although it seems like a rather simple concept, there are some pitfalls to consider. In the following example the two functions are not $\alpha$equivalent:
$$ (\lambda x.x+y) \not\equiv (\lambda y.y+y) $$
This is, because the former free variable $y$ is now a bound variable.
$\beta$Reduction
When applying a function to an argument, all occurrences of bound variables in the function’s body are replaced with the corresponding value.
Let’s apply the value $1$ to the function $\lambda x.x$:
$$ \lambda x.x\enspace 1 \newline \lambda 1.1 $$
The head of a function is only there to bind variables; it tells us which identifier we have to replace inside the function’s body. Since the bound variable $x$ is now applied to the argument $1$ the head can be eliminated:
$$ \lambda x.x\enspace 1 \newline \lambda 1.1 \newline 1 $$
The process of applying functions to arguments is referred to as $\beta $reduction and the function $\lambda x.x$ is the identity function; it just returns the given input value.
The $\beta$ normal form
The term $\beta$reduction describes the process of applying arguments to functions. When it is not possible to apply further arguments, the term is reduced to its $\beta$ normal form. Therefore, the term
$$ \lambda x.x \enspace \lambda y.y $$
is not reduced to its $\beta$ normal form. Applying the argument results in
$$ \lambda y.y $$
with no arguments left to apply: The term is now reduced to its $\beta$ normal form.
Why is this relevant? In the functional programming paradigm, a computer program is composed of computable functions which are reduced further and further until the program completes. In this regard, the $\beta$ normal form can be seen as an executed program.
The identity function
Since $\lambda x.x$ is now established as the identity function, it should return any given input value. As the argument of a function can be an expression as well, a corresponding example could look like:
$$ (\lambda x.x)\enspace (\lambda a.a) \newline[0.2em] \lbrack x := \lambda a.a \rbrack \newline[0.2em] (\lambda (\lambda a.a).(\lambda a.a)) \newline[0.2em] \lambda a.a $$
The bracket notation $\lbrack … \rbrack$ is usually used to indicate with what value the bound variable is substituted.
Currying  Application of multiple arguments
In lambda calculus a function has exactly one bound variable and can be applied to one argument. Functions expecting multiple arguments therefore have to be notated with a nested head. A function taking two arguments looks like this:
$$ \lambda x.(\lambda y.xy) $$
This notation is usually simplified as:
$$ \lambda xy.xy $$
Arguments applied to functions are processed from left to right, i.e. the first argument eliminates the outer $\lambda$, the second argument the next $\lambda$ and so on.
Giving an example, this process looks like this:
$$ \lambda x.(\lambda y.xy) \enspace 1 \thinspace 2 \newline[0.2em] \lbrack x := 1 \rbrack \newline[0.2em] \lambda 1.(\lambda y.1y) \enspace 2 \newline[0.2em] \lbrack y := 2 \rbrack \newline[0.2em] \lambda 1.(\lambda 2.12) \newline[0.2em] 12 $$
This process of applying multiple arguments to a function is referred to as currying, named after Haskell Brooks Curry.
Formally speaking, currying is a transformation from a function taking multiple arguments into a sequence of functions taking one argument.
$$ f: (X \times Y) \rightarrow Z \newline $$
The function $f$ relates ordered pairs of the Cartesian product $X \times Y$ to $Z$. Currying transforms this function into the following form:
$$ g: X \rightarrow (Y \rightarrow Z) $$
That is, the function $g$ takes an argument of $X$ and returns a function relating $Y$ to $Z$.
Combinators
Combinators are a class of lambda expressions which only combine arguments they are applied to. They are not introducing new (free) variables or other data.
Therefore, the following examples are combinators, as every term in the body occurs in the head:
$$ \lambda x.x \newline[0.2em] \lambda xy.y \newline[0.2em] \lambda yx.xy $$
While these are not:
$$ \lambda x.xy \newline[0.2em] \lambda x.y $$
Divergence  Never ending reduction
In computer science a computation is diverging if it never terminates. In lambda calculus this means that an expression can be reduced, but never reaches its $\beta$ normal form.
An example can be given with:
$$ \lambda x.xx \enspace \lambda x.xx \newline[0.2em] [x := \lambda x.xx] \newline[0.2em] \lambda (\lambda x.xx).(\lambda x.xx)(\lambda x.xx) \newline[0.2em] (\lambda x.xx)(\lambda x.xx) $$
Applying the argument $\lambda x.xx$ to each $x$ in the term $\lambda x.xx$ and omitting the head after the application results in the expression from the beginning. If a lambda expression can actually be reduced to its $\beta$ normal form it is converging.
Basic arithmetic in lambda calculus
Lambda calculus is capable to represent arithmetic operations, as it is expected from a formal system used to study computability. In order to that, first a representation of numbers has to be derived using the lambda calculus building block: the function. Moving along, some examples for basic arithmetic operations are given in the following sections.
Church encoding
In lambda calculus the only primitive is the function. Therefore, instead of notating natural numbers $\mathbb{N}$ with $1, 2, …$, a functional representation is required. This can be done using Church numerals.
First a short recap how the natural numbers $\mathbb{N}$ can be constructed: Starting from $0$, a successor function is needed to yield $1$. Applying the successor function $n$ times leads to the natural numbers:
$$ \begin{aligned} 0 \cr 1 & = \text{\small SUCC} \enspace 0 \cr 2 & = \text{\small SUCC}(\text{\small SUCC} \enspace 0) \cr &… \end{aligned} $$
Church numerals are representing the natural numbers $0, 1, 2, …$ with functions in the following form: $$ \begin{aligned} 0 &\equiv \lambda y.(\lambda x.x) \equiv \lambda yx.x \cr 1 &\equiv \lambda yx.y(x) \cr 2 &\equiv \lambda yx.y(y(x)) \cr &… \end{aligned} $$
The number $n\in\mathbb{N}$ is therefore defined as a function with two arguments, which applies the first argument $n$ times to its second argument.
The successor function
The successor function is defined as:
$$ \text{\small SUCC} \equiv \lambda wyx.y(wyx) $$
Applying the successor function to the Church numeral representation of $0$ therefore leads to:
$$ \lambda wyx.y(wyx) \enspace \lambda yx.x \newline[0.2em] \lambda yx.y((\lambda yx.x)yx) \newline[0.2em] \lambda yx.y(\lambda x.x)x \newline[0.2em] \lambda yx.y(x) \equiv 1 $$
Addition
Addition can be represented with the successor function. For example, calculating $2+1$ can be done with applying the successor function two times to $1$.
$$ \begin{aligned} 2 + 1 &\equiv \underbrace{\vphantom{\frac{1}{2},\frac{1}{3}}(\lambda yx.y(yx))}_{\equiv \thinspace 2} \enspace \underbrace{\vphantom{\frac{1}{2},\frac{1}{3}}(\lambda wmn.m(wmn))}_{\equiv \thinspace \text{SUCC}} \enspace \underbrace{\vphantom{\frac{1}{2},\frac{1}{3}}(\lambda uv.uv)}_{\equiv \thinspace 1} \cr &= (\lambda x.(\lambda wmn.m(wmn))((\lambda wmn.m(wmn))x)) \enspace (\lambda uv.uv) \cr &= ((\lambda wmn.m(wmn)((\lambda wmn.m(wmn))(\lambda uv.uv))) \cr &\equiv \text{\small SUCC} (\text{\small SUCC}(1)) \cr &\equiv 3 \end{aligned} $$
Note that in the example above $ \alpha $conversion was used to assign different variable names for each term. This makes the process of applying functions to complex arguments more feasible as identifying which variable is bound to which head is easier.
Multiplication
The multiplication operator can be defined as:
$$ \text{\small MULT} \equiv \lambda xyz.x(yz) $$
The product of $2$ by $2$ can then be calculated using the corresponding Church numeral as:
$$ \begin{aligned} 2*2 &\equiv \lambda xyz.x(yz) \thinspace 2\thinspace2 \cr &= \lambda z.2(2z) \cr &= \lambda z.(\lambda yx.y(y(x)))((\lambda uv.u(u(v)))z) \cr &= \lambda z.(\lambda yx.y(y(x)))((\lambda v.z(z(v)))) \cr &= \lambda z.(\lambda x.(\lambda v.z(z(v)))((\lambda v.z(z(v)))(x))) \cr &= \lambda z.(\lambda x.(\lambda v.z(z(v)))((\lambda v.z(z(v)))(x))) \cr &= \lambda z.(\lambda x.(\lambda v.z(z(v)))((z(z(x))))) \cr &= \lambda z.(\lambda x.z(z(z(z(x))))) \cr &= \lambda zx.z(z(z(z(x)))) \cr &\equiv 4 \end{aligned} $$
Typed lambda calculus
Lambda calculus, in the form discussed in this article, is a typefree theory. That means, that every argument may be applied to a function without further type restrictions. The identity function $\lambda x.x$ for example just returns any given input expression, even with the identity function itself as its input:
$$ \lambda x.x \enspace 1 = 1 \newline \lambda x.x \enspace \lambda y.y = \lambda y.y $$
For real life functional programming languages, it is actually the typed variant of lambda calculus which is used as the theoretical foundation. In this theory a lambda term can have a type assigned to it. To assign a type to the identity function one can write:
$$ id: A \rightarrow A $$
This reveals, that if the input value of the identity function is of type $A$ it returns an output with the type $A$ as well.
Going into more detail is out of scope of this article, but it should also be noted that there are different varieties or flavors of the typed lambda calculus with different implications. The original papers of Curry (1934) and Church (1940) allowed for the differentiation of two programming paradigms: implicit and explicit typing. In the former one can write a program without type annotations at all and it is the compiler that has to figure out which type a term gets assigned, while in the latter types have to be explicitly stated.
References
 A Formulation of the Simple Theory of Types  A. Church
 A Tutorial Introduction to the Lambda Calculus  R. Rojas, FU Berlin
 An Unsolvable Problem of Elementary Number Theory  A. Church
 BackusNaur Form (BNF)
 Church numerals
 Codomain of a function
 Domain of a function
 Functionality in combinatory logic  H. Curry
 Haskell Curry
 Haskell Programming Language
 Introduction to Lambda Calculus  H. Barendregt, E. Barendsen
 Lecture Notes on the Lambda Calculus  P. Selinger, Dalhousie University
 Scala Programming Language