A brief introduction to Lambda Calculus, Interaction Combinators, and how they are used to parallelize operations on Bend / HVM.
If you are reading this article, you probably recently heard about Bend, a new programming language that aims to be massively parallel but without you worrying about things like threads creation, and other common parallel programming terms.
If you do not know what I am talking about, watch the video below:
They claim “it feels like Python, but scales like CUDA”. As an enthusiast of parallel programming, it caught my attention immediately. After some reading, I found that Bend is powered by HVM (Higher-Order Virtual Machine), the runtime where the magic happens. That is, in a Bend program, the Bend code is compiled into HVM, which does some magic to run this program in an inherently parallel manner. In some way, all operations that can be parallelized are automatically parallelized by this runtime.
Straight away, I wanted to learn how all of the HVM magic happens. How can all of this be possible? After some reading, I learned that the magic behind HVM is mostly based on Interaction Combinators, which is a model of computation based on graphs and graphical rules developed by Yves Lafont in the 1990s. So, I opened the Lafont paper, rolled some pages and saw this:
I felt like I was in that movie Arrival, where the aliens try to communicate with us using a strange symbolic language.
That’s when I closed the laptop and gave up on trying to understand that.
A while later, when I turned on my machine again, those symbols were there, staring at me, as if they were asking me to be understood.
After a lot of reading, watching videos and alien help, I somehow started to understand this thing.
The purpose of this article is to briefly clarify how all the HVM magic happens and facilitate your further understanding by explaining some common terms you might find during your learning journey. In order to do that, we need to first learn some basic concepts.
The Lambda Calculus is a formal system in mathematical logic created by Alonzo Church in the 1930s. Its purpose was to investigate some aspects of logic theory from a purely mathematical point of view. Church was aiming to define what is computability in mathematical terms (what can be calculated using a set of fundamental rules). Let’s start:
You probably have already used Lambda Calculus before. For example, imagine a function to multiply a number by two:
On Python, you can express a named function for that like this:
def multiply_by_two(x):
return 2 * xprint(multiply_by_two(2))
# 4
But you can also express that using lambdas, which are basically an anonymous function:
multiply_by_two_lambda = lambda x: x * 2print(multiply_by_two_lambda(2))
# 4
So, let’s go back to mathematics. In Lambda Calculus, you express this same function using the notation λx.2x, where x is the the parameter and 2x the body.
λ.
This is called an abstraction. An abstraction λx.t denotes an anonymous function that takes a single input variable x and returns t. For example, λx.(x²+2x) is an abstraction representing the function f defined by f(x) = x²+2x. So, an abstraction basically defines a function but does not invoke it.
You can also have a term like λx.(x+y), which is the definition of f(x) = x+y. Here, y has not been defined yet. The expression λx.(x+y) is a valid abstraction and represents a function that adds its input to the yet-unknown y.
If using λx.2x defines a function, (λx.2x)a “calls” a function with argument “a”. That is, we basically substitute the variable “x” with “a”.
f(x) = 2x
f(2) = 4
This is the same as:
λx.2x
(λx.2x)2 = 4
This is called an application. We are “applying” the abstraction (λx.2x) to the number 2.
You can also apply a lambda expression to another lambda expression, such as nested functions:
And you want g(f(x)):
You can express this using lambda expressions:
λx.2x
λx.x³
=> (λx.x³)(λx.2x)
Do not try to solve it for now, first understand the notation, and further I will show you how to solve it!
It’s important to not confuse the parenthesis. For example:
1 — λx.((λx.x)x) is an abstraction (function definition).
2 — (λx.(λx.x))x is an application (funtion application).
On the Example 1, we are defining a function λx.B, where B is the expression (λx.x)x, which is the anonymous function λx.x applied to the input x.
On Example 2, we are applying the anonymous function λx.(λx.x) to the input x.
Applications can also be represented as f x (applying function f to the variable x).
We can also represent functions with n parameters using Lambda Calculus. This can be done by using nested functions, each taking a single parameter: f(x,y,z) → λx.λy.λz
Thus, f(x, y, z) = x + y + z can be expressed by the abstraction:
λx.λy.λz.(x + y + z).
Using this abstraction we can construct applications:
(λx.λy.λz.(x + y + z))1 2 3 => 6
When studying Lambda Calculus, there are also two common terms you might find:
Alpha conversion (α-conversion) and Beta reduction (β-reduction)
Alpha conversion
When evaluating more complex lambda expressions, you may obtain some expression like this:
In this expression, the inner x could be mistakenly interpreted as the outer x. In order to avoid that, we can rename the inner variable x:
(λx.(λy.y+y)x)
This process is what it is called α-conversion, the name seems something complex, but it is this simple renaming of a variable to avoid mistakes.
λx.x → λy.y (α-conversion)
Both expressions represents the same function. The α-conversion does not change the function’s behavior, just the variable name.
Beta reduction
β-reduction is simply the process of calculating the result from an application of a function to an expression. For instance:
(λx.xy)z
On the output xy, substitute every occurrence of x by z
= zy
You also might found the following notation:
(λ param . output)input => output (param := input) => result
This basically means that to get the result, you look at the output and substitute every occurrence of param by the input. In the previous expression:
(λx.xy)z => (xy)(x := z) => zy
Example
Going back to our example f(x) = 2x; g(x) = x³ and we want g(f(1)).
In order to not mix up terms incorrectly, we can rewrite:
f(x) = 2x and g(y) = y³
Then, we substitute f within g:
g(f(1)) = (f(1))³
=> g(f(1)) = (2*1)³
=> 8*x = 8.
Now with Lambda Calculus:
λx.2x
λx.x³
=> (λx.x³)((λx.2x)1)
First, apply α-conversion in order to not mix up things:
(λy.y³)((λx.2x)1)
Then, β-reduction on the inner most expression (λx.2x)1:
(λ param . output)input => output (param := input) => result
(λx.2x)1 => 2x (x := 1) => 2*1 = 2.
Then, β-reduction again on the resulting expression (λy.y³)2:
(λ param . output)input => output (param := input) => result
(λy.y³)2 => y³(y := 2) => 2³ => 8.
We got the same result! That’s brilliant right?
_________________________________________________________________
If you’re starting to feel confused at this point, please don’t close the article!! I understand it can be challenging at first, but I promise you, when you sleep today, you will wake up with things more clear! So, keep reading and enjoy the rest of the article 🙂
_________________________________________________________________
Some years after the Lambda Calculus, Alan Turing introduced the concept of Turing machines, an abstract mathematical model of a computer capable of simulating any algorithmic process that can be described mathematically. Building on the work of both Church and Turing, it was established that there exists a theoretical equivalence between Lambda Calculus and Turing machines. This equivalence means that, despite not having numbers or booleans, any problem computable by a Turing machine can also be expressed in Lambda Calculus terms. Thus, we can express any computable algorithm using Lambda Calculus!! Let’s understand how this can be done.
Numbers
I mentioned that Lambda Calculus does not have numbers, only lambda expressions. But then how could I have written things like λx.(x+2) before?
Well, I lied to you…
But don’t get angry, it was only to facilitate understanding
Now, let’s understand how Church represented numbers using only lambda expressions:
The Church representation of numerals is a bit complicated to understand at the beginning but it will get clearer further.
The chuch numeral n
is defined as a function that takes a function f
and returns the application of f
to its argument n
times.
0: λf.λx.x (applies f
to x
0 times)
1: λf.λx.f x (applies f
to x
1 time)
2: λf.λx.f(f x) (applies f
to x
2 times)
3: λf.λx.f(f(f x)) (applies f
to x
3 times)
and so on…
It seems confusing, but after some though, it starts to make sense. The church numeral n
simply means to do anything n
times.
A good way to illustrate this is to remember that the idea of numbers comes from the process of counting. For example, imagine that you have a stair with 20 steps. When it is said that to climb the stairs you have to go up 20 steps, it means that you will climb one step 20 times, right? That’s exactly the same idea of Church encoding: You have a function f
that means ‘go up one step’ and if you want to express the idea of 20 steps, you apply f
20
times.
Numerical Operations
After defining the Church numerals, we can define the numerical operations. The first one is to define a successor function s. It is basically a function that increments a Church numeral by 1. Thus, the successor is a function that takes a Church numeral representing the number n
and returns a Church numerical representation of n+1
.
For example, if λf.λx.f(f x) represents the number 2, if we apply the successor function s to that, we will get λf.λx.f(f(f x)) (Church numerical representation of number 3).
The successor function is defined as follows:
s(n) =λn.λf.λx.f((n f) x), where n
is the Church numeral n
.
Let’s analyze it:
- Input:
n
(a Church numeral),f
(a function), andx
(an argument) - Application of n: The term
(nf)x
represents the application of the functionf
to the argumentx
n
times. - Additional Application: The term
f((nf)x)
appliesf
one more time to the result of(nf)x
.
If the Church numeral n
means to do something n
times, s n
means do something n+1
times.
Let’s for example apply the successor function to the Church numeral for 1:
Church numeral for 2: λf.λx.f(f x)
Applying successor of this expression:
We know that s(n) = λn.λf.λx.f((n f) x)
Our n = 2 = λf.λx.f(f x)
Thus, we apply the successor function on that:
s(λf.λx.f(f x)) = ( λn.λf.λx.f((n f) x) )( λf.λx.f(f x) )
Using the first β-reduction on the application expression:
(λ param . output)input => output (param := input) => result
( λn.λf.λx.f((n f) x) )( λf.λx.f(f x) ) => λf.λx.f((n f) x) (n := λf.λx.f(f x))
=> λf.λx.f((λf.λx.f(f x) f x)
Now, let’s analyze the inner application expression:
(λf.λx.f(fx) f x
The underlined term is the Church numeral 2, right? And it can be read as:
Given a function f, apply f 2 times to its argument, which is x.
(λf.λx.f(fx) f x turns into f(f x)
Substituting on our expression λf.λx.f((λf.λx.f(fx) f x), we get:
λf.λx.f(f(f x)), which is exactly the Church numerical representation of the number 3!
So, we just defined the successor lambda operation. By using this idea, if we define 0 = λf.λx.x, we can obtain the other Church numerals using the successor function recursively:
1 = s 0
2 = s(s 0)
3 = s(s(s 0))
…
We can take advantage of this functions to implement other operations such as addition and multiplication.
The addition of two numbers m + n is defined as:
ADD(m, n) = λm.λn.λf.λx.(m f)((n f) x)
Thus, if we define m and n as the Church numerical representations of 3 and 4, respectively, and then apply this ADD function, we will get the Church numerical representation of 7.
The same logic applies to multiplication of two numbers m * n:
MUL(m, n) = λm.λn.λf.λx.m (n f)
Try to apply yourself anytime!
Booleans
Before we get into the Church definitions, let’s think of booleans as some operation that we can do for selection. Among two options A and B, depending on some condition, we select A or B.
IF (CONDITION) THEN (RESULT A) ELSE (RESULT B)
.
For example, during some app execution, if we want to use booleans to change the background color of the screen:
“red_theme = True”
This will only be useful when at some other part of the program we do some selection:
background_color = IF red_theme THEN red ELSE white.
Thus, all we need from booleans is some manner of conditionally selecting two options.
Based on that, in Lambda Calculus, the Church definition of true and false are defined as functions of two parameters:
- true chooses the first parameter.
- false chooses the second parameter.
TRUE = λx.λy.x
FALSE = λx.λy.y
It seems strange, right? But let’s define some boolean operations and see how it goes:
NOT: Takes a Boolean and returns the opposite.
NOT = λp. p FALSE TRUE
This means: “Take a Boolean function p
. Apply p
to the two parameters FALSE
TRUE
.”
Remember the definition of booleans on Church enconding? TRUE returns the first parameter and FALSE returns the second parameter? Thus:
→ If p
is TRUE
, it returns FALSE
.
→ If p
is FALSE
, it returns TRUE
.
AND: Takes two Booleans and returns TRUE if both are TRUE, otherwise FALSE.
AND = λp.λq.p q p
This means: “Take two Boolean functions p
and q
. Apply p
to q
and p
.”
Let’s try on practice:
AND TRUE FALSE = (λp.λq.p q p) TRUE FALSE:
Given TRUE and FALSE, return TRUE FALSE TRUE:
=> TRUE FALSE TRUE = λx.λy.x FALSE TRUE
Given FALSE and TRUE, return the first parameter:
λx.λy.x FALSE TRUE = FALSE
The definitions of the other boolean operations such as OR, XOR and others follow the same idea.
Practice
Now, let’s use some Lambda Calculus in practice:
# Lambda function abstraction
def L(f):
return f# Church numeral 0
ZERO = L(lambda f: L(lambda x: x))
# Successor function: λn.λf.λx.f (n f x)
SUCC = L(lambda n: L(lambda f: L(lambda x: f(n(f)(x)))))
# Addition: λm.λn.λf.λx.m f (n f x)
ADD = L(lambda m: L(lambda n: L(lambda f: L(lambda x: m(f)(n(f)(x))))))
# Multiplication: λm.λn.λf.m (n f)
MUL = L(lambda m: L(lambda n: L(lambda f: m(n(f)))))
# Convert integer to Church numeral
def to_church(n):
if n == 0:
return ZERO
else:
return SUCC(to_church(n - 1))
# Helper function to compare Church numerals
def church_equal(church_number_1, church_number_2):
f = lambda x: x + 1
return church_number_1(f)(0) == church_number_2(f)(0)
church_two = to_church(2)
church_three = to_church(3)
church_five = to_church(5)
church_six = to_church(6)
# Successor of 2 is 3
assert church_equal(SUCC(church_two), church_three)
# 2 + 3 = 5
assert church_equal(ADD(church_two)(church_three), church_five)
# 2 * 3 = 6
assert church_equal(MUL(church_two)(church_three), church_six)
print("All tests passed.")
As you can see, we are performing numerical operations using only lambda functions!! Also, by extending this with lambda boolean logic, we could implement if/else, loops and even an entire programming language solely with lambda functions! Amazing right?
Okay, now after this brief introduction to Lambda Calculus, we can go to the next topic of our journey.
Before going directly to Interaction Combinators, let’s first learn another earlier work by Yves Lafont: Interaction Nets. This foundation will make understanding Interaction Combinators easier.
Interaction Nets are a model of computation created by Yves Lafont in 1990. They use graph-like structures and a set of interaction rules to represent algorithms.
The first thing we need to define is a cell. A consists of a some symbol e.g. α, a principal port and n auxiliary ports, represented by the image below:
When a cell has n = 0 auxiliary ports, it is represented as follows:
By connecting a set of cells through wires on their ports we construct a net. For example, a net with cells α, β and γ, with arities n = 2, 1 and 0, respectively.
Note that a wire can connect two ports of the same cell and a net may not be necessarily connected. Also, in this example there are three free ports x, y and z.
Whenever a pair of cells is connected through their principal ports, there will be an interaction. An interaction is a rule that will modify the net. This pairs connected through their active ports and ready to interact are called an active pair (or redex).
On the previous example, there are two possible interactions (active pairs) on the first iteration.
After applying these rules, the net will be modified. We then repeatdly apply these rules again to the resulting nets until we reach an irreducible form, when no more interaction rules can be applied. This process of repeatedly applying interaction rules is also known as reduction.
An interaction system is constructed with a set of interaction rules that can be applied without ambiguity. That is, if we define an interaction rule for an active pair (αi, αj), it will be the same for all (αi, αj) that appear.
After this brief explanation, let’s do some practice.
Building an interaction system for arithmetics
Let’s build an interaction system for doing arithmetics. In order to create that, let’s first forget our basic intuition about numbers and try to create a system that can model natural numbers. In 1889, Giuseppe Peano introduced five axioms to formalize natural numbers, similar to how Euclid defined his axioms for geometry. The Peano’s axioms enable an infinite set to be generated by a finite set of symbols and rules. Using these axioms, Peano defined some rules for a finite set of symbols to model natural numbers and their arithmetic properties:
0 → Symbolizes the number zero
s(n) → Represents the successor function. It returns the next natural number.
Using s and 0 we can define the natural numbers, as we have previously seen during lambda calculus studies:
1 = s(0)
2 = s(s(0))
3 = s(s(s(0)))
and so on…
+ → Represents addition. It is a function recursively defined as:
Base case: 0 + a = a
Recursion: a + s(b) = s(a+b)
For example:
a + 3:
= a + s(2)
= s(a+2)
= s(a+s(1))
= s(s(a+1))
= s(s(a+s(0)))
= s(s(s(a+0)))
= s(s(s(a)))
×: Represents multiplication. It is a function recursively defined as:
Base case: b × 0 = 0
Recursion: s(a) × b = (a × b) + b
Inspired by this, Yves Lafont built a interaction system to model natural numbers and arithmetics. Let’s understand:
First, he defined cells for the s and 0 symbols:
Then, the cell for the addition operation:
It seems strange, I know, but I promise will it will further make sense.
If all natural numbers can be expressed using only the symbols 0 and successor s, for addition we need to define just two interaction rules: how an addition interacts with successor and with 0. Therefore, Lafont introduced the two following interaction rules:
Compare these rules with the Peano’s equations for addition, they are extactly the same expressions:
0 + y = y
Now, let’s understand the interaction rules for multiplication. The cell for multiplication is defined as follows:
Now, take a look at Peano’s equations:
y × 0 = 0
Note that the first equation “erases” the y variable (y appears on the left side of the equation and do not appear on the right side). In the second equation, the y is “duplicated” with another multiplication and an addition.
Thus, two other symbols are needed: ε (eraser) and δ (duplicator).
The idea of this symbols is that a net representing natural numbers will be erased when connected to the principal port of ε, and it will be duplicated if it is connected to the principal port of δ. Now, the multiplication rule can be represented as follows:
Try to reflect on how they are similar to the Peano’s expressions:
y × 0 = 0
The interaction rules for duplicator and eraser with successor and 0 are defined as follows:
Thus, we have a set of six symbols {0, s, +, ×, δ, ε} with the following set of eight interaction rules: {(s, +), (0, +), (s, ×), (0, ×), (s, δ), (0, δ), (s, ε), (0, ε)}. Let’s analyze them in practice for the operation 2 × 2.
If you take a look, there is an active pair (s, ×) that we can apply the Rule #3.
Therefore, the operation is solved by applying the interaction rules until we reach an irreducible form:
Take a look at the final form that we have reached: s(s(s(s 0))).
It is exactly the definition of the numeral 4, the result of 2 × 2! Amazing, right? After some manipulation of strange symbols, we could solve an arithmetic operation!
But why do such a complicated thing? What are the advantages of solving problems using these manipulations?
Lafont’s nets have an interesting property: if a net μ can reduce in one step to two different possible nets v or v’, then v and v’ reduce in one step to a common net ξ.
The consequence of this confluence property is that if a net μ reduces to v in n steps, then any sequence of reductions will reach v in n steps. In other words, the order of the application of interaction rules does not matter, the net will reach the same form with the same amount of steps!
Did you get the power of this property? Basically, if the order of interactions doesn’t matter, we can apply them in parallel!
For instance, on our previous 2 × 2 operation, instead of applying the rules one by one, we could parallelize them at moments like this:
In actual execution, both rules could be parallelized by running them in two separated threads, without concerns about thread collisions and other common issues related to parallelism. And that’s one of the core principles on which HVM/Bend is founded! Based on that, all operations that can be parallelized will be inherently parallelized!
Now that we understand interaction nets, let’s take one more step. Earlier in this article, I mentioned that HVM was based on Interaction Combinators, so let’s understand how these concepts relate.
Based on his earlier Interaction Nets work, Yves Lafont created the Interaction Combinators. The idea was to create a representation of computation using a minimal set of primitives, called combinators. While interaction nets use graph rewriting to model computation explicitly, interaction combinators refine this by focusing on the fundamental combinatory logic. This shift provides a more abstract but more powerful framework for expressing computation processes.
For interaction combinators, Lafont defined three symbols (also called combinators): γ (constructor), δ (duplicator) and ε (eraser).
Using these three combinators, a total of only six rules were created. These rules are divided into:
commutation — when two cells of different symbols interact (γδ, γε, δε);
annihilation — when two cells of the same symbol interact (γγ, δδ, εε).
The rules are defined below:
Therefore, using only these six rules you can model any computable algorithm! Amazing, right?
However, the HVM runtime uses a variant of Lafont’s interaction combinators, called Symmetric Interaction Combinators (SIC) (Mazza, 2007). This variant is a simplified version that uses the same rewrite rule for all of its symbols:
As you can see, the single difference is that the rules γγ and δδ are now the similar. The crucial confluence property is maintained, preserving its parallelization capability.
For now on, we will be using the SIC rules for our examples, so focus on them.
Lambda Calculus → Symmetric Interaction Combinators
Now you may be asking “How can I write programs using that? How to transform my Python function into interaction combinators drawings?”
I mentioned before that you can represent any computable algorithm using lambda calculus right?
Now another information: you can transform lambda calculus into interaction combinators!
Thus, any program can be transformed into lambda calculus, then transformed into interaction combinators, run in parallel and then be transformed back!
So, let’s understand how you can translate lambdas to interaction combinators!
Lambda expressions ( λ ) and applications ( @ ) can be expressed using a constructor γ. For instance, a lambda expression λx.y can be expressed as follows:
And for a given application f x, we can express it as follows:
Using these representations, we can express the identity expression λx.x (given x, return x itself):
Now, imagine we want to do the application (λx.x)y:
If we reduce the expression (λx.x)y, we will get y as result. Let’s analyze what can we get using SIC rules?
Notice that when there is an application applied to a lambda expression, there will be an active pair that we can reduce! In this case, we will apply the interaction rule γγ. Also, for now on, we will use a circle to identify the final calculation result we are interested in.
As you can notice, (λx.x)y was correctly reduced to y! Amazing, right?
Now, imagine we want to express λf.ff (given f, apply f to itself). As you can notice, the parameter f is duplicated on the body. That’s when the duplicator (δ) comes into action! We can use duplicators to copy (duplicate) values:
Let’s go back to our expression λf.ff. First, identify that this is an expression that given the input f, it outputs the application f applied to f itself. Therefore, it can be expressed as follows:
Beyond duplication, variables can also be vanished. For instance, let’s take the Church number 0 := λf.λx.x. This expression can be read as “given two variables f and x, return x”. As you can notice, the variable f is not used at the output. If we tried to represent using SIC with our current knowledge, we would obtain:
The f wire is floating. Something seems wrong, right? That’s why we have the eraser ε! In order to represent this variable disappearance we do:
In summary, we can handle Lambda Calculus with Symmetric Interaction Combinators using:
Now that we covered these transformations, we are able to perform more complex operations.
Church numbers
Let’s draw some Church numbers!
Before we go further, try to replicate this yourself! Get a paper and start drawing! For instance, let’s try to draw together the Church number four: λf.λx.f(f(f(f x))).
The thing that I draw is the outer lambda expression λf.____
Then, the second lambda expression __.λx.____:
Now, we need to draw the applications (@). But first, notice that we have f repeated four times. Therefore, we need to copy (duplicate) f three more times (so we need three duplicators in sequence):
Now that we have four copies of f, we can draw the applications of f to f in sequence!
Using the same strategy, we can easily construct other expressions.
Successor
Let’s implement the successor function. It is given by λn.λf.λx.f((n f) x).
Let’s apply SUCC to the number 0 and analyze what we get.
Let’s apply the interaction rules. In order to facilitate readability, I will draw duplicators δ as black cells and constructors γ as white cells:
Well, we should have reached the Church numeral 1, right? What went wrong? Take a look at the eraser ε connected to the auxiliary port of the duplicator δ (in black):
This eraser is making this left auxiliary port to be redundant! All of the information passed through this duplicator will be erased. For any cell that interacts with this duplicator, the left part will be erased.
So we can remove this redundant duplicator and connect the wire directly:
And voila! After reducing SUCC(0) we got exactly the Church number 1, as expected!
Let’s apply SUCC againt to the number 1 and see if we get the number 2:
We got exactly the Church number 2! Amazing, right?
Addition
So far, we have just performed sequential reductions. Let’s do a more complex one, such as addition, to visualize the full parallelization potential of interaction combinators. Below the SIC representation of addition: ADD(m, n) = λm.λn.λf.λx.(m f)((n f) x).
Let’s calculate ADD 1 1:
Executing the reductions:
Take a look at this step. There are two active pairs! In cases like this we can reduce both in parallel. In a real program, we could run them in two different threads.
Let’s continue:
After reducing ADD 1 1 we got exactly the representation of the Church number 2!
And that’s how the operations are parallelized using Interaction Combinators. At each step, if there are multiples active pairs, all of them run in different threads.
In this post we covered basic concepts of lambda calculus, interaction combinators, and how they are combined to parallelize operations. I hope I could give you a brief explanation on how Bend/HVM works and for more information, please visit their website.
Also, follow me here and on my LinkedIn profile to stay updated on my latest articles!
Lafont’s Interaction Combinators paper
Interaction combinators tutorial 1