The Order of Things
How we solved which objects to initialize first when writing bindings
Simon Rydell
The problem
We’ve all done it, written std::vector
before #include <vector>
and seen the annoying little red squiggly lines appear. Or maybe we forgot to include the header to a user defined class
or, my personal favourite, messed up the order of defining internal classes in some giant legacy file. The order in which we define things matter in C++
, and the same is true when writing bindings for C++
. Tolc
is a bindings compiler, so it automatically writes bindings to other languages for you. In this post we will take a look at how we go from an unordered AST (Abstract Syntax Tree) of some C++
to an ordered list of objects to be defined.
Tolc
Tolc
has three parts: a Parser
which creates an AST using LLVMs libtooling, a frontend
which takes the AST and generates some bindings, and some input/output piping in the form of an executable to tie everything together.
The solution
Take a look at the functions f
and g
:
struct Data {};
Data f();
Data g();
f
clearly needs to be declared afterData
, but it does not put any restrictions on g
. The same is true when defining bindings to, for instance Objective-C
, where the corresponding code might look like:
@interface Data : NSObject
@end
Data* f();
Data* g();
Swapping around the function declarations with the @interface
produces a compiler error.
So functions might depend on something to be defined, but nothing depends on a function (at least when speaking about declarations). The same is true for global, and member variables. So in a dependency graph, these are all leaves.
Types (struct
and enum
) on the other hand may depend on another type, and be dependent on:
struct Data {
enum class Inner {
I0, I1
};
};
Data::Inner f();
In this case Inner
depends on Data
and f
depends on Inner
. So they can be defined in order as [Data, Inner, f]
.
To make it easier to work with, lets number each of them with a size_t
, and give them a set<size_t>
to define their dependencies:
After this the valid ordering is [0, 1, 2]
. Now we can create the whole dependency graph in the form of a vector<set<size_t>> graph
where graph[id]
corresponds to the dependencies of id
. We just need to find an array vector<size_t> order
such that the order of the elements matches a valid definition order of the objects. I can almost feel the itching of the leetcode readers.
Lets quickly remind ourselves of the goal; we are laying out objects (functions, structs, etc.) in one single file such that every type will be defined before they are used. Now, lets take a look at the dependencies, set<size_t> deps
, of one of these objects. We will have two cases:
-
It has no dependencies (
deps.empty() == true
) -
It has dependencies (
deps.empty() == false
)
In the first case, it doesn’t matter when the object is defined as long as it is only done so once. Lets keep track of which objects we have defined so far in set<size_t> defined
:
if (deps.empty()
&& !defined.contains(id)) {
// No dependencies
// Never defined before
// => Define it
order.push_back(id);
defined.insert(id);
}
}
On the other hand, if there are dependencies and they have all been defined before, we can add the object to the back of the order
, otherwise we need to wait for them to be defined:
if (all_of(
deps.begin(),
deps.end(),
[&defined](auto dep) {
return
defined.contains(dep);
})) {
// Dependencies are defined
// => we can add it
order.push_back(id);
defined.insert(id);
}
Now we do this for each of the deps
in the graph
, sprinkle in some error checks about making progress, and we get the final solution:
optional<vector<size_t>>
createDefinitionOrder(
vector<set<size_t>> const& graph) {
set<size_t> defined;
vector<size_t> order;
while (order.size() != graph.size()) {
// Closer to a solution?
bool hasProgressed = false;
size_t id = 0;
for (auto const& deps : graph) {
if (deps.empty()
&& !defined.contains(id)) {
// No dependencies
// Never added before
// => Add it
order.push_back(id);
defined.insert(id);
hasProgressed = true;
} else {
if (all_of(deps.begin(),
deps.end(),
[&defined](auto dep) {
return defined.contains(dep);
})) {
// All dependencies defined
// => we can add it
order.push_back(id);
defined.insert(id);
hasProgressed = true;
}
}
id++;
}
if (!hasProgressed) {
// Circular dependency
return nullopt;
}
}
return order;
}
This will then be used by Tolc
to produce bindings that define objects in the right order. You can find the full source code of the Parser
project where this solution was implemented here.
Tolc
is an open source bindings compiler aiming to make libraries written in C++
trivial to use from other languages (currently python
and WebAssembly
are supported, but more are soon to come). If you need help with your language bindings or any other cross language problems, don’t hesitate to contact us.
Thanks for reading!
Edit: As suggested by reddit user u/sQGNXXnkceeEfhm
, whenever an id
is added, we might be able to add any of the ones dependent on it. Therefore, we can speed up the algorithm by keeping track of the reverse graph as well. Going from O(V^2)
to O(V + E)
, where V
is the number of objects and E
is the number of connections. I’m keeping the post the same since the complexity of the code went up a bit.