The Order of Things
How we solved which objects to initialize first when writing bindings
Simon Rydell

The problem

We’ve all done it, written std::vector before #include <vector> and seen the annoying little red squiggly lines appear. Or maybe we forgot to include the header to a user defined class or, my personal favourite, messed up the order of defining internal classes in some giant legacy file. The order in which we define things matter in C++, and the same is true when writing bindings for C++. Tolc is a bindings compiler, so it automatically writes bindings to other languages for you. In this post we will take a look at how we go from an unordered AST (Abstract Syntax Tree) of some C++ to an ordered list of objects to be defined.

Tolc

Tolc has three parts: a Parser which creates an AST using LLVMs libtooling, a frontend which takes the AST and generates some bindings, and some input/output piping in the form of an executable to tie everything together.

The solution

Take a look at the functions f and g:

struct Data {};

Data f();
Data g();

f clearly needs to be declared afterData, but it does not put any restrictions on g. The same is true when defining bindings to, for instance Objective-C, where the corresponding code might look like:

@interface Data : NSObject
@end

Data* f();
Data* g();

Swapping around the function declarations with the @interface produces a compiler error.

So functions might depend on something to be defined, but nothing depends on a function (at least when speaking about declarations). The same is true for global, and member variables. So in a dependency graph, these are all leaves.

Dependency graph

Types (struct and enum) on the other hand may depend on another type, and be dependent on:

struct Data {
  enum class Inner {
    I0, I1
  };
};

Data::Inner f();

In this case Inner depends on Data and f depends on Inner. So they can be defined in order as [Data, Inner, f].

Dependency graph

To make it easier to work with, lets number each of them with a size_t, and give them a set<size_t> to define their dependencies:

Dependency graph

After this the valid ordering is [0, 1, 2]. Now we can create the whole dependency graph in the form of a vector<set<size_t>> graph where graph[id] corresponds to the dependencies of id. We just need to find an array vector<size_t> order such that the order of the elements matches a valid definition order of the objects. I can almost feel the itching of the leetcode readers.

Lets quickly remind ourselves of the goal; we are laying out objects (functions, structs, etc.) in one single file such that every type will be defined before they are used. Now, lets take a look at the dependencies, set<size_t> deps, of one of these objects. We will have two cases:

  1. It has no dependencies (deps.empty() == true)
  2. It has dependencies (deps.empty() == false)

In the first case, it doesn’t matter when the object is defined as long as it is only done so once. Lets keep track of which objects we have defined so far in set<size_t> defined:

if (deps.empty()
     && !defined.contains(id)) {
    // No dependencies
    // Never defined before
    // => Define it
    order.push_back(id);
    defined.insert(id);
  }
}

On the other hand, if there are dependencies and they have all been defined before, we can add the object to the back of the order, otherwise we need to wait for them to be defined:

if (all_of(
      deps.begin(),
      deps.end(),
      [&defined](auto dep) {
        return
          defined.contains(dep);
      })) {
  // Dependencies are defined
  // => we can add it
  order.push_back(id);
  defined.insert(id);
}

Now we do this for each of the deps in the graph, sprinkle in some error checks about making progress, and we get the final solution:

optional<vector<size_t>>
createDefinitionOrder(
  vector<set<size_t>> const& graph) {

  set<size_t> defined;
  vector<size_t> order;

  while (order.size() != graph.size()) {
    // Closer to a solution?
    bool hasProgressed = false;

    size_t id = 0;
    for (auto const& deps : graph) {
      if (deps.empty()
          && !defined.contains(id)) {
        // No dependencies
        // Never added before
        // => Add it
        order.push_back(id);
        defined.insert(id);
        hasProgressed = true;
      } else {
        if (all_of(deps.begin(),
                   deps.end(),
                   [&defined](auto dep) {
                     return defined.contains(dep);
                   })) {
          // All dependencies defined
          // => we can add it
          order.push_back(id);
          defined.insert(id);
          hasProgressed = true;
        }
      }
      id++;
    }

    if (!hasProgressed) {
      // Circular dependency
      return nullopt;
    }
  }
  return order;
}

This will then be used by Tolc to produce bindings that define objects in the right order. You can find the full source code of the Parser project where this solution was implemented here.

Tolc is an open source bindings compiler aiming to make libraries written in C++ trivial to use from other languages (currently python and WebAssembly are supported, but more are soon to come). If you need help with your language bindings or any other cross language problems, don’t hesitate to contact us.

Thanks for reading!


Edit: As suggested by reddit user u/sQGNXXnkceeEfhm, whenever an id is added, we might be able to add any of the ones dependent on it. Therefore, we can speed up the algorithm by keeping track of the reverse graph as well. Going from O(V^2) to O(V + E), where V is the number of objects and E is the number of connections. I’m keeping the post the same since the complexity of the code went up a bit.