Blog about stuff

Compiling WebAssembly with LLVM/Clang

2018/02/19

As you might already know the recommended tool to compile WebAssembly is by using emscripten. Basically emscripten can do everything for you: providing libc, OS emulation layer, a lot of the common libraries and even a HTML template with all the necessary Javascript boilerplate. If you just want to get something working quickly it is the solution for you.

What I wanted however was having all the control I can get and learning the whole platform along the way. I wanted it to work just like it works with compiling C/C++ code on X86, so that I know exactly where every bit of code came from. And this is exactly what the brand new Clang + LLVM backend provides me. I compile C files into .o files and link them together into an executable. The downside is that right now you get pretty much nothing - not even libc.

Getting llvm, lld and Clang

I’m not sure if there are any packaged binaries available right now. Luckily we can compile the newest version ourselves. To get the files and build them you’re gonna need Subversion, CMake, compiler and linker. On windows you would need Visual Studio or on Unix-like system you would need gcc, g++ and ld.

mkdir llvm-wasm
cd llvm-wasm
svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
cd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
svn co http://llvm.org/svn/llvm-project/lld/trunk lld
cd ../..
mkdir llvm-build
cd llvm-build
cmake -G "Unix Makefiles" -DLLVM_TARGETS_TO_BUILD= -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly ../llvm

If you’re on Windows you would want to change “Unix Makefiles” to a Visual Studio one, for example latest Visual Studio would be “Visual Studio 15 2017 Win64”.

To build it on Unix-like system:

make -j 4
(don’t need -j 4 if you want to run only 1 thread)

On Windows the most convinient way to build would be to open the generated .sln file somewhere in the llvm-build folder we created earlier.

Something to be aware of is that compiling this will require a lot of RAM. I tried to build it on a Ubuntu VM with 6GB of RAM and the linker just crashed. Eventually I ended up making VM with 4GB RAM + 10GB swap space and I ran it without the “-j 4” over night when I went to sleep. On a modern physical system you should not have to worry. This can take anywhere from 15min to few hours depending how beefy system you have.

Once you have everything compiled your files will be in llvm-build/bin. Typing the whole path to the executables is inconvenient so you might want to add that bin folder to your PATH. What I ended up doing was creating symbolic links to the executables I needed to my ~/bin. This way I avoid conflicts with the Clang compiler I already have installed.

ln -s ~/llvm-wasm/llvm-build/bin/clang ~/bin/clang-wasm
ln -s ~/llvm-wasm/llvm-build/bin/lld ~/bin/lld-wasm
ln -s ~/llvm-wasm/llvm-build/bin/llvm-ar ~/bin/llvm-ar-wasm

Now I can use clang-wasm, lld-wasm and llvm-ar-wasm anywhere I want. To avoid confusion I’m gonna use original names in the examples.

We’re ready to do our first test compilation.

Let’s compile something

Create program.c with the following content:

int test(int value)
{
    int second = 5;
    return value * second;
}

Now let’s compile our super basic program.

clang --target=wasm32-unkown-unknown-wasm progRAM.c -c
lld -flavor wasm -export test program.o -o program.wasm --no-entry
Now we’re gonna have to make our HTML and Javascript boilerplate to get our code running. Also you’re gonna need a HTTP server to serve the files. I’m assuming you have one already or know how to set one up.

Move your program.wasm to your server and create the following index.html:

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>WebAssembly test</title>

<script>
var importObject = {
    env: 
    {
        alertStringInJavascript: function(strPtr)
        {
            alert(importObject.Asc2Str(strPtr));
        }
    }
};

fetch('program.wasm').then(response => 
    response.arrayBuffer()
).then(bytes =>
    WebAssembly.instantiate(bytes, importObject)
).then(results => {
    importObject = results.instance;

    importObject.Asc2Str = function Asc2Str(ptr) {
        var str = '';		  
        var i8 = new Uint8Array(this.exports.memory.buffer);
        for(var i=0; 1; i++) {
            var ch = i8[ptr + i];
            if (!ch) 
                return str;
            str += String.fromCharCode(ch);
        }
        return null;
    };

    var testResult = importObject.exports.test(11);
    alert("test returned " + testResult);

    //importObject.exports.test2();
});
	
</script>

</head>

<body>
WebAssemby test...
</body>

</html>
I don’t know anything about Javascript, excuse my poor code. Opening index.html in your browser you should see “test returned 55” as expected. This was Javascript code calling a WebAssembly function. Now let’s call a Javascript function from WebAssembly. If you look at the Javascript code you can see that I’ve already included a function that we’re going to call (alertStringInJavascript). Let’s edit our program.c

void alertStringInJavascript(const char *string);

int test(int value)
{
	int second = 5;
	return value * second;
}

void test2()
{
    alertStringInJavascript("Hello from WebAssembly!");
}
Also we need to edit linking command a bit.
lld -flavor wasm -export test -export test2 program.o -o program.wasm --no-entry --allow-undefined
The –allow-undefined switch let’s us compile with undefined symbols which will then be linked against our importObject in Javascript. Now comment out the test2 function call in Javascript. When you refresh the page you should see the “Alert from WebAssembly!” alert box. I also included the Asc2Str function in Javascript which converts C type string to Javascript string. Pointers work exactly as you would expect - they are just an offset to byte array representing WebAssembly Virtual Machine memory. All we have to do is read characters from pointer onwards and stop once we hit a 0.

This is neat but…

Unfortunately if you try to #include something you’d expect to be there like string.h you’ll find that nothing is really there. At the moment of writing this article you have to either write your own libc or use someone else’s. The only standalone libc I’ve found is musl ported to WebAssembly Musl implements the whole libc, but it need quite a few syscalls to fully implement it. If you look at the source code you will find that it has been written with Unix systems in mind. To make it work in WebAssembly you would have to write OS emulation layer for those syscalls. At the moment of writing this article the repository I linked only implements a few. Some syscalls like mmap especially will be troublesome, because WebAssembly currently only supports linear allocation through memory.grow(), which would be roughly equivalent to brk/sbrk syscalls on Unix. Emscripten also uses musl, but if you look at the source you’ll see that there’s thoushands of lines of code to emulate all the syscalls.

What I decided to do was to write my own subset of libc. I don’t really use much of it. If I need some part I can always copy it in from some open source project with appropriate license. For malloc I use dlmalloc. I compile everything I need to .o files and archive them together with llvm-ar. Then I can just add the .a file to linker (lld). The obvious downside is that it’s a lot of work to use libraries that make heavy use of libc. The benefit is that you get really small programs and you know exacly where everything came from.

Wrapping it up

Those were my experiences using WebAssembly with LLVM backend. Hope someone finds it useful.