I love the concept -- I've often wished that lean languages like Lua had more support for static typing, especially given the potential performance benefits.
I also love the focus on performance. I'm curious if you've considered using a tail call design for the interpreter. I've found this to be the best way to get good code out of the compiler: https://blog.reverberate.org/2021/04/21/musttail-efficient-i... Unfortunately it's not portable to MSVC.
In that article I show that this technique was able to match Mike Pall's hand-coded assembly for one example he gave of LuaJIT's interpreter. Mike later linked to the article as a new take for how to optimize interpreters: https://github.com/LuaJIT/LuaJIT/issues/716#issuecomment-854...
I did experiment with a few different dispatch methods before settling on the one in Bolt now, though not with tailcalls specifically. The approach I landed on was largely chosen cause it in my testing competes with computed goto solutions while also compiling on msvc, but I'm absolutely open to try other things out.
There’s one thing that tail calls do that no other approach to interpreters outside assembly really can, and that is decent register allocation. Current compilers only ever try to allocate registers for a function at a time, and somehow that invariably leads them to do a bad job when given a large blob of a single intepreter function. This is especially true if you don’t isolate your cold paths into separate functions marked uninlineable (and preferably preserve_all or the like). Just look at the assembly and you’ll usually find that it sucks.
(Whether the blob uses computed gotos or loop-switch is less important these days, because Clang for example is often smart enough to actually replicate your dispatch in the loop-switch case, avoiding the indirect branch prediction problem that in the past meant computed gotos were preferable. You do need to verify that this optimization actually happens, though, because it can be temperamental sometimes[1].)
By contrast, tail calls with the most important interprerer variables turned into function arguments (that are few enough to fit into registers per the ABI—remember to use regparm or fastcall on x86-32) give the compiler the opportunity to allocate registers for each bytecode’s body separately. This usually allows it to do a much better job, even if putting the cold path out of line is still advisable. (Somehow I’ve never thought to check if it would be helpful to also mark those functions preserve_none on Clang. Seems likely that it would be.)
From my research into the subject the easiest way to implement it would be a 'musttail' macro which falls back to a trampoline for compilers which don't support it. The problem then becomes having the function call overhead (assuming the compiler can't figure out what's going on and do tail-call optimizations anyway) on the unsupported systems with each and every opcode which is probably slower than just a Big Old Switch -- which, apparently, modern compilers are pretty good at optimizing.
The VM I've been poking at is I/O bound so the difference (probably) isn't even measurable over the overhead of reading a file. I went with a pure 'musttail' implementation but didn't do any sort of performance measurements so who knows if it's better or not.
You may be interested in Luau, which is the gradually-typed dialect of Lua maintained by Roblox. The game Alan Wake 2 also used it for level scripting.
I like 99% of this, and the thing I don't like is in the very first line of the example:
> import abs, epsilon from math
IMHO it's wrong to put the imported symbols first, because the same symbol could come from two different libraries and mean different things. So the library name is pretty important, and putting it last (and burying it after a potentially long list of imported symbols) just feels wrong.
I get that it has a more natural-language vibe this way, but put there's a really good reason that most of the languages I know that put the package/module name first:
import packageName.member; // java
from package import symbol; # python
use Module 'symbol'; # perl
The OP seems to be asking for the Python order of the import statement because it allows for simpler auto-completion when typing it:
from math import square_root as sqrt, abs as absolute
from math import * as not_math
In a format like this, your language service can open up `math` immediately after the `from math` and start auto-completing the various types inside math on the other side of the `import`.
Whereas the `import abs from math` often means you type `import` have no auto-complete for what comes next, maybe type ` from math` then cursor back to after the import to get auto-completion hints.
It's very similar to the arguments about how the SQL syntax is backwards for good auto-complete and a lot of people prefer things like PRQL or C# LINQ that take an approach like `from someTable where color = 'Red' select name` (rather than `select name from someTable where color = 'Red'`).
Oooh, bikeshedding! To me your `import math with x as y` reads like "import all of math, making all of its symbols visible, just renaming some of them". That's different from the intended "from math, import only x (maybe with a renaming)".
Put the category first so it makes it easy to skim and sort dependencies. You're never going to organise your dependencies based on what the individual functions, types or sub-packages are called, and sorting based on something that ends up in a more or less random place at the end of a line just seems obtuse.
FYI "the embedded scene" is likely to be interpreted as "embedded systems" rather than "embedded interpreters" even by people who know about embedded interpreters, especially since all the languages you give as an example have been attempted for use on those targets (micropython, lua, and even typescript)
True. I misread it as being for embedded, especially with the term "real-time" in the mix. Then later when there was no ARM or RISC-V support I became very confused.
I was quite excited by the description and then I noted that Bolt heavily relies on double floating point numbers. I am quite disappointed because this doesn't allow me to use Bolt in my context: embedded systems where floating point numbers are rarely supported... So I realized that I misinterpreted `embedded`.
I appreciate the followup here. The brainfuck interpreter isn't meant to be a benchmark notably, it's a naive implementation for the sake of the example.
I did spot some poor code in the Bolt version of nbody that can be changed (the usage of `.each()` in the hot loop is creating loads of temporary iterators, that's the memory difference.)
luajit -joff does perform better even with this change, but I observe closer to 15% than a 2x difference
This looks so familiar that it got me thinking: who is collating all of the languages that are being invented? I must see two dozen a year on HN. I'm not dissing OP, but I've seen so many languages I'm not sure if I'm having deja vu, or vuja de.
Outperforming languages in its class is doing some heavy lifting here. Missing comparison to wasm interpreter, any of the java or dot net interpreters, the MLs, any lisps etc.
Compile to register bytecode is legitimate as a strategy but its not the fast one, as the author knows, so probably shouldn't be branding the language as fast at this point.
It might be a fast language. Hard to tell from a superficial look, depends on how the type system, alias analysis and concurrency models interact. It's not a fast implementation at this point.
> This means functions do not need to dynamically capture their imports, avoiding closure invocations, and are able to linearly address them in the import array instead of making some kind of environment lookup.
That is suspect, could be burning function identifiers into the bytecode directly, not emitting lookups in a table.
Likewise the switch on the end of each instruction is probably the wrong thing, take a look at a function per op, forced tailcalls, with the interpreter state in the argument registers of the machine function call. There's some good notes on that from some wasm interpreters, and some context on why from luajit if you go looking.
The class is "embeddable interpreted scripting language", which is not quite the same thing as just an interpreter.
Embedded interpreters are that designed to be embedded into a c/c++ program (often a game) as a scripting language. They typically have as few dependencies as possible, try to be lightweight and focus on making it really easy to interopt between contexts.
The comparison hits many of the major languages for this usecase. Though it probably should have included mono's interpreter mode, even if nobody really uses it since mono got AoT
If functions don't have a return signature, does that mean everything must be satisfied in the compilation step?
What about memory management/ownership? This would imply that everything must be copy by value in each function callsite, right? How to use references/pointers? Are they supported?
I like the matchers which look similar to Rust, but I dislike the error handling because it is neither implicit, and neither explicit, and therefore will be painful to debug in larger codebases I'd imagine.
Do you know about Koka? I don't like its syntax choices much but I think that an effect based error type system might integrate nicely with your design choices, especially with matchers as consumers.
I'm very much in favor of authors choosing unique names for programming languages because there's still plenty of good names up for grabs without having to step on someone's toes. If the project is dead, that's one thing; the data-race one was a research project and hasn't had any activity in 5 years. BOLT last modified in 2014.
But beariish/bolt and boltlang/bolt were started in the same year and are still under active development. With boltlang/bolt obviously snagging the namespace first, I think they should have claim to the name for now. That said, neither seems to have registered any domains, so whoever gets bolt-lang.org/com/net first will probably have an easier time defending a claim.
I see your benchmarks compare against other interpreted languages "in its class".
We read here a couple days ago about Q which is compiled. Bolt claims to "plow through code at over 500kloc/thread/second". Q claims to compile in milliseconds--so fast that you can treat it like a script.
Bolt and Q are both newborns. Perhaps you could include each other in your benchmarks to give each other a little publicity.
I might be missing this, but I'm not seeing anything about how the type system handles (or doesn't) polymorphism - generics, traits, that sort of thing. Is that in there?
I don't understand why people still choose the syntax `import xxx from yyy` in the current year.
It is a major source of complaining for languages like python or javascript, because it makes autocomplete does not work well.
1. Ask - the author is very much available, right here in this comment section they made specifically for such a prospect.
2. Contribute - Code the change you wish to see in the world. Follow the OP’s example, and do something about it.
> It is a major source of complaining for languages like python or javascript
Dynamically typed languages have more difficulties with autocomplete in general, Bolt is statically typed so you shouldn't automatically assume the same difficulties carry over.
Really impressive, great job! I was interested to see how you had solved Result type and that seems quite developer-friendly—no wrappers just value & error union. I should try it out to see how it's to write if I can run it on ARM64. I wish Godot Script looked like this.
Super cool! Just now I am building something where I am trying to use mlua to make it scriptable with Lua. But the biggest pain point right now is trying to generate type annotations for LuaLS based on my rust structs. I will look into whether bolt could be interesting for the project.
Really cool! Roughly how much memory does it take to include it in an engine? Also I'm really interested in the process of creating these really fast scripting languages, have you written anything about how you wrote Bolt?
Bolt's memory usage in most cases hovers right around Lua 5.4/Luau in my own testing, but maybe I should include a few memory benchmarks to highlight that more. It does notably have a higher memory overhead during compilation than other languages in this class though.
As for writeups, I'm working on putting out some material about the creation of Bolt and my learnings now that it's out there.
Seems to be the worst of both worlds: mutable by default, and one must add a "const" keyword to the "let", disincentivizing const behavior by making it super verbose (e.g. the "final" problem from Java)
This looks awesome. Would you have any data on the performance of large number of invocations of small scripts? I am wondering at startup overhead for every script run. which the 500kloc/s may not capture well.
It depends on your exact usecase, I'm not 100% sure what you're asking. There is some overhead for invoking the compiler on a per-script basis. If you're parsing once but running a script many times, Bolt provides some tools (like reusing a preallocated thread object) to ammortize that cost
We have a server which uses Lua based script plugins. They are usually a few hundred to a few thousand lines and get invoked via APIs. I was trying to figure out how Bolt will behave in such a context and whether we could replace the Lua based plugin engine with this.
This is really cool but it's not portable to MacOs or aarm64 yet, and that kind of portability unfortunately is what appeals to me about an embeddable scripting language.
The only way to have any idea of how long a language might be still around is to look at how long it's been already around. From this perspective , you can only use older languages. The benchmarks show that Lua (and the Luau and Lua+JIT variants) is actually very competitive, so I'd stick with one of those.
In the end, weight is a kind of strength, and popularity is a kind of quality. It looks promising but you can't expect long-term support until there's more contributors and users
At this point it is too early to know. Even JavaScript took like 20 years to catch on
As of right now no - my primary target when developing this was realtime and games in particular since that's what I know best, but if there's a real target in embedded that's certainly something that could be explored.
Function return type inference is funny but I don't think it's that great of a feature. It makes it harder for a library's consumer to know how to properly use a function, and it also makes it harder for the maintainer to not break backwards compatibility inadvertently. Anyway, I'm all for experimenting.
There's nothing stopping a library author from explicitly annotating return types wherever a stable interface is important, the idea is more for smaller functions or callbacks to make use of this. Perhaps I'll make the examples clearer to reflect the intention.
Perhaps it makes more sense to say that exported function interfaces be explicit. That forces you to document the api and more carefully consider changes.
It’s great in TypeScript. In TypeScript your source can have inferred types returned, and then use a build step to produce resolved typings files (.d.ts) for distribution that have the fully specified type.
I also love the focus on performance. I'm curious if you've considered using a tail call design for the interpreter. I've found this to be the best way to get good code out of the compiler: https://blog.reverberate.org/2021/04/21/musttail-efficient-i... Unfortunately it's not portable to MSVC.
In that article I show that this technique was able to match Mike Pall's hand-coded assembly for one example he gave of LuaJIT's interpreter. Mike later linked to the article as a new take for how to optimize interpreters: https://github.com/LuaJIT/LuaJIT/issues/716#issuecomment-854...
Python 3.14 also added support for this style of interpreter dispatch and got a modest performance win from it: https://blog.reverberate.org/2025/02/10/tail-call-updates.ht...
(Whether the blob uses computed gotos or loop-switch is less important these days, because Clang for example is often smart enough to actually replicate your dispatch in the loop-switch case, avoiding the indirect branch prediction problem that in the past meant computed gotos were preferable. You do need to verify that this optimization actually happens, though, because it can be temperamental sometimes[1].)
By contrast, tail calls with the most important interprerer variables turned into function arguments (that are few enough to fit into registers per the ABI—remember to use regparm or fastcall on x86-32) give the compiler the opportunity to allocate registers for each bytecode’s body separately. This usually allows it to do a much better job, even if putting the cold path out of line is still advisable. (Somehow I’ve never thought to check if it would be helpful to also mark those functions preserve_none on Clang. Seems likely that it would be.)
[1] https://blog.nelhage.com/post/cpython-tail-call/
http://www.emulators.com/docs/nx25_nostradamus.htm
The VM I've been poking at is I/O bound so the difference (probably) isn't even measurable over the overhead of reading a file. I went with a pure 'musttail' implementation but didn't do any sort of performance measurements so who knows if it's better or not.
> import abs, epsilon from math
IMHO it's wrong to put the imported symbols first, because the same symbol could come from two different libraries and mean different things. So the library name is pretty important, and putting it last (and burying it after a potentially long list of imported symbols) just feels wrong.
I get that it has a more natural-language vibe this way, but put there's a really good reason that most of the languages I know that put the package/module name first:
With Typescript being the notable exception:Though I almost never manually type out imports manually anymore.
See https://guide.elm-lang.org/webapps/modules (scroll down to "Using Modules") for examples
Whereas the `import abs from math` often means you type `import` have no auto-complete for what comes next, maybe type ` from math` then cursor back to after the import to get auto-completion hints.
It's very similar to the arguments about how the SQL syntax is backwards for good auto-complete and a lot of people prefer things like PRQL or C# LINQ that take an approach like `from someTable where color = 'Red' select name` (rather than `select name from someTable where color = 'Red'`).
Put the category first so it makes it easy to skim and sort dependencies. You're never going to organise your dependencies based on what the individual functions, types or sub-packages are called, and sorting based on something that ends up in a more or less random place at the end of a line just seems obtuse.
"In case of conflict or convenience, you can give modules an alias as well."
That said, somehow I do not believe it is faster than LuaJIT. We will see.
Just checked with nbody:
I did spot some poor code in the Bolt version of nbody that can be changed (the usage of `.each()` in the hot loop is creating loads of temporary iterators, that's the memory difference.)
luajit -joff does perform better even with this change, but I observe closer to 15% than a 2x difference
Compile to register bytecode is legitimate as a strategy but its not the fast one, as the author knows, so probably shouldn't be branding the language as fast at this point.
It might be a fast language. Hard to tell from a superficial look, depends on how the type system, alias analysis and concurrency models interact. It's not a fast implementation at this point.
> This means functions do not need to dynamically capture their imports, avoiding closure invocations, and are able to linearly address them in the import array instead of making some kind of environment lookup.
That is suspect, could be burning function identifiers into the bytecode directly, not emitting lookups in a table.
Likewise the switch on the end of each instruction is probably the wrong thing, take a look at a function per op, forced tailcalls, with the interpreter state in the argument registers of the machine function call. There's some good notes on that from some wasm interpreters, and some context on why from luajit if you go looking.
Embedded interpreters are that designed to be embedded into a c/c++ program (often a game) as a scripting language. They typically have as few dependencies as possible, try to be lightweight and focus on making it really easy to interopt between contexts.
The comparison hits many of the major languages for this usecase. Though it probably should have included mono's interpreter mode, even if nobody really uses it since mono got AoT
What about memory management/ownership? This would imply that everything must be copy by value in each function callsite, right? How to use references/pointers? Are they supported?
I like the matchers which look similar to Rust, but I dislike the error handling because it is neither implicit, and neither explicit, and therefore will be painful to debug in larger codebases I'd imagine.
Do you know about Koka? I don't like its syntax choices much but I think that an effect based error type system might integrate nicely with your design choices, especially with matchers as consumers.
[1] https://koka-lang.github.io/koka/doc/index.html
Functions do have a return signature.
It looks like the author chose to show off the feature of return type inference in the short example README code, rather than the explicit case.
https://github.com/Beariish/bolt/blob/main/doc/Bolt%20Progra...
Bolt: a language with in-built data-race freedom! (recent discussion: https://news.ycombinator.com/item?id=23122973) - https://github.com/mukul-rathi/bolt
Bolt: A programming language for rapid application development - https://github.com/boltlang/Bolt
BOLT: a programming language that was desinged for begining programmers who have never seen code before in their life - https://sourceforge.net/projects/boltprogramming/files
I'm very much in favor of authors choosing unique names for programming languages because there's still plenty of good names up for grabs without having to step on someone's toes. If the project is dead, that's one thing; the data-race one was a research project and hasn't had any activity in 5 years. BOLT last modified in 2014.
But beariish/bolt and boltlang/bolt were started in the same year and are still under active development. With boltlang/bolt obviously snagging the namespace first, I think they should have claim to the name for now. That said, neither seems to have registered any domains, so whoever gets bolt-lang.org/com/net first will probably have an easier time defending a claim.
We read here a couple days ago about Q which is compiled. Bolt claims to "plow through code at over 500kloc/thread/second". Q claims to compile in milliseconds--so fast that you can treat it like a script.
Bolt and Q are both newborns. Perhaps you could include each other in your benchmarks to give each other a little publicity.
make me instantly lost interest in the language.
1. Ask - the author is very much available, right here in this comment section they made specifically for such a prospect. 2. Contribute - Code the change you wish to see in the world. Follow the OP’s example, and do something about it.
Dynamically typed languages have more difficulties with autocomplete in general, Bolt is statically typed so you shouldn't automatically assume the same difficulties carry over.
On the feature side, is there any support for breakpoints or a debugging server, and if not is it planned?
As for writeups, I'm working on putting out some material about the creation of Bolt and my learnings now that it's out there.
I noticed that `let`-declared variables seem to be mutable. I'd strongly recommend against that. Add a `var` keyword.
https://github.com/Beariish/bolt/blob/0.1.0/doc/Bolt%20Progr...
Usually I feel like that's bare minimum before I'd like to try and play around with a language
Quickly scanned the programming guide - but wasn't able to find it. Did i miss a section?
https://github.com/Beariish/bolt/blob/main/examples/error_ha...
But I think OP was asking about whatever the hell this means by "native error callback" https://github.com/Beariish/bolt/blob/0.1.0/doc/Bolt%20Stand...
(https://www.compuphase.com/pawn/pawn.htm)
My main concern about a new language is not performance, syntax, or features, but long term support and community.
At this point it is too early to know. Even JavaScript took like 20 years to catch on
Is it deterministic like Lua?