Remix.run Logo
pansa2 3 days ago

Luau seems to be significantly more complex than Lua - I'm not sure it can still be called "small". Looking at the relative size of the implementations: Luau's is 120,000 lines of C++ [0], an order of magnitude larger than Lua 5.1's 14,000 lines of C.

But I think that complexity is unavoidable for a gradually- or statically-typed language. Any language with a reasonably-complete type system is inevitably going to be much more complex than a dynamically-typed scripting language.

[0] Counting *.cpp files in the "Analysis", "AST", "Compiler" and "VM" directories

parenwielder 3 days ago | parent | next [-]

Lua (and to a somewhat lesser extent Luau) are small in terms of the learned surface of the value language, not necessarily in terms of lines of code. That being said, any runtime use of the language needn't depend on Analysis, which is the biggest compilation unit by far.

Probably also worth mentioning that Analysis currently contains two full type system implementations because we've spent the better part of the past three years building a new type system to address a lot of the fundamental limitations and architectural issues that we'd run into after years of working on the original one. The new one is not without issues still, but is definitely very usable, and gets better and better every week. At some point in the future, we will clip the old one altogether, and that'll shave off some 30,000 lines of code or something.

pizlonator 3 days ago | parent [-]

Is the primary goal of the type system performance, or dev productivity, or something else?

If performance, are you comparing against a baseline that does the classic Self-style optimizations (like what JS engines do)?

FWIW I don't consider LuaJIT to be an interesting baseline because of what I've heard about it being overtuned to specific benchmark that the author(s) care about. (Too much of an attitude where if your code doesn't run well then it's your fault.)

parenwielder 3 days ago | parent [-]

I would say that the primary interest in building the type system is in supporting a better developer experience, both in terms of productivity (better autocomplete, go-to definition, etc.) and in terms of correctness (identifying bugs earlier, which is why we're actively concerned with having a _sound_ type system). There's a pretty limited surface of areas where we accept unsoundness today (outside of casting) and they're all connected to limitations of the type system that we've been working to resolve.

Longer-term, there's definitely some interest in how we could leverage types to support more optimized code, but it's been a notoriously difficult problem for gradually-typed languages in general, see [Is sound gradual typing dead?](https://dl.acm.org/doi/pdf/10.1145/2837614.2837630)

pizlonator 3 days ago | parent [-]

Gotcha. For some reason I was under the impression that getting perf from gradual typing had been a goal of y'all's.

> primary interest in building the type system is in supporting a better developer experience

Yeah this is the right reason to do it :-)

> there's definitely some interest in how we could leverage types to support more optimized code, but it's been a notoriously difficult problem for gradually-typed languages in general

I know. I still get folks asking why TypeScript types can't be used to make JS fast and it's quite exhausting to explain this.

Hard to imagine gradual types beating what you get from the combo of PICs, speculative JITing, and static analysis.

(I worked on JavaScriptCore's optimizer, hence why I'm curious about where y'all are coming from on this)

thomasmg 3 days ago | parent | prev | next [-]

I fully agree. Lua and Luau are impressive, sure, but they are not really "small" or "simple", in my view. I don't think the complexity is unavoidable however. There are many programming languages that are much simpler, but at the same time very expressive. I'm working on one of them currently named "Bau" [1], and I started working on a Lua-inspired VM [2] for a subset of this language. There are many languages like mine, most of them incomplete and not really popular, discussed in [3].

[1] https://github.com/thomasmueller/bau-lang [2] https://github.com/thomasmueller/bau-lang/blob/main/src/test... [3] https://www.reddit.com/r/ProgrammingLanguages/

kragen 3 days ago | parent | prev | next [-]

I agree that static and especially gradual typing add complexity, but it's a very much smaller amount of complexity than we're talking about here, so in fact it is very common to encounter dynamically-typed scripting languages that are much more complex than some languages with excellent type systems.

I think you can implement a Hindley–Milner type checker in about a page of code, not the 2000 pages of code you're talking about.

I'm not sure what you mean by "complete". H–M is complete in the sense that it's decidable and doesn't leave any holes: programs that check statically are guaranteed not to have type errors at runtime. It handles higher-order functions and parametric polymorphism (generics) out of the box, it doesn't suffer from null pointers, and it can even handle mutability. And it's fully inferrable. There are various extensions to make it more expressive (GADTs, typeclasses, subtyping, linearity, tagged arguments) but even the basic HM system is already a lot more powerful than something like the type system of C or Java 1.7.

pansa2 3 days ago | parent [-]

> it is very common to encounter dynamically-typed scripting languages that are much more complex than some languages with excellent type systems.

You're right, that statement was too general. Python is a dynamically-typed scripting language (if you exclude external tools like MyPy), and is one of the most complex languages out there.

I should have been more specific: "Any language with a reasonably-complete type system is inevitably going to be much more complex than Lua".

> I agree that static and especially gradual typing add complexity, but it's a very much smaller amount of complexity than we're talking about here [...] I think you can implement a Hindley–Milner type checker in about a page of code

aw1621107's comment shows that Luau's type checker (the "Analysis" directory) is ~60% of the project's code. Maybe there are languages where the equivalent is just a single page, but even then, type checking makes a language implementation more complex in other ways as well.

For example, Luau's AST implementation alone is 75% the size of the whole of Lua 5.1. By deferring type-checking to runtime, Lua can avoid the need to build an AST at all: the compiler can go straight from source code to bytecode.

kragen 2 days ago | parent [-]

Building an AST is also not a large amount of code, especially if you already have GC and some kind of runtime type discrimination. And building an AST simplifies a lot of other things you might want to do, not just type checking. You're speaking as if compilers were invented in the Obama presidency because previously computers didn't have enough memory for all their code, and were invariably many-person projects.

But, in fact, writing a compiler, with an AST and static types, is a common single-person term project for undergraduates, and it's something people have been doing for 70 years, since computers used magnetic drums and acoustic delay lines for RAM. We've learned techniques since then that make it easier, which is why undergraduates can do it now.

One notable example is Stephen C. Johnson's "portable C compiler", which was the main C compiler in the late 70s and early 80s. By my count the current version of it at https://github.com/PortableCC/pcc is a bit under 50,000 lines of code, including C, some kind of C++, and Fortran frontends and backends for 18 architectures, but not including lex and yacc. I just built it here on my phone, which required implementing getw() and putw() (maybe I don't know the right feature test macros) and #including <strings.h> with the "s" in aarch64/local2.c. The executables total about 320K, including C and C++ (no Fortran) and only aarch64.

lifthrasiir 3 days ago | parent | prev | next [-]

I have written a Lua type checker in the past and have delved enough into the Lua source code. For that reason I can say that Lua is particularly densely coded; Lua's 14K LoC is something more like 3--40K LoC when coded in the normal, less dense way. Lua is not necessarily small; it's just kinda concise.

aw1621107 3 days ago | parent | prev [-]

To further elaborate, here's a more detailed breakdown of tokei's line counts for each of the directories you list + the CodeGen directory:

- Analysis: 62821 lines of C++ code, 9254 lines of C headers

- Ast: 8444 lines of C++, 2582 lines of C headers

- CodeGen: 21678 lines of C++, 4456 lines of C headers

- Compiler: 7890 lines of C++, 542 lines of C headers

- VM: 16318 lines of code, 1384 lines of C headers

Compare to Lua 5.1, which tokei says has 11104 lines of C and 1951 lines of C headers in the src/ directory.

HaroldCindy 3 days ago | parent [-]

To be fair, both `Analysis` (the type-checker, not necessary at runtime or compile time) and `CodeGen` (the optional JIT engine) have no equivalent in PUC-Rio Lua.

If you look purely at the VM and things necessary to compile bytecode (AST, Compiler and VM) then the difference in code size isn't as stark.

Having worked with both Lua 5.1 and Luau VM code, Luau's codebase is a heck of a lot nicer to work on than the official Lua implementation even if it is more complex in performance-sensitive places. I have mixed feelings on the structural typing implementation, but the VM itself is quite good.

implicit 3 days ago | parent | next [-]

Further, these extra components are easy to omit if you don't want to use them.

The REPL that we offer in the distribution doesn't include any of the analysis logic and it's just 1.7mb once compiled (on my M1 Macbook). I'm not sure how much smaller it gets if you omit CodeGen.

Luau can be pretty small if you need it to be.

kragen 3 days ago | parent [-]

Did you say just 1.7MB? For the REPL alone? Or is that with a bunch of heavyweight libraries?

The Lua REPL executable on my cellphone is 0.17 megabytes.

kragen 2 days ago | parent [-]

Also for the first ten years I used computers I was using all kinds of REPLs on computers that didn't have 1.7MB of RAM. On my first computer, which had one floppy drive and no hard drive, 1.7MB would have been 17 floppy disks, or 26 times the size of its RAM. So I'm kind of unconvinced by this stance that 1.7MB is a small REPL.

I mean, it's smaller than bash? But even your mom is smaller than bash.

aw1621107 3 days ago | parent | prev [-]

> If you look purely at the VM and things necessary to compile bytecode (AST, Compiler and VM) then the difference in code size isn't as stark.

I suspected as much, but I didn't want to guess since I'm not familiar with either codebase. Thanks for the info!