Remix.run Logo
weregiraffe 5 days ago

Now write a Python compiler in 500 lines of C.

bluGill 5 days ago | parent | next [-]

I could probably do it - but you wouldn't like it. My dictionaries would be a linked-list, looking for a key becomes a linear search... (if you gave me C++ I'd use std::map) I'm assuming you will allow me to use the C standard library, if I have to implement strlen or malloc in that 500 lines of C I'm not sure I can pull that off. 500 lines is aggressive, but IOCCC gives me plenty of tricks to get the line count down and the language isn't that big. I'm also going to assume 100% valid python code is fed in, if there is a bug or error of any sort that is undefined behavior.

Note that most of what makes python great isn't the language, it is the library. I believe that large parts of the python library are also written in C (for speed), and thus you won't be able to use my 500 line python compiler for anything useful because you won't have any useful libraries.

dekhn 5 days ago | parent | next [-]

A hash table in C is about 30 lines of code, so I don't think you have to stick to linked lists for dictionaries.

threeducks 4 days ago | parent | next [-]

9 lines seem to be sufficient (assuming string keys and int values).

    // Hashtable definition
    #include <string.h>
    #define N 1024
    int* map_ptr(const char **keys, int *values, const char *key){
        size_t h = 0, c = 0, i;
        for (const char *c = key; *c; c++) h = h * 33 + *(unsigned char*)c;
        for (i = h % N; c < N && keys[i] && 0 != strcmp(keys[i], key); i = (i + 1) % N, c++);
        if (!keys[i]) keys[i] = key;
        return 0 == strcmp(keys[i], key) ? &values[i] : NULL;
    }

    // Example usage
    const char *keys[N];
    int values[N];

    #include <stdio.h>

    int main(){
        // Set some values
        *map_ptr(keys, values, "one") = 1;
        *map_ptr(keys, values, "two") = 2;
        *map_ptr(keys, values, "three") = 3;

        // Retrieve values
        printf("one: %i\n", *map_ptr(keys, values, "one"));
        printf("two: %i\n", *map_ptr(keys, values, "two"));
        printf("three: %i\n", *map_ptr(keys, values, "three"));

        return 0;
    }
ludocode 5 days ago | parent | prev | next [-]

Indeed, a decent closed hash table is maybe 30 lines. An open hash table with linear probing is even less, especially if you don't need to remove entries. It's almost identical to a linear search through an array; you just change where you start iterating.

In my first stage Onramp linker [1], converting linear search to an open hash table adds a grand total of 24 bytecode instructions, including the FNV-1a hash function. There's no reason to ever linear search a symbol table.

[1]: https://github.com/ludocode/onramp/blob/develop/core/ld/0-gl...

bluGill 5 days ago | parent [-]

a linear search may be faster because it is cache and branch prediction frienly. Benchmarks on real world data is needed to make a final call.

bluGill 5 days ago | parent | prev [-]

Sure but a linear search is 5. when my limit is 500 lines of C I don't dare spare those lines.

jlokier 4 days ago | parent | prev [-]

If you can search (and do other operations) on a linked list, and the lists are long enough to matter, you can trivially speed it up to a fixed-size hash table with hardly any code changes.

This:

    entry = list_search(key, list);
becomes:

    entry = list_search(key, lists[hash(key) % (sizeof(lists)/sizeof(lists[0]))]);
This:

    list_add(key, entry, list);
becomes:

    list_add(key, entry, lists[hash(key) % (sizeof(lists)/sizeof(lists[0]))]);
etc.

A simple hash(), not high quality but good enough for non-adversarial inputs, can be a one-liner.

A good dictionary does more than this of course, especially dynamic sizing, but this simple change can radically speed up simple linked list code when the lists are long, you don't have a hash table implemention you can use, and you want to keep the code change very small.

The same principle can be used with other things than lists too.

bluGill 4 days ago | parent [-]

Those would be useful optimizations once the simplest thing works. However the goal here is 500 lines- not fast, correct, maintainable code. If I was to write this for real world use I wouldn't start with C in the first place.

jlokier 9 hours ago | parent [-]

That's why I described a not very well known trick that requires just +1 line, the definition of hash(). It's neither fast nor maintainable, it's just a neat hack that gives you some of the performance of hash tables for negligible source code overhead. Perfect for tiny, self-contained programs like this one.

wyldfire 5 days ago | parent | prev | next [-]

A python VM that consumes bytecode might be doable in not-ludicrous-amounts of C. Not 500 lines I suppose. But something manageable I think? Especially if you targeted the older releases.

jonjacky 5 days ago | parent [-]

In the CPython reference interpreter, that VM can be found at

https://github.com/python/cpython/blob/main/Python/ceval.c

It's 3619 lines. It's explained in this 515 line file:

https://github.com/python/cpython/blob/main/InternalDocs/

For comparison, there is a pure Python bytecode interpreter, its VM is here:

https://github.com/nedbat/byterun/blob/master/byterun/pyvm2....

It's 1043 lines.

nickpsecurity 5 days ago | parent | prev | next [-]

Maybe 500 lines of Pythonic, macro-heavy C. If the macros' LOC don't count. Maybe.

TZubiri 5 days ago | parent | prev [-]

Not to be that guy, but Python is an interpreted language.

That said, I guess technically you could make something that compiles python to an executable? This is hacker news after all

vidarh 5 days ago | parent | next [-]

A language is not inherently interpreted or compiled.

Some languages are more or less easy to compile efficiently and without embedding a JIT compiler, but any language can be compiled.

For Python in particular, there are already compilers.

If you want a nightmarish language to compile, look at Ruby. There are compilers even for Ruby.

nurettin 5 days ago | parent [-]

Python has the same amount of nightmare. Maybe even more. You can add static class and instance accessors at runtime, it supports full monkey patching just like Ruby does. You can meta program modules, classes, objects, you can decorate classes and functions, declare functions and lambdas anywhere. "compilers" usually disallow monkey business and compile only a subset.

vidarh 4 days ago | parent [-]

While my (half finished, buggy) Ruby compiler is definitely a subset, it allows the monkey business. There are ways to do it reasonably, but making it fast gets a lot harder when you do...

nurettin 4 days ago | parent [-]

Best in Class seems to be JRuby, but it needs a JVM in between.

zahlman 5 days ago | parent | prev [-]

This isn't even correct for the one specific reference implementation you're presumably thinking of. It is just as much "compiled" as Java or C#, just that the compilation is often done on the fly. (Although IIRC C# does some additional trickery to pretend that its bytecode is "real" executable code, wrapped in a standard .exe format.) Presumably you've noticed the __pycache__ folders containing .pyc files; those are compiled bytecode. When you install Python, typically the standard library will all be precompiled (at least in part, so that those bytecode files can be created by an admin user now, and used by a standard user later). There is an interpretive REPL environment, but that works by doing the same bytecode-compilation each time.