Remix.run Logo
peter_d_sherman 2 days ago

>"This wasn’t a fully transparent codebase, though. Like many production appliances, a large portion of the Python logic was shipped only as compiled .pyc files."

Observation: One of the great virtues of Python is that typically when someone runs a Python program, they are running a Python interpreter on transparent Python source code files, which means that typically, the person that runs a Python program has the entirety of the source code of that program!

But that's not the case here!

.pyc, aka "pre-compiled, pre-tokenized, pre-digested" aka "obfuscated" python -- one of the roots of this problem -- is both a blessing and a curse!

It's a blessing because it allows Python code to have different interpretation/compilation/"digestion" stages cached -- which allows the Python code to run faster -- a very definite blessing!

But it's also (equal-and-oppositely!) -- a curse!

It's a curse because as the author of this article noted above, it allows Python codebases to be obfuscated -- in whole or in parts!

Of course, this is equally true of any compiled language -- for example with C code, one typically runs the compiled binaries, and compiled binaries are obfuscated/non-transparent by their nature. And that's equally true of any compiled language. So this is nothing new!

Now, I am not a Python expert. But maybe there's a Python interpreter switch which says 'do not run any pre-digested cached/obfuscated code in .pyc directories, and stop the run and emit an error message if any are encountered'.

I know there's a Python switch to prevent the compilation of source code into .pyc directories. Of course, the problem with this approach is that code typically runs slower...

So, what's the solution? Well, pre-created (downloaded) .pyc directories where the corresponding Python source code is not provided are sort of like the equivalent of "binary blobs" aka "binary black boxes" that ship with proprietary software.

Of course, some software publishers that do not believe in Open-source / transparent software might argue that such binary blobs protect their intellectual property... and if there's a huge amount of capital investment necessary to produce a piece of software, then such arguments are not 100% entirely wrong...

Getting back to Python (or more broadly, any interpreted language that has the same pre-compilation/obfuscation capability), what I'd love to see is a runtime switch, maybe we'd call it something like '-t' or '-transparent' or something like that, where if passed to the interpreter prior to running a program, then if it encounters a .pyc (or equivalent, for whatever format that language uses, call it "pre-tokenized", "pre-parsed", "pre-compiled" or whatever), then it immediately stops execution, and reports an error where the exact line and line number, the exact place where the code in the last good source file which called it, is reported, exactly to the end-user, and then execution completely stops!

This would allow easy discovery of such "black box" / "binary blob" non-transparent dependencies!

(Note to Future Self: Put this feature in any future interpreters you write... :-) )