Remix.run Logo
derefr a day ago

> The issue with "mining" the rom for shaders is they're not defined in a consistent way across the games.

I don't want to be snippy, but — I don't think you understood the rest of the paragraph you're attempting to rebut here, since this is exactly (part of) what I said myself. (I wouldn't blame you if you didn't understand it; the concept of "concolic execution" is probably familiar to maybe ~50000 people worldwide, most of them people doing capital-S Serious static-analysis for work in cryptanalysis, automated code verification, etc.)

To re-explain without the jargon: you wouldn't be "mining" the shaders as data-at-rest; rather, you'd be "running" the ROM under a semi-symbolic (symbolic+concrete — concolic!) interpreter, one that traverses all possible code-paths "just enough" times to see all "categories of states" (think: a loop's base-case vs its inductive case vs its breakout case.) You'd do this so that, for each "path of states" that reaches an instruction that tells the console's GPU "this here memory, this is a shader now", the interpreter could:

1. look back at the path that reached the instruction;

2. reconstruct a sample (i.e. with all irrelevant non-branch-determinant values fixed to placeholders) concrete execution trace; and then

3. concretely "replay" that execution trace, using the emulator itself (but with no IO peripherals hooked up, and at maximum speed, and with no need for cycle-accurate timed waits since inter-core scheduling is pre-determined in the trace);

4. which would, as a side-effect, "construct" each piece of shader object-code into memory — at a place where the interpreter is expecting it, given the symbolic "formula" node that the interpreter saw passed into the instruction ("formula node": an AST subtree built out of SSA-instruction branch-nodes and static-value leaf-nodes, referenced by a versioned Single-Static-Information cell, aliasable into slices within CPU-register ADTs, or into a layered+sparse memory-cell interval-tree ADT);

5. so that the interpreter can then pause concrete emulation at the same "load this as a shader" instruction; reach into the emulator's memory where the "formula node" said to look; and grab the shader object-code out.

If you know how the AFL fuzzer works, you could think of this as combining "smart fuzzing" (i.e. looking at the binary and using it to efficiently discover the "constraint path" of branch-comparison value-ranges that reaches each possible state); with a graph-path-search query that "directs" the fuzzer down only paths that reach states we're interested in (i.e. states that reach a GPU shader load instruction); and with an observing time-travelling debugger/tracer connected, to then actually execute the discovered "interesting" paths up to the "interesting" point, to snapshot the execution state at that point and extract "interesting" data from it.

---

Or, at least, that's how it works in the ideal case.

(In the non-ideal case, it's something you can't resolve because the "formula" contains nodes that reference things the interpreter can't concretely emulate without combinatoric state-space explosion — e.g. "what was loaded from this save file created by an earlier run of the game process"; or maybe "what could possibly be in RAM here" when the game uses multiple threads and IPC, and relies on the console OS to pre-emptively schedule those threads, so that "when a message arrives to a thread's IPC inbox" becomes non-deterministic. So this wouldn't work for every game. But it could work for some. And perhaps more, if you can have your concolic interpreter present a more-stable-than-reality world by e.g. "coercing processors into a fake linear clock that always pulses across the multiple CPU cores in a strict order each cycle"; or "presenting a version of the console's OS/BIOS that does pre-emptive thread scheduling deterministically"; etc.)

jamesgeck0 a day ago | parent [-]

Many games that have unique shaders all the way up to the end of the game. Reconstructing the path of states required to get an individual sample from the end of a 100 hour JRPG in this manner seems like it would be hilariously computationally expensive?