I have no specific knowledge, but another approach would be to integrate more unusual very-long-instruction-word micro-instructions, like large scale matrix functions, algorithm encode/decode functions, and very long vector operations.

As I recall, Transmeta's CPU could accept x86 instructions because the software translator, called Code Morphing Software (like Rosetta), would decompose the x86 instruction into a set of steps over a very-long-instruction-word. VLIW's design is such that all of the instructions went into separate, parallel pipelines. Each pipeline had specific set of abilities. Think, the first three pipelines might be able to do integer arithmetic, but 3 and 4 can do floats. Also, the CPU implemented a commit/rollback concept which allowed it cause "faults," like branch miss-predictions, interrupts, and instruction faults. This allowed the Transmeta CPU to emulate the x86 beyond just JIT compilations. In theory, it could emulate any other CPU. They tried going after Intel (and failed); but, I think they would have been better off trying go after any one trying to jump start a new architecture.

Part of the reason why CPUs aren't good at GPU activities is because the instructions are expected to have pretty small, definite set of inputs and outputs (registers), use a reasonable number of CPU cycles, and must devote logic to ensure a fault can be unwound (CPU doesn't crash). FPGs are cool because you can essentially have wholly independent units with their own internal state. The little units can be wired any way desired. The problem with FPGs is all that interconnect means a lot of capacitance in the lines, so much slower clock speeds.

So, maybe they are trying to strike a balance. They have targeted instructions are more FPG-like, like "perform algorithm." The instruction receives a set of flags that defines which algorithms to use and in what order (use vector as 8-bit integers, mask with 0x80, compute 16bit checksum) and a vector register. You can loading vectors and running them then finally "read perform algorithm result" with flag "get compute 16bit checksum." FPG-like and registers aren't "polluted" with intermediate state.

▲

6SixTy 7 months ago | parent [-]

Transmeta's whole elevator pitch was a power efficient CPU that through translation software, happened to run x86 instructions so there's no porting nonsense necessary. Only issue was that they made it 1/x efficient, at 1/x the speed.

Interesting fact is that the guy who architected Transmeta's CPUs also worked on Itanium and Russia's Elbrus CPUs. The Elbrus is sort of a spiritual successor to Transmeta's efforts at this translation thing, but it is very much aimed as a hardware root of trust solution to sandboxing software rather than a genuine effort at competing in foreign markets.

	▲	one_even_prime 7 months ago \| parent [-]
		Who was that guy?