Remix.run Logo
written-beyond a day ago

Could you go into a little more depth about how what they're building is anything different from an FPGA.

FPGA's are basically a matrix of interconnected MUXs and LUTs, providing whatever functionality a designer may require, that fits in it's die.

cdumler a day ago | parent | next [-]

I have no specific knowledge, but another approach would be to integrate more unusual very-long-instruction-word micro-instructions, like large scale matrix functions, algorithm encode/decode functions, and very long vector operations.

As I recall, Transmeta's CPU could accept x86 instructions because the software translator, called Code Morphing Software (like Rosetta), would decompose the x86 instruction into a set of steps over a very-long-instruction-word. VLIW's design is such that all of the instructions went into separate, parallel pipelines. Each pipeline had specific set of abilities. Think, the first three pipelines might be able to do integer arithmetic, but 3 and 4 can do floats. Also, the CPU implemented a commit/rollback concept which allowed it cause "faults," like branch miss-predictions, interrupts, and instruction faults. This allowed the Transmeta CPU to emulate the x86 beyond just JIT compilations. In theory, it could emulate any other CPU. They tried going after Intel (and failed); but, I think they would have been better off trying go after any one trying to jump start a new architecture.

Part of the reason why CPUs aren't good at GPU activities is because the instructions are expected to have pretty small, definite set of inputs and outputs (registers), use a reasonable number of CPU cycles, and must devote logic to ensure a fault can be unwound (CPU doesn't crash). FPGs are cool because you can essentially have wholly independent units with their own internal state. The little units can be wired any way desired. The problem with FPGs is all that interconnect means a lot of capacitance in the lines, so much slower clock speeds.

So, maybe they are trying to strike a balance. They have targeted instructions are more FPG-like, like "perform algorithm." The instruction receives a set of flags that defines which algorithms to use and in what order (use vector as 8-bit integers, mask with 0x80, compute 16bit checksum) and a vector register. You can loading vectors and running them then finally "read perform algorithm result" with flag "get compute 16bit checksum." FPG-like and registers aren't "polluted" with intermediate state.

6SixTy a day ago | parent [-]

Transmeta's whole elevator pitch was a power efficient CPU that through translation software, happened to run x86 instructions so there's no porting nonsense necessary. Only issue was that they made it 1/x efficient, at 1/x the speed.

Interesting fact is that the guy who architected Transmeta's CPUs also worked on Itanium and Russia's Elbrus CPUs. The Elbrus is sort of a spiritual successor to Transmeta's efforts at this translation thing, but it is very much aimed as a hardware root of trust solution to sandboxing software rather than a genuine effort at competing in foreign markets.

one_even_prime a day ago | parent [-]

Who was that guy?

sillywalk a day ago | parent | prev | next [-]

If you mean about the Ubitium, then no - other than what's in the article.

If you mean more in depth about MAJC, then also no - I read an Ars Technica article around about it (and Itanium) around 25 years ago, when it came out and also the Wikipedia page.

I have no EE or CPU design background, I'd imagine most people would know far more than me. I just remembered the 'generic instruction unit' from MAJC and if this was something superficially similar but at the processor 'core' level.

imtringued a day ago | parent | prev [-]

Actually, FPGAs are a mix of everything nowadays. They have both programmable logic consisting of LUTs and flipflops in CLBs with integrated carry chains, connection boxes and routing switches, configurable SRAM blocks known as block RAM or sometimes UltraRAM, DSP blocks providing configurable arithmetic units, PLLs, conventional ARM cores, memory controllers, high speed transceivers and finally also VLIW cores for machine learning inference. Notice how a lot of the silicon area is actually taken up by hard silicon that can be connected to the programmable logic. The problem with the largest FPGAs is that you will reach the point where you are swimming in LUTs and the chip are is better spent on e.g. more memory or other hard wired logic like a processor core.