Same as the failure of Itanium VLIW instructions: you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime.

Also, additional information on instructions costs instruction bandwidth and I-cache.

▲

david-gpu an hour ago | parent [-]

> you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime

That is very context-dependent. In high-performance code having explicit control over caches can be very beneficial. CUDA and similar give you that ability and it is used extensively.

Now, for general "I wrote some code and want the hardware to run it fast with little effort from my side", I agree that transparent caches are the way.

	▲	10000truths 35 minutes ago \| parent [-]
		x86 provides this control with non-temporal load/store instructions.