AMD - Advanced Micro Devices, Inc

TomB16 · Jun 14, 2025

Nerd alert.

20FP is a complex term that nobody here is likely to understand, including me. I had to do some reading. It's marketing shorthand for 20 TOPS of FP16-like throughput.

They have created an FP16 shorthand for cases in which a large block of numbers share the same exponent. This is an extremely common case. AMD's optimization is to note the exponent only once for a block and store only the mantissas (mantissae?). This makes both block loading and numerical operations faster by using this form of data compression.

Incel nerd alert.

I'll bet nobody knows this. Not even engineers. You're all way too young.

In the very early days of computing, Digital Equipment Corporation (later DEC) took a big chunk of the scientific community away from IBM by having double the floating point precision. Digital would not have succeeded without it. They accomplished double the precision with hidden bit normalization.

Hidden bit normalization is a clever trick of the binary number system.

In any floating point representation, you have a mantissa and an exponent. The mantissa holds the bits of significance. Since it's both binary and represented with scientific notation, the digit left of the decimal will always be "1". If that digit were "0", the decimal should be shifted to the right and the exponent decremented. In other words, it would be an error of normalization. Since the mantissa is always "1.x", the "1." was not stored. That additional bit was used for increased precision by one binary digit.

Tell your wives and girlfriends. If that piece of computing history gets you laid, report back immediately and provide details.

TomB16 · Jun 14, 2025

Here is a finer point on the AMD NPU topic that should be of interest to investors.

The MI350/355 is promised for Q3 of this year. That is extremely soon. That's when it is scheduled to ship to partners, not customers.

AMD's ROCm is much further along than just about anyone realizes with v7. Last I heard, it was not quite feature complete but it could easily be so by now. It's going to take some time to stabilize ROCm. It will need a lot of debugging and optimization. That will take time. That's why I suggested next year for the MI350.

If you pay attention, people who criticize AMD software will either not cite version and will cast shade using a broad brush or they will cite how poor ROCm v5 is. It's going to take some time to break through the lies and demonstrate competence but I believe they are on their way. I also believe AMD is likely to achieve their time target. There may be a bug or two that need to be worked out but I think it will be ready.

BUT... MI400 is scheduled to come out in 2026. It will probably be somewhat early in 2026 with volume late in the first half.

The MI400 is an absolute juggernaut. It scales to 1024 GPUs, 20TB/s of memory bandwidth.

Amd claims 300GB/s of "scale out bandwidth" for MI400. It remains to be known what this means. PCIe v6 x16 would only be 64GB/s. Vendors like to double that number for full duplex connections so that would be 128. The rest must come from infinity fabric or some new interconnect.

Whatever the case, I do not expect nVidia to answer with superior performance. I expect they will soon be playing catch up.

The real fulcrum, on which AMD NPU dominance will need to be built, is ROCm. If ROCm is not a good, AMD is dead in the NPU space. If ROCm is good, I expect AMD hardware to compete strongly with nVidia within 12 months.

Lastly, I don't know what to think about AMD's Helios rack architecture, in terms of customer buy-in. I suspect there will be some resistance, since the 19 inch rack has been standard longer than I've been alive. Also, I'm old. So, the double width rack makes a lot of sense but it breaks from an entrenched standard so watch this space but I suspect it will dominate. There is a bit of efficiency to being able to house 1024 GPUs in a single rack.

MI400 will out-scale nVidia's Hopper but I trust MI400 will be competing with son-of-Hopper by next year, so it's a moving target.

This seems to be the bulk of the semi industry profit flow for the next few years. Whomever takes AI will be the most profitable.

TomB16 · Jul 4, 2025

It appears, Zen 6 will be the hot gaming chip for 18 months. Zero chance Intel can catch up or even get close.

I'm not sure what this means. The PC gaming market is worth many billions of dollars and AMD has killed Intel with nVidia now taking a severe beating. This will net AMD some nice revenue but I believe the primary objective is the AI market where AMD continues to be the underdog. AMD is a credible player in AI with some great looking projects that could turn the tide but they are well off the lead, right now. Right now, is what matters because PowerPoint decks are not part of base reality.

TomB16 · Jul 15, 2025

It's confirmed, Zen 6 is well out of the first couple of rounds of engineering samples and is now in the hands of technology partners. They're already working aggressively on clock ramps.

A few posts ago, I lamented that AMD hasn't picked up the mantle of leading X86_64 architecture. Zen 7 is going to have a whole bunch of new stuff, including instruction set extensions. Zen 7 in 2028 is going to bury Intel. Intel needs to slim down and stop making massive bets on regaining the lead so they can survive as a value chip maker.

Forget ARM. AMD will own the world.

TomB16 · Jul 29, 2025

I've been looking at RDNA 5 and things have become pretty clear. AMD is not trying to catch nVidia. Their goal is to beat them.

They will have massive parallelism with 70 cores for GPU parts. They are specifying 68 cores, so they can have up to 2 failed cores and still ship the part. This is a brilliant response to N3C defect rates to maintain yield. They are thinking huge.

Top end AI dies will probably have 192 cores. Energy efficiency will be way, way up. I have strong confidence AMD energy efficiency will achieve quite a few wins for the Radeon platform.

With fat busses and multi-channel LDDR7, they will drop Infinity cache. No need. They will have massive, native, bandwidth. That should improve latency a whole lot.

There are RDNA 5 engineering samples in testing, right now. Release looks like early 2026 but I doubt they are committed to any dates, at this time. They're probably waiting for ROCm and their platforms to mature. I highly doubt they are waiting on GPU hardware. They could have next gen GPUs in the channel in four months.

Features and performance are often dropped before release. If AMD can achieve half of what they're working on, they will dominate nVidia for some time.

nVidia's next generation looks like it is targeting 1H2027.

TomB16 · Jul 29, 2025

FWIW, Zen 6 looks like it will be a significant step forward. There is no question, AMD will be the unrivaled CPU leader.

However, the work AMD is doing on Radeon is far more exciting, from a business perspective.

If AMD can eat nVidia's lunch, that will reshape the IT landscape.

TomB16 · Jul 29, 2025

One last point. Zen 6 will be the last evolution of the Ryzen architecture. They might call the follow on processor Zen 7 but it will be a new architecture.

TomB16 · Aug 5, 2025

July CPU sales in Germany were 94% AMD. AMD also dominating in other countries but most countries, including the US, its impossible to get specific metrics of all sales. The best you can do is get metrics from specific vendors, which I do.

Intel is hosed with no knob to turn off the cleansing flow.

TomB16 · Aug 31, 2025

I've been studying AMD's HELIOS rack design and I missed an important design cue that changes everything. I'm now 75% confident AMD will beat nVidia in AI efficiency, cost, and scale in 2026.

RDNA 5 will put AMD ahead of nVidia with substantially more efficient CU cores, more cores per die, and multi-die with an inventory of symbiotic dies to construct just about anything. The specs are so far beyond where nVidia is now, it seems unlikely nVidia will be able to respond. 80% chance AMD will have substantially more powerful nodes than anything nVidia can respond with, next generation.

HELIOS is a big step forward over the rest of the industry. Their double width rack is a unicorn but it will allow them to have larger clusters with more nodes in a switching fabric. This is a tremendous advantage.

So many disparaging things have been written about ROCm that it's difficult to imagine it can all be wrong but I'm reasonably confident it is. ROCm is the wild card that will decide if AMD is propelled to upper middle of the pack or to the lead. I suspect ROCm will have some teething pains but, if AMD can bring it to bear on AI, AMD will dominate.

ROCm is designed to scale to roughly as many nodes as CUDA. The key is the thread block limit. Thread block matricies cannot scale to an entire farm. AMD's strategy to focus on node and cluster performance will make thread blocks substantially more efficient. This is the key.

There are efficiency problems at scale. 10 nodes cannot do 10x the work of 1 node. It's become clear, AMD is concentrating on node level performance. This is counter intuitive, since AMD is currently in second place in node power but this is a nuanced victory with AMD and nVidia having different strengths. RDNA 5 is designed to change the landscape. It's not just more shaders and adding as many CUs as a die shrink will allow.

There is a lot to execute here. HELIOS is in place and a big win. RDNA 5 and ROCm are not in place, although they are both in late alpha or early beta. AMD appears to have world beating strategy. Now it's down to the people who do the work to put AMD in the lead. Time will tell.

TomB16 · Aug 31, 2025

Building on the previous post but a slightly different topic, AMD's RDNA 5 GPU die is designed around the maximum possible die size. They have bet their AI future on TSMC being able to scale their N3 node with strong yields.

Using max die size would normally produce dismal yields and massive discards. AMD's strategy is to have a high number of compute units and deactivate units which fail QC. In this way, a GPU with 88 cores will be specified to have 80 cores so AMD can ship a part with 10% failed cores.

As I understand, any wafer is expected to have 3 inclusions in an extremely well optimized lithography production line. It sounds like the numbers have been tremendously higher lately on leading edge nodes. Inclusions are foreign particles that damage the circuit structure. Dies that contain an inclusion are traditionally thrown out. This is why max die size is directly tied to yield. AMD has moved yield performance from the physical tier to the functional tier with their individual core QC strategy.

While this sounds trivial, it isn't easy to be able to activate/deactivate cores and retain performance.

AMD - Advanced Micro Devices, Inc

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

Share This Page

Useful Searches

AMD - Advanced Micro Devices, Inc

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

TomB16 Well-Known Member

Share This Page