AMD - Advanced Micro Devices, Inc

Discussion in 'Stock Message Boards NYSE, NASDAQ, AMEX' started by Stockaholic, Mar 31, 2016.

  1. TomB16

    TomB16 Well-Known Member

    Joined:
    Jun 22, 2018
    Messages:
    4,819
    Likes Received:
    2,972
    Nerd alert.

    20FP is a complex term that nobody here is likely to understand, including me. I had to do some reading. It's marketing shorthand for 20 TOPS of FP16-like throughput.

    They have created an FP16 shorthand for cases in which a large block of numbers share the same exponent. This is an extremely common case. AMD's optimization is to note the exponent only once for a block and store only the mantissas (mantissae?). This makes both block loading and numerical operations faster by using this form of data compression.


    Incel nerd alert.

    I'll bet nobody knows this. Not even engineers. You're all way too young.

    In the very early days of computing, Digital Equipment Corporation (later DEC) took a big chunk of the scientific community away from IBM by having double the floating point precision. Digital would not have succeeded without it. They accomplished double the precision with hidden bit normalization.

    Hidden bit normalization is a clever trick of the binary number system.

    In any floating point representation, you have a mantissa and an exponent. The mantissa holds the bits of significance. Since it's both binary and represented with scientific notation, the digit left of the decimal will always be "1". If that digit were "0", the decimal should be shifted to the right and the exponent decremented. In other words, it would be an error of normalization. Since the mantissa is always "1.x", the "1." was not stored. That additional bit was used for increased precision by one binary digit.

    Tell your wives and girlfriends. If that piece of computing history gets you laid, report back immediately and provide details.
     
    #2781 TomB16, Jun 14, 2025 at 2:34 PM
    Last edited: Jun 14, 2025 at 2:41 PM
  2. TomB16

    TomB16 Well-Known Member

    Joined:
    Jun 22, 2018
    Messages:
    4,819
    Likes Received:
    2,972
    Here is a finer point on the AMD NPU topic that should be of interest to investors.

    The MI350/355 is promised for Q3 of this year. That is extremely soon. That's when it is scheduled to ship to partners, not customers.

    AMD's ROCm is much further along than just about anyone realizes with v7. Last I heard, it was not quite feature complete but it could easily be so by now. It's going to take some time to stabilize ROCm. It will need a lot of debugging and optimization. That will take time. That's why I suggested next year for the MI350.

    If you pay attention, people who criticize AMD software will either not cite version and will cast shade using a broad brush or they will cite how poor ROCm v5 is. It's going to take some time to break through the lies and demonstrate competence but I believe they are on their way. I also believe AMD is likely to achieve their time target. There may be a bug or two that need to be worked out but I think it will be ready.

    BUT... MI400 is scheduled to come out in 2026. It will probably be somewhat early in 2026 with volume late in the first half.

    The MI400 is an absolute juggernaut. It scales to 1024 GPUs, 20TB/s of memory bandwidth.

    Amd claims 300GB/s of "scale out bandwidth" for MI400. It remains to be known what this means. PCIe v6 x16 would only be 64GB/s. Vendors like to double that number for full duplex connections so that would be 128. The rest must come from infinity fabric or some new interconnect.

    Whatever the case, I do not expect nVidia to answer with superior performance. I expect they will soon be playing catch up.

    The real fulcrum, on which AMD NPU dominance will need to be built, is ROCm. If ROCm is not a good, AMD is dead in the NPU space. If ROCm is good, I expect AMD hardware to compete strongly with nVidia within 12 months.

    Lastly, I don't know what to think about AMD's Helios rack architecture, in terms of customer buy-in. I suspect there will be some resistance, since the 19 inch rack has been standard longer than I've been alive. Also, I'm old. So, the double width rack makes a lot of sense but it breaks from an entrenched standard so watch this space but I suspect it will dominate. There is a bit of efficiency to being able to house 1024 GPUs in a single rack.

    MI400 will out-scale nVidia's Hopper but I trust MI400 will be competing with son-of-Hopper by next year, so it's a moving target.

    This seems to be the bulk of the semi industry profit flow for the next few years. Whomever takes AI will be the most profitable.
     
    #2782 TomB16, Jun 14, 2025 at 3:00 PM
    Last edited: Jun 14, 2025 at 6:52 PM

Share This Page