Is it just memory bandwidth? Or is it that AMD is not well supported by pytorch well enough for most products? Or some combination of those?

  • Naz@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    I’ve gotten LLAMA running locally during CLBlast on an AMD GPU, and using the CPU simultaneously (basically APU execution pathway)

    AMD is seriously slacking when it comes to machine learning, the hardware is Uber powerful, but just like everyone complains about, software isn’t there.

    ROCM doesn’t even work on Windows, FFS.

    You can run models on almost anything but the token generation is extremely slow. Like, you might be waiting upwards of 5 minutes for a response, or something like 0.2-0.6/tokens per second, which for a minimum of 100 tokens to be coherent is abysmal.