• 0 Posts
  • 116 Comments
Joined 3 years ago
cake
Cake day: August 15th, 2023

help-circle





  • The training is sophisticated, but inference is unfortunately really a text prediction machine. Technically token prediction, but you get the idea.

    For every single token/word. You input your system prompt, context, user input, then the output starts.

    The

    Feed the entire context back in and add the reply “The” at the end.

    The capital

    Feed everything in again with “The capital”

    The capital of

    Feed everything in again…

    The capital of Austria

    It literally works like that, which sounds crazy :)

    The only control you as a user can have is the sampling, like temperature, top-k and so on. But that’s just to soften and randomize how deterministic the model is.

    Edit: I should add that tool and subagent use makes this approach a bit more powerful nowadays. But it all boils down to text prediction again. Even the tools are described per text for what they are for.


  • Decent sized for what?

    Creative writing and roleplay? Plenty, but I try to fit it into my 16 GB VRAM as otherwise it’s too slow for my liking.

    Coding/complex tasks? No, that would need 128GB and upwards and it would still be awfully slow. Except you use a Mac with unified memory.

    For image and video generation you’d want to fit it into GPU VRAM again, system RAM would be way too slow.



  • You might genuinely be using it wrong.

    At work we have a big push to use Claude, but as a tool and not a developer replacement. And it’s working pretty damn well when properly setup.

    Mostly using Claude Sonnet 4.6 with Claude Code. It’s important to run /init and check the output, that will produce a CLAUDE.md file that describes your project (which always gets added to your context).

    Important: Review everything the AI writes, this is not a hands-off process. For bigger changes use the planning mode and split tasks up, the smaller the task the better the output.

    Claude Code automatically uses subagents to fetch information, e.g. API documentation. Nowadays it’s extremely rare that it hallucinates something that doesn’t exist. It might use outdated info and need a nudge, like after the recent upgrade to .NET 10 (But just adding that info to the project context file is enough).



  • Maybe it has changed again, but in the past I gave it a try. When 16 GB was a lot. Then when 32 GB was a lot. I always thought “Not filling up the RAM anyway, might as well disable it!”

    Yeah, no, Windows is not a fan. Like you get random “running out of memory” errors, even though with 16 GB I still had 3-4 GB free RAM available.

    Some apps require the page file, same as crash dumps. So I just set it to a fixed value (like 32 GB min + max) on my 64 GB machine.





  • Price has already stabilized and even had some dips. My 2x32 GB 6000 CL30 kit that I bought for 200€ early 2025 spiked to 900€. But now it costs 800€ and in the last weeks it was even possible to get it for 609€ when price dipped.

    Don’t forget that the Chinese are entering the DDR5 market, it will take a year or two, but more supply is coming.

    When the AI bubble pops (which can still take a while) the memory price will also plummet. The main reason price is up is the memory companies being careful this time around. Last time they built factories and raised supply, only to get burned. Now they are hedging their bets instead of expanding.


  • Of course they already have stock, the original launch date was Q1 2026 (which ends in 3 weeks). They might not have a ton of devices, but they do have stock. Same way they have been sending out dev kits.

    RAM has spiked since November/December, before that it was cheap. I got my 2x32 GB 6000 CL30 RAM for 200€.

    Valve has the issue that they could deliver the first batch for “cheap”, but then every unit they produce afterwards suddenly costs a chunk more (like 200-300€). So it’s either a paper launch or they’d have to raise the price above the Index, which won’t lead to good PR.