Is it just me or the clocks frequently break or change appearance without the page being refreshed?
Edit: nevermind, I skipped past the sentence explaining that every minute, the site prompts LLMs for a new solution. This is hilariously sad how LLMs aren’t able to be consistent from one prompt to another.
When I tried it Kimi K2 was surprisingly consistent and not even as bad as the others. Occasionally the numbers or hands (I couldn’t really tell which) were possitioned a bit off, for example the seconds hand will appear to be horizontal but the 9 or 3 will be slightly below or slightly above the hand. But whoever can center a div may throw the first stone, and it’s not going to be me for sure
if every single token is, at the end, chosen by random dice roll (and they are) then this is exactly what you’d expect.
that’s a massive oversimplification
not really. If the system outputs a probability distribution, then by definition, you’re picking somewhat randomly. So not really a simplification
Apologies for the late reply, but it turns out I can’t let that sit. Sorry for the rant, but I work in RL and saying “it’s just dice rolls” is insulting to my entire line of work. :(
A probability distribution is not the same as random dice roll. Dice rolls are uniformly and independently random, whereas the probability distributions for LLMs are conditional on the context and the model’s learned parameters. Additionally, all modern LLMs use top K and p sampling–which filters the probability distribution to only high confidence words–so the probability of it choosing to say random garbage is exactly zero.
The issues with LLMs have nothing to do with their sampling from random distributions. That’s just a minor part of their training, and some LLMs don’t even do random sampling since they use tree search. The issues with LLMs are the result of people trying to teach it intelligence using behavior cloning on a corpus of human words and images. Words can’t encode wisdom, only knowledge. Wisdom can only be gained through lived experience.
How well do you think you would perform if you were born into a cave, forced to read a thousand dictionaries in order with no context, and then your only interaction with the outside world was a single question from a single human, and then you died? If you ask me, the LLMs are doing suprisingly well given their “lived experiences”.
“conditional on the context and the model’s learned parameters.” you seem to be under the wrong impression that “random dice roll” == “random dice roll from a uniform distribution”. I didn’t say that. If it outputs a probability distribution, which it does, then you sample it randomly according to that distribution, not a uniform one.
As for your last paragraph: I wasn’t, I didn’t do that, and if that’s all the system can do then people should stop claiming it is even remotely intelligent. Whatever the excuses, the systems aren’t (and won’t be getting) there. If you’re trying to get me to empathize with a couple of matrices, then you’re not going to succeed.
I don’t care if you get offended because someone else doesn’t like your line of work. I think what you do is actively harmful to humanity. I also dislike weapons manufacturers, how they feel about it is irrelevant. You’re no different
This is my favorite

The last one, Kimi K2, has been consistently good as long as I’ve been looking at it. That’s pretty impressive.
The rest are hilarious!
Haha, I found myself thinking the same thing, and then caught myself, realizing all the other LLMs on this page had lowered the bar immensely for what I’m considering impressive.
I thought the same and then Kimi K2 came up with a clock that has two 12 and no 11…
By far the best, but still off. These three were loaded in the same order as i post them:



I dig the square clock, and am now sad that the numbers can’t be put into the corners on a real clock. Unless they’re shifted from the usual position.
Cartier found a work around quite some time ago and maybe they weren’t even the first to design a square ‘clock’:
(The roman numerals are nice, but notice the ‘circle’ between the numerals and the hands, almost like the circle from the ai)
That one is pretty good, though the Roman numerals are rather busy and uneven.
This one is closer, though now I have to wonder if all non-square rectangular clocks have an old-timey whiff for me, or it’s just the border here:

This is also impressive:




