- 1 Post
- 114 Comments
scruiser@awful.systemsto TechTakes@awful.systems•Apple: ‘Reasoning’ AIs fail hard if they actually have to thinkEnglish13·2 days agoAnother thing that’s been annoying me about responses to this paper… lots of promptfondlers are suddenly upset that we are judging LLMs by abitrary puzzle solving capabilities… as opposed to the arbitrary and artificial benchmarks they love to tout.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 15th June 2025English21·2 days agoSo, I’ve been spending too much time on subreddits with heavy promptfondler presence, such as /r/singularity, and the reddit algorithm keeps recommending me subreddit with even more unhinged LLM hype. One annoying trend I’ve noted is that people constantly conflate LLM-hybrid approaches, such as AlphaGeometry or AlphaEvolve (or even approaches that don’t involve LLMs at all, such as AlphaFold) with LLMs themselves. From their they act like of course LLMs can [insert things LLMs can’t do: invent drugs, optimize networks, reliably solve geometry exercise, etc.].
Like I saw multiple instances of commenters questioning/mocking/criticizing the recent Apple paper using AlphaGeometry as a counter example. AlphaGeometry can actually solve most of the problems without an LLM at all, the LLM component replaces a set of heuristics that make suggestions on proof approaches, the majority of the proof work is done by a symbolic AI working with a rigid formal proof system.
I don’t really have anywhere I’m going with this, just something I noted that I don’t want to waste the energy repeatedly re-explaining on reddit, so I’m letting a primal scream out here to get it out of my system.
scruiser@awful.systemsto TechTakes@awful.systems•Apple: ‘Reasoning’ AIs fail hard if they actually have to thinkEnglish7·2 days agoJust one more training run bro. Just gotta make the model bigger, then it can do bigger puzzles, obviously!
scruiser@awful.systemsto TechTakes@awful.systems•Apple: ‘Reasoning’ AIs fail hard if they actually have to thinkEnglish30·2 days agoThe promptfondlers on places like /r/singularity are trying so hard to spin this paper. “It’s still doing reasoning, it just somehow mysteriously fails when you it’s reasoning gets too long!” or “LRMs improved with an intermediate number of reasoning tokens” or some other excuse. They are missing the point that short and medium length “reasoning” traces are potentially the result of pattern memorization. If the LLMs are actually reasoning and aren’t just pattern memorizing, then extending the number of reasoning tokens proportionately with the task length should let the LLMs maintain performance on the tasks instead of catastrophically failing. Because this isn’t the case, apple’s paper is evidence for what big names like Gary Marcus, Yann Lecun, and many pundits and analysts have been repeatedly saying: LLMs achieve their results through memorization, not generalization, especially not out-of-distribution generalization.
scruiser@awful.systemsto TechTakes@awful.systems•Deep in Mordor where the shadows lie: Dystopian tales of that time when I sold out to GoogleEnglish8·4 days agoA surprising number of the commenters seem to be at least considering the intended message… which makes the contrast of the number of comments failing at basic reading comprehension that much more absurd (seriously, it’s absurd how many comments somehow missed that the author was living in and working from Brazil and felt it didn’t reflect badly on them to say as much in the HN comments).
scruiser@awful.systemsto TechTakes@awful.systems•Manifest 2025 Update: Still A Tech Bro Eugenics ConferenceEnglish7·4 days agoI struggle to think of a good reason why such prominent figures in politics and tech would associate themselves with such an event.
There is no good reason, but there is an obvious bad one: these prominent figures have racist sympathies (if they aren’t “outright” racist themselves) and in between a lack of empathy and position of privilege don’t care about the negative effects of boosting racist influencers.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 8th June 2025English8·5 days agoI’ve been waiting for this. I wish it had happened sooner, before DOGE could do as much damage it did, but better late than never. Donald Trump isn’t going to screw around, and, ironically, DOGE has shown you don’t need congressional approval or actual legal authority to screw over people funded by the government, so I am looking forward to Donald screwing over SpaceX or Starlink’s government contracts. On the returning end… Elon doesn’t have that many ways of properly screwing with Trump, even if he has stockpiled blackmail material I don’t think it will be enough to turn MAGA against Trump. Still, I’m somewhat hopeful this will lead to larger infighting between the techbro alt-righters and the Christofascist alt-righters.
scruiser@awful.systemsto TechTakes@awful.systems•OpenAI engineers are flocking to its rival Anthropic. “They let us huff our own farts,” says oneEnglish20·5 days ago-
“tickled pink” is a saying for finding something humorous
-
“BI” is business insider, the newspaper that has the linked article
-
“chuds” is a term of online alt-right losers
-
OFC: of fucking course
-
“more dosh” mean more money
-
“AI safety and alignment” is the standard thing we sneer at here: making sure the coming future acasual robot god is a benevolent god. Occasionally reporter misunderstand it to mean or more PR-savvy promptfarmers misrepresent it to mean stuff like stopping LLMs from saying racist shit or giving you recipes that would accidentally poison you but this isn’t it’s central meaning. (To give the AI safety and alignment cultists way too much charity, making LLMs not say racist shit or give harmful instructions has been something of a spin-off application of their plans and ideas to “align” AGI.)
-
scruiser@awful.systemsto TechTakes@awful.systems•Deep in Mordor where the shadows lie: Dystopian tales of that time when I sold out to GoogleEnglish8·6 days agoI’ve seen articles and blog posts picking at bits and pieces of Google’s rep (lots of articles and blogs on their roll in ongoing enshittification and I recall one article on Google rejecting someone on the basis of a coding interview despite that person being the creator and maintainer of a very useful open source library, although that article was more a criticism of coding interviews and the mystique of FAANG companies in general), but many of these criticism portray the problems as a more recent thing, and I haven’t seen as thorough a take down as mirrorwitch’s essay.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 8th June 2025English9·7 days agoIt is definitely of interest, it might be worth making it a post on its own. It’s a good reminder than even before Google cut the phrase “don’t be evil”, they were still a megacoporation, just with a slightly nicer veneer.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 8th June 2025English8·9 days agoYeah, the commitment might be only a token amount of money as a deposit or maybe even less than that. A sufficiently reliable and cost effective (which will include fuel costs and maintenance cost) supersonic passenger plane doesn’t seem impossible in principle? Maybe cryptocurrency, NFTs, LLMs, and other crap like Theranos have given me low standards on startups: at the very least, Boom is attempting to make something that is in principle possible (for within an OOM of their requested funding) and not useless or criminal in the case that it actually works and would solve a real (if niche) need. I wouldn’t be that surprised if they eventually produce a passenger plane… a decade from now, well over the originally planned budget target, that is too costly to fuel and maintain for all but the most niche clientele.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 8th June 2025English7·9 days agoI just now heard about here. Reading about it on Wikipedia… they had a mathematical model that said their design shouldn’t generate a sonic boom audible from ground level, but it was possible their mathematical model wasn’t completely correct, so building a 1/3 scale prototype (apparently) validated their model? It’s possible their model won’t be right about their prospective design, but if it was right about the 1/3 scale then that is good evidence their model will be right? idk,
I’m not seeing much that is sneerable here, it seems kind of neat. Surely they wouldn’t spend the money on the 1/3 scale prototype unless they actually needed the data (as opposed to it being a marketing ploy or worse yet a ploy for more VC funds)… surely they wouldn’t?iirc about the Concorde (one of only two supersonic passenger planes), it isn’t so much that supersonic passenger planes aren’t technologically viable, its more a question of economics (with some additional issues with noise pollution and other environmental issues). Limits on their flight path because of the sonic booms was one of the problems with the Concorde, so at least they won’t have that problem. And as to the other questions… Boom Supersonic’s webpage directly addresses these questions, but not in any detail, but at least they address them…
Looking for some more skeptical sources… this website seems interesting: https://www.construction-physics.com/p/will-boom-successfully-build-a-supersonic . They point out some big problems with Boom’s approach. Boom is designing both its own engine and it’s own plane, and the costs are likely to run into the limits of their VC funding even assuming nothing goes wrong. And even if they get a working plane and engine, the safety, cost, and reliability needed for a viable supersonic passenger plane might not be met. And… XB-1 didn’t actually reach Mach 2.2 and was retired after only a few flight. Maybe it was a desperate ploy for more VC funding? Or maybe it had some unannounced issues? Okay… I’m seeing why this is potentially sneerable. There is a decent chance they entirely fail to deliver a plane with the VC funding they have, and even if they get that far it is likely to fail as a commercially viable passenger plane. Still, there is some possibility they deliver something… so eh, wait and see?
scruiser@awful.systemsto TechTakes@awful.systems•ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives.English12·9 days agoAs the other comments have pointed out, an automated search for this category of bugs (done without LLMs) would do the same job much faster, with much less computational resources, without any bullshit or hallucinations in the way. The LLM isn’t actually a value add compared to existing tools.
scruiser@awful.systemsto TechTakes@awful.systems•ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives.English43·10 days agoOf course, part of that wiring will be figuring out how to deal with the the signal to noise ratio of ~1:50 in this case, but that’s something we are already making progress at.
This line annoys me… LLMs excel at making signal-shaped noise, so separating out an absurd number of false positives (and investigating false negatives further) is very difficult. It probably requires that you have some sort of actually reliable verifier, and if you have that, why bother with LLMs in the first place instead of just using that verifier directly?
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025English6·10 days agoLoose Mission Impossible Spoilers
The latest Mission Impossible movie features a rogue AI as one of the main antagonists. But on the other hand, the AI’s main powers are lies, fake news, and manipulation, and it only gets as far as it does because people allow fear to make themselves manipulable and it relies on human agents to do a lot of its work. So in terms of promoting the doomerism narrative, I think the movie could actually be taken as opposing the conventional doomer narrative in favor of a calm, moderate, internationally coordinated (the entire plot could have been derailed by governments agreeing on mutual nuclear disarmament before the AI subverted them) response against AI’s that ultimately have only moderate power.
Adding to the post-LLM hype predictions: I think post LLM bubble popping, “Terminator” style rogue AI movie plots don’t go away, but take on a different spin. Rogue AI’s strength’s are going to be narrower, their weaknesses are going to get more comical and absurd, and idiotic human actions are going to be more of a factor. For weaknesses it will be less “failed to comprehend love” or “cleverly constructed logic bomb breaks its reasoning” and more “forgets what it was doing after getting drawn into too long of a conversation”. For human actions it will be less “its makers failed to anticipate a completely unprecedented sequence of bootstrapping and self improvement” and more “its makers disabled every safety and granted it every resource it asked for in the process of trying to make an extra dollar a little bit faster”.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025English12·14 days agoThis connection hadn’t occured to me before, but the Starship Troopers scenes (in the book) where they claim to have mathematically rigorous proofs about various moral statements or actions or societal constructs reminds me of how Eliezer has a decision theory in mind with all sorts of counter intuitive claims (it’s mathematically valid to never ever give into any blackmail or threats or anything adjacent to them), but hasn’t actually written out his decision theory in rigorous well defined terms that can pass peer review or be used to figure out anything beyond some pre-selected toy problems.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025English10·15 days agoThere are parts of the field that have major problems, like the sorts of studies that get done on 20 student volunteers and then get turned into a pop psychology factoid that gets tossed around and over-generalized while the original study fails to replicate, but there are parts that are actually good science.
scruiser@awful.systemsto TechTakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025English7·15 days agoI wouldn’t say even that part works so well, given how Mt. Moon is such a major challenge even with all the features like that.
Example #“I’ve lost count” of LLMs ignoring instructions and operating like the bullshit spewing machines they are.