Maybe so… but I feel like this is one of the biggest mistakes in the post WWII order. Imagine what things would be like if the UN had more teeth (and no security council)
Maybe so… but I feel like this is one of the biggest mistakes in the post WWII order. Imagine what things would be like if the UN had more teeth (and no security council)
Why would people want to be on Twitter for posts like this?
It’s like joining a real life club where the more of a jerk you are, the more speaking time you get.
“You know this war we’re slow-walking? That we keep almost negotiating a ceasefire to? Yeah, let’s keep sabotaging that, and start another war, just in case.”
I also kinda sympathize with American allies looking on from the outside now, as the U.S. is going to no doubt unconditionally support this, because… yeah.
deleted by creator
I’m really shocked at how many people (apparently) shell out for onlyfans, subscriptions and such.
I swear, I’m the only sane one, not the only insane one.
How to turn your country into North Korea, 101.
Good! Try the IQM, XS, and XSS quantizations as well, especially if you try a 14B, as they “squeeze” the model into less space better than the Q3_K quantizations.
Yeah I’m liking the 32B as well. If you are looking for speed just for ultilitarian Q/A, you might want to keep a Deepseek Lite V2 Code GGUF on hand, as it’s uber fast partially offloaded.
I like how much it obsesses over your post’s engagement metrics. I feel like that creates toxic incentive’s, but… shrug. What do I know?
Same issue here.
I think it’s just a temporary issue, but I really will stop using Reddit if old.reddit.com goes down. There are niches that just aren’t on here… but that UI.
A Qwen 2.5 14B IQ3_M should completely fit in your VRAM, with longish context, with acceptable quality.
An IQ4_XS will just barely overflow but should still be fast at short context.
And while I have not tried it yet, the 14B is allegedly smart.
Also, what I do on my PC is hook up my monitor to the iGPU so the GPU’s VRAM is completely empty, lol.
Oh, and you HAVE to try the new Qwen 2.5 14B.
The whole lineup is freaking sick, 34B it outscoring llama 3.1 70B in a lot of benchmarks, and in personal use it feels super smart.
You can try a smaller IQ3 imatrix quantization to speed it up, but 22B is indeed tight for 8GB.
If someone comes out with an AQLM for it, it might completely fit in VRAM, but I’m not sure it would even work for a Pascal card TBH.
Especially if you’re mega rich.
“Well, one lesson I’ve learned is that just because I say something to a group and they laugh doesn’t mean it’s going to be all that hilarious as a post on X,” he said in a follow-up post early Monday. “Turns out that jokes are WAY less funny if people don’t know the context and the delivery is plain text."
I knew people like this in real life, who’d say something horrible and follow it up with “It’s just a joke,” but only if they ‘lose’ and are called out on it.
They’re slimey jerks, and it’s utterly miserable to even be around them. And I don’t understand why so many would worship/follow Elon and dwell on Twitter for it.
It’s still everywhere in my news/internet diet.
It’s bleeding, for sure, but it’s big. Its gone bad. But I think its premature to say its collapse is a good thing, because it just won’t go away.
It’s not dead though, it’s still linked to everywhere, from big news to niche communities because it still has that critical mass and inertia.
And I have to be cynical of the Fediverse, but realistically, what replaces it, at least here in the US? Discord? No, thanks, I’d at least rather have information be public.
I’m speaking as someone who has never used Twitter, but I can’t ignore it, as much as I’d like to.
The behavior is configurable just like it is on linux, UAC can be set to require a password every time.
But I think its not set this way by default because many users don’t remember their passwords, lol. You think I’m kidding, you should meet my family…
Also, scripts can do plenty without elevation, on linux or Windows.
The problem is that splitting models up over a network, even over LAN, is not super efficient. The entire weights need to be run through for every half word.
And the other problem is that petals just can’t keep up with the crazy dev pace of the LLM community. Honestly they should dump it and fork or contribute to llama.cpp or exllama, as TBH no one wants to split up LLAMA 2 (or even llama 3) 70B, and be a generation or two behind for a base instruct model instead of a finetune.
Even the horde has very few hosts relative to users, even though hosting a small model on a 6GB GPU would get you lots of karma.
The diffusion community is very different, as the output is one image and even the largest open models are much smaller. Lora usage is also standardized there, while it is not on LLM land.
TBH this is a great space for modding and local LLM/LLM “hordes”
Isn’t that a massive security risk?
Like, what if the U.S was using Roscosmos satellite links in drones? I’d certainly be raising an eyebrow.