sh.just.works down?

Master@lemm.ee · edit-2 1 year ago

sh.just.works down?

TheDude@sh.itjust.works · 1 year ago

Hey all,

There has been a few issues over the last couple of weeks. Here’s a rundown of some of the issues we’ve had.

Broken Images: A few weeks ago we had issues with broken images. This was due to me migrating our local image storage to object storage (like AWS S3). When I did the switch it broke all old images as they needed to be migrated. Pictrs at that time did not support concurrent uploads which means the migration would have taken days or weeks to complete during which time the image service would be offline. Instead I waiting for a newer version to be released that supported concurrent connections and did the migration in about 30-40 minutes one evening.
5xx server errors: Some of you may have experience a lemmy page with an error code on it. This was due to me trying to implement an additional proxy to shield and mitigate future risks. While rolling this out I hit a few blockers that caused downtime as we worked to rectify it. I’m glad to say that as of Today this has since been implemented.

In the addition to the above, the lemmyverse and especially this instance has been under bot attack almost daily. These bot attacks are eating resources and causing a query floods.

Lemmy is still very young and in its early phases. In time these issues will slowly go away.

P.S you don’t have to worry about me leaving you guys hanging.

eestileib@sh.itjust.works · 1 year ago

The Dude Abides.

Zaphodquixote@sh.itjust.works · 1 year ago

S’alright. You’re the man, the dude

Dodecahedron December@sh.itjust.works · 1 year ago

They were down but aren’t. This is going to happen from time to time for reasons, but most importantly (and this is not an advert or endorsement for centralized services like reddit):

these instances are run by small teams, maybe even one person per instance. By “run by” I mean the admins who can actually host and support the hosting environment of the instance, not moderators though that’s an important task too.
At reddit or other for-profit companies, multiple teams of people monitor multiple data centers worth of servers, have 24/7 tech support crew, dashboards, alarms, alerts, escallation proceedures drafted by other teams, people they can escallate problems to including usually a decent sized team at the physical datacenter due to the amount of servers they buy because of what they can afford based off advertising income because the site is popular enough, which is why it’s much more rare to see these services go down.

But so many things can and do fail, including:

updates (dependencies, breaking updates, “this should just have worked but it didn’t, why?!”)
server issues (too many memes and now the disk has runeth over)
one server that gets overloaded or is in a data center that has a network failure, or a hardware failure on the server where the virtual server is hosted
account got hacked
0 day exploit targeted directly at this server
DoS or DDoS attack
Admin has a day job that they need to do to keep the lights on at home and at the lemmy instance and has to do their day job work.

Speaking from experience, but not with lemmy in particular.

xylene@sh.itjust.works · 1 year ago

I had a new one this morning!

Dog bumps into server cabinet, pushing it against the wall, kinking the fiber optic cable that the DNS server uses.

loaExMachina@sh.itjust.works · 1 year ago

Sometimes, shit just doesn’t work.