- This status page actually identified the outage: https://hackernews.onlineornot.com/ - Pages by Hund and Statuspal did not show the outage.
- The last post before the outage was https://news.ycombinator.com/item?id=46301823 (1:39:59 PM GMT). The last comment was https://news.ycombinator.com/item?id=46301848 (1:41:54 PM GMT).
- There was an average of ~4 seconds per comment just prior to the outage. Based on this, HN likely went down at 1:41:58 PM GMT.
(The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users. However, it seems that I turned the knobs down too far.)
In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.
I'll add more as we find out more, but it probably won't be till later this afternoon PST.
Enjoy your deserved sleep and if for a couple of hours it's down, so be it.
Thanks for your continued service!
We all knew that but I haven't seen any confirmation before this.
I hope it doesn't change (much).
So don’t beat yourself up please.
When I worked for “SaaS unicorn” we typically had multiple levels of escalation, and acknowledging would have done nothing because the alarm would continue firing until fixed. Not sure what’s changed in 15 years of ops, I had assumed it would be better now- I can’t imagine silencing an alert totally by acknowledging it- if its still occurring.
I’m totally fine with how you handled it, if anything I am thankful. But that seems to be a system I would improve if I had the time.
“mute” is different than “resolve” to me, and both should exist. (Where mute is an acknowledgement of an issue as ongoing.)
(Might be wise though to have PagerDuty configured to re-alert if the outage persists.)
HN is important, but unlikely much harm could be done before morning.
(Source: Lost a lot of sleep at one place, enough to realize that sleep interruption and deficit has significant costs.)
https://downforeveryoneorjustme.com/hacker-news
This website had many instances of reports, the last I saw were 52 reports in only a short frame of time, the maximum reports on this are 118 it seems.
> In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.
Its okay I suppose, have you figured out who is crawling hackernews so much tho, was it a ddos attack or an AI company trying to get data, doesn't hackernews support an api and I am sure that there are datasets for it too so Its interesting why they might crawl but we all know the reasons why as they have been discussed here.
Sometimes I could not open the comment section, receiving a blank page with "... We're sorry" or something along these lines when opening from new private window. It works when opening normally.
Logging in on the private window seems to resolve the issue. Can you take a look on this if possible?
https://news.ycombinator.com/item?id=5229522
Re: traffic, dang said (2022):
https://news.ycombinator.com/item?id=33454140
I took it as a good reminder that the hard part is the human part: that high-overhead features and UI fripperies are nice but not necessary (or sufficient) to keep a community healthy and vibrant over the decades.
(And on the subject of the human side, if you didn’t catch Anna Wiener’s 2019 profile, it’s here:
https://www.newyorker.com/news/letter-from-silicon-valley/th... )
The most interesting number is the 1300 submissions because that hasn't grown since 2011 - it just fluctuates. Everything else has been growing more or less linearly for a long time, which is how we like it.
I find that surprising, as 2011-2022 covers an exponential rise in SEO spam and "growth hackers" attempting to drive traffic and links.
Or was 1,300 the number of non-flagged submissions?
A lot of people out here designing their blogs like its 1989.
1. Blame: The first thing to do is to point the finger. That doesn't mean analysing the technical issue, which can delay this step and limit your options, but figuring out who is politically easiest to blame. Often, that's the new guy, but outside contractors and vendors without good connections are also a common solution. Even if you are technically responsible for hiring them, you can always push them under the bus with a little skill. This small sacrifice helps unify, focus, and motivate the rest of the team.
2. Emotion: Inject your emotion into the situation and make that the implicit, but indisputable priority. Particularly, outrage and anger - This is completely _____. These people are utterly _____ (I'd use all caps, but that's not allowed on HN). Make sure everyone's attention is over their shoulder, on your emotion, and infect the team with it. Threats are an effective tool here - this is a crisis, and anyone who is calm is not emotionally engaged. Otherwise, they won't care enough about this problem - without you driving them, they probably wouldn't care much at all. Anyway, you don't have time for niceties like empathy or even basic respect.
3. Speed: Respnsiveness to stakeholders is very important. People need answers now. Give them answers they want to hear, outcomes they will be comfortable with. Don't worry if different groups hear different things. Your team will find a way to make it all work - that's their job.
4. Communication: Good communication is essential. Make sure you clearly tell your team what they should be doing; repeat it several times to prevent misunderstanding. Especially people with experience can have minds of their own; keep them on track. The situation is a crisis so you can't take any risks; stay on top of them and everything they do, and give input if you're not certain they are doing exactly what you would be doing.
5. Victimhood: Find a way to turn the tables: Make it about you, and how you're the victim here, and feed the fire with more outrage. With this and outrage, nobody will undermine the team by challenging your ideas or authority, which is the most essential component of a successful outcome. Remember, without you this all falls apart.
Have I missed anything?
There is an official dump which doesn't even require parsing HTML at all: https://console.cloud.google.com/marketplace/details/y-combi...
Of course, they'd better restore service after they wake up naturally, because I need my HN dose. But it's not worth losing sleep over it.
How does this happen?
Not the person you are asking. Bot operators have an incentive to make crawlers look as much like a human as possible so they do not get blocked. Some of them fail miserably and some nearly succeed. That makes it trivial to accidentally block a real person. I am personally fine with that given I do not pay for this site and have no SLA or contract with it.
My pager noise: https://www.soundjay.com/transportation/sounds/train-crossin...
That will not only wake the dead, it'll wake me no matter how asleep I am.
I used to work on Motorola Minitor 5 pagers. Looks like they recently released their newest model, the Minitor 7
I wonder if pagers are still used in hospitals? I imagine so
[1] - rel="nofollow"
Try opening HN -> it's down, better check HN to see everyone talking about a major website being down -> Try opening HN -> loop
That was a few hours ago. I'm glad this loop is broken.
/s
On all fairness though, mine is same for the original comment where just pressing n autocompletes it to https://news.ycombinator.com/
https://www.proginosko.com/leechblock/
You'll still open new tabs and go to HN, but you'll be reminded quickly, and every day can be downtime day \o/ (for you, personally)
It's like they say: "Your demons will comfort you when no one else will. That's why it's so hard to get rid of them"
youd go through that effort when you could have just stopped though.
If I could snap my fingers and break toxic habits and patterns, I would have done so decades ago :)
That's so refreshing in terms of being a user-focused feature, and yet it stands in sharp contrast against today's engagement-hyperfocused climate. I never would have thought to look on a website's own settings page to limit my access to that same website.
I love it, thank you for pointing me to this!
I know dang basically works tirelessly to not change the format in order to not induce those addictive patterns
but yet here we all are
It's understandable to be addicted. Lol.
I visit this place multiple times a day.
i cant find the link, but there was a post about how to "be nice" and it was a revelation to a worrying amount of "geniuses" on here. bare in mind the sum total of the advice was "be nice, dont be rude"
2. your characterization of the article sounds uncharitable
3. my point isn't exactly that this is necessarily the smartest place
Almost every (non-troll) online community that is relatively peaceful and has some semblance of moderation to remove flamewars thinks of itself as "the best community". Usually as compared to reddit, though if it's on reddit they will compare themselves to some other (hated) sub.
It's a fact of the internet. Every online community thinks of itself as the smartest, more thoughtful, more civilized. HN is no exception.
It goes without saying HN is not the smartest or more thoughtful online community. It's just... ok. Not the worst, not the best. Certainly NOT the place with the smartest people, though some smart people frequent it. As a regular, you can soon figure out HN's unspoken rules, blindspots, and areas where the group opinion is more likely to be accurate.
How does that go without saying? Name some others then, compare and contrast. As-is your argument is just posturing.
No need, because whether an online community is more thoughtful or smarter than another is very subjective. Almost by definition, HN is not it. Extraordinary claims require extraordinary evidence, and all that. Of course, by internet law, HN (or a subset of its members) considers itself to be the smartest, more thoughtful online community.
There are communities I like better, which are smarter and more thoughtful, but I've no desire to argue with you.
> As-is your argument is just posturing
Nah. Hard pass. Nice try though!
But also, people like me. Be careful what you choose to believe on this website
This was especially obvious during Covid, I even stopped visiting because the comment section was so crazy.
Nice joke!
At least, I hope it was a joke...
... but I still cannot tell if the original commenter was sarcastic or not! ;)
Did it like 5 times during that 1h-ish outage. :(
https://x.com/HNStatus
Is there a better place to check, beyond a basic down detector that may provide more insight or signal that the outage is acknowledged?
(Basically whenever you see an x.com link just change it to xcancel.com and avoid the nonsense.)
Seems to reset it on the web view, too.
I didn't read the post text, it's identified there haha, my bad! I wish the text post text wasn't grey, I gloss over it too easily.
I suppose you could also just clear your HN cookies in regular browsing window, but then when they fix it you'd have to log in again.
That's not so useful when news.ycombinator.com is having problems.
Maybe ycombinator does have an official status page somewhere, but it is not easy to find if that is the case.
I believe it's because they accept user reports.
https://hackernews.onlineornot.com/incidents/yaz-eOJeARBL
https://downforeveryoneorjustme.com/hacker-news
Strangely, nothing from the statuspal, which is the first google result
https://hacker-news.statuspal.io/
on edit: ok others pointed out it was cached pages I saw. explains it.
https://x.com/paulg/status/1953289830982664236?s=46
It did work without being logged on. The auth service appeared to be down as the log in attempt (just showing the page) failed.
It's down about 8.4 minutes per week. On 26% of days it doesn't work at least once, and on 12% of days it has more than one consecutive failed check. The longest uptime streak was 24 days
I've been keeping track since exactly 2 years (to the day!) because I was surprised that it seemed briefly down for me on a daily basis. Was I getting unlucky and hitting it every time, or was it just down very often? Nobody posted anything so I started answering the question for myself :p
I've been meaning to post the tracker to HN but there's a pesky bug I want to fix: the "is it currently down" stat. I don't know how this is beyond me but something in the code bugs out. So this is my first time posting about it
It's not that much different from HN, come to think of it.
(ha, ha)
I'm sure it's a coincidence but it started working again shortly after emailing hn@ycombinator.com
I'm still impressed nonetheless.
I'd like to know what caused the outage and how it could have been prevented, for learning purposes.
You can just look at them, turn on showdead in your profile and you'll see a bunch of flag-killed comments in this discussion by whatevermrfukz. No need for a plugin or scraper.
Anyway, glad to see you back.
Paris 1812.
Cheers from France.
HN was down about an hour ago.
Glad to see it back !
Cheers.
Working with full dates in the HTML and doing a tiny JavaScript that calculates the "minutes ago" would actually be a neat improvement.
After more than an hour I thought, "wow this is pretty harsh" and "so much of my exposure to learning things is directly tied to HN posts". I was lost lol.
Edit: Now it happens again. Knee jerk defenses all the way down.
Being "voted to -2" doesn't necessarily mean you were wrong (it often correlates though). People might just think it wasn't relevant in whatever context you posted it in
I often find it hard to tell what makes people think something I write is not helpful (or sometimes also a comment someone else made) and thus appreciate comments that clarify constructively. It can also help to ask for clarification if you're particularly surprised about the votes on a given post
That’s not what happens in practice.