Top
Best
New

Posted by todsacerdoti 12/17/2025

Log level 'error' should mean that something needs to be fixed(utcc.utoronto.ca)
482 points | 299 commentspage 5
Too 12/20/2025|
Agree with the post. The job of blackbox is to turn probes into metrics. If a probe fails, that should just become a probe_success=0 metric. Blackbox did its job and should not log an error.
Kinrany 12/20/2025||
Why are logs usually assumed to be for human consumption only? It seems weird to me that log storage usually exists outside of the system and isn't a general purpose message bus.
BiraIgnacio 12/20/2025||
It means something is wrong, yes. Now, if it's worth fixing (granted, most of the time it would), that's another story.
leni536 12/20/2025||
I make error logs fail happy path functional/integration tests for the backend applications I'm currently writing.
plandis 12/20/2025||
I agree. Error or higher should result in an alarm and indicates that some corrective action needs to be taken.
mkoubaa 12/20/2025||
To me it's always a neat trick when you're not allowed to use print() in production code
mycall 12/20/2025||
Severity is the value and you set thresholds based on context of the error type.
29athrowaway 12/20/2025||
Input errors do not need fixing, so no.
lanstin 12/21/2025|
If they cause your customers to ditch your product but calling them and saying "your calls are all getting 4xx because you are not putting the state code into the call parameters" would keep them as customers, then you would be wise to make that communication.
dolmen 12/21/2025||
But first ensure that the input error is properly reported to the client in the response body (ideally in a structured way), so the client could have figured out by himself.

If a fix is needed on your side for this matter, having a conversation with a customer might be useful before breaking more stuff. ("We have no state code in EU. Why is that mandatory?").

lanstin 12/23/2025||
If you are trying to sell a product, it is sometimes useful to solve people problems for them, rather than counting on them to figure them out on their own.
azov 12/20/2025|
If my system doesn’t work - I want to be alerted. If notification was supposed to be sent but wasn’t - it’s an error regardless of whether it wasn’t sent because of a bug in my code or external service being down. It may be a warning if I’m still retrying, but if I gave up - it’s an error.

“External service down, not my problem, nothing I can do” is hardly ever the case - e.g. you may need to switch to a backup provider, initiate a support call, or at least try to figure out why it’s down and for how long.