CrowdStrike Releases the Root Cause Analysis (RCA)

Published 2024-08-06

All Comments (9)
  • @Aksel_16
    - The error first accrued when a mismatch happened between what was validated (21 inputs) and what was provided after the validation (only 20 inputs). - The testing face of a non-wildcard passed (Undetected). It was introduced in July 19, 2024. We can safely say It was a faulty testing process and a miss communication between "Content Validator" and "Content Interpreter" which made input 21 in an "out-of-bounds" state. I would say this thou: Thank you CrowdStrike for releasing the RCA soon and in depth, it told me that you have the professional skills that is needed to solve problems, but, it is clearly a mismanagement problem that could have been avoided with a close inspections. Give your skilled developers time and let them check all of your systems manually. Don't rush updates and always ask "What we missed?" and double check.
  • @DQSoft
    If I am understating this correctly, it was a very deterministic error: if you install the channel file, you get the error 100% of the time. So the incident could have been avoided by just installing the update on one test machine before releasing it to the "valued customers". Code inspection and "synthetic" testing are great for detecting errors before you deploy code, but you still have to do testing in actual conditions on a limited scale before pushing an update to thousands of systems.
  • @parrottm76262
    All I can say is they should have caught this. I worked in software dev and testing for a major telephone carrier before I retired and that environment is even more critical than CloudStrike. We had to prove our test suite was as bulletproof as humanly possible. One set of testing was done and sourced from our call center. These techs basically had no training on what they were testing and we wanted it this way (by the way, this wasn't the only testing gate). If something was going to break, these techs could do it. Not on purpose, of course. If a piece of software required any external touching, reading , writing of ANY file, you were expected to come up with a way to break it, then test for that later 14 ways from Sunday, to borrow a phrase. Sorry, I'm ranting a bit, but an outage like this was preventable, IMO.
  • @Aikurisu
    I suppose the big question most folks have after this debacle is: can they be trusted that this won't happen again? I... probably wouldn't bet on it.
  • @vh9network
    All these companies including the one I work for should not be using Crowdstrike, or all running the same AV/firewall software in general, the very idea of it just sounds like a disaster waiting to happen, and when it happens as it did, it will affect a large majority running the same tool. This is what happens when business/corporations go cheap and all relay on the same service.
  • @TriggerJim88
    Do their little white lies and subtle credit hogging for any minor role in rectifying their own mistake make you feel better? Alongside your own constructive criticism and support, of course. A little complimentary misdirection is key in selling sincerety and hedging some of the public criticism leading up to the court case. :/ Crazy that a mismatch in the number of data points snowballs into ultimately nobody getting any at all...?
  • @liaminwales
    The real question is why only a handful of company's control the space, kind of scary when one player has so much control. If one day MS has the same problem, it's going to be a nightmare. Well we kind of had the same problem in the early W10 days~