What is site reliability engineering? The creator of the first site reliability engineering (SRE) program, Benjamin Treynor Sloss at Google, described it this way: Site reliability engineering is what happens when you ask a software engineer to design an operations team. What does that mean? Unlike traditional system administrators, site reliability engineers (SREs) apply solid software engineering principles to their day-to-day work. For laypeople, a clearer definition might be: Site reliability engineering is the discipline of building and supporting modern production systems at scale. SREs are responsible for maximizing reliability, performance availability, latency, efficiency, monitoring, emergency response, change management, release planning, and capacity planning for both infrastructure and software. ... SREs should be spending more time designing solutions than applying band-aids. A general guideline is for SREs to spend 50% of their time in engineering work, such as writing code and automating tasks. When an SRE is on-call, time should be split between about 25% of time managing incidents and 25% on operations duty.
Over the past year, Trail of Bits was engaged by the Defense Advanced Research Projects Agency (DARPA) to examine the fundamental properties of blockchains and the cybersecurity risks associated with them. DARPA wanted to understand those security assumptions and determine to what degree blockchains are actually decentralized. To answer DARPA’s question, Trail of Bits researchers performed analyses and meta-analyses of prior academic work and of real-world findings that had never before been aggregated, updating prior research with new data in some cases. They also did novel work, building new tools and pursuing original research. The resulting report is a 30-thousand-foot view of what’s currently known about blockchain technology. Whether these findings affect financial markets is out of the scope of the report: our work at Trail of Bits is entirely about understanding and mitigating security risk. The report also contains links to the substantial supporting and analytical materials. Our findings are reproducible, and our research is open-source and freely distributable. So you can dig in for yourself.
At first, the shift in security strategy went from protecting one, single castle to a “multiple castle” approach. In this scenario, you’d treat each salesperson’s laptop as a sort of satellite castle. SaaS vendors and cloud providers played into this idea, trying to convince potential customers not that they needed an entirely different way to think about security, but rather that, by using a SaaS product, they were renting a spot in the vendor’s castle. The problem is that once you have so many castles, the interconnections become increasingly more difficult to protect. And it’s harder to say exactly what is “inside” your network versus what is hostile wilderness. Zero trust assumes that the castle system has broken down completely, so that each individual asset is a fortress of one. Everything is always hostile wilderness, and you operate under the assumption that you can implicitly trust no one. It’s not an attractive vision for society, which is why we should probably retire the castle and moat metaphor. Because it makes sense to eliminate the human concept of trust in our approach to cybersecurity and treat every user as potentially hostile.
Disrupting attacks in their early stages is critical for all sophisticated attacks but especially human-operated ransomware, where human threat actors seek to gain privileged access to an organization’s network, move laterally, and deploy the ransomware payload on as many devices in the network as possible. For example, with its enhanced AI-driven detection capabilities, Defender for Endpoint managed to detect and incriminate a ransomware attack early in its encryption stage, when the attackers had encrypted files on fewer than four percent (4%) of the organization’s devices, demonstrating improved ability to disrupt an attack and protect the remaining devices in the organization. This instance illustrates the importance of the rapid incrimination of suspicious entities and the prompt disruption of a human-operated ransomware attack. ... A human-operated ransomware attack generates a lot of noise in the system. During this phase, solutions like Defender for Endpoint raise many alerts upon detecting multiple malicious artifacts and behavior on many devices, resulting in an alert spike.
The first rule of cybersecurity is to treat everything as if it’s vulnerable because, of course, everything is vulnerable. Every risk management course, security certification exam, and audit mindset always emphasizes that there is no such thing as a 100% secure system. Arguably, the entire cybersecurity field is founded on this principle. ... The third law of cybersecurity, originally popularized as one of Brian Krebs’ 3 Rules for Online Safety, aims to minimize attack surfaces and maximize visibility. While Krebs was referring only to installed software, the ideology supporting this rule has expanded. For example, many businesses retain data, systems, and devices they don’t use or need anymore, especially as they scale, upgrade, or expand. This is like that old, beloved pair of worn out running shoes that sit in a closet. This excess can present unnecessary vulnerabilities, such as a decades-old exploit discovered in some open source software. ... The final law of cybersecurity states that organizations should prepare for the worst. This is perhaps truer than ever, given how rapidly cybercrime is evolving. The risks of a zero-day exploit are too high for businesses to assume they’ll never become the victims of a breach.
At a very high level, Google defines the core of SRE principles and practices as an ability to ’embrace risk.’ Site reliability engineers balance the organizational need for constant innovation and delivery of new software with the reliability and performance of production environments. The practice of SRE grows as the adoption of DevOps grows because they both help balance the sometimes opposing needs of the development and operations teams. Site reliability engineers inject processes into the CI/CD and software delivery workflows to improve performance and reliability but they will know when to sacrifice stability for speed. By working closely with DevOps teams to understand critical components of their applications and infrastructure, SREs can also learn the non-critical components. Creating transparency across all teams about the health of their applications and systems can help site reliability engineers determine a level of risk they can feel comfortable with. The level of desired service availability and acceptable performance issues that you can reasonably allow will depend on the type of service you support as well.
If you’re thinking the solution sounds a lot like serverless, you’re not wrong, but Matt Butcher, co-founder and CEO at Fermyon, says that instead of forcing a function-based programming paradigm, the startup decided to use WebAssembly, a much more robust programming environment, originally created for the browser. Using WebAssembly solved a bunch of problems for the company including security, speed and efficiency in terms of resources. “All those things that made it good for the browser were actually really good for the cloud. The whole isolation model that keeps WebAssembly from being able to attack the hosts through the browser was the same kind of [security] model we wanted on the cloud side,” Butcher explained. What’s more, a WebAssembly module could download really quickly and execute instantly to solve any performance questions, and finally instead of having a bunch of servers that are just sitting around waiting in case there’s peak traffic, Fermyon can start them up nearly instantly and run them on demand.
According to Trevett, the new forum will not concern itself with philosophical debates about what the metaverse will be in 10-20 years time. However, he thinks the metaverse is “going to be a mixture of the connectivity of the web, some kind of evolution of the web, mixed in with spatial computing.” He added that spatial computing is a broad term, but here refers to “3D modeling of the real world, especially in interaction through augmented and virtual reality.” “No one really knows how it’s all going to come together,” said Trevett. “But that’s okay. For the purposes of the forum, we don’t really need to know. What we are concerned with is that there are clear, short-term interoperability problems to be solved.” Trevett noted that there are already multiple standards organizations for the internet, including of course the W3C for web standards. What MSF is trying to do is help coordinate them, when it comes to the evolving metaverse. “We are bringing together the standards organizations in one place, where we can coordinate between each other but also have good close relationships with the industry that [is] trying to use our standards,” he said.
Technology adoption, as part of digital transformation initiative, is generally of a greater scale and impact than what most are accustomed to, primarily because we are looking not only to revamp parts of our IT enterprise, but to also introduce brand new technology architecture environments comprised of a combination of heavy-duty systems. In addition to the due diligence that comes which planning for and incorporating new technology innovations, with digital transformation initiatives we need to be extra careful not to be lured into over-automation. The reengineering and optimization of our business processes in support of enhancing productivity and customer-centricity need to be balanced with practical considerations and the opportunity to first prove that a given enhancement is actually effective with our customers before building enhancements upon it. If we automate too much too soon, it will be painful to roll back, both financially and organizationally. Laying out a phased approach will avoid this.
Quote for the day:
"Real leadership is being the person others will gladly and confidently follow." -- John C. Maxwell