Showing posts with label SRE. Show all posts
Showing posts with label SRE. Show all posts

Daily Tech Digest - January 16, 2025

How DPUs Make Collaboration Between AppDev and NetOps Essential

While GPUs have gotten much of the limelight due to AI, DPUs in the cloud are having an equally profound impact on how applications are delivered and network functions are designed. The rise of DPU-as-a-Service is breaking down traditional silos between AppDev and NetOps teams, making collaboration essential to fully unlock DPU capabilities. DPUs offload network, security, and data processing tasks, transforming how applications interact with network infrastructure. AppDev teams must now design applications with these offloading capabilities in mind, identifying which tasks can benefit most from DPUs—such as real-time data encryption or intensive packet processing. ... AppDev teams must explicitly design applications to leverage DPU-accelerated encryption, while NetOps teams need to configure DPUs to handle these workloads efficiently. This intersection of concerns creates a natural collaboration point. The benefits of this collaboration extend beyond security. DPUs excel at packet processing, data compression, and storage operations. When AppDev and NetOps teams work together, they can identify opportunities to offload compute-intensive tasks to DPUs, dramatically improving application performance. 


The CFO may be the CISO’s most important business ally

“Cybersecurity is an existential threat to every company. Gone are the days where CFOs could only be fired if they ran out of money, cooked the books, or had a major controls outage,” he said. “Lack of adequate resourcing of cybersecurity is an emerging threat to their very existence.” This sentiment reflects the reality that for most organizations cyber threat is the No. 1 business risk today, and this has significant implications for the strategic survival of the enterprise. It’s time for CISOs and CFOs to address the natural barriers to their relationship and develop a strategic partnership for the good of the company. ... CISOs should be aware of a few key strategies for improving collaboration with their CFO counterparts. The first is reverse mentoring. Because CFOs and CISOs come from differing perspectives and lead domains rife with terminology and details that can be quite foreign to the other, reverse mentoring can be important for building a bridge between the two. In such a relationship, the CISO can offer insights into cybersecurity, while simultaneously learning to communicate in the CFO’s financial language. This mutual learning creates a more aligned approach to organizational risk. Second, CISOs must also develop their commercial perspective.


Establishing a Software-Based, High-Availability Failover Strategy for Disaster Mitigation and Recovery

No one should be surprised that cloud services occasionally go offline. If you think of the cloud as “someone else’s computer,” then you recognize there are servers and software behind it all. Someone else is doing their best to keep the lights on in the face of events like human error, natural disasters, and DDoS and other types of cyberattacks. Someone else is executing their disaster response and recovery plan. While the cloud may well be someone else’s computer, when there is a cloud outage that affects your operations, it is your problem. You are at the mercy of someone else to restore services so you can get back online. It doesn’t have to be that way. Cloud-dependent organizations can adopt strategies that allow them to minimize the risk someone else’s outage will knock them offline. One such strategy is to take advantage of hybrid or multi-cloud architecture to achieve operational resiliency and high availability through service redundancy through SANless clustering. Normally a storage area network (SAN) uses local storage to configure clustered nodes on-premises, in the cloud, and to a disaster recovery site. It’s a proven approach, but because it is hardware dependent, it is costly in terms of dollars and computing resources, and comes with additional management demands.


Trusted Apps Sneak a Bug Into the UEFI Boot Process

UEFI is a kind of sacred space — a bridge between firmware and operating system, allowing a machine to boot up in the first place. Any malware that invades this space will earn a dogged persistence through reboots, by reserving its own spot in the startup process. Security programs have a harder time detecting malware at such a low level of the system. Even more importantly, by loading first, UEFI malware will simply have a head start over those security checks that it aims to avoid. Malware authors take advantage of this order of operations by designing UEFI bootkits that can hook into security protocols, and undermine critical security mechanisms like UEFI Secure Boot or HVCI, Windows' technology for blocking unsigned code in the kernel. To ensure that none of this can happen, the UEFI Boot Manager verifies every boot application binary against two lists: "db," which includes all signed and trusted programs, and "dbx," including all forbidden programs. But when a vulnerable binary is signed by Microsoft, the matter is moot. Microsoft maintains a list of requirements for signing UEFI binaries, but the process is a bit obscure, Smolár says. "I don't know if it involves only running through this list of requirements, or if there are some other activities involved, like manual binary reviews where they look for not necessarily malicious, but insecure behavior," he says.


How CISOs Can Build a Disaster Recovery Skillset

In a world of third-party risk, human error, and motivated threat actors, even the best prepared CISOs cannot always shield their enterprises from all cybersecurity incidents. When disaster strikes, how can they put their skills to work? “It is an opportunity for the CISO to step in and lead,” says Erwin. “That's the most critical thing a CISO is going to do in those incidents, and if the CISO isn't capable doing that or doesn't show up and shape the response, well, that's an indication of a problem.” CISOs, naturally, want to guide their enterprises through a cybersecurity incident. But disaster recovery skills also apply to their own careers. “I don't see a world where CISOs don't get some blame when an incident happens,” says Young. There is plenty of concern over personal liability in this role. CISOs must consider the possibility of being replaced in the wake of an incident and potentially being held personally responsible. “Do you have parachute packages like CEOs do in their corporate agreements for employability when they're hired?” Young asks. “I also see this big push of not only … CISOs on the D&O insurance, but they're also starting to acquire private liability insurance for themselves directly.”


Site Reliability Engineering Teams Face Rising Challenges

While AI adoption continues to grow, it hasn't reduced operational burdens as expected. Performance issues are now considered as critical as complete outages. Organizations are also grappling with balancing release velocity against reliability requirements. ... Daoudi suspects that there are a series of contributing factors that have led to the unexpected rise in toil levels. The first is AI systems maintenance: AI systems themselves require significant maintenance, including updating models and managing GPU clusters. AI systems also often need manual supervision due to subtle and hard-to-predict errors, which can increase the operational load. Additionally, the free time created by expediting valuable activities through AI may end up being filled with toilsome tasks, he said. "This trend could impact the future of SRE practices by necessitating a more nuanced approach to AI integration, focusing on balancing automation with the need for human oversight and continuous improvement," Daoudi said. Beyond AI, Daoudi also suspects that organizations are incorrectly evaluating toolchain investments. In his view, despite all the investments in inward-focused application performance management (APM) tools, there are still too many incidents, and the report shows a sentiment for insufficient observability instrumentation.


The Hidden Cost of Open Source Waste

Open source inefficiencies impact organizations in ways that go well beyond technical concerns. First, they drain productivity. Developers spend as much as 35% of their time untangling dependency issues or managing vulnerabilities — time that could be far better spent building new products, paying down technical debt, or introducing automation to drive cost efficiencies. ... Outdated dependencies compound the challenge. According to the report, 80% of application dependencies remain un-upgraded for over a year. While not all of these components introduce critical vulnerabilities, failing to address them increases the risk of undetected security gaps and adds unnecessary complexity to the software supply chain. This lack of timely updates leaves development teams with mounting technical debt and a higher likelihood of encountering issues that could have been avoided. The rapid pace of software evolution adds another layer of difficulty. Dependencies can become outdated in weeks, creating a moving target that’s hard to manage without automation and actionable insights. Teams often play catch-up, deepening inefficiencies and increasing the time spent on reactive maintenance. Automation helps bridge this gap by scanning for risks and prioritizing high-impact fixes, ensuring teams focus on the areas that matter most.


The Virtualization Era: Opportunities, Challenges, and the Role of Hypervisors

Choosing the most appropriate hypervisor requires thoughtful consideration of an organization’s immediate needs and long-term goals. Scalability is a crucial factor, as the selected solution must address current workloads and seamlessly adapt to future demands. A hypervisor that integrates smoothly with an organization’s existing IT infrastructure reduces the risks of operational disruptions and ensures a cost-effective transition. Equally important is the financial aspect, where businesses must look beyond the initial licensing fees to account for potential hidden costs, such as staff training, ongoing support, and any necessary adjustments to workflows. The quality of support the vendor provides, coupled with the strength of the user community, can significantly influence the overall experience, offering critical assistance during implementation and beyond. For many businesses, partnering with Managed Service Providers (MSPs) brings an added layer of expertise, ensuring that the chosen solution delivers maximum value while minimizing risk. The ongoing evolution and transformation of the virtualization market presents both challenges and opportunities. As the foundation for IT efficiency and flexibility, hypervisors remain central to these changes.

 

DORA’s Deadline Looms: Navigating the EU’s Mandate for Threat Led Penetration Testing

It’s hard to defend yourself, if you have no idea what you’re up against, and history and countless news stories are evidence that trying to defend against all manner of digital threat is a fool’s errand. As such, the first step to approaching DORA compliance is profiling not only the threat actors that target the financial services sector, but specifically which actors, and by what Tactics Techniques and Procedures (TTPs), you are likely to be attacked. However, first before you can determine how an actor may view and approach you, you need to know who you are. So, the first profile that must be built is of your own business. Not just financial services, but what sector/aspect, what region, and finally what is the specific risk profile based on the critical assets in organizational, and even partner, infrastructures. The second profile begins with the current population of known actors that target the financial services industry. It then moves to narrowing to the actors known to be aligned with the specific targeting profile. From there, leveraging industry standard models such as the MITRE ATT&CK framework, a graph is created of each actor/group’s understood goals and TTPs, including their traditional and preferred methods of access and exploitation, as well as their capabilities for evasion, persistence and command and control.


With AGI looming, CIOs stay the course on AI partnerships

“The immediate path for CIOs is to leverage gen AI for augmentation rather than replacement — creating tools that help human teams make smarter, faster decisions,” Nardecchia says. “There are very promising results with causal AI and AI agents that give an autonomous-like capability and most solutions still have a human in the loop.” Matthew Gunkel, CIO of IT Solutions at the University of California at Riverside, agrees that IT organizations should keep moving forward regardless of the growing delta between AI technology milestones and actual AI implementations. ... “The rapid advancements in AI technology, including projections for AGI and ACI, present a paradox: While the technology races ahead, enterprise adoption remains in its infancy. This divergence creates both challenges and opportunities for CIOs, employees, and AI vendors,” Priest says. “Rather than speculating on when AGI/ACI will materialize, CIOs would be best served to focus on what preparation is required to be ready for it and to maximize the value from it.” Sid Nag, vice president at Gartner, agrees that CIOs should train their attention on laying the foundation for AI and addressing important matters such as privacy, ethics, legal issues, and copyright issues, rather than focus on AGI advances.



Quote for the day:

"When you practice leadership,The evidence of quality of your leadership, is known from the type of leaders that emerge out of your leadership" -- Sujit Lalwani

Daily Tech Digest - July 10, 2024

How platform teams lead to better, faster, stronger enterprises

Platform teams are uniquely equipped to optimize resource allocation because they sit in between developers and the cloud infrastructure and compute that developers need, and are able to maximize the efficiency and effectiveness of software development processes. With their unique set of skills and expertise, they effectively collaborate with other teams, including developers, data scientists, and operations teams, to accurately understand their needs and pain points. Using a product approach, platform teams remove barriers for developers and operations teams by offering shared services for developer self-service, enabling faster modernization within organizational boundaries and automation to simplify the management of applications and Kubernetes clusters in the cloud. Fostering a culture of innovation, platform teams play a crucial role in keeping the organization at the forefront of emerging trends and technologies. This enables enterprises to provide innovative solutions that set them apart in the market.


Developing An AI Uuse Policy

An AI Use Policy is designed to ensure that any AI technology used by your business is done so in a safe, reliable and appropriate manner that minimises risks. It should be developed to inform and guide your employees on how AI can be used within your business. ... Perhaps the most important part for the majority of your employees, set specific do’s and don’ts for inputs and outputs. This is to ensure compliance with data security, privacy and ethical standards. For example, “Don’t input any company confidential, commercially sensitive or proprietary information”, “Don’t use AI tools in a way that could inadvertently perpetuate or reinforce bias” and “Don’t input any customer or co-worker’s personal data”. For outputs, guidance can reiterate to staff the potential for misinformation or ‘hallucinations’ generated by AI. Consider rules such as “Clearly label any AI generated content”, “Don’t share any output without careful fact-checking” or “Make sure that a human has the final decision when using AI to help make a decision which could impact any living person


Synergy between IoT and blockchain transforming operational efficiency

The synergy between the two technologies is integral to achieving Industry 4.0 goals, including digital transformation, decentralised connectivity, and smart industry advancements. Via this integration, organisations can achieve real-time visibility into production operations, optimise supply chain processes, and enhance overall efficiency. ... In regulated industries like pharmaceutical manufacturing, where compliance is crucial, integrating IoT and Blockchain lets companies onboard suppliers to upload raw material info, batch numbers, and quality checks to a blockchain ledger. IoT devices automate data acquisition during manufacturing and storage, ensuring data integrity and transparency. In smart city ecosystems, local authorities share data with service providers for waste management, traffic updates, and more. Traffic data from sensors can be securely uploaded to a blockchain, where third-party services like food delivery and ridesharing can access it to optimise operations. Logistics companies use IoT systems to gather data on location and handling, which is uploaded to a blockchain ledger to track goods, estimate delivery time, and provide real-time updates.


Ignore Li-ion fire risks at your peril

Li-ion batteries are prone to destructive and hard-to-control fires. There have been several reported incidents in data centers, some of which have led to serious outages, but they are not well-documented or systematically studied. ... A commonly held view is that Li-ion’s fire risk in the data center is overstated, partly as a result of marketing by vendors of alternative chemistries such as salt and nickel-zinc. If these products are promoted as a “safe” alternative, then it will (it is speculated) create a perception that Li-ion is “unsafe.” After assessing the evidence, examining the science, and hearing from data center operators at recent member meetings, Uptime Institute is taking a cautious and practicable stance at this point. While it is true that Li-ion batteries have a higher risk of fire compared with other chemistries, and these fires are particularly problematic, Uptime Institute engineers do not think Li-ion batteries should be rejected out of hand. ... Data center builders and operators should carefully consider the benefits of Li-ion batteries alongside the risks. As well as the obvious risk of serious fires, there are financial and reputational risks in preparing for, avoiding, and responding to such incidents.


More than a CISO: the rise of the dual-titled IT leader

Dual-title roles give CISOs new levers to work with and more scope to drive strategic integration and alignment of cybersecurity within the organization. ... Belknap finds having his own team of engineers puts him in a stronger position when working with partners. When looking for support or assistance with a project, his team will have already built something, reducing the amount of work needed from the partner team. “This means we can lean on them to be responsible for the things that only they can do. I don’t have to pull them into the work that only I can do or the work that’s not aligned to their expertise,” he says. These dual-title roles also recognize how CISOs are increasingly operating as technology leaders and operators of the organization, according to Adam Ely, head of digital products at Fidelity Investments who was formerly the firm’s CISO and has a long history in security. Ely says that as CISOs typically work across an organization, know how the business lines work, and are day-to-day leaders of people and technology as well as crisis managers, it stands them in good stead for dual-title or more senior positions. 


You Can’t Wish Away Technology Complexity

Every business succeeds because of technology. Every person gets paid by technology. The value of our currency itself is about technology. Of course, it is not only about technology. But tell that to the CFO or CLO. When it is about finance, there is very little pushback in saying it is all about money. When it is about legal, there is no push-back about it being about law. I’ve noticed only technologists pull back and say, “You’re right, it’s not about technology.” ... See what people often forget that technology complexity is cool on multiple levels. It gives us the ability to make different choices for stakeholders and customers (I mean real customers not stakeholders that think they are customers – note to business stakeholders, you and I get our paychecks from the same place, you are not my customer. Our customer is my customer). But while this complexity allows for choice, it also creates a dependency on understanding those choices. Or a dependency on a professional who does. I don’t pretend to understand medicine. That is why I ask doctors what to do.


Electronic Health Record Errors Are a Serious Problem

The exposure of healthcare records, in even minor ways, leaves patients highly vulnerable. “I never reached out to this woman [whose records were entered into my father’s], but I had all her contact information. I could have gone to her house and handed her the copy of the results I had found in my dad’s records,” Hollingsworth says. ... Data aggregators pose a further risk. These organizations may collect deidentified data to perform analyses on population-level health issues for both healthcare organizations and insurance companies. “Are they following the same security standards that we follow in the health care transaction world?” Ghanayem asks. “I don’t know.” ... Clear distinctions between important information fields must be made to cut down on adjacency errors. Concise patient summaries at the beginning of each record and usable search features may increase usability and decrease frustration that leads to the introduction of errors. And refining when alerts are issued can decrease alert fatigue, which may lead providers to simply ignore alerts even when they are valid.


Diversifying cyber teams to tackle complex threats

To make a significant change and deliver a more diverse cyber workforce, we need to focus on leadership and change our language and processes for recruitment. This takes courage and is the biggest challenge organizations face. Having a diverse team helps others see it is a place for them. It isn’t just about attracting talent; it’s also about openness and retaining talent. Organizations need to help individuals from diverse backgrounds to see themselves as role models who need to be out shouting about the opportunities within the sector. Diversity fosters a sense of belonging and inclusivity making the cybersecurity field more attractive to a wider range of individuals. When potential recruits see relatable role models within a team, it breaks down the traditional and somewhat homogenous perception of cybersecurity. This inclusivity is crucial for attracting talent from underrepresented groups, particularly women and minority groups, who may not have traditionally seen themselves in cybersecurity roles. A diverse team with strong role models creates a positive feedback loop. 


Nanotechnology and SRE: Pioneering Precision in Performance

Nanotechnology offers the opportunity to transform SRE at the atomic level — addressing individual tasks, subtasks, and tickets. For example, extra-sensitive nanosensors can continuously monitor system performance metrics, including temperature, voltage, and processing load. When placed in data centers, these sensors enable real-time data collection and analysis, detecting electrical and mechanical issues before they escalate and extending the lifespan of technological components. Nanobots can be programmed to address hardware issues and routine maintenance tasks. Together, these technologies can integrate into a self-healing and continuously improving system in line with SRE principles. ... Nanotechnology can potentially transform SRE, leading to enhanced system reliability and performance. Nanotechnology-enabled solutions can allow more precise monitoring, optimization, and real-time improvements, supporting the key pillars of SRE. At the same time, the foundational principles of SRE can be applied to ensure the reliability of advanced nanotechnology systems. 


Three Areas Where AI Can Make a Huge Difference Without Significant Job Risk

Doing a QC job can be annoying because even though the job is critical to the outcome, your non-QC peers and management treat you like a potentially avoidable annoyance. You stand in the way of shipping on time and at volume, potentially delaying or even eliminating performance-based bonuses. We are already discovering that to assure the quality of an AI-driven coding effort, a second AI is needed to assure the quality of the result because people just don’t like doing QC on code, particularly those who create it. ... In short, properly applied AI could highlight and help address problems that are critically reducing a company’s ability to perform to its full potential and preventing it from becoming a great place to work. ... Calculating an employee’s contribution and then using it to set compensation transparently should significantly reduce the number of employees who feel they are being treated unfairly by eliminating that unfairness or by showing them a path to improve their value and thus positively impact their pay.



Quote for the day:

"When you stop chasing the wrong things you give the right things a chance to catch you." -- Lolly Daskal