Daily Tech Digest - July 18, 2024

The Critical Role of Data Cleaning

Data cleaning is a crucial step that eliminates irrelevant data, identifies outliers and duplicates, and fixes missing values. It involves removing errors, inconsistencies, and, sometimes, even biases from raw data to make it usable. While buying pre-cleaned data can save resources, understanding the importance of data cleaning is still essential. Inaccuracies can significantly impact results. In many cases, before the removal of low-value data, the rest is still hardly usable. Cleaning works as a filter, ensuring that data passes through to the next step, which is more refined and relevant to your goals. ... At its core, data cleaning is the backbone of robust and reliable AI applications. It helps guard against inaccurate and biased data, ensuring AI models and their findings are on point. Data scientists depend on data cleaning techniques to transform raw data into a high-quality, trustworthy asset. ... Interestingly, LLMs that have been properly trained on clean data can play a significant role in the data cleaning process itself. Their advanced capabilities enable LLMs to automate and enhance various data cleaning tasks, making the process more efficient and effective.


What Is Paravirtualization?

Paravirtualization builds upon traditional virtualization by offering extra services, improved capabilities or better performance to guest operating systems. With traditional virtualization, organizations abstract the underlying resources via virtual machines to the guest so they can run them as is, says Greg Schulz, founder of the StorageIO Group, an IT industry analyst consultancy. However, those virtual machines use all of the resources assigned to them, meaning there is a great deal of idle time, even though it doesn’t appear so, according to Kalvar. Paravirtualization uses software instruction to dynamically size and resize those resources, Kalvar says, turning VMs into bundles of resources. They are managed by the hypervisor, a software component that manages multiple virtual machines in a computer. ... One of the biggest advantages of paravirtualization is that it is typically more efficient than full virtualization because the hypervisor can closely manage and optimize resources between different operating systems. Users can manage the resources they consume on a granular basis. “I’m not buying an hour of a server, I’m buying seconds of resource time,” Kalvar says. 


Leaked Access Keys: The Silent Revolution in Cloud Security

The challenge for service accounts is that MFA does not work, and network-level protection (IP filtering, VPN tunneling, etc.) is not consequently applied, primarily due to complexity and costs. Thus, service account key leaks often enable hackers to access company resources. While phishing is unusual in the context of service accounts, leakages are frequently the result of developers posting them (unintentionally) online, often in combination with code fragments that unveil the user to whom they apply. ... Now, Google has changed the game with its recent policy change. If an access key appears in a public GitHub repository, GCP deactivates the key, no matter whether applications crash. Google's announcement marks a shift in the risk and priority tango. Gone are the days when patching vulnerabilities could take days or weeks. Welcome to the fast-paced cloud era. Zero-second attacks after credential leakages demand zero-second fixing. Preventing an external attack becomes more important than avoiding crashing customer applications – that is at least Google's opinion. 


Juniper advances AI networking software with congestion control, load balancing

On the load balancing front, Juniper has added support for dynamic load balancing (DLB) that selects the optimal network path and delivers lower latency, better network utilization, and faster job completion times. From the AI workload perspective, this results in better AI workload performance and higher utilization of expensive GPUs, according to Sanyal. “Compared to traditional static load balancing, DLB significantly enhances fabric bandwidth utilization. But one of DLB’s limitations is that it only tracks the quality of local links instead of understanding the whole path quality from ingress to egress node,” Sanyal wrote. “Let’s say we have CLOS topology and server 1 and server 2 are both trying to send data called flow-1 and flow-2, respectively. In the case of DLB, leaf-1 only knows the local links utilization and makes decisions based solely on the local switch quality table where local links may be in perfect state. But if you use GLB, you can understand the whole path quality where congestion issues are present within the spine-leaf level.”


Impact of AI Platforms on Enhancing Cloud Services and Customer Experience

AI platforms enable businesses to streamline operations and reduce costs by automating routine tasks and optimizing resource allocation. Predictive analytics, powered by AI, allows for proactive maintenance and issue resolution, minimizing downtime and ensuring continuous service availability. This is particularly beneficial for industries where uninterrupted access to cloud services is critical, such as finance, healthcare, and e-commerce. ... AI platforms are not only enhancing backend operations but are also revolutionizing customer interactions. AI-driven customer service tools, such as chatbots and virtual assistants, provide instant support, personalized recommendations, and seamless user experiences. These tools can handle a wide range of customer queries, from basic information requests to complex problem-solving, thereby improving customer satisfaction and loyalty. The efficiency and round-the-clock availability of AI-driven tools make them invaluable for businesses. By the year 2025, it is expected that AI will facilitate around 95% of customer interactions, demonstrating its growing influence and effectiveness.


2 Essential Strategies for CDOs to Balance Visible and Invisible Data Work Under Pressure

Short-termism under pressure is a common mistake, resulting in an unbalanced strategy. How can we, as data leaders, successfully navigate such a scenario? “Working under pressure and with limited trust from senior management can force first-time CDOs to commit to an unbalanced strategy, focusing on short-term, highly visible projects – and ignore the essential foundation.” ... The desire to invest in enabling topics stems from the balance between driving and constraining forces. The senior management tends to ignore enabling topics because they rarely directly contribute to the bottom line; they can be a black box to a non-technical person and require multiple teams to collaborate effectively. On the other hand, Anne knew that the same people eagerly anticipated the impact of advanced analytics such as GenAI and were worried about potential regulatory risks. With the knowledge of the key enabling work packages and the motivating forces at play, Anne has everything she needs to argue for and execute a balanced long-term data strategy that does not ignore the “invisible” work required.


Gen AI Spending Slows as Businesses Exercise Caution

Generative AI has advanced rapidly over the past year, and organizations are recognizing its potential across business functions. But businesses have now taken a cautious stance regarding gen AI adoption due to steep implementation costs and concerns related to hallucinations. ... This trend reflects a broader shift away from the AI hype, and while businesses acknowledge the potential of this technology, they are also wary of the associated risks and costs, according to Michael Sinoway, CEO, Lucidworks. "The flattened spending suggests a move toward more thoughtful planning. This approach ensures AI adoption delivers real value, balancing competitiveness with cost management and risk mitigation," he said. ... Concerns regarding implementation costs, accuracy and data security have increased considerably in 2024. The number of business leaders with concerns related to implementation costs has increased 14-fold and those related to response accuracy have grown fivefold. While concerns about data security have increased only threefold, it remains the biggest worry.


CIOs are stretched more than ever before — and that’s a good thing

“Many CIOs have built years of credibility and trust by blocking and tackling the traditional responsibilities of the role,” she adds. “They’re now being brought to the conversation as business leaders to help the organization think through transformational priorities because they’re functional experts like any other executive in the C-suite.” ... “Boards want technology to improve the top and bottom line, which can be a tough balance, even if it’s one that CIOs are getting used to managing,” says Nash Squared’s White. “On the one hand, they’re being asked to promote innovation and help generate revenue, and on the other, they’re often charged with governance and security, too.” The importance of technology will only continue to increase going forward as well. Gen AI, for example, will make it possible to boost productivity while reducing costs. CyberArk’s Grossman expects the central role of digital leaders in exploiting these emerging technologies will mean high-level CIOs will be even more important in the future.


What Is a Sovereign Cloud and Who Truly Benefits From It?

A sovereign cloud is a cloud computing environment designed to help organizations comply with regulatory rules established by a particular government. This often entails ensuring that data stored within the cloud environment remains within a specific country. But it can also involve other practices, as we explain below. ... For one thing, cost. In general, cloud computing services on a sovereign cloud cost more than their equivalents on a generic public cloud. The exact pricing can vary widely depending on a number of factors, such as which cloud regions you select and which types of services you use, but in general, expect to pay a premium of at least 15% to use a sovereign cloud. A second challenge of using sovereign clouds is that in some cases your organization must undergo a vetting process to use them because some sovereign cloud providers only make their solutions available to certain types of organizations — often, government agencies or contractors that do business with them. This means you can't just create a sovereign cloud account and start launching workloads in a matter of minutes, as you could in a generic public cloud.


Securing datacenters may soon need sniffer dogs

So says Len Noe, tech evangelist at identity management vendor CyberArk. Noe told The Register he has ten implants – passive devices that are observable with a full body X-ray, but invisible to most security scanners. Noe explained he's acquired swipe cards used to access controlled premises, cloned them in his implants, and successfully entered buildings by just waving his hands over card readers. ... Noe thinks hounds are therefore currently the only reliable means of finding humans with implants that could be used to clone ID cards. He thinks dogs should be considered because attackers who access datacenters using implants would probably walk away scot-free. Noe told The Register that datacenter staff would probably notice an implant-packing attacker before they access sensitive areas, but would then struggle to find grounds for prosecution because implants aren't easily detectable – and even if they were the information they contain is considered medical data and is therefore subject to privacy laws in many jurisdictions.



Quote for the day:

"Leadership is liberating people to do what is required of them in the most effective and humane way possible." -- Max DePree

No comments:

Post a Comment