Tech Bytes - Daily Digest: Daily Tech Digest

Cisco takes aim at developing quantum data center

On top of the quantum network fabric effort, Cisco is developing a software package that includes the best way for entanglement, distribution effort, protocol, and routing algorithms, which the company is building in a protocol stack and compiler, called Quantum Orchestra. “We are developing a network-aware quantum orchestrator, which is this general framework that takes quantum jobs in terms of quantum circuits as an input, as well as the network topology, which also includes how and where the different quantum devices are distributed inside the network,” said Hassan Shapourian, Technical Leader, Cisco Outshift. “The orchestrator will let us modify a circuit for better distributability. Also, we’re going to decide which logical [quantum variational circuit] QVC to assign to which quantum device and how it will communicate with which device inside a rack.” “After that we need to schedule a set of switch configurations to enable end-to-end entanglement generations [to ensure actual connectivity]. And that involves routing as well as resource management, because, we’re going to share resources, and eventually the goal is to minimize the execution time or minimize the switching events, and the output would be a set of instructions to the switches,” Shapourian said.

How CIOs Can Fix Data Governance For Generative AI

When you look at it from a consumption standpoint, the enrichment of AI happens as you start increasing the canvas of data it can pick up, because it learns more. That means it needs very clean information. It needs [to be] more accurate, because you push in something rough, it’s going to be all trash. Traditional AI ensured that we have started cleaning the data, and metadata told us if there is more data available. AI has started pushing people to create more metadata, classification, cleaner data, reduce duplicates, ensure that there is a synergy between the sets of the data, and they’re not redundant. It’s cleaner, it’s more current, it’s real-time. Gen AI has gone a step forward. If you want to contextually make it rich, you want to pull in more RAGs into these kinds of solutions, you need to know exactly where the data sits today. You need to know exactly what is in the data to create a RAG pipeline, which is clean enough for it to generate very accurate answers. Consumption is driving behavior. In multiple ways, it is actually driving organizations to start thinking about categorization, access controls, governance. [An AI platform] also needs to know the history of the data. All these things have started happening now to do this because this is very complex.

Here’s the paper no one read before declaring the demise of modern cryptography

With no original paper to reference, many news outlets searched the Chinese Journal of Computers for similar research and came up with this paper. It wasn’t published in September, as the news article reported, but it was written by the same researchers and referenced the “D-Wave Advantage”—a type of quantum computer sold by Canada-based D-Wave Quantum Systems—in the title. Some of the follow-on articles bought the misinformation hook, line, and sinker, repeating incorrectly that the fall of RSA was upon us. People got that idea because the May paper claimed to have used a D-Wave system to factor a 50-bit RSA integer. Other publications correctly debunked the claims in the South China Morning Post but mistakenly cited the May paper and noted the inconsistencies between what it claimed and what the news outlet reported. ... It reports using a D-Wave-enabled quantum annealer to find “integral distinguishers up to 9-rounds” in the encryption algorithms known as PRESENT, GIFT-64, and RECTANGLE. All three are symmetric encryption algorithms built on a SPN—short for substitution-permutation network structure.

AI Has Created a Paradox in Data Cleansing and Management

When asked about the practices required to maintain a cleansed data set, Perkins-Munn states that in that state, it is critical to think about enhancing data cleaning and quality management. Delving further, she states that there are many ways to maintain it over time and discusses a few that include AI algorithms revolving around automated data profiling and anomaly detection. Particularly in the case of unsupervised learning models, AI algorithms automatically profile data sets and detect anomalies or outliers. Continuous data monitoring is one ongoing way to keep data clean. She also mentions intelligent data matching and deduplication, wherein machine learning algorithms improve the accuracy and efficiency of data matching and duplication processes. Apart from those, there are fuzzy matching algorithms that can identify and merge duplicate records even with minimal variations or errors. Moving forward, Perkins-Munn states that for effective data management, organizations must prioritize where to start with data cleansing, and there is no one-method-fits-all approach to it. She advises to focus on cleaning the data that directly impacts the most critical business process or decision, thus ensuring quick, tangible value.

A brief summary of language model finetuning

For language models, there are two primary goals that a practitioner will have when performing fine tuning: Knowledge injection: Teach the model how to leverage new sources of knowledge (not present during pretraining) when solving problems. Alignment (or style/format specification): Modify the way in which the language model surfaces its existing knowledge base; e.g., abide by a certain answer format, use a new style/tone of voice, avoid outputting incorrect information, and more. Given this information, we might wonder: Which fine-tuning techniques should we use to accomplish either (or both) of these goals? To answer this question, we need to take a much deeper look at recent research on the topic of fine tuning. ... We don’t need tons of data to learn the style or format of output, only to learn new knowledge. When performing fine tuning, it’s very important that we know which goal—either alignment or knowledge injection—that we are aiming for. Then, we should put benchmarks in place that allow us to accurately and comprehensively assess whether that goal was accomplished or not. Imitation models failed to do this, which led to a bunch of misleading claims/results!

Bridging Tech and Policy: Insights on Privacy and AI from IndiaFOSS 2024

Global communication systems are predominantly managed and governed by major technology corporations, often referred to as Big Tech. These organizations exert significant influence over how information flows across the world, yet they lack a nuanced understanding of the socio-political dynamics in the Global South. Pratik Sinha, co-founder at Alt News, spoke about how this gap in understanding can have severe consequences, particularly when it comes to issues such as misinformation, hate speech, and the spread of harmful content. ... The FOSS community is uniquely positioned to address these challenges by collaboratively developing communication systems tailored to the specific needs of various regions. Pratik suggested that by leveraging open-source principles, the FOSS community can create platforms (such as Mastodon) that empower users, enhance local governance, and foster a culture of shared responsibility in content moderation. In doing so, they can provide viable alternatives to Big Tech, ensuring that communication systems serve the diverse needs of communities rather than being controlled by a handful of corporations with a limited understanding of local complexities.

Revealing causal links in complex systems: New algorithm reveals hidden influences

In their new approach, the engineers took a page from information theory—the science of how messages are communicated through a network, based on a theory formulated by the late MIT professor emeritus Claude Shannon. The team developed an algorithm to evaluate any complex system of variables as a messaging network. "We treat the system as a network, and variables transfer information to each other in a way that can be measured," Lozano-Durán explains. "If one variable is sending messages to another, that implies it must have some influence. That's the idea of using information propagation to measure causality." The new algorithm evaluates multiple variables simultaneously, rather than taking on one pair of variables at a time, as other methods do. The algorithm defines information as the likelihood that a change in one variable will also see a change in another. This likelihood—and therefore, the information that is exchanged between variables—can get stronger or weaker as the algorithm evaluates more data of the system over time. In the end, the method generates a map of causality that shows which variables in the network are strongly linked.

Proactive Preparation: Learning From Crowdstrike Chaos

You can’t plan for every scenario. However, having contingency plans can significantly minimise disruption if worse case scenarios occur. Clear guidance, such as knowing who to speak to about the situation and when during outages, can help financial organisations quickly identify faults in their supply chains and restore services. ... Contractual obligations with software suppliers provide an added layer of protection if issues arise. These ensure that there’s a legally binding agreement in place to ensure suppliers handle the issue effectively. Escrow agreements are also key. They protect the critical source code behind applications by keeping a current copy in escrow and can help organisations manage risk if a supplier can no longer provide software or updates. ... supply chains are complex. Software providers also rely on their own suppliers, creating an interconnected web of dependencies. Organisations in the sector should understand their suppliers’ contingency plans to handle disruptions in their wider supply chain. Knowing these plans provides peace of mind that suppliers are also prepared for disruptions and have effective steps in place to minimise any impact.

AI Drives Major Gains for Big 3 Cloud Giants

"Over the last four quarters, the market has grown by almost $16 billion, while over the previous four quarters the respective figure was $10 billion," John Dinsdale, chief analyst at Synergy Research Group, wrote in a statement. "Given the already massive size of the market, we are seeing an impressive surge in growth." ... The Azure OpenAI Service emerged as a particular bright spot, with usage more than doubling over the past six months. AI-based cloud services overall are helping Microsoft's cloud business. ... According to Pichai, Google Cloud's success is focused around five strategic areas. First, its AI infrastructure demonstrated leading performance through advances in storage, compute, and software. Second, the enterprise AI platform, Vertex, showed remarkable growth, with Gemini API calls increasing nearly 14 times over a six-month period. ... Looking ahead, AWS plans increased capital expenditure to support AI growth. "It is a really unusually large, maybe once-in-a-lifetime type of opportunity," Jassy said about the potential of generative AI. "I think our customers, the business, and our shareholders will feel good about this long term that we're aggressively pursuing it."

GreyNoise: AI’s Central Role in Detecting Security Flaws in IoT Devices

GreyNoise’s Sift is powered by large language models (LLMs) that are trained on a massive amount of internet traffic – including which targets targeting IoT devices – that can identify anomalies in the traffic that traditional system could miss, they wrote. They said Sift can spot new anomalies and threats that haven’t been identified or don’t fit the known signatures of known threats. The honeypot analyzes real-time traffic and uses the vendor’s proprietary datasets and then runs the data through AI systems to separate routine internet activity from possible threats, which whittles down what human researchers need to focus on and delivers faster and more accurate results. ... The discovery of the vulnerabilities highlights the larger security issues for an IoT environment that number 18 billion devices worldwide this year and could grow to 32.1 billion by 2030. “Industrial and critical infrastructure sectors rely on these devices for operational efficiency and real-time monitoring,” the GreyNoise researchers wrote. “However, the sheer volume of data generated makes it challenging for traditional tools to discern genuine threats from routine network traffic, leaving systems vulnerable to sophisticated attacks.”

Quote for the day:

"If you're not confused, you're not paying attention." -- Tom Peters

Tech Bytes - Daily Digest

Pages

Daily Tech Digest - November 02, 2024