Daily Tech Digest by Kannan Subbiah: Daily Tech Digest

How LLMs made their way into the modern data stack in 2023

Beyond helping teams generate insights and answers from their data through text inputs, LLMs are also handling traditionally manual data management and the data efforts crucial to building a robust AI product. In May, Intelligent Data Management Cloud (IDMC) provider Informatica debuted Claire GPT, a multi-LLM-based conversational AI tool that allows users to discover, interact with and manage their IDMC data assets with natural language inputs. It handles multiple jobs within the IDMC platform, including data discovery, data pipeline creation and editing, metadata exploration, data quality and relationships exploration, and data quality rule generation. Then, to help teams build AI offerings, California-based Refuel AI provides a purpose-built large language model that helps with data labeling and enrichment tasks. A paper published in October 2023 also shows that LLMs can do a good job at removing noise from datasets, which is also a crucial step in building robust AI. Other areas in data engineering where LLMs can come into play are data integration and orchestration.

Corporate governance in 2023: a year in review

2023 has seen a continuing trend of more responsibilities for directors. Often, this responsibility comes from regulators; sometimes, it comes from investors or other stakeholders. One thing is certain, though: directors are rapidly losing any remaining wiggle room to be “rubber-stamp” individuals. Modern board roles carry serious accountability; many directors are starting to appreciate that and adhere to new standards. The trouble is sometimes the new standard overstretch the director – so much so that we now have concerns about overboarding, exhaustion, and undue stress. How will that play out if the trend of more responsibility continues? ... The board dismissed the evidently popular CEO Sam Altman in a decision made behind closed doors with utmost secrecy. And as the world’s attention predictably turned their way, they could give no answers. Soon, Altman was rehired after around 70% of the company’s staff threatened to resign and join Microsoft (a significant OpenAI investor). The board subsequently agreed to undergo a major reshuffle for more accountability and transparent decision-making.

Quantum Computing’s Hard, Cold Reality Check

The problem isn’t just one of timescales. In May, Matthias Troyer, a technical fellow at Microsoft who leads the company’s quantum computing efforts, co-authored a paper in Communications of the ACM suggesting that the number of applications where quantum computers could provide a meaningful advantage was more limited than some might have you believe. “We found out over the last 10 years that many things that people have proposed don’t work,” he says. “And then we found some very simple reasons for that.” The main promise of quantum computing is the ability to solve problems far faster than classical computers, but exactly how much faster varies. There are two applications where quantum algorithms appear to provide an exponential speed up, says Troyer. One is factoring large numbers, which could make it possible to break the public key encryption the internet is built on. The other is simulating quantum systems, which could have applications in chemistry and materials science. Quantum algorithms have been proposed for a range of other problems including optimization, drug design, and fluid dynamics.

Navigating the Data Landscape: The Crucial Role of Data Governance in Today’s Business Environment

Data quality management has become increasingly paramount as the volume of data exponentially raises day by day. Organizations can protect their data with policies and procedures, ensure that they follow all the rules and regulations, hire folks that understand the data you are collecting and what it means to their company but if that data isn’t high quality your organization may get the short end of the stick. Maybe you’re three weeks late for a TikTok trend or you miss out on a whole subset of customers because of the misstep with your collection methods, either way that profit loss and a chance to build on that data point in the future could be a pivotal misstep. Ensuring that your organization has processes to monitor and improve your data quality on a continuous basis will save your organization time and money in the long run. Despite its importance, implementing effective data governance comes with challenges. Organizations often face resistance to change, cultural barriers, and the complexity of managing diverse data sources.

Choosing Between Message Queues and Event Streams

There are numerous distinctions between technologies that allow you to implement event streaming and those that you can use for message queueing. To highlight them, I will compare Apache Kafka and RabbitMQ. I’ve chosen Kafka and RabbitMQ specifically because they are popular, widely used solutions providing rich capabilities that have been extensively battle-tested in production environments. ... Message queueing and event streaming can both be used in scenarios requiring decoupled, asynchronous communication between different parts of a system. For instance, in microservices architectures, both can power low-latency messaging between various components. However, going beyond messaging, event streaming and message queueing have distinct strengths and are best suited to different use cases. ... Message queueing is a good choice for many messaging use cases. It’s also an appealing proposition if you’re early in your event-driven journey; that’s because message queueing technologies are generally easier to deploy and manage than event streaming solutions.

5G and edge computing: What they are and why you should care

Instead of relying solely on large, high-powered cell towers (as 4G does), 5G will run off both those towers and a ton of small cell sites that can be clustered together. This is how 5G achieves its population density. 5G is also supposed to be more energy efficient. As such, the communications component of IoT devices won't drain as much power, resulting in longer battery life for connected devices. There's also a ton of AI and machine learning in 5G implementations. 5G nodes and interface devices deployed on the edge, away from central hubs. They utilize AI and machine learning to analyze communications performance, and use AI to bandwidth-shape communications, to wring as much performance out of the hardware as possible. You're familiar with the term "cloud computing." We've all used cloud services, services that run on a server someplace rather than on our desktop computers or mobile devices. The cloud, of course, isn't really a cloud. Amazon, Google, Facebook, Microsoft, and others operate massive data centers packed with thousands upon thousands of servers. Soft and fluffy, the cloud is not.

Stolen Booking.com Credentials Fuel Social Engineering Scams

Social engineering expert Sharon Conheady said this type of trickery remains extremely difficult to repel, because of the customer-first nature of hospitality. Many public-facing people in such organizations, such as receptionists, are "trained to help people - that's their job," and of course they're going to bend over backwards to try to meet apparent customers' demands, Conheady said in an interview at this month's Black Hat Europe conference in London. Help desks remain another frequent target. "I had a client lately who asked me to call the help desk and obtain BitLocker keys," she said, referring to a recent penetration test. "Every single one of the help desk agents gave us the BitLocker key." That prompted her to ask: Do these personnel even know what a BitLocker key is, and why they shouldn't share it? The client said they didn't know. While training people in customer-facing roles can help, Conheady said the only truly effective approach would be to put in place strong technical controls to outright prevent and block such attacks.

Significantly Improving Security Posture: A CMMI Case Study

“Phoenix Defense has led the way in adopting CMMI Security best practices for nearly two decades, and now included the Security best practices,” says Kris Puthucode, Certified CMMI High Maturity Lead Appraiser at Software Quality Center LLC. “This adoption has yielded quantifiable benefits, enhancing security posture across Mission, Personnel, Physical, Process, and Cybersecurity domains. Additionally, incorporating Virtual work best practices has standardized virtual meetings and events, boosting efficiency.” Phoenix Defense has been a CMMI Performance Solutions Organization since 2005, first achieving Maturity Level 5 in 2020. ... Before adopting CMMI Security and Managing Security Threats and Vulnerabilities Practice Areas in the model, Phoenix Defense had a closed network with no outward-facing applications and relied on a third-party vendor to monitor threats and spam. They did not fully, quantitively track attacks against the networks or other data flows, and they required a more robust approach to properly ensure network security.

5 common data security pitfalls — and how to avoid them

While regulations like GDPR and SOX set standards for data security, they are merely starting points and should be considered table stakes for protecting data. Compliance should not be mistaken for complete data security, as robust security involves going beyond compliance checks. The fact is that many large data breaches have occurred in organizations that were fully compliant on paper. Moving beyond compliance requires actively identifying and mitigating risks rather than just ticking boxes during audits. ... Data is one of the most valuable assets for any organization. And yet, the question, “Who owns the data?” often leads to ambiguity within organizations. Clear delineation of data ownership and responsibility is crucial for effective data governance. Each team or employee must understand their role in protecting data to create a culture of security. ... Unpatched vulnerabilities are one of the easiest targets for cyber criminals. This means that organizations face significant risks when they can’t address public vulnerabilities quickly. Despite the availability of patches, many enterprises delay deployment for various reasons, which leaves sensitive data vulnerable.

Outmaneuvering AI: Cultivating Skills That Make Algorithms Scratch Their Head

Reasoning, the intellectual ninja of skills, is all about slicing through misinformation, assumptions, and biases to get to the heart of the matter. It’s not just drawing conclusions, but thinking about how we do that. This skill is the brain’s bouncer, keeping cognitive fallacies and hasty generalizations at bay. We humans, bless our hearts, are prone to jumping on the bandwagon or seeing patterns where there are none (like seeing a face on Mars or believing in hot streaks at Vegas). These mental shortcuts, or heuristics, can lead us astray, making reasoning not just useful but essential. AI is trained on our past reasoning reflected in old works. But it can’t reason on its own — at least not yet. Consider a business deciding whether to invest in a new technology. Without proper reasoning, they might follow the hype (everyone else is doing it!) or rely on gut feelings (it just feels right!). But with reasoning, they dissect the decision, weigh the evidence, consider alternatives, and make a choice that’s not just good on paper, but good in reality.

Quote for the day:

"Whether you think you can or you think you can’t, you’re right." -- Henry Ford

Daily Tech Digest by Kannan Subbiah

Daily Tech Digest - December 23, 2023