Tech Bytes - Daily Digest: Daily Tech Digest

OpenAI, others pushing false narratives about LLMs, says Databricks CTO

“There are definitely the larger providers, like OpenAI, Google, and so on; they have this narrative – and they’re talking in a lot of places about how – first of all, this stuff is super dangerous, not in the sense of a disruptive technology, but even in the sense of ‘it might be evil and whatever’,” Zaharia told ITPro during an interview at Databricks AI and Data Summit 2023. “It’s very sci-fi.” “OpenAI – that’s exactly the narrative they’re pushing – but others as well. “Anytime someone talks about AI alignment or whatever, it’s often from this angle: Watch out, it might be evil. They’re also saying how it’s a huge amount of work to train [models]: It’s super expensive – don’t even try it. “I’m not sure either of those things are true.” Zaharia cited MosaicML – the startup Databricks recently acquired for $1.3 billion – as having trained a large language model (LLM) with 30 million parameters that’s competitive with GPT-3, and “probably cost like ten to 20 times less” to train.

Ransomware: recovering from the inevitable

There’s no doubt that businesses’ cybersecurity teams are under an immense amount of pressure in the battle against ransomware but they can only go so far alone. There must be an awareness that it simply can’t be stopped at the source, and that defending against ransomware takes a combination of people, processes and technology. The digital world can appear complex – especially in the case of large enterprise structures – so it can be helpful to stress that the digital world and the real world are not that different. Digital protections such as patching systems, multi-factor authentication, data protection and the risk of the insider threats all have real world counterparts: open windows that need to be locked at night, double locking your front door, locking away vital items in a safe, and opportunistic break ins through unlocked windows or doors. However, whilst using a combination of people, processes and technology to minimise attacks is key, some will inevitably slip through the cracks, which is where recovery comes into play.

AI Foundation launches AI.XYZ to give people their own AI assistants

The platform enables users to design their own AI assistants that can safely support them in both personal and professional settings. Each AI is unique to its creator and can assist with tasks such as note-taking, email writing, brainstorming, and offering personalized advice and perspectives. Unlike generic AI assistants from companies like Amazon, Google, Apple, or ChatGPT, each AI assistant designed on AI.XYZ belongs exclusively to its creator, knows the person’s values and goals, and provides more personalized help. The company sees a significant opportunity for workplaces and enterprises to provide each of their employees with their own AIs. ... AI.XYZ is available in public beta and can be accessed on the web with an invitation code. Creators can interact with their AIs through text, voice, and video. A free subscription to AI.XYZ allows users to get started creating their own AI, while a premium subscription for $20 per month allows additional capabilities and customization options. The AI Foundation has collaborated with top research institutions like the Technical University of Munich to create “sustainable AI” for everyone.

TDD and the Impact on Security

Outside-In Test-Driven Development (TDD) is an approach to software development that emphasizes starting the development process first by creating high-level acceptance tests or end-to-end tests that demonstrate the desired behaviour of the system from his point of view to define users or external interfaces. It is also commonly referred to as behaviour-directed development (BDD). With Outside-In TDD, the development process begins with writing a failed acceptance test that describes the desired behaviour of the system. This test is usually written from a user's perspective or a high-level component interacting with the system. The test is expected to initially fail as the system does not have the required functionality. Once the first acceptance test has been performed, the next step is to write a failing unit test for the smallest possible unit of code that will pass the acceptance test. This unit test defines the desired behaviour of a specific module or component within the system. The unit test fails because the corresponding code still needs to be implemented.

Wasm: 5 things developers should be tracking

One of Wasm’s biggest draws is its cross-platform portability. Wasm is a neutral binary format that can be shoved in a container and run anywhere. This is key in our increasingly polyglot hardware and software world. Developers hate compiling to multiple different formats because every additional architecture (x86, Arm, Z, Power, etc.) adds to your test matrix, and exploding test matrices is a very expensive problem. QE is the bottleneck for many development teams. With Wasm, you have the potential to write applications, compile them once, test them once, and deploy them on any number of hardware and software platforms that span the hybrid cloud, from the edge to your data center to public clouds. A developer on a Mac could compile a program into a Wasm binary, test it locally, and then confidently push it out to all of the different machines that it’s going to be deployed on. All of these machines will already have a Wasm runtime installed on them, one that is battle tested for that particular platform, thereby making the Wasm binaries extremely portable, much like Java.

Getting Started with Data Literacy: Two Tips for Success

How should an enterprise get started? Langer says he “came to the inescapable conclusion that data literacy must start with leaders. Data literacy isn't just for the rank-and-file.” As a litmus test when he starts talking to organizations, he asks about their leader's commitment to data literacy. “I ask them, ‘Is your organization willing to send your leaders to training -- managers, executives, the C-suite, all of them?’ If not, which is often the case, that probably tells you everything that you need to know, because data literacy is very much a cultural transformation. If your leaders aren't all in, then there's almost no point in getting started, to be frank. If employees see their managers not exhibiting a data literacy mindset and data literacy behaviors, they will revert to business as usual.” Langer admits to receiving pushback; executives wonder if data literacy is needed because newer technology such as no-code/low-code or generative AI already make it easier to gain insights.

How Data Observability Helps Shift Left Your Data Reliability

When you consider data observability, the term “shift left” refers to a proactive strategy that involves incorporating observability practices at the early stages of the data lifecycle. This concept draws inspiration from software development methodologies and emphasizes the importance of addressing potential issues and ensuring high quality right from the start. When applied to data observability, shifting left entails integrating observability practices and tools into the data pipeline and infrastructure right from the outset. This approach avoids treating observability as an afterthought or implementing it only in later stages. The primary goal is to identify and resolve data quality, integrity, and performance issues as early as possible, thereby minimizing the likelihood of problems propagating downstream. ... Taking a proactive approach to address data incidents early on enables organizations to mitigate the potential impact and cost associated with data issues.

Architecting Real-Time Analytics for Speed and Scale

Apache Druid has emerged as the preferred database for real-time analytics applications due to its high performance and ability to handle streaming data. With its support for true stream ingestion and efficient processing of large data volumes in sub-second timeframes, even under heavy loads, Apache Druid excels in delivering fast insights on fresh data. Its seamless integration with Apache Kafka and Amazon Kinesis further solidifies its position as the go-to choice for real-time analytics. When choosing an analytics database for streaming data, considerations such as scale, latency, and data quality are crucial. The ability to handle the full-scale of event streaming, ingest and correlate multiple Kafka topics or Kinesis shards, support event-based ingestion, and ensure data integrity during disruptions are key requirements. Apache Druid not only meets these criteria but goes above and beyond to deliver on these expectations and provide additional capabilities.

Why business leaders must tackle ethical considerations as AI becomes ubiquitous

When it comes to ethical AI, there is a true balancing act. The industry as a whole has differing views on what is deemed ethical, making it unclear who should make the executive decision on whose ethics are the right ethics. However, perhaps the question to ask is whether companies are being transparent about how they are building these systems. This is the main issue we are facing today. Ultimately, although supporting regulation and legislation may seem like a good solution, even the best efforts can be thwarted in the face of fast-paced technological advancements. The future is uncertain, and it is very possible that in the next few years, a loophole or an ethical quagmire may surface that we could not foresee. This is why transparency and competition are the ultimate solutions to ethical AI today. Currently, companies compete to provide a comprehensive and seamless user experience. For example, people may choose Instagram over Facebook, Google over Bing, or Slack over Microsoft Teams based on the quality of experience.

ChatGPT, compliance, and the impending wave of AI-fuelled content

Despite its convincing rhetoric, ChatGPT is, at times, deeply flawed. Quite simply, its statements can’t always be trusted. This is a reasonably devastating indictment for a tool which invites such vehement scrutiny, and has been acknowledged by OpenAI, who admit that “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.” ChatGPT has a vast wealth of knowledge because it was trained on all manner of web content, from books and academic articles to blog posts and Wikipedia entries. Alas, the internet is not a domain renowned for its factual integrity. Furthermore, ChatGPT doesn’t actually connect to the internet to track down the information it needs to respond. Instead, it simply repeats patterns it has seen in its training data. In other words, ChatGPT arrives at an answer by making a series of guesses, which is part of the reason it can argue wrong answers as if they were completely true, and give different (incorrect) answers to the same questions.

Quote for the day:

"The mediocre leader tells The good leader explains The superior leader demonstrates The great leader inspires." -- Buchholz and Roth

Tech Bytes - Daily Digest

Pages

Daily Tech Digest - July 02, 2023