OpenAI, others pushing false narratives about LLMs, says Databricks CTO
“There are definitely the larger providers, like OpenAI, Google, and so on; they
have this narrative – and they’re talking in a lot of places about how – first
of all, this stuff is super dangerous, not in the sense of a disruptive
technology, but even in the sense of ‘it might be evil and whatever’,” Zaharia
told ITPro during an interview at Databricks AI and Data Summit 2023. “It’s very
sci-fi.” “OpenAI – that’s exactly the narrative they’re pushing – but others as
well. “Anytime someone talks about AI alignment or whatever, it’s often from
this angle: Watch out, it might be evil. They’re also saying how it’s a huge
amount of work to train [models]: It’s super expensive – don’t even try it. “I’m
not sure either of those things are true.” Zaharia cited MosaicML – the startup
Databricks recently acquired for $1.3 billion – as having trained a large
language model (LLM) with 30 million parameters that’s competitive with GPT-3,
and “probably cost like ten to 20 times less” to train.
Ransomware: recovering from the inevitable
There’s no doubt that businesses’ cybersecurity teams are under an immense
amount of pressure in the battle against ransomware but they can only go so far
alone. There must be an awareness that it simply can’t be stopped at the source,
and that defending against ransomware takes a combination of people, processes
and technology. The digital world can appear complex – especially in the
case of large enterprise structures – so it can be helpful to stress that the
digital world and the real world are not that different. Digital protections
such as patching systems, multi-factor authentication, data protection and the
risk of the insider threats all have real world counterparts: open windows that
need to be locked at night, double locking your front door, locking away vital
items in a safe, and opportunistic break ins through unlocked windows or doors.
However, whilst using a combination of people, processes and technology to
minimise attacks is key, some will inevitably slip through the cracks, which is
where recovery comes into play.
AI Foundation launches AI.XYZ to give people their own AI assistants
The platform enables users to design their own AI assistants that can safely
support them in both personal and professional settings. Each AI is unique to
its creator and can assist with tasks such as note-taking, email writing,
brainstorming, and offering personalized advice and perspectives. Unlike
generic AI assistants from companies like Amazon, Google, Apple, or ChatGPT,
each AI assistant designed on AI.XYZ belongs exclusively to its creator, knows
the person’s values and goals, and provides more personalized help. The
company sees a significant opportunity for workplaces and enterprises to
provide each of their employees with their own AIs. ... AI.XYZ is available in
public beta and can be accessed on the web with an invitation code. Creators
can interact with their AIs through text, voice, and video. A free
subscription to AI.XYZ allows users to get started creating their own AI,
while a premium subscription for $20 per month allows additional capabilities
and customization options. The AI Foundation has collaborated with top
research institutions like the Technical University of Munich to create
“sustainable AI” for everyone.
TDD and the Impact on Security
Outside-In Test-Driven Development (TDD) is an approach to software
development that emphasizes starting the development process first by creating
high-level acceptance tests or end-to-end tests that demonstrate the desired
behaviour of the system from his point of view to define users or external
interfaces. It is also commonly referred to as behaviour-directed development
(BDD). With Outside-In TDD, the development process begins with writing a
failed acceptance test that describes the desired behaviour of the system.
This test is usually written from a user's perspective or a high-level
component interacting with the system. The test is expected to initially fail
as the system does not have the required functionality. Once the first
acceptance test has been performed, the next step is to write a failing unit
test for the smallest possible unit of code that will pass the acceptance
test. This unit test defines the desired behaviour of a specific module or
component within the system. The unit test fails because the corresponding
code still needs to be implemented.
Wasm: 5 things developers should be tracking
One of Wasm’s biggest draws is its cross-platform portability. Wasm is a
neutral binary format that can be shoved in a container and run anywhere. This
is key in our increasingly polyglot hardware and software world. Developers
hate compiling to multiple different formats because every additional
architecture (x86, Arm, Z, Power, etc.) adds to your test matrix, and
exploding test matrices is a very expensive problem. QE is the bottleneck for
many development teams. With Wasm, you have the potential to write
applications, compile them once, test them once, and deploy them on any number
of hardware and software platforms that span the hybrid cloud, from the edge
to your data center to public clouds. A developer on a Mac could compile a
program into a Wasm binary, test it locally, and then confidently push it out
to all of the different machines that it’s going to be deployed on. All of
these machines will already have a Wasm runtime installed on them, one that is
battle tested for that particular platform, thereby making the Wasm binaries
extremely portable, much like Java.
Getting Started with Data Literacy: Two Tips for Success
How should an enterprise get started? Langer says he “came to the inescapable
conclusion that data literacy must start with leaders. Data literacy isn't
just for the rank-and-file.” As a litmus test when he starts talking to
organizations, he asks about their leader's commitment to data literacy. “I
ask them, ‘Is your organization willing to send your leaders to training --
managers, executives, the C-suite, all of them?’ If not, which is often the
case, that probably tells you everything that you need to know, because data
literacy is very much a cultural transformation. If your leaders aren't all
in, then there's almost no point in getting started, to be frank. If employees
see their managers not exhibiting a data literacy mindset and data literacy
behaviors, they will revert to business as usual.” Langer admits to receiving
pushback; executives wonder if data literacy is needed because newer
technology such as no-code/low-code or generative AI already make it easier to
gain insights.
How Data Observability Helps Shift Left Your Data Reliability
When you consider data observability, the term “shift left” refers to a
proactive strategy that involves incorporating observability practices at the
early stages of the data lifecycle. This concept draws inspiration from
software development methodologies and emphasizes the importance of addressing
potential issues and ensuring high quality right from the start. When applied
to data observability, shifting left entails integrating observability
practices and tools into the data pipeline and infrastructure right from the
outset. This approach avoids treating observability as an afterthought or
implementing it only in later stages. The primary goal is to identify and
resolve data quality, integrity, and performance issues as early as possible,
thereby minimizing the likelihood of problems propagating downstream. ...
Taking a proactive approach to address data incidents early on enables
organizations to mitigate the potential impact and cost associated with data
issues.
Architecting Real-Time Analytics for Speed and Scale
Apache Druid has emerged as the preferred database for real-time analytics
applications due to its high performance and ability to handle streaming data.
With its support for true stream ingestion and efficient processing of large
data volumes in sub-second timeframes, even under heavy loads, Apache Druid
excels in delivering fast insights on fresh data. Its seamless integration
with Apache Kafka and Amazon Kinesis further solidifies its position as the
go-to choice for real-time analytics. When choosing an analytics database for
streaming data, considerations such as scale, latency, and data quality are
crucial. The ability to handle the full-scale of event streaming, ingest and
correlate multiple Kafka topics or Kinesis shards, support event-based
ingestion, and ensure data integrity during disruptions are key requirements.
Apache Druid not only meets these criteria but goes above and beyond to
deliver on these expectations and provide additional capabilities.
Why business leaders must tackle ethical considerations as AI becomes ubiquitous
When it comes to ethical AI, there is a true balancing act. The industry as a
whole has differing views on what is deemed ethical, making it unclear who
should make the executive decision on whose ethics are the right ethics.
However, perhaps the question to ask is whether companies are being
transparent about how they are building these systems. This is the main issue
we are facing today. Ultimately, although supporting regulation and
legislation may seem like a good solution, even the best efforts can be
thwarted in the face of fast-paced technological advancements. The future is
uncertain, and it is very possible that in the next few years, a loophole or
an ethical quagmire may surface that we could not foresee. This is why
transparency and competition are the ultimate solutions to ethical AI today.
Currently, companies compete to provide a comprehensive and seamless user
experience. For example, people may choose Instagram over Facebook, Google
over Bing, or Slack over Microsoft Teams based on the quality of
experience.
ChatGPT, compliance, and the impending wave of AI-fuelled content
Despite its convincing rhetoric, ChatGPT is, at times, deeply flawed. Quite
simply, its statements can’t always be trusted. This is a reasonably
devastating indictment for a tool which invites such vehement scrutiny, and
has been acknowledged by OpenAI, who admit that “ChatGPT sometimes writes
plausible-sounding but incorrect or nonsensical answers.” ChatGPT has a vast
wealth of knowledge because it was trained on all manner of web content, from
books and academic articles to blog posts and Wikipedia entries. Alas, the
internet is not a domain renowned for its factual integrity. Furthermore,
ChatGPT doesn’t actually connect to the internet to track down the information
it needs to respond. Instead, it simply repeats patterns it has seen in its
training data. In other words, ChatGPT arrives at an answer by making a series
of guesses, which is part of the reason it can argue wrong answers as if they
were completely true, and give different (incorrect) answers to the same
questions.
Quote for the day:
"The mediocre leader tells The good
leader explains The superior leader demonstrates The great leader inspires."
-- Buchholz and Roth
No comments:
Post a Comment