Quote for the day:
"Small daily improvements over time lead
to stunning results." -- Robin Sharma

Streaming requires a new mindset. You must reason about event time compared to
processing time, manage watermarking and windowing and guarantee exactly-once
semantics even when things change midstream. These design patterns must be
built into your pipelines from the beginning. ... Agentic AI stretches the
typical data engineer’s streaming data skill set because it is no longer about
a single model running in isolation. Today, we see networks of perception
agents, reasoning agents and execution agents working together, each handling
tasks and passing insights to the next in real time. If you know only how to
schedule batch ETL jobs or deploy an inference server, you’re missing a core
skill: how to build high-throughput, low-latency pipelines that keep these
agents reliable and responsive in production. ... A single slow or broken
stream can cause cascading failures in multiagent systems. Use schema
registries, enforce data contracts and apply exactly-once semantics to
maintain trust in your streaming infrastructure. ... Communication presents
another challenge. Data scientists often discuss “precision” as a metric that
data engineers must translate into reality. Implement evaluation scores like
factual consistency checks, entity precision comparisons and human-in-the-loop
review pipelines.

Anthropic said the first environment it developed tests an agent’s ability to
complete an alignment audit for an intentionally misaligned model. For this
test, the researchers brought the investigator agent to the environment and
equipped it with similar tools that human teams use. The researchers instructed
the agent to identify the root cause of the misalignment and determine the
specific behaviors that were misaligned. According to the researchers, the
investigator agent successfully found the root cause of the issues 10-13% of the
time. Still, this performance improved to 42% “when using a super-agent approach
that aggregates findings across multiple investigations.” ... The next step in
most alignment audits is typically a behavioral evaluation, which determines
whether the model consistently misbehaves in various settings. To test its
evaluation agent, the researchers designed 10 models “given system prompt
instructions to exhibit a specific behavior, such as excessive deference to the
user.” They ran the agent five times per model and saw that the agent correctly
finds and flags at least one quirk of the model. However, the agent sometimes
failed to identify specific quirks consistently. It had trouble evaluating
subtle quirks, such as self-promotion and research-sandbagging, as well as
quirks that are difficult to elicit, like the Hardcode Test Cases quirk.

As enterprises race to operationalize AI, the challenge isn't only about
building and deploying large language models (LLMs), it's also about integrating
them seamlessly into existing API ecosystems while maintaining enterprise level
security, governance, and compliance. Apigee is committed to lead you in this
journey. Apigee streamlines the integration of gen AI agents into applications
by bolstering their security, scalability, and governance. While the Model
Context Protocol (MCP) has emerged as a de facto method of integrating discrete
APIs as tools, the journey of turning your APIs into these agentic tools is
broader than a single protocol. This post highlights the critical role of your
existing API programs in this evolution and how ... Leveraging MCP services
across a network requires specific security constraints. Perhaps you would like
to add authentication to your MCP server itself. Once you’ve authenticated calls
to the MCP server you may want to authorize access to certain tools depending on
the consuming application. You may want to provide first class observability
information to track which tools are being used and by whom. Finally, you may
want to ensure that whatever downstream APIs your MCP server is supplying tools
for also has minimum guarantees of security like already outlined above

A skill is a single ability, such as the ability to write a message or analyze a
spreadsheet and trigger actions from that analysis. An agent independently
handles complex, multi-step processes to produce a measurable outcome. We
recently announced an expanded network of Joule Agents to help foster autonomous
collaboration across systems and lines of business. This includes out-of-the-box
agents for HR, finance, supply chain, and other functions that companies can
deploy quickly to help automate critical workflows. AI front-runners, such as
Ericsson, Team Liquid, and Cirque du Soleil, also create customized agents that
can tackle specific opportunities for process improvement. Now you can build
them with Joule Studio, which provides a low-code workspace to help design,
orchestrate, and manage custom agents using pre-defined skills, models, and data
connections. This can give you the power to extend and tailor your agent network
to your exact needs and business context. ... Another way to become an AI
front-runner is to tackle fragmented tools and solutions by putting in place an
open, interoperable ecosystem. After all, what good is an innovative AI tool if
it runs into blockers when it encounters your other first- and third-party
solutions?

The most difficult part of this transformation wasn’t the technology but getting
people to collaborate in new ways, which required a greater focus on stakeholder
alignment and change management. So my colleague first established a strong
governance structure. A steering committee with leaders from key functions like
IT, operations, finance, and merchandising met biweekly to review progress and
resolve conflicts. This wasn’t a token committee, but a body with authority. If
there were any issues with data exchange between marketing and supply chain,
they were addressed and resolved during the meetings. By bringing all
stakeholders together, we were also able to identify discrepancies early on. For
example, when we discovered a new feature in the inventory system could slow
down employee workflows, the operations manager reported it, and we immediately
adjusted the rollout plan. Previously, such issues might not have been
identified until after the full rollout and subsequent finger-pointing between
IT and business departments. The next step was to focus on communication and
culture. From previous failed projects, we knew that sending a few emails wasn’t
enough, so we tried a more personal approach. We identified influential
employees in each department and recruited them as change champions.

HumanEval and SWE-bench have taken hold in the ML community, and yet, as
indicated above, neither is necessarily reflective of LLMs’ competence in
everyday software engineering tasks. I conjecture one of the reasons is the
differences in points of view of the two communities! The ML community prefers
large-scale, automatically scored benchmarks, as long as there is a “hill
climbing” signal to improve LLMs. The business imperative for LLM makers to
compete on popular leaderboards can relegate the broader user experience to a
secondary concern. On the other hand, the software engineering community needs
benchmarks that capture specific product experiences closely. Because curation
is expensive, the scale of these benchmarks is sufficient only to get a
reasonable offline signal for the decision at hand (A/B testing is always
carried out before a launch). Such benchmarks may also require a complex setup
to run, and sometimes are not automated in scoring; but these shortcomings can
be acceptable considering a smaller scale. For exactly these reasons, these are
not useful to the ML community. Much is lost due to these different points of
view. It is an interesting question as to how these communities could
collaborate to bridge the gap between scale and meaningfulness and create evals
that work well for both communities.

When a quantum computer successfully handles a task that would be practically
impossible for current computers, this achievement is referred to as quantum
advantage. However, this advantage does not apply to all types of problems,
which has led scientists to explore the precise conditions under which it can
actually be achieved. While earlier research has outlined several conditions
that might allow for quantum advantage, it has remained unclear whether those
conditions are truly essential. To help clarify this, researchers at Kyoto
University launched a study aimed at identifying both the necessary and
sufficient conditions for achieving quantum advantage. Their method draws on
tools from both quantum computing and cryptography, creating a bridge between
two fields that are often viewed separately. ... “We were able to identify the
necessary and sufficient conditions for quantum advantage by proving an
equivalence between the existence of quantum advantage and the security of
certain quantum cryptographic primitives,” says corresponding author Yuki
Shirakawa. The results imply that when quantum advantage does not exist, then
the security of almost all cryptographic primitives — previously believed to be
secure — is broken. Importantly, these primitives are not limited to quantum
cryptography but also include widely-used conventional cryptographic primitives
as well as post-quantum ones that are rapidly evolving.

With increasing social and regulatory pressure, reluctance by a company to
reveal emissions is ill-received. For example, in Europe the Corporate
Sustainability Reporting Directive (CSRD) currently requires large businesses to
publish their emissions and other sustainability datapoints. Opaque
sustainability reporting undermines environmental commitments and distorts the
reference points necessary for net zero progress. How can organisations work
toward a low-carbon future when its measurement tools are incomplete or
unreliable? The issue is particularly acute regarding Scope 3 emissions. Scope 3
emissions often account for the largest share of a company’s carbon footprint
and are those generated indirectly along the supply chain by a company’s
vendors, including emissions from technology infrastructure like data centres.
... It sounds grim, but there is some cause for optimism. Most companies are in
a better position than they were five years ago and acknowledge that their
measurement capabilities have improved. We need to accelerate the momentum of
this progress to ensure real action. Earth Overshoot Day is a reminder that
climate reporting for the sake of accountability and compliance only covers the
basics. The next step is to use emissions data as benchmarks for real-world
progress.

Building resilience isn’t just about buying more tech, it’s about making data
more trustworthy, shareable, and actionable. That’s where global data standards
play a critical role. The most agile supply chains are built on a shared
framework for identifying, capturing, and sharing data. When organizations use
consistent product and location identifiers, such as GTINs (Global Trade Item
Numbers) and GLNs (Global Location Numbers) respectively, they reduce ambiguity,
improve traceability, and eliminate the need for manual data reconciliation.
With a common data language in place, businesses can cut through the noise of
siloed systems and make faster, more confident decisions. ... Companies further
along in their digital transformation can also explore advanced data-sharing
standards like EPCIS (Electronic Product Code Information Services) or RFID
(radio frequency identification) tagging, particularly in high-volume or
high-risk environments. These technologies offer even greater visibility at the
item level, enhancing traceability and automation. And the benefits of this kind
of visibility extend far beyond trade compliance. Companies that adopt global
data standards are significantly more agile. In fact, 58% of companies with full
standards adoption say they manage supply chain agility “very well” compared to
just 14% among those with no plans to adopt standards, studies show.

When we build autonomous systems and allow them to make decisions for us, we
enter a strange world of ethical limbo. A self-driving car forced to make a
similar decision to protect the driver or a pedestrian in a case of a
potentially fatal crash will have much more time than a human to make its
choice. But what factors influence that choice? ... It’s not just the AI systems
shaping the narrative, raising some voices while quieting others. Organisations
made up of ordinary flesh-and-blood people are doing it too. Irish cognitive
scientist Abeba Birhane, a highly-regarded researcher of human behaviour, social
systems and responsible and ethical artificial intelligence was asked to give a
keynote recently for the AI for Good Global Summit. According to her own reports
on Bluesky, a meeting was requested just hours before presenting her keynote: “I
went through an intense negotiation with the organisers (for over an hour) where
we went through my slides and had to remove anything that mentions ‘Palestine’
‘Israel’ and replace ‘genocide’ with ‘war crimes’…and a slide that explains
illegal data torrenting by Meta, I also had to remove. In the end, it was either
remove everything that names names (Big Tech particularly) and remove logos, or
cancel my talk.”
No comments:
Post a Comment