Quote for the day:
"You are not a team because you work together. You are a team because you trust, respect and care for each other." -- Vala Afshar
How to automate the testing of AI agents
Experts view testing AI agents as a strategic risk management function that
encompasses architecture, development, offline testing, and observability for
online production agents. ... “Testing agentic AI is no longer QA, it is
enterprise risk management, and leaders are building digital twins to stress
test agents against messy realities: bad data, adversarial inputs, and edge
cases,” says Srikumar Ramanathan ... “Agentic systems are non-deterministic and
can’t be trusted with traditional QA alone; enterprises need tools that trace
reasoning, evaluate judgment, test resilience, and ensure adaptability over
time,” says Nikolaos Vasiloglou ... Part of the implementation strategy
will require integrating feedback from production back into development and test
environments. Although testing AI agents should be automated, QA engineers will
need to develop workflows that include reviews from subject matter experts and
feedback from other end users. “Hierarchical scenario-based testing, sandboxed
environments, and integrated regression suites—built with cross-team
collaboration—form the core approach for test strategy,” says Chris Li ... Mike
Finley, says, “One key way to automate testing of agentic AI is to use
verifiers, which are AI supervisor agents whose job is to watch the work of
others and ensure that they fall in line. Beyond accuracy, they’re also looking
for subtle things like tone and other cues. If we want these agents to do human
work, we have to watch them like we would human workers.”AI For Proactive Risk Governance In Today’s Uncertain Landscape
Emerging risks are no longer confined to familiar categories like credit or
operational performance. Instead, leaders are contending with a complex web of
financial, regulatory, technological and reputational pressures that are
interconnected and fast-moving. This shift has made it harder for executives
to anticipate vulnerabilities and act before risks escalate into real business
impact. ... The sheer volume of evolving requirements can overwhelm compliance
teams, increasing the risk of oversight gaps, missed deadlines or inconsistent
reporting. For many organizations, the challenge is not simply keeping up but
proving to regulators and stakeholders that governance practices are both
proactive and defensible. ... As businesses evaluate their options to get
ahead of risk, AI is top of the list. But not all AI is created equal, and
paradoxically, some approaches may introduce added risk. General-purpose large
language models can be powerful tools for information synthesis, but they are
not designed to deliver the accuracy, transparency and auditability required
for high-stakes enterprise decisions. Their probabilistic nature means outputs
can at times be incomplete or inaccurate. ... Every AI output must be
explainable, traceable and auditable. Executives need to understand the
reasoning behind the recommendations they present to boards, regulators or
shareholders. Defensible AI ensures that decisions can withstand scrutiny,
fostering both compliance and trust between human and machine.Navigating India's Data Landscape: Essential Compliance Requirements under the DPDP Act
The Digital Personal Data Protection Act, 2023 (DPDP Act) marks a pivotal shift
in how digital personal data is managed in India, establishing a framework that
simultaneously recognizes the individual's right to protect their personal data
and the necessity for processing such data for lawful purposes. For any
organization—defined broadly to include individuals, companies, firms, and the
State—that determines the purpose and means of processing personal data (a "Data
Fiduciary" or DF), compliance with the DPDP Act requires strict adherence to
several core principles and newly defined rules. Compliance with the DPDP
Act is like designing a secure building: it requires strong foundational
principles, robust security systems, specific safety features for vulnerable
occupants (Child Data rules), specialized certifications for large structures,
and a clear plan for Data Erasure. Organizations must begin planning now, as the
core operational rules governing notice, security, child data, and retention
come into force eighteen months after the publication date of the DPDP Rules in
November 2025. ... DFs must implement appropriate technical and organizational
measures. These safeguards must include techniques like encryption, obfuscation,
masking, or the use of virtual tokens, along with controlled access to computer
resources and measures for continued processing in case of compromise, such as
data backups.
Doomed enterprise AI projects usually lack vision
CIOs and other IT decision-makers are under pressure from boards and CEOs who
want their companies to be “AI-first” operations; that runs the risk of moving
too fast on execution and choosing the right projects, said Steven Dickens,
principal analyst at Hyperframe Research. Smart leaders are cautious and
pragmatic and focused on validated value, not jumping the gun on
mission-critical processes. “They are ring-fencing pilot projects to low-risk,
high-impact areas like internal code generation or customer service triage,”
Dickens said. ... In this experimental period, organizations viewing AI as a way
to reimagine business will take an early lead, Tara Balakrishnan, associate
partner at McKinsey, said in the study. “While many see leading indicators from
efficiency gains, focusing only on cost can limit AI’s impact,” Balakrishnan
wrote. Scalability, project costs, and talent availability also play key roles
in moving proof-of-concept projects to production. AI tools are not just plug
and play, said Jinsook Han, chief strategy and agentic AI officer at Genpact.
While companies can experiment with flashy demos and proofs of concept, the
technology also needs to be usable and relevant, Han said. ... Many AI projects
fail because they are built atop legacy IT systems, Han said, adding that
modifying a company’s technology stack, workflows, and processes will maximize
what AI can do. Humans also still need to oversee AI projects and outcomes —
especially when agentic AI is involved, Han said.
GenAI vs Agentic AI: From creation to action — What enterprises need to know
Generative AI and Agentic AI are two separate – but often interrelated –
paradigms. Generative AI excels in authoring or creating content from prompts,
while Agentic AI involves taking autonomous actions to achieve objectives in
complex workflows that involve multiple steps. ... Agentic AI is the next step
to advances in data science – from construction to self-execution. They act as
intelligent digital workers capable of managing a vast array of complex
multi-step workflows. In banking and financial services, Agentic AI enables
autonomous function for trading and portfolio management. Given a strategic
objective like “maximize return within an acceptable risk parameter,” it can
perform autonomously by monitoring market signals, executing traders’ decisions
by rebalancing assets and adjusting portfolios, all in real-time. ... The
difference between Generative AI and Agentic AI is starting to fade. We are
heading toward a future version of generative models being the “thinking engine”
of agentic systems. It will not be Generative AI versus Agentic AI. Intelligent
systems will reason, create and act across business ecosystems. For this to
happen, there will be a need for interoperable systems and common standards.
There are frameworks such as the Model Context Protocol (MCP) and metadata
standards like AgentFacts already laying the groundwork for a transparent and
plug-and-play agent ecosystems to provide trust, transparency, and safe
collaboration for agents between platforms.
Pushing the thermal envelope
“When new data centers are designed today, instead of relying solely on the grid, they are integrating on-site power stations with their facilities. These on-site generators function like traditional power stations, and as heat engines, they produce substantial byproduct heat,” Hannah explains. This high-grade, abundant heat opens new possibilities. Technologies such as absorption chillers, historically underutilized in data centers due to insufficient heat, can now be deployed effectively when coupled with BYOP systems. This flexibility extends to operational optimization as well. ... The digital twin methodology allows engineers to create theoretical models of systems to simulate responses and tune control algorithms accordingly. Operational or production-based digital twins extend this approach by using field and system data to continuously improve model accuracy over time. ... The thermal chain and power train now operate less as separate systems and more as partners in a shared ecosystem, each dependent on the other for optimal performance. This growing synergy extends beyond technology, driving closer collaboration between traditionally separate teams across design, engineering, manufacturing, and operations. “The growth is so incredible that customers are looking for products and systems they can deploy quickly – solutions that are easy to install, reliable, densified, cost-effective, and efficient,” says Hannah. “Right now, speed of deployment is the priority.”Cloud Services Face Scrutiny Under the Digital Markets Act
Today, European authorities announced three new market investigations into
cloud-computing services under the Digital Markets Act (DMA), as EU leaders
gather in Berlin for the Summit on European Digital Sovereignty — an event
billed as a push for an “independent, secure and innovation-friendly digital
future for Europe.” Two investigations will assess whether Amazon Web Services
(AWS) and Microsoft’s Azure should be designated as gatekeepers, despite
apparently “not meeting the DMA gatekeeper thresholds for size, user number and
market position.” A third investigation is to assess if the DMA is best placed
to “effectively tackle practices that may limit competitiveness and fairness in
the cloud computing sector in the EU.” ... Europe is increasingly concerned
about data security and sovereignty, spurred in part by the Trump
administration’s ongoing hostility to the EU and the powers granted by the CLOUD
Act (Clarifying Lawful Overseas Use of Data Act), which allows US law
enforcement to obtain data stored abroad, even data concerning non-US citizens.
Fears of a potential “kill switch” have pushed digital sovereignty up the EU
agenda, with some member states switching away from the biggest cloud providers
and adopting European alternatives. However, to switch away from US providers at
scale may require competition law enforcement and regulation. The European
Commission has passed the Data Act, which requires cloud providers to eliminate
switching charges by 2027 and bans “technical, contractual and organisational
obstacles’ to switching to another provider.”
IBM readies commercially valuable quantum computer technology
According to Chong, Loon puts a separate layer on the chip, going
three-dimensional, allowing connections between qubits that aren’t immediate
neighbors. Even separate chips, the ones contained in the boxes at the base of
those giant cryogenic chandelier-shaped refrigerators, can be linked together,
says IBM’s Crowder. In fact, that’s already possible with Nighthawk. “You can
think of it as wires going between the boxes at the bottom,” Crowder says.
“Nighthawk is designed to be able to do that, and it’ll also be used to connect
the fault-tolerant modules in the large-scale fault-tolerant system as well.”
“That is a big announcement for the industry,” says IDC analyst Heather West.
“Now we’re seeing ways to actually begin scaling these systems without squeezing
thousands or hundreds of thousands of qubits on a chip.” It’s a misperception
that quantum computing isn’t beneficial and can’t be used today. Organizations
should already be thinking about how they will use quantum computing, especially
if they expect to be able to get a competitive edge from it, West says. “Waiting
until the technology advances further could be detrimental because the learning
curve that you need to be able to understand quantum and to program quantum
algorithms is quite high,” West says. It’s difficult to develop these skills
internally, and difficult to bring them into an organization. And then there’s
the time it takes to develop use cases and figure out new workflows.
Why modular AI is emerging as the next enterprise architecture standard
LLMs are remarkable, but they are not inherently aligned with enterprise control
frameworks. Without a way to govern the reasoning and retrieval pathways,
organizations place themselves at risk of unpredictable outputs — and
unpredictable headlines. ... The modular approach I explored is built on two
ideas: small language models and retrieval-augmented generation. SLMs focus on
specific domains rather than being trained to handle everything. Because they
are compact and specialized, they can run on more common infrastructure and
offer predictable performance. Instead of forcing one model to understand every
topic in the enterprise, SLMs stay close to the context they are responsible
for. ... Together, SLMs and RAG form a system where intelligence is both
efficient and explainable. The model contributes language understanding, while
retrieval ensures accuracy and alignment with business rules. It’s an approach
that favors control and clarity over brute-force scale — exactly what large
organizations need when AI decisions must be defended, not just delivered. ...
At the heart of this approach is what I call a semantic layer: a coordination
surface where AI agents reason only over the business context and data sources
assigned to them. This layer defines three critical elements: What information
an agent can access; How its decisions are validated; and When it should
escalate or defer to humans. In this design, smaller language models are used
where focus matters more than size.
The long conversations that reveal how scammers work
The slow cadence is what scammers use to build trust. The study shows how
predictable that progression is when viewed at scale. Early messages tend to
focus on small talk, harmless questions, light personal details, and daily
routines. These early exchanges often contain subtle checks to see if the target
is human. Some scammers ask directly. “By the way, there are a lot of fake
people here, are you a real person” is one of the lines captured in the study.
... That distance between the greeting and the attempted cash out is the core
challenge in studying long game fraud. Scammers send photos of meals or walks,
talk about family, and bring up current events to lay the groundwork for later
requests. Scammers often sent images, while audio and video were less common,
but when used, they tended to appear at moments when scammers wanted to
strengthen the sense of presence. The researchers found that 20 percent of
conversations included selfie requests, and more than half of those requests
took place on WhatsApp. ... Long haul scams do not rely on high urgency. They
rely on comfort, familiarity, and patience. This is a different challenge than
technical support scams or prize scams. Defenders need to detect slow moving
risk signals before money leaves accounts. The study also shows the scale
challenge. Manual research that covers weeks of dialog is difficult to sustain.
The researchers address this by blending an LLM with a workflow that pulls in
human reviewers at key points.
No comments:
Post a Comment