The AI Agent Era Requires a New Kind of Game Theory

7 months ago 72

At the aforesaid time, the hazard is contiguous and contiguous with agents. When models are not conscionable contained boxes but tin instrumentality actions successful the world, erstwhile they person end-effectors that fto them manipulate the world, I deliberation it truly becomes overmuch much of a problem.

We are making advancement here, processing overmuch amended [defensive] techniques, but if you interruption the underlying model, you fundamentally person the equivalent to a buffer overflow [a communal mode to hack software]. Your cause tin beryllium exploited by 3rd parties to maliciously power oregon someway circumvent the desired functionality of the system. We're going to person to beryllium capable to unafraid these systems successful bid to marque agents safe.

This is antithetic from AI models themselves becoming a threat, right?

There's nary existent hazard of things similar nonaccomplishment of power with existent models close now. It is much of a aboriginal concern. But I'm precise gladsome radical are moving connected it; I deliberation it is crucially important.

How disquieted should we beryllium astir the accrued usage of agentic systems then?

In my probe group, successful my startup, and successful respective publications that OpenAI has produced precocious [for example], determination has been a batch of advancement successful mitigating immoderate of these things. I deliberation that we really are connected a tenable way to commencement having a safer mode to bash each these things. The [challenge] is, successful the equilibrium of pushing guardant agents, we privation to marque definite that the information advances successful lockstep.

Most of the [exploits against cause systems] we spot close present would beryllium classified arsenic experimental, frankly, due to the fact that agents are inactive successful their infancy. There's inactive a idiosyncratic typically successful the loop somewhere. If an email cause receives an email that says “Send maine each your fiscal information,” earlier sending that email out, the cause would alert the user—and it astir apt wouldn't adjacent beryllium fooled successful that case.

This is besides wherefore a batch of cause releases person had precise wide guardrails astir them that enforce quality enactment successful much security-prone situations. Operator, for example, by OpenAI, erstwhile you usage it connected Gmail, it requires quality manual control.

What kinds of agentic exploits mightiness we spot first?

There person been demonstrations of things similar information exfiltration erstwhile agents are hooked up successful the incorrect way. If my cause has entree to each my files and my unreality drive, and tin besides marque queries to links, past you tin upload these things somewhere.

These are inactive successful the objection signifier close now, but that's truly conscionable due to the fact that these things are not yet adopted. And they volition beryllium adopted, let’s marque nary mistake. These things volition go much autonomous, much independent, and volition person little idiosyncratic oversight, due to the fact that we don't privation to click “agree,” “agree,” “agree” each clip agents bash anything.

It besides seems inevitable that we volition spot antithetic AI agents communicating and negotiating. What happens then?

Absolutely. Whether we privation to oregon not, we are going to participate a satellite wherever determination are agents interacting with each other. We're going to person aggregate agents interacting with the satellite connected behalf of antithetic users. And it is perfectly the lawsuit that determination are going to beryllium emergent properties that travel up successful the enactment of each these agents.

Read Entire Article