{"id":14566,"date":"2025-08-05T12:19:10","date_gmt":"2025-08-05T12:19:10","guid":{"rendered":"https:\/\/newestek.com\/?p=14566"},"modified":"2025-08-05T12:19:10","modified_gmt":"2025-08-05T12:19:10","slug":"nvidia-patches-critical-triton-server-bugs-that-threaten-ai-model-security","status":"publish","type":"post","link":"https:\/\/newestek.com\/?p=14566","title":{"rendered":"Nvidia patches critical Triton server bugs that threaten AI model security"},"content":{"rendered":"<div>\n<div id=\"remove_no_follow\">\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<section class=\"wp-block-bigbite-multi-title\">\n<div class=\"container\"><\/div>\n<\/section>\n<p>A surprising attack chain in Nvidia\u2019s Triton Inference Server, starting with a seemingly minor memory-name leak, could allow full remote server takeover without user authentication.<\/p>\n<p>Security researchers from Wiz have discovered a chain of critical vulnerabilities in the popular open-source platform for running AI models at scale.<\/p>\n<p>\u201cWhen chained together, these flaws can potentially allow a remote, unauthenticated attacker to gain complete control of the server, achieving remote code execution (RCE),\u201d Wiz researchers Ronen Shustin and Nir Ohfeld said in a blog post. \u201cThis poses a critical risk to organizations using Triton for AI\/ML, as a successful attack could lead to the theft of valuable AI models, exposure of sensitive data, manipulating the AI model\u2019s responses and a foothold for attackers to move deeper into a network.\u201d<\/p>\n<p>The researchers, who discovered a total of three vulnerabilities leading to this attack chain, including information disclosure, lack of input validation, and remote code execution (RCE) flaws, disclosed the findings to Nvidia, and a patch has now been released by the AI giant.<\/p>\n<h2 class=\"wp-block-heading\"><a><\/a>Leaky error to total server control<\/h2>\n<p>Triton is a universal inference server that supports major AI frameworks like PyTorch and TensorFlow through modular backends. Each backend handles models from a specific framework, and Triton routes inference requests accordingly. Inference requests are calls made to a trained AI model to make decisions or predictions on new, real-world data.<\/p>\n<p>The attack chain starts with an error in Triton\u2019s Python backend via a crafted inference request that could leak the full shared-memory key in an error message. That key, meant to stay private, is then abused via Triton\u2019s shared-memory API (intended for performance), giving attackers arbitrary read\/ write access to internal backend memory.<\/p>\n<p>\u201cTriton offers a user-friendly shared memory feature for performance,\u201d researchers <a href=\"https:\/\/www.wiz.io\/blog\/nvidia-triton-cve-2025-23319-vuln-chain-to-ai-server\" target=\"_blank\" rel=\"noreferrer noopener\">said<\/a> about the API. \u201cA client can use this feature to have Triton read input tensors from, and write output tensors to, a pre-existing shared memory region. This process avoids the costly transfer of large amounts of data over the network and is a documented, powerful tool for optimizing inference workloads.\u201d<\/p>\n<p>The vulnerability stems from the API failing to verify whether a shared memory key points to a valid user-owned region or a restricted internal one. Finally, memory corruption or manipulation of inter-process communication (IPC) structures opens the door to full remote code execution.<\/p>\n<h2 class=\"wp-block-heading\" id=\"this-could-matter-to-ai-everywhere\">This could matter to AI everywhere<\/h2>\n<p>Wiz researchers focused their analysis on Triton\u2019s Python backend, citing its popularity and central role in the system. While it handles models written in Python, it also serves as a dependency for several other backends\u2013meaning models configured under different frameworks may still rely on it during parts of the inference process.<\/p>\n<p>If exploited, the vulnerability chain could let an unauthenticated attacker remotely take control of Triton, potentially leading to stolen AI models, leaked sensitive data, tampered model outputs, and lateral movement within the victim\u2019s network.<\/p>\n<p>Nvidia has previously <a href=\"https:\/\/nvidianews.nvidia.com\/news\/nvidia-announces-major-updates-to-triton-inference-server-as-25-000-companies-worldwide-deploy-nvidia-ai-inference\" target=\"_blank\" rel=\"noreferrer noopener\">said<\/a> its AI inference platform is used by more than 25000 customers, including tech heavyweights like Microsoft, Capital One, Samsung Medison, Siemens Energy, and Snap.\u00a0 On Monday, the company published a <a href=\"https:\/\/nvidia.custhelp.com\/app\/answers\/detail\/a_id\/5687\" target=\"_blank\" rel=\"noreferrer noopener\">security advisory<\/a> detailing the flaws with assigned CVEs: CVE-2025-23319, CVE-2025-23320, and CVE-2025-23334, and patches. Users are recommended to upgrade both Nvidia Triton Inference Server and the Python backend to version 25.07 to completely mitigate the issue.<\/p>\n<p>Model-serving infrastructures like Triton are becoming a critical attack surface as AI adoption scales. In October 2023, inference endpoints from major providers like <a href=\"https:\/\/www.lasso.security\/blog\/1500-huggingface-api-tokens-were-exposed-leaving-millions-of-meta-llama-bloom-and-pythia-users-for-supply-chain-attacks\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face<\/a> and <a href=\"https:\/\/www.csoonline.com\/article\/654332\/new-critical-ai-vulnerabilities-in-torchserve-put-thousands-of-ai-models-at-risk.html\">Torch Serve<\/a> faced issues that led to significant exposure risks.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>A surprising attack chain in Nvidia\u2019s Triton Inference Server, starting with a seemingly minor memory-name leak, could allow full remote server takeover without user authentication. Security researchers from Wiz have discovered a chain of critical vulnerabilities in the popular open-source platform for running AI models at scale. \u201cWhen chained together, these flaws can potentially allow a remote, unauthenticated attacker to gain complete control of the&#8230; <\/p>\n<p class=\"more\"><a class=\"more-link\" href=\"https:\/\/newestek.com\/?p=14566\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-14566","post","type-post","status-publish","format-standard","hentry","category-uncategorized","is-cat-link-borders-light is-cat-link-rounded"],"_links":{"self":[{"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/posts\/14566","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14566"}],"version-history":[{"count":0,"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/posts\/14566\/revisions"}],"wp:attachment":[{"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14566"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14566"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14566"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}