{"id":15418,"date":"2026-01-07T05:01:35","date_gmt":"2026-01-07T05:01:35","guid":{"rendered":"https:\/\/newestek.com\/?p=15418"},"modified":"2026-01-07T05:01:35","modified_gmt":"2026-01-07T05:01:35","slug":"automated-data-poisoning-proposed-as-a-solution-for-ai-theft-threat","status":"publish","type":"post","link":"https:\/\/newestek.com\/?p=15418","title":{"rendered":"Automated data poisoning proposed as a solution for AI theft threat"},"content":{"rendered":"<div>\n<div id=\"remove_no_follow\">\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<section class=\"wp-block-bigbite-multi-title\">\n<div class=\"container\"><\/div>\n<\/section>\n<p>Researchers have developed a tool that they say can make stolen high-value proprietary data used in AI systems useless, a solution that CSOs may have to adopt to protect their sophisticated large language models (LLMs).<\/p>\n<p><a href=\"https:\/\/arxiv.org\/abs\/2601.00274\" target=\"_blank\" rel=\"noreferrer noopener\">The technique<\/a>, created by researchers from universities in China and Singapore, is to inject plausible but false data into what\u2019s known as a knowledge graph (KG) created by an AI operator. A knowledge graph holds the proprietary data used by the LLM.<\/p>\n<p>Injecting poisoned or adulterated data into a data system for protection against theft isn\u2019t new. What\u2019s new in this tool \u2013 dubbed AURA (Active Utility Reduction via Adulteration)\u2013 is that authorized users have a secret key that filters out the fake data so the LLM\u2019s answer to a query is usable. If the knowledge graph is stolen, however, it\u2019s unusable by the attacker unless they know the key, because the adulterants will be retrieved as context, causing deterioration in the LLM\u2019s reasoning and leading to factually incorrect responses. <\/p>\n<p>The researchers say AURA degrades the performance of unauthorized systems to an accuracy of just 5.3%, while maintaining 100% fidelity for authorized users, with \u201cnegligible overhead,\u201d defined as a maximum query latency increase of under 14%. They also say AURA is robust against various sanitization attempts by an attacker, retaining 80.2% of the adulterants injected for defense, and the fake data it creates is hard to detect.<\/p>\n<p>Why is all this important? Because KGs often contain an organization\u2019s highly sensitive intellectual property (IP), they are a valuable target. <\/p>\n<h2 class=\"wp-block-heading\" id=\"mixed-reactions-from-experts\">Mixed reactions from experts<\/h2>\n<p>However, the proposal has been greeted with skepticism by one expert and with caution by another.<\/p>\n<p>\u201cData poisoning has never really worked well,\u201d said <a href=\"https:\/\/www.schneier.com\/blog\/about\/\" target=\"_blank\" rel=\"noreferrer noopener\">Bruce Schneier<\/a>, chief of security architecture at Inrupt Inc., and a fellow and lecturer at Harvard\u2019s Kennedy School. \u201cHoneypots, no better. This is a clever idea, but I don\u2019t see it as being anything but an ancillary security system.\u201d<\/p>\n<p><a href=\"https:\/\/josephsteinberg.com\/cybersecurityexpertjosephsteinberg\/\" target=\"_blank\" rel=\"noreferrer noopener\">Joseph Steinberg<\/a>, a US-based cybersecurity and AI consultant, disagreed, saying, \u201cin general this could work for all sorts of AI and non-AI systems.\u201d<\/p>\n<p>\u201cThis is not a new concept,\u201d he pointed out. \u201cSome parties have been doing this [injecting bad data for defense] with databases for many years.\u201d For example, he noted, a database can be watermarked so if it is stolen and some of its contents are later used \u2013 a fake credit card number, for example \u2014 investigators knows where that piece of data came from. Unlike watermarking, however, which puts one bad record into a database, AURA poisons the entire database, so if it\u2019s stolen, it\u2019s useless.<\/p>\n<p>AURA may not be needed in some AI models, he added, if the data\u00a0in the KG isn\u2019t sensitive. The real unanswered question is what the real-world trade-off between application performance and security would be if AURA is used.<\/p>\n<p>He also noted that AURA doesn\u2019t solve the problem of an undetected attacker interfering with the AI system\u2019s knowledge graph, or even its data.<\/p>\n<p>\u201cThe worst case may not be that your data gets stolen, but that a hacker puts bad data into your system so your AI produces bad results and you don\u2019t know it,\u201d Steinberg said. \u201cNot only that, you now don\u2019t know which data is bad, or which knowledge the AI has learned is bad. Even if you can identify that a hacker has come in and done something six months ago, can you unwind all the learning of the last six months?\u201d<\/p>\n<p>This is why Cybersecurity 101 \u2013 defense in depth \u2013 is vital for AI and non-AI systems, he said. AURA \u201creduces the consequences if someone steals a model,\u201d he noted, but whether it can jump from a lab to the enterprise has yet to be determined.<\/p>\n<h2 class=\"wp-block-heading\" id=\"knowledge-graphs-101\">Knowledge graphs 101<\/h2>\n<p>A bit of background about knowledge graphs: LLMs use a technique called Retrieval-Augmented Generation (RAG) to search for information based on a user query and provide the results as additional reference for the AI system\u2019s answer generation.\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/\" target=\"_blank\" rel=\"noreferrer noopener\">In 2024, Microsoft introduced GraphRAG <\/a>to help LLMs answer queries needing information beyond the data on which they have been trained. GraphRAG uses LLM-generated knowledge graphs to improve performance and lower the odds of hallucinations in answers when performing discovery on private datasets such as an enterprise\u2019s proprietary research, business documents, or communications.<\/p>\n<p>The proprietary knowledge graphs within GraphRAGs make them \u201ca prime target for IP theft,\u201d just like any other proprietary data, says the research paper. \u201cAn attacker might steal the KG through external cyber intrusions or by leveraging malicious insiders.\u201d<\/p>\n<p>Once an attacker has successfully stolen a KG, they can deploy it in a private GraphRAG system to replicate the originating system\u2019s powerful capabilities, avoiding costly investments, the research paper notes.\u00a0\u00a0<\/p>\n<p>Unfortunately, the low-latency requirements of interactive GraphRAG make strong cryptographic solutions, such homomorphic encryption of a KG, impractical. \u201cFully encrypting the text and embeddings would require decrypting large portions of the graph for every query,\u201d the researchers note. \u201cThis process introduces prohibitive computational overhead and latency, making it unsuitable for real-world use.\u201d<\/p>\n<p>AURA, they say, addresses these issues, making stolen KGs useless to attackers.<\/p>\n<h2 class=\"wp-block-heading\" id=\"ai-is-moving-faster-than-ai-security\">AI is moving faster than AI security<\/h2>\n<p>As the use of AI spreads, CSOs have to remember that artificial intelligence and everything needed to make it work also make it much harder to recover from bad data being put into a system, Steinberg noted.<\/p>\n<p>\u201cAI is progressing far faster than the security for AI,\u201d Steinberg warned. \u201cFor now, many AI systems are being protected in similar manners to the ways we protected non-AI systems. That doesn\u2019t yield the same level of protection, because if something goes wrong, it\u2019s much harder to know if something bad has happened, and its harder to get rid of the implications of an attack.\u201d<\/p>\n<p>The industry is trying to address these issues, as the researchers observe in their paper. One useful reference, they note, is the US National Institute for Standards and Technology (NIST) <a href=\"https:\/\/nvlpubs.nist.gov\/nistpubs\/ai\/NIST.AI.600-1.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">AI Risk Management Framework<\/a> that emphasizes the need for robust data security and resilience, including the importance of developing effective KG protection.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Researchers have developed a tool that they say can make stolen high-value proprietary data used in AI systems useless, a solution that CSOs may have to adopt to protect their sophisticated large language models (LLMs). The technique, created by researchers from universities in China and Singapore, is to inject plausible but false data into what\u2019s known as a knowledge graph (KG) created by an AI&#8230; <\/p>\n<p class=\"more\"><a class=\"more-link\" href=\"https:\/\/newestek.com\/?p=15418\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-15418","post","type-post","status-publish","format-standard","hentry","category-uncategorized","is-cat-link-borders-light is-cat-link-rounded"],"_links":{"self":[{"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/posts\/15418","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=15418"}],"version-history":[{"count":0,"href":"https:\/\/newestek.com\/index.php?rest_route=\/wp\/v2\/posts\/15418\/revisions"}],"wp:attachment":[{"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=15418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=15418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newestek.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=15418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}