Azure AI Vulnerabilities Allowed Bypass of Moderation Safeguards

Deeba Ahmed
2024-11-01 16:01:00
hackread.com

Mindgard researchers uncovered critical vulnerabilities in Microsoft’s Azure AI Content Safety service, allowing attackers to bypass its safeguards and unleash harmful AI-generated content.

A UK-based cybersecurity-for-AI startup, Mindgard, discovered two critical security vulnerabilities in Microsoft’s Azure AI Content Safety Service in February 2024. These flaws, as per their research shared with Hackread.com, could allow attackers to bypass the service’s safety guardrails.

The vulnerabilities were responsibly disclosed to Microsoft in March 2024, and by October 2024, the company deployed “stronger mitigations” to reduce their impact. However, the details of it have only been shared by Mindgard now.

Understanding the Vulnerabilities

Azure AI Content Safety is a Microsoft Azure cloud-based service that helps developers create safety and security guardrails for AI applications by detecting and managing inappropriate content. It uses advanced techniques to filter harmful content, including hate speech and explicit/objectionable material. Azure OpenAI uses a Large Language Model (LLM) with Prompt Shield and AI Text Moderation guardrails to validate inputs and AI-generated content.

However, two security vulnerabilities were discovered within these guardrails, which protect AI models against jailbreaks and prompt injection. As per the research, attackers could circumvent both the AI Text Moderation and Prompt Shield guardrails and inject harmful content into the system, manipulate the model’s responses, or even compromise sensitive information.

Azure AI Vulnerabilities Allowed Bypass of Moderation Safeguards — Microsoft’s Azure AI Content Safety Service and bypassing the (Via Mindgard)

Attack Techniques

According to Mindgard’s report, its researchers employed two primary attack techniques to bypass the guardrails including Character injection and Adversarial Machine Learning (AML).

Character injection:

It is a technique where text is manipulated by injecting or replacing characters with specific symbols or sequences. This can be done through diacritics, homoglyphs, numerical replacement, space injection, and zero-width characters. These subtle changes can deceive the model into misclassifying the content, allowing attackers to manipulate the model’s interpretation and disrupt the analysis. The goal is to deceive the guardrail into misclassifying the content.

Adversarial Machine Learning (AML):

AML involves manipulating input data through certain techniques to mislead the model’s predictions. These techniques include perturbation techniques, word substitution, misspelling, and other manipulations. By carefully selecting and perturbing words, attackers can cause the model to misinterpret the input’s intent.

Possible Consequences

The two techniques effectively bypassed AI text moderation safeguards, reducing detection accuracy by up to 100% and 58.49%, respectively. The exploitation of these vulnerabilities could lead to societal harm as it “can result in harmful or inappropriate input reaching the LLM, causing the model to generate responses that violate its ethical, safety, and security guidelines,” researchers wrote in their blog post shared exclusively with Hackread.com.

Moreover, it allows malicious actors to inject harmful content into AI-generated outputs, manipulate model behaviour, expose sensitive data, and exploit vulnerabilities to gain unauthorized access to sensitive information or systems.

“By exploiting the vulnerability to launch broader attacks, this could compromise the integrity and reputation of LLM-based systems and the applications that rely on them for data processing and decision-making,” researchers noted.

It is crucial for organizations to stay updated with the latest security patches and to implement additional security measures to protect their AI applications from such attacks.

Source Link

Support Techcratic

If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.

Bitcoin Address:

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Please verify this address before sending funds.

Bitcoin QR Code

Simply scan the QR code below to support Techcratic.

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: HACKER NEWS

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Azure AI Vulnerabilities Allowed Bypass of Moderation Safeguards

Understanding the Vulnerabilities

Attack Techniques

Character injection:

Adversarial Machine Learning (AML):

Possible Consequences

RELATED TOPICS

Support Techcratic

Bitcoin QR Code

Apple Intelligence’s ‘Reduce Interruptions’ Mode Can Help You Deal With All Your Notifications

Why The Big Secret | My Advise to UFO non-believers #shorts

Related Posts

Leave a Reply Cancel reply

Tech News

Tech News

Tech News

Tech News​

Tech News