OUTPUT.DD

How safe is AI? Toward Trustworthy AI

Raum 1020

Typ Demo

Studiengang / Lehrstuhl / Firma
Chair of Scalable Software Architectures for Data Analytics / ScaDS.AI Dresden/Leipzig

Präsentator Himanshu Beniwal

Website https://scads.ai

As AI systems grow more pervasive, questions of safety and trust demand urgent, practical answers. This session presents five research efforts spanning the AI safety landscape: a classifier that identifies AI-generated content across model families (IITGnGPT); toxicity detectors for low-resource Indian languages covering 17 fine-grained harm categories across 12 languages (UnityAI-Guard 1.0 & 2.0); a demonstration of backdoor attacks in YOLO, text classifiers, generative models, and translation systems — revealing a systemic adversarial threat surface; and SangrahaTox, a benchmark dataset for auditing multimodal models for stereotypes, bias, and toxicity. Together, these projects chart a path from detecting what AI produces, to exposing how it can be manipulated, to measuring the biases it silently encodes — offering both diagnostic tools and a broader framework for building trustworthy AI.

Projektdetails

How safe is AI? Toward Trustworthy AI

Raum 1020