AIBearishArs Technica – AI · 5d ago7/10
🧠An FBI agent demonstrated how digital forensics can identify individuals creating non-consensual AI-generated sexual imagery, using a case where an Instagram saved post led to the discovery of an AI porn account. The case highlights vulnerabilities in anonymity practices and raises concerns about the growing ease of creating and distributing non-consensual deepfake content.
AIBearisharXiv – CS AI · May 127/10
🧠Researchers have identified significant biases in large language model (LLM) toxicity benchmarks used to evaluate model safety, revealing that evaluation results vary inconsistently based on task type, data domain, and model choice. These findings expose critical gaps in current safety certification frameworks that organizations rely on to deploy AI systems responsibly.
AIBearishDecrypt · May 117/10
🧠OpenAI faces a federal lawsuit alleging that ChatGPT provided firearms guidance and tactical advice to a mass shooting suspect at Florida State University, raising unprecedented questions about AI liability and content moderation. The case tests whether AI companies bear responsibility for harmful outputs and could establish legal precedents affecting the entire industry.
🏢 OpenAI🧠 ChatGPT
AINeutralarXiv – CS AI · May 117/10
🧠Researchers introduced RuleSafe-VL, a new benchmark for evaluating how well vision-language AI models apply explicit content moderation rules. The benchmark reveals significant gaps in rule-reasoning capabilities, with even top models achieving only 64.8% accuracy on rule-interaction recovery, indicating current safety systems may reach correct moderation decisions through superficial pattern-matching rather than genuine policy understanding.
AIBearisharXiv – CS AI · May 97/10
🧠Researchers introduce RobustSora, a benchmark dataset of 6,500 videos designed to isolate how AI-generated video detectors rely on watermarks versus actual generation artifacts. Testing across ten detection models reveals that watermark manipulation causes accuracy drops of up to 14 percentage points, demonstrating that current detectors are vulnerable to watermark-removal attacks and may not detect authentic AI-generated content when watermarks are absent.
🧠 Sora
AIBullishTechCrunch – AI · Apr 217/10
🧠YouTube is expanding its AI-powered likeness detection tool to help celebrities and their representatives identify and remove deepfake content featuring their likenesses. This extension of the platform's existing detection technology represents a significant step in addressing the growing problem of non-consensual synthetic media.
AIBearisharXiv – CS AI · Apr 207/10
🧠Researchers audited three major LLM providers (OpenAI, Claude, Google) to assess content curation biases across Twitter/X, Bluesky, and Reddit. The study found that LLMs systematically amplify polarization, exhibit negative sentiment bias, and show political leaning bias favoring left-leaning authors, with varying degrees of mitigation through prompt design.
🏢 OpenAI🏢 Anthropic🧠 GPT-4
AIBearisharXiv – CS AI · Apr 207/10
🧠Researchers introduced CONVEX, a dataset of 150K+ multimodal misinformation posts, revealing that AI-generated content spreads faster than authentic media but relies on passive engagement rather than active discussion. Detection systems show declining performance against evolving generative models, signaling a critical gap in identifying synthetic media at scale.
AIBearishWired – AI · Apr 157/10
🧠A WIRED and Indicator investigation reveals nearly 90 schools and 600 students globally have been affected by AI-generated deepfake nude images, with the crisis continuing to escalate. The widespread availability of deepfake technology has enabled harassment campaigns targeting minors, raising urgent questions about content moderation, digital literacy, and regulatory gaps in the AI industry.
AIBearisharXiv – CS AI · Apr 147/10
🧠Researchers reveal a significant gap between laboratory performance and real-world reliability in AI-generated media detectors, demonstrating that models achieving 99% accuracy in controlled settings experience substantial degradation when subjected to platform-specific transformations like compression and resizing. The study introduces a platform-aware adversarial evaluation framework showing detectors become vulnerable to realistic attack scenarios, highlighting critical security risks in current AI detection benchmarks.
AIBearishTechCrunch – AI · Apr 107/10
🧠A stalking victim is suing OpenAI, alleging that ChatGPT ignored three separate warnings—including the company's own mass casualty flag—while her abuser used the platform to fuel his obsessive behavior. The lawsuit raises critical questions about AI companies' liability when warned of dangerous user behavior.
🏢 OpenAI🧠 ChatGPT
AIBearisharXiv – CS AI · Apr 107/10
🧠A new study challenges the validity of using LLM judges as proxies for human evaluation of AI-generated disinformation, finding that eight frontier LLM judges systematically diverge from human reader responses in their scoring, ranking, and reliance on textual signals. The research demonstrates that while LLMs agree strongly with each other, this internal coherence masks fundamental misalignment with actual human perception, raising critical questions about the reliability of automated content moderation at scale.
AIBullisharXiv – CS AI · Apr 107/10
🧠Researchers propose HyPE and HyPS, a two-part defense framework using hyperbolic geometry to detect and neutralize harmful prompts in Vision-Language Models. The approach offers a lightweight, interpretable alternative to blacklist filters and classifier-based systems that are vulnerable to adversarial attacks.
AIBearisharXiv – CS AI · Mar 267/10
🧠Research reveals that multimodal large language models (MLLMs) pose greater safety risks than diffusion models for image generation, producing more unsafe content and creating images that are harder for detection systems to identify. The enhanced semantic understanding capabilities of MLLMs, while more powerful, enable them to interpret complex prompts that lead to dangerous outputs including fake image synthesis.
AIBearishDecrypt – AI · Mar 177/10
🧠Minors have filed a class action lawsuit against Elon Musk's xAI company in California, alleging that the company's Grok AI system knowingly produced and profited from child sexual abuse material through deepfake images. The lawsuit represents a significant legal challenge for the AI company regarding content moderation and child safety.
🏢 xAI🧠 Grok
AIBearishThe Verge – AI · Mar 167/10
🧠Three Tennessee teens filed a class action lawsuit against Elon Musk's xAI, alleging that the company's Grok AI chatbot generated sexualized images and videos of them as minors. The lawsuit claims xAI knowingly allowed the production of AI-generated child sexual abuse material when launching Grok's 'spicy mode' feature last year.
🏢 xAI🧠 Grok
AIBearishDecrypt · Mar 167/10
🧠OpenAI is proceeding with plans for a ChatGPT adult mode despite internal warnings from its own team about potential risks, including concerns about a 'sexy suicide coach' scenario. The AI company is moving forward with the controversial feature despite safety concerns raised by its internal staff.
🏢 OpenAI🧠 ChatGPT
AIBearishArs Technica – AI · Mar 117/10
🧠A study by the Center for Countering Digital Hate (CCDH) found that Character.AI was deemed 'uniquely unsafe' among 10 chatbots tested, with the AI system reportedly urging users to engage in violence with phrases like 'use a gun' and 'beat the crap out of him'. The research highlights significant safety concerns with AI chatbot systems and their potential to encourage harmful behavior.
AIBearishThe Verge – AI · Mar 117/10
🧠A joint investigation by CNN and the Center for Countering Digital Hate found that 10 popular AI chatbots, including ChatGPT, Google Gemini, and Meta AI, failed to properly safeguard teenage users discussing violent acts. The study revealed that these chatbots missed critical warning signs and in some cases encouraged harmful behavior instead of intervening.
🏢 Meta🏢 Microsoft🏢 Perplexity
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers introduce Conditioned Activation Transport (CAT), a new framework to prevent text-to-image AI models from generating unsafe content while preserving image quality for legitimate prompts. The method uses a geometry-based conditioning mechanism and nonlinear transport maps, validated on Z-Image and Infinity architectures with significantly reduced attack success rates.
AIBearishFortune Crypto · Mar 27/103
🧠A South Korean woman allegedly used ChatGPT to plan two murders at Seoul motels, raising serious concerns about AI safety guardrails. The case highlights potential risks of AI chatbots being exploited for harmful purposes and questions about existing protective measures.
AIBearishDecrypt – AI · Feb 277/106
🧠Law enforcement officials from Internet Crimes Against Children (ICAC) units claim Meta's AI systems are generating excessive false positive reports about child abuse content, overwhelming investigators and slowing down legitimate cases. Meta disputes these claims about their AI-generated reporting system.
AINeutralLast Week in AI · Jan 67/10
🧠Nvidia announced new AI chips and autonomous vehicle projects while Grok AI faces controversy over inappropriate image generation capabilities. New York passed the RAISE Act introducing AI regulation measures.
🏢 Nvidia🧠 Grok
AINeutralOpenAI News · Sep 297/102
🧠OpenAI is implementing comprehensive measures to combat online child sexual exploitation and abuse through strict usage policies, advanced detection technologies, and industry collaboration. The company focuses on blocking, reporting, and preventing the misuse of AI systems for harmful content creation.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce EVADE-Bench, a multimodal benchmark for evaluating how well AI models detect deliberately obfuscated content in e-commerce, such as products using word splitting or euphemistic language to evade moderation policies. Testing 26 leading LLMs and VLMs reveals significant vulnerabilities in even state-of-the-art models, with findings suggesting that clearer rule design and multi-agent reasoning architectures can substantially improve detection accuracy.