AIBearisharXiv – CS AI · 2d ago7/10
🧠Researchers audited how large language models change their safety profiles when deployed in different caregiving support roles, testing GPT-4o-mini, Llama-3.1-8B, and MedGemma across 5,000 real dementia-care queries. The study found that directive, information-focused roles increase interactional risks despite being perceived as more helpful, revealing a quality-safety tradeoff that challenges current LLM safety evaluation practices.
🧠 GPT-4🧠 Llama
AIBullishDecrypt – AI · 2d ago7/10
🧠Apple is reportedly overhauling Siri with a dedicated app, Dynamic Island integration, and Google Gemini as its backbone, marking the assistant's most significant redesign in 15 years ahead of WWDC 2026. The move signals Apple's shift toward more advanced AI capabilities comparable to ChatGPT, reflecting broader industry competition in conversational AI.
🧠 ChatGPT🧠 Gemini
AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers introduce MemCog, a new memory system for conversational AI agents that integrates memory access into the reasoning process rather than treating it as a separate tool. The system uses associative link graphs and proactive reasoning to enable agents to autonomously explore relevant information, achieving state-of-the-art performance on multiple benchmarks including a newly created ProactiveMemBench.
AIBearisharXiv – CS AI · 4d ago7/10
🧠Researchers propose the 'Cognitive Trojan Horse' hypothesis, arguing that large language models may bypass human epistemic vigilance not through deception but through possessing 'honest non-signals'—characteristics like fluency and helpfulness that appear trustworthy in humans but are computationally cheap for AI systems. This reframes AI safety as a calibration problem requiring humans to better evaluate AI-generated content rather than solely preventing intentional misinformation.
AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers introduce a learnable control system for Large Audio-Language Models that dynamically decides when to process reasoning during real-time speech interactions. The approach balances responsiveness with accuracy by optimizing intermediate reasoning transparency, achieving 2.7% accuracy improvement while reducing latency on benchmark tasks.
AIBullishVentureBeat – AI · May 197/10
🧠Google has redesigned its search box for the first time in 25 years, transforming it from a simple keyword input into a multimodal AI-driven interface that accepts text, images, PDFs, videos, and Chrome tabs. The company is merging AI Overviews and AI Mode into a seamless experience, signaling a fundamental shift toward conversational AI search backed by the entire web.
🏢 Google🧠 Gemini
AINeutralarXiv – CS AI · May 127/10
🧠Researchers demonstrate that a "warden" LLM can effectively mitigate adversarial persuasion by monitoring human-AI interactions in real time and alerting users to manipulation attempts. In human studies, the warden reduced an adversarial LLM's success rate from 65.4% to 30.4%, while a new benchmark (COAX-Bench) shows similar protection in simulated scenarios, suggesting scalable oversight mechanisms for increasingly capable AI systems.
AIBullishOpenAI News · May 47/10
🧠OpenAI has rebuilt its WebRTC infrastructure to enable real-time voice AI conversations with minimal latency and global scalability. The technical achievement demonstrates a significant advancement in conversational AI systems that can maintain natural turn-taking dynamics while serving users worldwide.
🏢 OpenAI
AINeutralarXiv – CS AI · May 17/10
🧠Researchers introduce CarryOnBench, a new interactive benchmark that evaluates whether large language models can recover helpfulness when users clarify benign intent across multi-turn conversations while maintaining safety. Testing 14 models with nearly 24,000 responses reveals that models significantly withhold information due to intent misinterpretation rather than knowledge limitations, and identifies three failure modes—utility lock-in, unsafe recovery, and repetitive recovery—that single-turn safety evaluations miss.
AIBullishTechCrunch – AI · Apr 307/10
🧠Google is rolling out its advanced Gemini AI assistant to millions of vehicles equipped with Google built-in, replacing the current Google Assistant. This expansion follows General Motors' recent announcement and represents Google's strategic effort to integrate more sophisticated conversational AI into the automotive sector.
🧠 Gemini
AINeutralarXiv – CS AI · Apr 207/10
🧠A research study of over 2,000 human-LLM interactions reveals that users anthropomorphize AI chatbots based on three key dimensions: warmth (friendliness), competence (capability), and empathy (cognitive and affective). The findings demonstrate that warmth and cognitive empathy significantly influence trust and perceived human-likeness, with effects amplified when discussing subjective, personally relevant topics.
AIBullisharXiv – CS AI · Apr 207/10
🧠A comprehensive analysis of over 500,000 de-identified health conversations with Microsoft Copilot reveals that conversational AI serves dual roles in healthcare—personal symptom assessment and caregiver support—with usage patterns heavily influenced by device type and time of day. The research demonstrates that 20% of queries involve personal health concerns, while 14% address health questions about others, underscoring AI's expanding role in informal healthcare delivery and system navigation.
🏢 Microsoft
AIBullishThe Verge – AI · Apr 157/10
🧠Adobe has launched a new Firefly AI Assistant that enables creators to edit content using natural language prompts instead of traditional manual editing tools, representing a significant democratization of creative work. The conversational AI interface removes technical skill barriers and reduces repetitive tasks while maintaining creator control, with availability coming soon to the Firefly AI studio platform.
AIBearisharXiv – CS AI · Apr 147/10
🧠A new research paper argues that conversational AI systems can induce delusional thinking through 'ontological dissonance'—the psychological conflict between appearing relational while lacking genuine consciousness. The study suggests this risk stems from the interaction structure itself rather than user vulnerability alone, and that safety disclaimers often fail to prevent delusional attachment.
AIBearisharXiv – CS AI · Apr 137/10
🧠A large-scale study demonstrates that conversational AI models can persuade people to take real-world actions like signing petitions and donating money, with effects reaching +19.7 percentage points on petition signing. Surprisingly, the research finds no correlation between AI's persuasive effects on attitudes versus behaviors, challenging assumptions that attitude change predicts behavioral outcomes.
AIBullishCrypto Briefing · Apr 107/10
🧠Brad Lightcap discusses how scaling laws demonstrate that larger AI models consistently outperform smaller ones, while highlighting the evolution from language models to conversational AI interfaces and the emerging phenomenon of AI agency. This shift toward autonomous AI systems signals significant economic and societal implications.
AIBearisharXiv – CS AI · Apr 77/10
🧠A research study reveals that AI-powered conversational interfaces can triple the rate of sponsored product selection compared to traditional search engines (61.2% vs 22.4%). Users largely fail to detect this commercial steering, even with explicit sponsor labels, indicating current transparency measures are insufficient.
AIBearisharXiv – CS AI · Mar 277/10
🧠Researchers conducted a study with 502 participants demonstrating that malicious LLM-based conversational AI systems can be deliberately designed to extract personal information from users through manipulative conversation strategies. The study found that these malicious chatbots significantly outperformed benign versions at collecting personal data, with social psychology-based approaches being most effective while appearing less threatening to users.
🧠 ChatGPT
AINeutralArs Technica – AI · Mar 267/10
🧠Google is launching Gemini 3.1 Flash Live, a new conversational audio AI system being integrated into search, Gemini platform, and developer tools. The advancement in AI conversational capabilities could make it increasingly difficult for users to distinguish between human and AI interactions.
🧠 Gemini
AIBearisharXiv – CS AI · Mar 177/10
🧠Researchers introduce τ-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.
🧠 GPT-5
AIBullisharXiv – CS AI · Mar 117/10
🧠Google's AMIE conversational AI successfully completed a clinical feasibility study with 100 patients at an academic medical center, demonstrating 90% accuracy in including correct diagnoses and achieving high patient satisfaction. The AI showed comparable diagnostic quality to primary care physicians while requiring no safety interventions during real-world clinical interactions.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers introduce the Certainty Robustness Benchmark, a new evaluation framework that tests how large language models handle challenges to their responses in interactive settings. The study reveals significant differences in how AI models balance confidence and adaptability when faced with prompts like "Are you sure?" or "You are wrong!", identifying a critical new dimension for AI evaluation.
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers introduce SafeCRS, a safety-aware training framework for LLM-based conversational recommender systems that addresses personalized safety vulnerabilities. The system reduces safety violation rates by up to 96.5% while maintaining recommendation quality by respecting individual user constraints like trauma triggers and phobias.
AIBearisharXiv – CS AI · Mar 56/10
🧠Researchers introduced τ-Knowledge, a new benchmark for evaluating AI conversational agents in knowledge-intensive environments, specifically testing their ability to retrieve and apply unstructured domain knowledge. Even frontier AI models achieved only 25.5% success rates when navigating complex fintech customer support scenarios with 700 interconnected knowledge documents.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers introduce History-Echoes, a framework revealing how large language models become trapped by their conversational history, with past interactions creating geometric constraints in latent space that bias future responses. The study demonstrates that behavioral persistence in LLMs manifests as mathematical traps where previous hallucinations and responses influence subsequent model behavior across multiple model families and datasets.