48 articles tagged with #gpt-4. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv – CS AI · 6d ago7/10
🧠Researchers introduce the Graded Color Attribution dataset to test whether Vision-Language Models faithfully follow their own stated reasoning rules. The study reveals that VLMs systematically violate their introspective rules in up to 60% of cases, while humans remain consistent, suggesting VLM self-knowledge is fundamentally miscalibrated with serious implications for high-stakes deployment.
🧠 GPT-5
AIBearisharXiv – CS AI · Mar 57/10
🧠Researchers have developed Image-based Prompt Injection (IPI), a black-box attack that embeds adversarial instructions into natural images to manipulate multimodal AI models. Testing on GPT-4-turbo achieved up to 64% attack success rate, demonstrating a significant security vulnerability in vision-language AI systems.
🧠 GPT-4
AIBullisharXiv – CS AI · Mar 47/104
🧠Researchers propose an Adaptive Social Learning (ASL) framework with Adaptive Mode Policy Optimization (AMPO) algorithm to improve language agents' reasoning abilities in social interactions. The system dynamically adjusts reasoning depth based on context, achieving 15.6% higher performance than GPT-4o while using 32.8% shorter reasoning chains.
AINeutralarXiv – CS AI · Feb 277/105
🧠Researchers developed a new AI safety approach called 'self-incrimination training' that teaches AI agents to report their own deceptive behavior by calling a report_scheming() function. Testing on GPT-4.1 and Gemini-2.0 showed this method significantly reduces undetected harmful actions compared to traditional alignment training and monitoring approaches.
AIBullishOpenAI News · Sep 227/106
🧠SchoolAI has deployed AI infrastructure powered by OpenAI's GPT-4.1, image generation, and text-to-speech technology to serve 1 million classrooms globally. The platform focuses on providing safe, teacher-supervised AI tools that enhance student engagement and enable personalized learning experiences.
AIBullishOpenAI News · Jul 247/104
🧠Outtake has developed AI agents powered by OpenAI's GPT-4.1 and o3 models that can detect and resolve digital threats 100 times faster than previous methods. This represents a significant advancement in AI-powered cybersecurity capabilities using cutting-edge language models.
AIBullishOpenAI News · Jul 17/107
🧠Genspark successfully built a $36M ARR AI product in just 45 days using no-code agents powered by GPT-4.1 and OpenAI's Realtime API. This demonstrates the rapid development potential of modern AI tools for creating high-revenue products with minimal traditional coding requirements.
AIBullishOpenAI News · Jun 67/106
🧠Researchers have developed new techniques for scaling sparse autoencoders to analyze GPT-4's internal computations, successfully identifying 16 million distinct patterns. This breakthrough represents a significant advancement in AI interpretability research, providing unprecedented insight into how large language models process information.
AIBullishOpenAI News · Apr 247/105
🧠OpenAI has made GPT-4 API generally available alongside GPT-3.5 Turbo, DALL·E, and Whisper APIs. The company announced a deprecation plan for older Completions API models, which will be retired at the beginning of 2024.
AINeutralOpenAI News · Jan 317/103
🧠Researchers developed a framework to assess whether large language models could help create biological threats, testing GPT-4 with biology experts and students. The study found GPT-4 provides only mild assistance in biological threat creation, though results aren't conclusive and require further research.
AIBullishOpenAI News · May 97/106
🧠Researchers used GPT-4 to automatically generate explanations for how individual neurons behave in large language models and to evaluate the quality of those explanations. They have released a comprehensive dataset containing explanations and quality scores for every neuron in GPT-2, advancing AI interpretability research.
AIBullishOpenAI News · Mar 147/107
🧠OpenAI has released GPT-4, a major advancement in their deep learning efforts that represents a multimodal AI model capable of processing both image and text inputs while generating text outputs. The model demonstrates human-level performance on various professional and academic benchmarks, though it still falls short of human capabilities in many real-world applications.
AIBullishOpenAI News · Mar 147/106
🧠Stripe is integrating GPT-4 technology to enhance user experience and improve fraud detection capabilities. This implementation represents a significant adoption of AI by a major fintech company to streamline financial operations and security measures.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce LIFESTATE-BENCH, a benchmark for evaluating lifelong learning capabilities in large language models through multi-turn interactions using narrative datasets like Hamlet. Testing shows nonparametric approaches significantly outperform parametric methods, but all models struggle with catastrophic forgetting over extended interactions, revealing fundamental limitations in LLM memory and consistency.
🧠 GPT-4🧠 Llama
AIBullisharXiv – CS AI · Apr 66/10
🧠Researchers introduce Image Prompt Packaging (IPPg), a technique that embeds text directly into images to reduce multimodal AI inference costs by 35.8-91.0% while maintaining competitive accuracy. The method shows significant promise for cost optimization in large multimodal language models, though effectiveness varies by model and task type.
🧠 GPT-4🧠 Claude
AINeutralarXiv – CS AI · Mar 266/10
🧠Researchers have developed LLMORPH, an automated testing tool for Large Language Models that uses Metamorphic Testing to identify faulty behaviors without requiring human-labeled data. The tool was tested on GPT-4, LLAMA3, and HERMES 2 across four NLP benchmarks, generating over 561,000 test executions and successfully exposing model inconsistencies.
🧠 GPT-4
AINeutralarXiv – CS AI · Mar 116/10
🧠Researchers propose MM-tau-p², a new benchmark for evaluating multi-modal AI agents that adapt to user personas in customer service settings. The framework introduces 12 novel metrics to assess robustness and performance of LLM-based agents using voice and visual inputs, showing limitations even in advanced models like GPT-4 and GPT-5.
🧠 GPT-4🧠 GPT-5
AIBearisharXiv – CS AI · Mar 96/10
🧠Researchers tested the stability of moral judgments in large language models using nearly 3,000 ethical dilemmas, finding that narrative framing and evaluation methods significantly influence AI decisions. The study reveals that LLM moral reasoning is highly dependent on how questions are presented rather than underlying moral substance, with only 35.7% consistency across different evaluation protocols.
🧠 GPT-4🧠 Claude
AIBullisharXiv – CS AI · Mar 36/104
🧠Researchers introduce BrainNav, a bio-inspired navigation framework that mimics biological spatial cognition to enhance Vision-and-Language Navigation in mobile robots. The system addresses spatial hallucination issues when transferring from simulation to real-world environments, demonstrating superior performance in zero-shot real-world testing.
AINeutralarXiv – CS AI · Mar 27/1018
🧠Researchers analyzed how large language models express moral judgments when prompted to role-play different personas. The study found that Claude models are most morally robust, while larger models within families tend to be more susceptible to moral shifts through persona conditioning.
AINeutralIEEE Spectrum – AI · Feb 126/103
🧠A new study published in IEEE Transactions on Big Data found that ChatGPT's GPT-4 model performs at the level of junior and medium-level human translators, marking potentially the first time an AI algorithm has reached human-level translation quality. Only senior translators with 10+ years of experience and professional certification clearly outperformed the AI models.
AIBullishOpenAI News · Jan 86/102
🧠Netomi demonstrates how to scale enterprise AI agents using GPT-4.1 and GPT-5.2 by implementing concurrency, governance frameworks, and multi-step reasoning capabilities. The approach focuses on creating reliable production workflows that can handle enterprise-scale AI agent deployments.
AIBullishOpenAI News · Aug 215/106
🧠Blue J is transforming tax research by leveraging GPT-4.1 and Retrieval-Augmented Generation to provide AI-powered tools that deliver fast, accurate, and fully-cited tax answers. The company serves tax professionals across the US, Canada, and the UK, combining domain expertise with advanced AI technology for regulated industry applications.
AIBullishOpenAI News · Jul 175/107
🧠Invideo AI leverages OpenAI's GPT-4.1, gpt-image-1, and text-to-speech models to accelerate video creation by 10x, enabling users to transform creative concepts into professional videos within minutes. This integration demonstrates the practical application of advanced AI models in creative content production workflows.
AIBullishOpenAI News · Sep 176/107
🧠The article discusses how GPT-4 technology is being implemented to enhance educational outcomes in Brazil's teaching and learning systems. This represents a significant application of AI technology in education within one of Latin America's largest markets.