AIBullisharXiv – CS AI · May 97/10
🧠Researchers have introduced the AI co-mathematician, an interactive workbench that leverages agentic AI to assist mathematicians in solving open-ended research problems. The system achieves state-of-the-art results on hard benchmarks, scoring 48% on FrontierMath Tier 4, and demonstrates practical value by helping researchers solve open problems and identify new research directions.
AIBullisharXiv – CS AI · Apr 137/10
🧠Researchers introduce Q+, a structured reasoning toolkit that enhances AI research agents by making web search more deliberate and organized. Integrated into Eigent's browser agent, Q+ demonstrates consistent benchmark improvements of 0.6 to 3.8 percentage points across multiple deep-research tasks, suggesting meaningful progress in autonomous AI agent reliability.
🏢 Anthropic🧠 GPT-4🧠 GPT-5
AIBullisharXiv – CS AI · Mar 46/102
🧠Researchers have developed APRES, an AI-powered system that uses Large Language Models to automatically revise scientific papers based on evaluation rubrics that predict citation counts. The system improves citation prediction accuracy by 19.6% and produces paper revisions that human experts prefer 79% of the time over original versions.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers have released LLMServingSim 2.0, a unified simulator that models the complex interactions between heterogeneous hardware and disaggregated software in large language model serving infrastructures. The simulator achieves 0.97% average error compared to real deployments while maintaining 10-minute simulation times for complex configurations.
$NEAR
AIBullishGoogle Research Blog · Sep 117/107
🧠The article introduces NucleoBench and AdaBeam, new tools for advancing nucleic acid design in biotechnology. These AI-powered platforms aim to improve the precision and efficiency of genetic engineering and therapeutic applications.
AIBullishOpenAI News · Jul 177/105
🧠OpenAI introduces a new ChatGPT agent that can think and act autonomously using various tools to complete complex tasks such as research, booking services, and creating presentations. This advancement represents a significant step toward more capable AI agents that can handle multi-step workflows with user guidance.
AIBullishOpenAI News · Jul 287/106
🧠OpenAI has released Triton 1.0, an open-source Python-like programming language that allows researchers without CUDA expertise to write highly efficient GPU code for neural networks. The tool aims to democratize GPU programming by making it accessible to those without specialized hardware programming knowledge while maintaining performance comparable to expert-level code.
AINeutralarXiv – CS AI · 3d ago5/10
🧠Researchers present Eliot, an interactive system for exploring evolving scientific literature trends across rapidly changing fields like Large Language Models and Automated Planning. The tool retrieves arXiv papers at query time, clusters them into thematic groups, and visualizes publication patterns over time, with evaluations showing 85% accuracy in meaningful cluster labeling across eight research domains.
AINeutralarXiv – CS AI · 3d ago6/10
🧠MetaboT is an open-source LLM-based framework that translates natural-language questions into SPARQL queries for metabolomics knowledge graphs, significantly lowering technical barriers for researchers without programming expertise. The multi-agent architecture addresses hallucination and schema-compliance issues through specialized agents for validation, entity resolution, and query refinement, validated on the Experimental Natural Products Knowledge Graph.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce TABX, a high-throughput multi-agent reinforcement learning simulator built on JAX that enables GPU-accelerated testing of cooperative AI algorithms. The framework prioritizes modularity and customization, allowing systematic investigation of emergent agent behaviors across varying task complexities with significantly reduced computational overhead.
AIBullishGoogle Research Blog · May 196/10
🧠Empirical Research Assistance (ERA) represents a significant advancement in AI-assisted scientific research, transitioning from academic publication to practical computational discovery tools. The development demonstrates how machine learning can accelerate the research process across scientific disciplines, with implications for both the academic and technology sectors.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce Re²Math, a new benchmark for evaluating large language models' ability to retrieve relevant mathematical theorems and lemmas from academic literature during proof construction. The benchmark reveals significant gaps in current AI systems, with the best model achieving only 7.0% accuracy despite retrieving valid statements, indicating AI struggles to verify applicability to specific proof contexts.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers demonstrate how large language models like ChatGPT can automate laboratory instrument control, reducing programming barriers for scientists. The study shows LLMs can create custom scripts and operate as autonomous AI agents for lab equipment management.
🧠 ChatGPT
AIBullishThe Verge – AI · Mar 46/101
🧠Google's NotebookLM now generates fully animated 'cinematic' video overviews from user research and notes, upgrading from basic narrated slideshows. The feature uses multiple AI models including Gemini 3, Nano Banana Pro, and Veo 3 to create animated visuals and determine narrative style automatically.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers have introduced SciDER, an AI-powered system that automates the entire scientific research process from data analysis to hypothesis generation and code execution. The system uses specialized AI agents that can collaboratively process raw experimental data and outperforms existing general-purpose AI models in scientific discovery tasks.
AINeutralarXiv – CS AI · Mar 37/106
🧠Researchers introduce MOSAIC, the first comprehensive benchmark to evaluate moral, social, and individual characteristics of Large Language Models beyond traditional Moral Foundation Theory. The benchmark includes over 600 curated questions and scenarios from nine validated questionnaires and four platform-based games, providing empirical evidence that current evaluation methods are insufficient for assessing AI ethics comprehensively.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers introduce Autorubric, an open-source Python framework that standardizes rubric-based evaluation of large language models (LLMs) for text generation assessment. The framework addresses scattered evaluation techniques by providing a unified solution with configurable criteria, multi-judge ensembles, bias mitigation, and reliability metrics across three evaluation benchmarks.
AIBullisharXiv – CS AI · Mar 36/106
🧠Researchers have developed S5-HES Agent, an AI-driven framework that democratizes smart home research by enabling natural language configuration of simulations without programming expertise. The system uses large language models and retrieval-augmented generation to make smart home environment testing accessible to broader research communities beyond traditional technical experts.
$NEAR
AIBullisharXiv – CS AI · Mar 26/1014
🧠WisPaper is a new AI-powered academic search system that combines semantic search capabilities with automated paper validation and organization tools. The system achieved 22.26% recall on TaxoBench and 93.70% validation accuracy, addressing key limitations in current academic search engines by integrating discovery, organization, and monitoring workflows.
AIBullisharXiv – CS AI · Feb 276/107
🧠CryoNet.Refine introduces a deep learning framework that uses one-step diffusion models to rapidly refine molecular structures in cryo-electron microscopy. The AI system automates and accelerates the traditionally manual and computationally expensive process of fitting atomic models into experimental density maps.
AIBullishOpenAI News · Jan 276/107
🧠Prism is a new free LaTeX-native workspace that integrates GPT-5.2 to help researchers write, collaborate, and conduct research in a unified platform. The tool aims to streamline academic and research workflows by combining document preparation with AI-powered reasoning capabilities.
AIBullishOpenAI News · Dec 36/107
🧠OpenAI is acquiring Neptune to enhance its ability to monitor and understand AI model behavior. The acquisition aims to strengthen research tools for tracking experiments and monitoring training processes.
AIBullishOpenAI News · Feb 26/105
🧠A new AI research agent has been launched that can synthesize large amounts of online information and complete complex multi-step research tasks through advanced reasoning capabilities. The tool is currently available to Pro users with rollout planned for Plus and Team subscribers.
AINeutralOpenAI News · May 75/105
🧠A company is introducing new technology to help researchers identify AI-generated content and joining the Coalition for Content Provenance and Authenticity Steering Committee. This initiative aims to promote industry standards for content attribution and authenticity verification.
AINeutralarXiv – CS AI · Apr 74/10
🧠Researchers have developed QualAnalyzer, an open-source Chrome extension that makes AI-assisted qualitative research more transparent by preserving detailed audit trails of LLM analysis processes. The tool processes data segments independently and maintains records of prompts, inputs, and outputs to enable systematic comparison between AI and human judgments.