329 articles tagged with #open-source. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers introduce Autorubric, an open-source Python framework that standardizes rubric-based evaluation of large language models (LLMs) for text generation assessment. The framework addresses scattered evaluation techniques by providing a unified solution with configurable criteria, multi-judge ensembles, bias mitigation, and reliability metrics across three evaluation benchmarks.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง DeepXiv-SDK introduces a new agentic data interface for scientific papers that enables AI research agents to access and process academic literature more efficiently. The SDK provides structured, budget-aware views of papers and supports progressive access patterns, currently deployed at arXiv scale with free API access.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers developed a dual-pipeline framework for bird image segmentation using foundation models including Grounding DINO 1.5, YOLOv11, and SAM 2.1. The supervised pipeline achieved state-of-the-art results with 0.912 IoU on the CUB-200-2011 dataset, while the zero-shot pipeline achieved 0.831 IoU using only text prompts.
AI ร CryptoBullisharXiv โ CS AI ยท Mar 37/109
๐คResearchers have developed the Agent Economic Sovereignty Protocol (AESP), a new framework that allows AI agents to conduct autonomous financial transactions at machine speed while maintaining human control and governance boundaries. The protocol uses five key mechanisms including policy engines, human oversight, dual-signed commitments, privacy preservation, and cryptographic substrates to ensure agents remain economically capable but never fully sovereign.
AIBearisharXiv โ CS AI ยท Mar 36/109
๐ง Research evaluated five small open-source language models on clinical question answering, finding that high consistency doesn't guarantee accuracy - models can be reliably wrong. Llama 3.2 showed the best balance of accuracy and reliability, while roleplay prompts consistently reduced performance across all models.
$NEAR
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers introduce FastCode, a new framework for AI-assisted software engineering that improves code understanding and reasoning efficiency. The system uses structural scouting to navigate codebases without full-text ingestion, significantly reducing computational costs while maintaining accuracy across multiple benchmarks.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers developed an open-source modular benchmark for evaluating diffusion-based motion planners in real-world autonomous driving systems. The system integrates with Autoware ROS 2 stack and achieves 3.2x latency reduction through encoder caching while improving accuracy by 41% with second-order solving.
AIBullisharXiv โ CS AI ยท Mar 36/106
๐ง Researchers introduce TripleSumm, a novel AI architecture that adaptively fuses visual, text, and audio modalities for improved video summarization. The team also releases MoSu, the first large-scale benchmark dataset providing all three modalities for multimodal video summarization research.
AIBullisharXiv โ CS AI ยท Mar 37/106
๐ง MOSAIC is a new open-source platform that enables cross-paradigm comparison and evaluation of different AI agents including reinforcement learning, large language models, vision-language models, and human decision-makers within the same environment. The platform introduces three key technical contributions: an IPC-based worker protocol, operator abstraction for unified interfaces, and a deterministic evaluation framework for reproducible research.
AIBullisharXiv โ CS AI ยท Mar 37/107
๐ง Researchers developed a method for creating synthetic instruction datasets to improve domain-specific LLMs, demonstrating with a 9.5 billion token Japanese financial dataset. The approach enhances both domain expertise and reasoning capabilities, with models and datasets being open-sourced for broader use.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers developed a detection-gated AI pipeline combining YOLOv8 and U-Net for accurate glottal segmentation in medical videoendoscopy. The system achieved state-of-the-art performance with zero-shot transfer capabilities across different clinical datasets, enabling real-time extraction of vocal function biomarkers at 35 frames per second.
AIBullisharXiv โ CS AI ยท Mar 36/105
๐ง Researchers have developed Re4, a multi-agent AI framework that uses three specialized LLMs (Consultant, Reviewer, and Programmer) working collaboratively to solve scientific computing problems. The system employs a rewriting-resolution-review-revision process that significantly improves bug-free code generation and reduces non-physical solutions in mathematical and scientific reasoning tasks.
$LINK
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduce AIssistant, an open-source framework that combines human expertise with AI agents to streamline scientific review and perspective paper creation in data science. The system uses 15 specialized LLM-driven agents across two workflows and demonstrates 65.7% time savings while maintaining research quality through strategic human oversight.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduce ScholarEval, a retrieval-augmented framework for evaluating AI-generated research ideas based on soundness and contribution metrics. The system outperformed OpenAI's o1-mini-deep-research baseline across multiple evaluation criteria in testing with 117 expert-annotated research ideas across four scientific disciplines.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduced Seek-CAD, a new system that uses the open-source DeepSeek-R1 language model to generate 3D CAD models locally without requiring expensive cloud-based AI services. The system incorporates visual feedback and self-refinement mechanisms to improve CAD model generation, potentially making AI-assisted design more accessible for industrial applications.
AINeutralarXiv โ CS AI ยท Mar 35/103
๐ง Researchers introduce Protap, a comprehensive benchmark comparing protein modeling approaches across realistic applications. The study finds that large-scale pretrained models often underperform supervised encoders on small datasets, while structural information and domain-specific biological knowledge can enhance specialized protein tasks.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers have introduced PiKV, an open-source KV cache management framework designed to optimize memory and communication costs for Mixture of Experts (MoE) language models across multi-GPU and multi-node inference. The system uses expert-sharded storage, intelligent routing, adaptive scheduling, and compression to improve efficiency in large-scale AI model deployment.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduce SHINE, a training-free framework that enables FLUX and other diffusion models to perform high-quality image composition without retraining. The framework addresses complex lighting scenarios like shadows and reflections, achieving state-of-the-art performance on new benchmark ComplexCompo.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers developed EditReward, a human-aligned reward model for instruction-guided image editing trained on over 200K preference pairs. The model demonstrates superior performance on established benchmarks and can effectively filter high-quality training data, addressing a key bottleneck in open-source image editing models.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers conducted the first comprehensive analysis of open-source direct preference optimization (DPO) datasets used to align large language models, revealing significant quality variations. They created UltraMix, a curated dataset that's 30% smaller than existing options while delivering superior performance across benchmarks.
AINeutralarXiv โ CS AI ยท Mar 27/1020
๐ง Researchers have released HumanMCP, the first large-scale dataset designed to evaluate tool retrieval performance in Model Context Protocol (MCP) servers. The dataset addresses a critical gap by providing realistic, human-like queries paired with 2,800 tools across 308 MCP servers, improving upon existing benchmarks that lack authentic user interaction patterns.
AIBullisharXiv โ CS AI ยท Mar 26/1014
๐ง Researchers have developed SleepLM, a family of AI foundation models that combine natural language processing with sleep analysis using polysomnography data. The system can interpret and describe sleep patterns in natural language, trained on over 100K hours of sleep data from 10,000+ individuals, enabling new capabilities like language-guided sleep event detection and zero-shot generalization to novel sleep analysis tasks.
AIBullisharXiv โ CS AI ยท Mar 26/1016
๐ง Researchers propose a minimal baseline architecture for AI-based theorem proving that achieves competitive performance with state-of-the-art systems while using significantly simpler design. The open-source implementation demonstrates that iterative proof refinement approaches are more sample-efficient and cost-effective than single-shot generation methods.
AIBullisharXiv โ CS AI ยท Mar 27/1011
๐ง Researchers from PKU-SEC-Lab have developed KEEP, a new memory management system that significantly improves the efficiency of AI-powered embodied planning by optimizing KV cache usage. The system achieves 2.68x speedup compared to text-based memory methods while maintaining accuracy, addressing a key bottleneck in memory-augmented Large Language Models for complex planning tasks.
AIBullisharXiv โ CS AI ยท Mar 27/1016
๐ง Researchers have developed MPU, a privacy-preserving framework that enables machine unlearning for large language models without requiring servers to share parameters or clients to share data. The framework uses perturbed model copies and harmonic denoising to achieve comparable performance to non-private methods, with most algorithms showing less than 1% performance degradation.