y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#open-source News & Analysis

329 articles tagged with #open-source. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

329 articles
AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Autorubric: A Unified Framework for Rubric-Based LLM Evaluation

Researchers introduce Autorubric, an open-source Python framework that standardizes rubric-based evaluation of large language models (LLMs) for text generation assessment. The framework addresses scattered evaluation techniques by providing a unified solution with configurable criteria, multi-judge ensembles, bias mitigation, and reliability metrics across three evaluation benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

DeepXiv-SDK: An Agentic Data Interface for Scientific Papers

DeepXiv-SDK introduces a new agentic data interface for scientific papers that enables AI research agents to access and process academic literature more efficiently. The SDK provides structured, budget-aware views of papers and supports progressive access patterns, currently deployed at arXiv scale with free API access.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Zero-Shot and Supervised Bird Image Segmentation Using Foundation Models: A Dual-Pipeline Approach with Grounding DINO~1.5, YOLOv11, and SAM~2.1

Researchers developed a dual-pipeline framework for bird image segmentation using foundation models including Grounding DINO 1.5, YOLOv11, and SAM 2.1. The supervised pipeline achieved state-of-the-art results with 0.912 IoU on the CUB-200-2011 dataset, while the zero-shot pipeline achieved 0.831 IoU using only text prompts.

AI ร— CryptoBullisharXiv โ€“ CS AI ยท Mar 37/109
๐Ÿค–

AESP: A Human-Sovereign Economic Protocol for AI Agents with Privacy-Preserving Settlement

Researchers have developed the Agent Economic Sovereignty Protocol (AESP), a new framework that allows AI agents to conduct autonomous financial transactions at machine speed while maintaining human control and governance boundaries. The protocol uses five key mechanisms including policy engines, human oversight, dual-signed commitments, privacy preservation, and cryptographic substrates to ensure agents remain economically capable but never fully sovereign.

AIBearisharXiv โ€“ CS AI ยท Mar 36/109
๐Ÿง 

Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinical Question Answering: Implications for Low-Resource Healthcare Deployment

Research evaluated five small open-source language models on clinical question answering, finding that high consistency doesn't guarantee accuracy - models can be reliably wrong. Llama 3.2 showed the best balance of accuracy and reliability, while roleplay prompts consistently reduced performance across all models.

$NEAR
AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

FastCode: Fast and Cost-Efficient Code Understanding and Reasoning

Researchers introduce FastCode, a new framework for AI-assisted software engineering that improves code understanding and reasoning efficiency. The system uses structural scouting to navigate codebases without full-text ingestion, significantly reducing computational costs while maintaining accuracy across multiple benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 36/106
๐Ÿง 

TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization

Researchers introduce TripleSumm, a novel AI architecture that adaptively fuses visual, text, and audio modalities for improved video summarization. The team also releases MoSu, the first large-scale benchmark dataset providing all three modalities for multimodal video summarization research.

AIBullisharXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

MOSAIC is a new open-source platform that enables cross-paradigm comparison and evaluation of different AI agents including reinforcement learning, large language models, vision-language models, and human decision-makers within the same environment. The platform introduces three key technical contributions: an IPC-based worker protocol, operator abstraction for unified interfaces, and a deterministic evaluation framework for reproducible research.

AIBullisharXiv โ€“ CS AI ยท Mar 37/107
๐Ÿง 

Constructing Synthetic Instruction Datasets for Improving Reasoning in Domain-Specific LLMs: A Case Study in the Japanese Financial Domain

Researchers developed a method for creating synthetic instruction datasets to improve domain-specific LLMs, demonstrating with a 9.5 billion token Japanese financial dataset. The approach enhances both domain expertise and reasoning capabilities, with models and datasets being open-sourced for broader use.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Detection-Gated Glottal Segmentation with Zero-Shot Cross-Dataset Transfer and Clinical Feature Extraction

Researchers developed a detection-gated AI pipeline combining YOLOv8 and U-Net for accurate glottal segmentation in medical videoendoscopy. The system achieved state-of-the-art performance with zero-shot transfer capabilities across different clinical datasets, enabling real-time extraction of vocal function biomarkers at 35 frames per second.

AIBullisharXiv โ€“ CS AI ยท Mar 36/105
๐Ÿง 

Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

Researchers have developed Re4, a multi-agent AI framework that uses three specialized LLMs (Consultant, Reviewer, and Programmer) working collaboratively to solve scientific computing problems. The system employs a rewriting-resolution-review-revision process that significantly improves bug-free code generation and reduces non-physical solutions in mathematical and scientific reasoning tasks.

$LINK
AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

Researchers introduce AIssistant, an open-source framework that combines human expertise with AI agents to streamline scientific review and perspective paper creation in data science. The system uses 15 specialized LLM-driven agents across two workflows and demonstrates 65.7% time savings while maintaining research quality through strategic human oversight.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

ScholarEval: Research Idea Evaluation Grounded in Literature

Researchers introduce ScholarEval, a retrieval-augmented framework for evaluating AI-generated research ideas based on soundness and contribution metrics. The system outperformed OpenAI's o1-mini-deep-research baseline across multiple evaluation criteria in testing with 117 expert-annotated research ideas across four scientific disciplines.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek

Researchers introduced Seek-CAD, a new system that uses the open-source DeepSeek-R1 language model to generate 3D CAD models locally without requiring expensive cloud-based AI services. The system incorporates visual feedback and self-refinement mechanisms to improve CAD model generation, potentially making AI-assisted design more accessible for industrial applications.

AINeutralarXiv โ€“ CS AI ยท Mar 35/103
๐Ÿง 

General Protein Pretraining or Domain-Specific Designs? Benchmarking Protein Modeling on Realistic Applications

Researchers introduce Protap, a comprehensive benchmark comparing protein modeling approaches across realistic applications. The study finds that large-scale pretrained models often underperform supervised encoders on small datasets, while structural information and domain-specific biological knowledge can enhance specialized protein tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

PiKV: KV Cache Management System for Mixture of Experts

Researchers have introduced PiKV, an open-source KV cache management framework designed to optimize memory and communication costs for Mixture of Experts (MoE) language models across multi-GPU and multi-node inference. The system uses expert-sharded storage, intelligent routing, adaptive scheduling, and compression to improve efficiency in large-scale AI model deployment.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

Researchers introduce SHINE, a training-free framework that enables FLUX and other diffusion models to perform high-quality image composition without retraining. The framework addresses complex lighting scenarios like shadows and reflections, achieving state-of-the-art performance on new benchmark ComplexCompo.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

Researchers developed EditReward, a human-aligned reward model for instruction-guided image editing trained on over 200K preference pairs. The model demonstrates superior performance on established benchmarks and can effectively filter high-quality training data, addressing a key bottleneck in open-source image editing models.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets

Researchers conducted the first comprehensive analysis of open-source direct preference optimization (DPO) datasets used to align large language models, revealing significant quality variations. They created UltraMix, a curated dataset that's 30% smaller than existing options while delivering superior performance across benchmarks.

AINeutralarXiv โ€“ CS AI ยท Mar 27/1020
๐Ÿง 

HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance

Researchers have released HumanMCP, the first large-scale dataset designed to evaluate tool retrieval performance in Model Context Protocol (MCP) servers. The dataset addresses a critical gap by providing realistic, human-like queries paired with 2,800 tools across 308 MCP servers, improving upon existing benchmarks that lack authentic user interaction patterns.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1014
๐Ÿง 

SleepLM: Natural-Language Intelligence for Human Sleep

Researchers have developed SleepLM, a family of AI foundation models that combine natural language processing with sleep analysis using polysomnography data. The system can interpret and describe sleep patterns in natural language, trained on over 100K hours of sleep data from 10,000+ individuals, enabling new capabilities like language-guided sleep event detection and zero-shot generalization to novel sleep analysis tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1016
๐Ÿง 

A Minimal Agent for Automated Theorem Proving

Researchers propose a minimal baseline architecture for AI-based theorem proving that achieves competitive performance with state-of-the-art systems while using significantly simpler design. The open-source implementation demonstrates that iterative proof refinement approaches are more sample-efficient and cost-effective than single-shot generation methods.

AIBullisharXiv โ€“ CS AI ยท Mar 27/1011
๐Ÿง 

KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning

Researchers from PKU-SEC-Lab have developed KEEP, a new memory management system that significantly improves the efficiency of AI-powered embodied planning by optimizing KV cache usage. The system achieves 2.68x speedup compared to text-based memory methods while maintaining accuracy, addressing a key bottleneck in memory-augmented Large Language Models for complex planning tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 27/1016
๐Ÿง 

MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

Researchers have developed MPU, a privacy-preserving framework that enables machine unlearning for large language models without requiring servers to share parameters or clients to share data. The framework uses perturbed model copies and harmonic denoising to achieve comparable performance to non-private methods, with most algorithms showing less than 1% performance degradation.