AIBullisharXiv – CS AI · Apr 207/10
🧠Researchers propose a novel statistical framework for integrating Large Language Model-generated data with real human data in conjoint analysis, addressing the bias gap between synthetic and authentic consumer responses. The approach delivers 24.9-79.8% cost and data savings while maintaining statistical robustness, validating that LLM data serves as a complement rather than substitute for human market research.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce MAC, a multi-agent framework that combines statistical causal discovery with large language models to identify relationships between variables more accurately than existing methods. By using autonomous agent debate and adversarial reasoning, MAC outperforms both traditional statistical and single-agent LLM approaches across multiple benchmark datasets.
🧠 Gemini
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce a novel active testing algorithm that reduces evaluation costs for large language models by intelligently sampling from evaluation pools using semantic entropy and approximate Neyman allocation. The method achieves up to 28% MSE reduction over uniform sampling while saving an average of 22.9% of evaluation budget across multiple benchmarks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers present a rigorous statistical framework for measuring AI agent reliability through U-statistics and kernel-based metrics, moving beyond traditional pass@1 evaluation methods. The study reveals that agents can possess requisite knowledge yet fail catastrophically under minor task variations, with trajectory-level consistency metrics providing significantly better diagnostic sensitivity for identifying failure modes in high-stakes deployments.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers demonstrate that K-means clustering, a widely-used statistical method in psychological research, can produce apparently meaningful subgroups even when analyzing data without genuine underlying categories. Testing the method on simulated data and the SMARVUS international psychometric dataset reveals that geometric partitioning around centroids may create the illusion of real psychological typologies rather than identifying them.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers propose Statistical Membership Inference (SMI), a new training-free auditing method that challenges the reliability of existing Membership Inference Attacks (MIAs) for verifying machine unlearning. The framework addresses a fundamental flaw in current auditing approaches by reformulating the problem as estimating non-member proportions in feature distributions, eliminating the need for computationally expensive shadow model training.
AINeutralarXiv – CS AI · Mar 33/105
🧠Researchers developed a new framework for causal effect triangulation that combines multiple statistical models to improve causal inference from observational data. The method addresses model uncertainty by using data-driven measures of model validity without requiring commitment to a single specification.
AINeutralarXiv – CS AI · Mar 24/106
🧠Researchers developed a framework for causal discovery in longitudinal data systems that addresses real-world workflow constraints by incorporating institutional protocols and timeline structures. The method was tested on a large Japanese health screening dataset with over 100,000 individuals, showing improved structural interpretability without requiring domain-specific specifications.