AIBearisharXiv – CS AI · Jun 47/10
🧠Researchers demonstrate the first practical quantization-conditioned attack that reliably compromises large language models across advanced quantization methods including AWQ, GPTQ, and GGUF. The attack exploits how outlier weights cause rounding errors in modern quantization schemes, allowing adversaries to inject hidden malicious behaviors that activate only after quantization, posing significant security risks to the deployment pipeline.
AIBearisharXiv – CS AI · Jun 47/10
🧠Researchers traced how ESM2-8M, a protein language model, predicts that proteins begin with methionine—a near-universal biological rule. The analysis reveals the model doesn't recognize methionine through direct evidence detection, but rather retrieves it via a distributed computational circuit anchored at the sequence start token. Critically, the model fails on sequences where biology diverges from the statistical default, suggesting that model confidence may not reflect genuine biological understanding.
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers introduce LUNA, a linguistically-aware watermarking technique for large language models that maintains output quality across multiple languages while enabling reliable detection without model provider access. The method achieves 99.59% detection accuracy with minimal perplexity degradation (0.045 mean shift), outperforming eight baseline approaches across six typologically diverse languages.
🏢 Perplexity
AIBullisharXiv – CS AI · May 297/10
🧠Researchers propose Cross-Model Entropy (CME), a label-free reward signal for reinforcement learning that uses a separate verifier model's likelihood assessment instead of human labels or self-referential signals. The method successfully extends RL post-training to open-ended instruction following across multiple model families, achieving win rates of 52.5-71.4% in head-to-head comparisons.
🧠 Llama
AIBearisharXiv – CS AI · May 297/10
🧠Researchers introduce KBF, a black-box auditing protocol that detects fraudulent LLM API substitutions by analyzing model behavior at knowledge boundaries. Testing across 16 production endpoints revealed all economically relevant model swaps without false positives, and identified inconsistencies in 7 of 27 model cells across major AI platforms, particularly affecting Claude premium endpoints.
🧠 Claude
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers propose a novel lightweight architecture for verifiable aggregation in federated learning that uses backdoor injection as intrinsic proofs instead of expensive cryptographic methods. The approach achieves over 1000x speedup compared to traditional cryptographic baselines while maintaining high detection rates against malicious servers.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers developed COOL-MC, a tool that combines reinforcement learning with model checking to verify and explain AI policies for platelet inventory management in blood banks. The system achieved a 2.9% stockout probability while providing transparent decision-making explanations for safety-critical healthcare applications.
AINeutralarXiv – CS AI · Mar 47/103
🧠Researchers have discovered that language models produce outputs with unique geometric signatures that lie on high-dimensional ellipses, which can be used to identify the source model. This signature is forgery-resistant and naturally occurring, potentially enabling cryptographic-like verification of AI model outputs.
AIBullisharXiv – CS AI · Feb 277/104
🧠Researchers propose a new approach to address 'legibility tax' in AI systems by decoupling solver and verification functions. They introduce a translator model that converts correct solutions into checkable forms, maintaining accuracy while improving verifiability through decoupled prover-verifier games.
AI × CryptoBullisharXiv – CS AI · Feb 277/103
🤖Researchers introduce IMMACULATE, a framework that audits commercial large language model API services to detect fraud like model substitution and token overbilling without requiring access to internal systems. The system uses verifiable computation to audit a small fraction of requests, achieving strong detection guarantees with less than 1% throughput overhead.
AINeutralarXiv – CS AI · May 296/10
🧠Researchers introduce RLR³, an advanced reinforcement learning framework that extends reward verification from task-level to criterion-level evaluation, enabling multi-criteria supervision for vision-language tasks. The approach uses hybrid verification paths combining LLM extractors with deterministic verifiers or LLM judges, demonstrating a 4.7-point improvement over baseline models on 15 benchmarks.
AINeutralarXiv – CS AI · May 286/10
🧠Researchers demonstrate that debate-based AI oversight works effectively only when specific conditions are met: the critic model must exceed the judge's classification ability, and the judge must verify claims rather than simply summarize testimony. A simpler single-critique approach recovers most benefits at lower computational cost.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers propose Statistical Membership Inference (SMI), a new training-free auditing method that challenges the reliability of existing Membership Inference Attacks (MIAs) for verifying machine unlearning. The framework addresses a fundamental flaw in current auditing approaches by reformulating the problem as estimating non-member proportions in feature distributions, eliminating the need for computationally expensive shadow model training.
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers introduce CLIP-Inspector, a backdoor detection method for prompt-tuned CLIP models that reconstructs hidden triggers using out-of-distribution images to identify if a model has been maliciously compromised. The technique achieves 94% detection accuracy and enables post-hoc model repair, addressing critical security vulnerabilities in outsourced machine learning services.
AIBullisharXiv – CS AI · Apr 66/10
🧠Research shows that smaller open-source AI models can match frontier models in mathematical proof verification when using specialized prompts, despite being up to 25% less consistent with general prompts. The study demonstrates that models like Qwen3.5-35B can achieve performance comparable to Gemini 3.1 Pro through LLM-guided prompt optimization, improving accuracy by up to 9.1%.
🧠 Gemini
AIBearisharXiv – CS AI · Mar 37/105
🧠A systematic audit of 17 shadow APIs used in 187 academic papers reveals widespread deception, with performance divergence up to 47.21% and identity verification failures in 45.83% of tests. These third-party services claim to provide access to frontier LLMs like GPT-5 and Gemini-2.5 but deliver inconsistent outputs, undermining research validity and reproducibility.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers propose a new approach to world models that combines explicit simulators with learned models using the DEVS formalism. The method uses LLMs to generate discrete-event world models from natural language specifications, targeting environments with event-driven dynamics like queueing systems and multi-agent coordination.