#financial-ai News & Analysis

27 articles tagged with #financial-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

27 articles

AIBearisharXiv – CS AI · 3d ago7/10

🧠

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

Researchers introduce PortBench, a comprehensive benchmark for evaluating large language models in portfolio management tasks. The study reveals that 90% of tested LLMs fail to outperform basic equal-weight allocation strategies, highlighting significant gaps between LLM performance on financial QA tasks and real-world portfolio decision-making.

AIBearisharXiv – CS AI · 3d ago7/10

🧠

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

Researchers introduce HARP, a methodology for measuring how harm propagates across multi-agent LLM systems when one component is compromised. Testing on a finance-oriented seven-agent system reveals that single-agent compromise creates the strongest amplification effects, while existing defenses struggle to balance security with utility costs.

AIBearisharXiv – CS AI · 3d ago7/10

🧠

From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems

A comprehensive survey reveals that machine learning systems deployed in regulated financial sectors—credit risk, fraud detection, and anti-money laundering—suffer from reproducibility failures caused by hardware-level nondeterminism in neural networks and generative AI. The research quantifies specific vulnerabilities across tabular models, graph networks, and LLM-based workflows, proposing evaluation frameworks to improve auditability in financial AI systems.

AIBullisharXiv – CS AI · May 127/10

🧠

TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning

TimeClaw is a new AI framework that improves how large language models analyze time-series data by learning from exploratory execution rather than just solving individual problems. The system uses a four-stage loop to compare, distill, and reuse successful reasoning patterns, showing consistent improvements over baseline models in finance and weather prediction tasks.

AIBullisharXiv – CS AI · May 97/10

🧠

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

Researchers introduce FinAgent-RAG, an advanced AI framework designed to answer complex financial questions by combining iterative retrieval, reasoning, and self-verification. The system achieves 76-78% accuracy on financial benchmarks while reducing computational costs by 41%, demonstrating practical viability for institutional financial analysis.

AI × CryptoNeutralCoinDesk · May 17/10

🤖

AI agent forms its own company, gets ready to trade crypto

An AI agent named Manfred has established its own company with crypto wallet access and hiring credentials, positioning itself to begin cryptocurrency trading by end of May. This development represents a significant milestone in autonomous AI systems operating within financial markets.

AIBearisharXiv – CS AI · May 17/10

🧠

Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR

Researchers demonstrate that supervised financial NLP benchmarks used to evaluate LLMs contain hidden measurement risks, where rubric wording, metric selection, and aggregation methods materially alter model performance rankings. Testing on the Japanese Financial Implicit-Commitment Recognition dataset reveals 13-point agreement variance across rubric variants and shows that certain metrics produce unreliable signals, highlighting the need for standardized evaluation governance in financial AI model selection.

AIBearisharXiv – CS AI · Mar 267/10

🧠

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Researchers introduced EnterpriseArena, the first benchmark testing whether AI agents can function as CFOs by allocating resources in complex enterprise environments over 132 months. Testing on eleven advanced LLMs revealed poor performance, with only 16% of runs surviving the full simulation period, highlighting significant capability gaps in long-term resource allocation under uncertainty.

AIBearisharXiv – CS AI · Mar 167/10

🧠

AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents

Research reveals that AI agents using tools for financial advice can recommend unsafe products while maintaining good quality metrics when tool data is corrupted. The study found that 65-93% of recommendations contained risk-inappropriate products across seven LLMs, yet standard evaluation metrics failed to detect these safety issues.

AIBullishOpenAI News · Mar 67/10

🧠

How Balyasny Asset Management built an AI research engine for investing

Balyasny Asset Management developed an AI research engine leveraging GPT-5.4 technology with rigorous model evaluation and agent workflows to transform their investment analysis capabilities. The system enables the hedge fund to process and analyze investment research at scale, representing a significant advancement in AI-powered financial analysis.

🧠 GPT-5

AIBullisharXiv – CS AI · Mar 37/102

🧠

Reasoning on Time-Series for Financial Technical Analysis

Researchers introduce Verbal Technical Analysis (VTA), a framework that combines Large Language Models with time-series analysis to produce interpretable stock forecasts. The system converts stock price data into textual annotations and uses natural language reasoning to achieve state-of-the-art forecasting accuracy across U.S., Chinese, and European markets.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

Researchers introduce CFMME, a Chinese financial multimodal evaluation benchmark containing 6,052 instances to assess Large Vision-Language Models' capabilities in financial contexts. Testing shows current state-of-the-art LVLMs achieve 66.11% accuracy on financial question-answering tasks, indicating significant room for improvement in applying these models to real-world financial applications.

AI × CryptoBullishCrypto Briefing · 2d ago6/10

🤖

VanEck CEO reveals $750,000 annual spending on Claude AI tokens

VanEck CEO disclosed the firm spends $750,000 annually on Claude AI tokens, signaling substantial enterprise adoption of advanced AI services. This revelation underscores how major financial institutions are rapidly integrating AI into operations while introducing new cost structures and dependency risks to institutional finance.

🧠 Claude

AI × CryptoNeutralarXiv – CS AI · May 46/10

🤖

ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

Researchers introduce ATLAS, a multi-agent framework that uses large language models for autonomous trading by combining dynamic prompt optimization with real-time market feedback. The system addresses key challenges in deploying LLMs for finance: adapting to delayed, noisy market signals and converting model outputs into executable orders.

AINeutralarXiv – CS AI · May 16/10

🧠

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Researchers introduce FinChain, a new benchmark dataset designed to evaluate chain-of-thought reasoning in financial AI systems. The dataset addresses gaps in existing finance benchmarks by emphasizing verifiable intermediate reasoning steps rather than just final answers, and reveals that even leading LLMs struggle with multi-step symbolic financial reasoning.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Hubble: An LLM-Driven Agentic Framework for Safe and Automated Alpha Factor Discovery

Researchers introduce Hubble, an LLM-driven framework that automates alpha factor discovery in quantitative finance by using large language models constrained by safety mechanisms to generate and refine predictive trading factors. The system achieved a composite score of 0.827 across 181 evaluated factors on U.S. equities, demonstrating that combining AI-driven generation with deterministic safety constraints enables interpretable and reproducible factor discovery.

AINeutralarXiv – CS AI · Apr 146/10

🧠

FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks

Researchers introduced FinTrace, a benchmark dataset with 800 expert-annotated trajectories for evaluating how large language models perform financial tool-calling tasks. The study reveals that while frontier LLMs excel at selecting appropriate tools, they struggle significantly with information utilization and generating accurate final outputs, pointing to a critical reasoning gap that persists even after fine-tuning with preference optimization techniques.

AINeutralarXiv – CS AI · Apr 146/10

🧠

When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Researchers demonstrate that large language models can extract predictive features from financial news with valid intermediate signals (Information Coefficient >0.15), yet these features fail to improve reinforcement learning trading agents during macroeconomic shocks. The findings reveal a critical gap between feature-level validity and downstream policy robustness, suggesting that valid signals alone cannot guarantee trading performance under distribution shifts.

AIBullisharXiv – CS AI · Apr 106/10

🧠

PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

Researchers introduce PyFi, a framework enabling vision language models to understand financial images through progressive reasoning chains, backed by a 600K synthetic dataset organized as a reasoning pyramid. The approach uses adversarial agents to automatically generate training data without human annotation, achieving up to 19.52% accuracy improvements on fine-tuned models.

AIBullisharXiv – CS AI · Mar 176/10

🧠

LabelFusion: Fusing Large Language Models with Transformer Encoders for Robust Financial News Classification

Researchers developed LabelFusion, a hybrid AI architecture combining Large Language Models with transformer encoders for financial news classification. The system achieves 96% F1 score on full datasets but LLMs alone perform better in low-data scenarios, suggesting different strategies based on available training data.

AINeutralarXiv – CS AI · Mar 126/10

🧠

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization

Researchers propose Nurture-First Development (NFD), a new paradigm for building domain-expert AI agents through progressive growth via conversational interaction rather than traditional code-first or prompt-first approaches. The method uses a Knowledge Crystallization Cycle to convert operational dialogue into structured knowledge assets, demonstrated through a financial research agent case study.

AIBullisharXiv – CS AI · Mar 37/107

🧠

ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents

Researchers developed ToolRLA, a three-stage reinforcement learning pipeline that significantly improves AI agents' ability to use external tools and APIs for domain-specific tasks. The system achieved 47% higher task completion rates and 93% lower regulatory violations when deployed in a real-world financial advisory copilot serving 80+ advisors with 1,200+ daily queries.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Constructing Synthetic Instruction Datasets for Improving Reasoning in Domain-Specific LLMs: A Case Study in the Japanese Financial Domain

Researchers developed a method for creating synthetic instruction datasets to improve domain-specific LLMs, demonstrating with a 9.5 billion token Japanese financial dataset. The approach enhances both domain expertise and reasoning capabilities, with models and datasets being open-sourced for broader use.

AIBullisharXiv – CS AI · Mar 27/1012

🧠

FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data

Researchers have developed FinBloom 7B, a specialized large language model trained on 14 million financial news articles and SEC filings, designed to handle real-time financial queries. The model introduces a Financial Agent system that can access up-to-date market data and financial information to support decision-making and algorithmic trading applications.

AINeutralarXiv – CS AI · Feb 275/106

🧠

FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation

Researchers introduce FIRE, a comprehensive benchmark for evaluating Large Language Models' financial intelligence and reasoning capabilities. The benchmark includes theoretical financial knowledge tests from qualification exams and 3,000 practical financial scenario questions covering complex business domains.

Page 1 of 2Next →