AIBearisharXiv – CS AI · 3d ago7/10
🧠Researchers introduce PortBench, a comprehensive benchmark for evaluating large language models in portfolio management tasks. The study reveals that 90% of tested LLMs fail to outperform basic equal-weight allocation strategies, highlighting significant gaps between LLM performance on financial QA tasks and real-world portfolio decision-making.
AIBearisharXiv – CS AI · 3d ago7/10
🧠Researchers introduce HARP, a methodology for measuring how harm propagates across multi-agent LLM systems when one component is compromised. Testing on a finance-oriented seven-agent system reveals that single-agent compromise creates the strongest amplification effects, while existing defenses struggle to balance security with utility costs.
AIBearisharXiv – CS AI · 3d ago7/10
🧠A comprehensive survey reveals that machine learning systems deployed in regulated financial sectors—credit risk, fraud detection, and anti-money laundering—suffer from reproducibility failures caused by hardware-level nondeterminism in neural networks and generative AI. The research quantifies specific vulnerabilities across tabular models, graph networks, and LLM-based workflows, proposing evaluation frameworks to improve auditability in financial AI systems.
AIBullisharXiv – CS AI · May 127/10
🧠TimeClaw is a new AI framework that improves how large language models analyze time-series data by learning from exploratory execution rather than just solving individual problems. The system uses a four-stage loop to compare, distill, and reuse successful reasoning patterns, showing consistent improvements over baseline models in finance and weather prediction tasks.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce FinAgent-RAG, an advanced AI framework designed to answer complex financial questions by combining iterative retrieval, reasoning, and self-verification. The system achieves 76-78% accuracy on financial benchmarks while reducing computational costs by 41%, demonstrating practical viability for institutional financial analysis.
AI × CryptoNeutralCoinDesk · May 17/10
🤖An AI agent named Manfred has established its own company with crypto wallet access and hiring credentials, positioning itself to begin cryptocurrency trading by end of May. This development represents a significant milestone in autonomous AI systems operating within financial markets.
AIBearisharXiv – CS AI · May 17/10
🧠Researchers demonstrate that supervised financial NLP benchmarks used to evaluate LLMs contain hidden measurement risks, where rubric wording, metric selection, and aggregation methods materially alter model performance rankings. Testing on the Japanese Financial Implicit-Commitment Recognition dataset reveals 13-point agreement variance across rubric variants and shows that certain metrics produce unreliable signals, highlighting the need for standardized evaluation governance in financial AI model selection.
AIBearisharXiv – CS AI · Mar 267/10
🧠Researchers introduced EnterpriseArena, the first benchmark testing whether AI agents can function as CFOs by allocating resources in complex enterprise environments over 132 months. Testing on eleven advanced LLMs revealed poor performance, with only 16% of runs surviving the full simulation period, highlighting significant capability gaps in long-term resource allocation under uncertainty.
AIBearisharXiv – CS AI · Mar 167/10
🧠Research reveals that AI agents using tools for financial advice can recommend unsafe products while maintaining good quality metrics when tool data is corrupted. The study found that 65-93% of recommendations contained risk-inappropriate products across seven LLMs, yet standard evaluation metrics failed to detect these safety issues.
AIBullishOpenAI News · Mar 67/10
🧠Balyasny Asset Management developed an AI research engine leveraging GPT-5.4 technology with rigorous model evaluation and agent workflows to transform their investment analysis capabilities. The system enables the hedge fund to process and analyze investment research at scale, representing a significant advancement in AI-powered financial analysis.
🧠 GPT-5
AIBullisharXiv – CS AI · Mar 37/102
🧠Researchers introduce Verbal Technical Analysis (VTA), a framework that combines Large Language Models with time-series analysis to produce interpretable stock forecasts. The system converts stock price data into textual annotations and uses natural language reasoning to achieve state-of-the-art forecasting accuracy across U.S., Chinese, and European markets.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce CFMME, a Chinese financial multimodal evaluation benchmark containing 6,052 instances to assess Large Vision-Language Models' capabilities in financial contexts. Testing shows current state-of-the-art LVLMs achieve 66.11% accuracy on financial question-answering tasks, indicating significant room for improvement in applying these models to real-world financial applications.
AI × CryptoBullishCrypto Briefing · 2d ago6/10
🤖VanEck CEO disclosed the firm spends $750,000 annually on Claude AI tokens, signaling substantial enterprise adoption of advanced AI services. This revelation underscores how major financial institutions are rapidly integrating AI into operations while introducing new cost structures and dependency risks to institutional finance.
🧠 Claude
AI × CryptoNeutralarXiv – CS AI · May 46/10
🤖Researchers introduce ATLAS, a multi-agent framework that uses large language models for autonomous trading by combining dynamic prompt optimization with real-time market feedback. The system addresses key challenges in deploying LLMs for finance: adapting to delayed, noisy market signals and converting model outputs into executable orders.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers introduce FinChain, a new benchmark dataset designed to evaluate chain-of-thought reasoning in financial AI systems. The dataset addresses gaps in existing finance benchmarks by emphasizing verifiable intermediate reasoning steps rather than just final answers, and reveals that even leading LLMs struggle with multi-step symbolic financial reasoning.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce Hubble, an LLM-driven framework that automates alpha factor discovery in quantitative finance by using large language models constrained by safety mechanisms to generate and refine predictive trading factors. The system achieved a composite score of 0.827 across 181 evaluated factors on U.S. equities, demonstrating that combining AI-driven generation with deterministic safety constraints enables interpretable and reproducible factor discovery.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduced FinTrace, a benchmark dataset with 800 expert-annotated trajectories for evaluating how large language models perform financial tool-calling tasks. The study reveals that while frontier LLMs excel at selecting appropriate tools, they struggle significantly with information utilization and generating accurate final outputs, pointing to a critical reasoning gap that persists even after fine-tuning with preference optimization techniques.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers demonstrate that large language models can extract predictive features from financial news with valid intermediate signals (Information Coefficient >0.15), yet these features fail to improve reinforcement learning trading agents during macroeconomic shocks. The findings reveal a critical gap between feature-level validity and downstream policy robustness, suggesting that valid signals alone cannot guarantee trading performance under distribution shifts.
AIBullisharXiv – CS AI · Apr 106/10
🧠Researchers introduce PyFi, a framework enabling vision language models to understand financial images through progressive reasoning chains, backed by a 600K synthetic dataset organized as a reasoning pyramid. The approach uses adversarial agents to automatically generate training data without human annotation, achieving up to 19.52% accuracy improvements on fine-tuned models.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers developed LabelFusion, a hybrid AI architecture combining Large Language Models with transformer encoders for financial news classification. The system achieves 96% F1 score on full datasets but LLMs alone perform better in low-data scenarios, suggesting different strategies based on available training data.
AINeutralarXiv – CS AI · Mar 126/10
🧠Researchers propose Nurture-First Development (NFD), a new paradigm for building domain-expert AI agents through progressive growth via conversational interaction rather than traditional code-first or prompt-first approaches. The method uses a Knowledge Crystallization Cycle to convert operational dialogue into structured knowledge assets, demonstrated through a financial research agent case study.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers developed ToolRLA, a three-stage reinforcement learning pipeline that significantly improves AI agents' ability to use external tools and APIs for domain-specific tasks. The system achieved 47% higher task completion rates and 93% lower regulatory violations when deployed in a real-world financial advisory copilot serving 80+ advisors with 1,200+ daily queries.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers developed a method for creating synthetic instruction datasets to improve domain-specific LLMs, demonstrating with a 9.5 billion token Japanese financial dataset. The approach enhances both domain expertise and reasoning capabilities, with models and datasets being open-sourced for broader use.
AIBullisharXiv – CS AI · Mar 27/1012
🧠Researchers have developed FinBloom 7B, a specialized large language model trained on 14 million financial news articles and SEC filings, designed to handle real-time financial queries. The model introduces a Financial Agent system that can access up-to-date market data and financial information to support decision-making and algorithmic trading applications.
AINeutralarXiv – CS AI · Feb 275/106
🧠Researchers introduce FIRE, a comprehensive benchmark for evaluating Large Language Models' financial intelligence and reasoning capabilities. The benchmark includes theoretical financial knowledge tests from qualification exams and 3,000 practical financial scenario questions covering complex business domains.