Academic

Latest AI Research

Stay ahead of the curve with our curated collection of the most impactful Artificial Intelligence research papers.

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

As Vision-Language Models (VLMs) become increasingly integrated into decision-making systems, it is essential to understand how visual inputs influence their behavior. This paper investigates the effects of visual priming on VLMs' cooperative behavior using the Iterated Prisoner's Dilemma (IPD) as a test scenario.

Sat 11 Apr 2026

Authors: Kenneth J. K. Ong

A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics

Read

Collective intelligence emerges across biological, physical, and artificial systems without central coordination, yet a unifying principle governing such behaviour remains elusive. The Free Energy Principle explains how individual agents adapt through variational inference, while game theory formalises strategic interactions.

Fri 10 Apr 2026

Authors: Djamel Bouchaffra, Faycal Ykhlef, Mustapha Lebbah, Hanane Azzag

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

Read

Multimodal Stance Detection (MSD) is crucial for understanding public discourse, yet effectively fusing text and image, especially with conflicting signals, remains challenging. Existing methods often face difficulties with contextual grounding, cross-modal interpretation ambiguity, and single-pass reasoning fragility.

Thu 9 Apr 2026

Authors: Weihai Lu, Zhejun Zhao, Yanshu Li, Huan He

Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances

Read

We introduce a framework called LAPITHS (Language model Analysis through Paradigm grounded Interpretations of Theses about Human likenesS) and use it to show that several major claims advanced by models such as CENTAUR, proposed as an artificial Unified Model of Cognition, are not theoretically or empirically justified. LAPITHS provides a principled reference point for counteracting the current behaviouristic tendency in AI research to interpret the human level performances of transformer based language models as evidence of human like underlying computation and, by extension, as signs of cognitive abilities.

Wed 8 Apr 2026

Authors: Matteo Da Pelo, Alessio Donvito, Claudio Frongia, Pietro Salis, Antonio Lieto

From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

Read

Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds of memory that agents need in production: exact facts, current state, updates and deletions, aggregation, relations, negative queries, and explicit unknowns.

Tue 7 Apr 2026

Authors: Alex Petrov, Alexander Gusak, Denis Mukha, Dima Korolev

Simulating clinical interventions with a generative multimodal model of human physiology

Read

Understanding how human health changes over time, and why responses to interventions vary between individuals, remains a central challenge in medicine. Here we present HealthFormer, a decoder-only transformer that models the human physiological trajectory generatively, by training on data from the Human Phenotype Project, a multi-visit cohort of over 15,000 deeply phenotyped individuals.

Mon 6 Apr 2026

Authors: Guy Lutsker, Gal Sapir, Jordi Merino, Smadar Shilo, Anastasia Godneva, Eli Meirom, Shie Mannor, Hagai Rossman, Gal Chechik, Eran Segal

Graph World Models: Concepts, Taxonomy, and Future Directions

Read

As one of the mainstream models of artificial intelligence, world models allow agents to learn the representation of the environment for efficient prediction and planning. However, classical world models based on flat tensors face several key problems, including noise sensitivity, error accumulation and weak reasoning.

Sun 5 Apr 2026

Authors: Jiawei Liu, Senqiao Yang, Mingjun Wang, Yu Wang, Bei Yu

In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

Read

Agent orchestration frameworks -- LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, and others -- place an external orchestrator above the LLM, tracking state and injecting routing instructions at every turn. We present a controlled comparison showing that for procedural tasks, this architecture is dominated by a simpler alternative: putting the entire procedure in the system prompt and letting the model self-orchestrate.

Sat 4 Apr 2026

Authors: Simon Dennis, Michael Diamond, Rivaan Patil, Kevin Shabahang, Hao Guo

Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

Read

Recent advances in agentic AI are shifting automation from discrete tools to proactive multi-agent systems that coordinate multi-specialized capabilities behind unified interfaces. However, today's agent systems typically rely on hard-coded agent architectures with fixed roles, coordination patterns, and interaction flows that limit end-user personalization and make adaptation to individual needs and contexts difficult.

Fri 3 Apr 2026

Authors: Giuseppe Arbore, Andrea Sillano, Luigi De Russis

Modeling Clinical Concern Trajectories in Language Model Agents

Read

Large language model (LLM) agents deployed in clinical settings often exhibit abrupt, threshold-driven behavior, offering little visibility into accumulating risk prior to escalation. In real-world care, however, clinicians act on gradually rising concern rather than instantaneous triggers.

Thu 2 Apr 2026

Authors: Sukesh Subaharan, Venkatesan VS, Murugadasan P, Sivakumar D, Gautham N, Ganeshkumar M

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

Read

Language models are saturating benchmarks for procedural tasks with narrow objectives. But they are increasingly being deployed in long-horizon, non-stationary environments with open-ended goals.

Wed 1 Apr 2026

Authors: Thomas Grady, Kip Parker, Iliyan Zarov, Henry Course, Chengxi Taylor, Ross Taylor

Rethinking Agentic Reinforcement Learning In Large Language Models

Read

Reinforcement Learning (RL) has traditionally focused on training specialized agents to optimize predefined reward functions within narrowly defined environments. However, the advent of powerful Large Language Models (LLMs) and increasingly complex, open-ended tasks has catalyzed a paradigm shift towards agentic paradigms within RL.

Tue 31 Mar 2026

Authors: Fangming Cui, Ruixiao Zhu, Cheng Fang, Sunan Li, Jiahong Li

A Grid-Aware Agent-Based Model for Analyzing Electric Vehicle Charging Systems

Read

This paper presents a configurable, grid-aware Agent-Based Model (ABM) for the systematic analysis of electric vehicle (EV) charging systems under configurable infrastructure and operational conditions. The model integrates heterogeneous EV behavior, charging column constraints, and a shared Energy Sandbox that regulates aggregate power allocation, enabling the joint study of user-centric charging dynamics and facility-level power behavior.

Mon 30 Mar 2026

Authors: Khalil Al-Rahman Youssefi, Marija Gojkovic, Walter Stefanutti, Mika Auer, Melanie Schranz

ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

Read

Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve.

Sun 29 Mar 2026

Authors: Mohit Dubey, Open Gigantic

MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents

Read

Multi-server MCP agents create an information-flow control problem: faithful tool composition can turn individually benign read/write permissions into cross-boundary credential propagation -- a structural side effect of workflow topology, not necessarily malicious model behavior. We present MCPHunt, to our knowledge the first controlled benchmark that isolates non-adversarial, verbatim credential propagation across multi-server MCP trust boundaries, with three methodological contributions: (1) canary-based taint tracking that reduces propagation detection to objective string matching; (2) an environment-controlled coverage design with risky, benign, and hard-negative conditions that validates pipeline soundness and controls for credential-format confounds; (3) CRS stratification that disentangles task-mandated propagation (faithful execution of verbatim-transfer instructions) from policy-violating propagation (credentials included despite the option to redact).

Sat 28 Mar 2026

Authors: Haonan Li, Tianjun Sun, Yongqing Wang, Qisheng Zhang

Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Read

The design of embedded safety-critical systems such as those used in next-generation automotive and autonomous platforms, is increasingly challenged by escalating system complexity, hardware-software heterogeneity, and the integration of intelligent, data-driven components. Ensuring dependability in such systems requires a holistic approach that spans multiple abstraction layers and encompasses both design- and run-time assurance.

Fri 27 Mar 2026

Authors: Behnaz Ranjbar, Kirankumar Raveendiran, Sudeep Pasricha, Samarjit Chakraborty, Cecilia Carbonelli, Akash Kumar

Post-Optimization Adaptive Rank Allocation for LoRA

Read

Exponential growth in the scale of modern foundation models has led to the widespread adoption of Low-Rank Adaptation (LoRA) as a parameter-efficient fine-tuning technique. However, standard LoRA implementations disregard the varying intrinsic dimensionality of model layers and enforce a uniform rank, leading to parameter redundancy.

Thu 26 Mar 2026

Authors: Vishnuprasadh Kumaravelu, Sunil Gupta, P. K. Srijith

WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments

Read

While GUI agents have shown impressive capabilities in common computer-use tasks such as OSWorld, current benchmarks mainly focus on isolated and single-application tasks. This overlooks a critical real-world requirement of coordinating across multiple applications to accomplish complex profession-specific workflows.

Wed 25 Mar 2026

Authors: Jinchao Li, Yunxin Li, Chenrui Zhao, Zhenran Xu, Baotian Hu, Min Zhang

Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions

Read

The emergence of Large Language Models (LLMs) offers a transformative interface for Web3, yet existing benchmarks fail to capture the complexity of translating high-level user intents into functionally correct, state-dependent on-chain transactions. We present \textsc{Intent2Tx}, a high-fidelity benchmark featuring 29,921 single-step and 1,575 multi-step instances meticulously derived from 300 days of real-world Ethereum mainnet traces.

Tue 24 Mar 2026

Authors: Zhuoran Pan, Yue Li, Zhi Guan, Jianbin Hu, Zhong Chen

Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making

Read

This article outlines a new framework of traffic light optimization through a digital twin of the transport infrastructure, managed by agentic AI to ensure real-time autonomous decisions. The framework relies on physical sensors and edge computing to measure real-time traffic information and simulate traffic flow in a constantly updated digital twin.

Mon 23 Mar 2026

Authors: Salman Jan, Toqeer Ali Syed, Shahid Kamal, Qamar Wali, Ali Akarma

1 2