Forem

Paperium profile picture

Paperium

Paperium AI Analysis & Review of Latest Scientific Research Articles

Joined Joined on 
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
Cover image for PEAR: Phase Entropy Aware Reward for Efficient Reasoning

PEAR: Phase Entropy Aware Reward for Efficient Reasoning

Comments
1 min read
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Cover image for ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

Comments
2 min read
Skill-Targeted Adaptive Training
Cover image for Skill-Targeted Adaptive Training

Skill-Targeted Adaptive Training

Comments
1 min read
High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting
Cover image for High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting

High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting

Comments
1 min read
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models
Cover image for On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models

On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models

Comments
1 min read
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
Cover image for CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

Comments
1 min read
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Cover image for SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Comments
1 min read
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Cover image for Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Comments
1 min read
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes
Cover image for AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes

AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes

Comments
1 min read
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Cover image for GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

Comments
2 min read
Don't Just Fine-tune the Agent, Tune the Environment
Cover image for Don't Just Fine-tune the Agent, Tune the Environment

Don't Just Fine-tune the Agent, Tune the Environment

Comments
1 min read
DocReward: A Document Reward Model for Structuring and Stylizing
Cover image for DocReward: A Document Reward Model for Structuring and Stylizing

DocReward: A Document Reward Model for Structuring and Stylizing

Comments
1 min read
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs
Cover image for FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs

Comments
1 min read
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
Cover image for BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Comments
1 min read
ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems
Cover image for ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems

ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems

Comments
1 min read
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Cover image for Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Comments
1 min read
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Cover image for InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Comments
1 min read
Demystifying Reinforcement Learning in Agentic Reasoning
Cover image for Demystifying Reinforcement Learning in Agentic Reasoning

Demystifying Reinforcement Learning in Agentic Reasoning

Comments
1 min read
Making Mathematical Reasoning Adaptive
Cover image for Making Mathematical Reasoning Adaptive

Making Mathematical Reasoning Adaptive

Comments
1 min read
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
Cover image for DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Comments
1 min read
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Cover image for AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Comments
1 min read
Spotlight on Token Perception for Multimodal Reinforcement Learning
Cover image for Spotlight on Token Perception for Multimodal Reinforcement Learning

Spotlight on Token Perception for Multimodal Reinforcement Learning

Comments
1 min read
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Cover image for RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Comments
1 min read
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States
Cover image for Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States

Comments
1 min read
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Cover image for OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Comments
2 min read
Diffusion Transformers with Representation Autoencoders
Cover image for Diffusion Transformers with Representation Autoencoders

Diffusion Transformers with Representation Autoencoders

Comments
1 min read
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Cover image for QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Comments
1 min read
Instant4D: 4D Gaussian Splatting in Minutes
Cover image for Instant4D: 4D Gaussian Splatting in Minutes

Instant4D: 4D Gaussian Splatting in Minutes

Comments
1 min read
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Cover image for ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

Comments
2 min read
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
Cover image for Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Comments
1 min read
LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology
Cover image for LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

Comments
1 min read
Formalizing Style in Personal Narratives
Cover image for Formalizing Style in Personal Narratives

Formalizing Style in Personal Narratives

Comments
1 min read
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
Cover image for ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

Comments
1 min read
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
Cover image for LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Comments
1 min read
Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels
Cover image for Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels

Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels

Comments
2 min read
Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation
Cover image for Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation

Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation

Comments
1 min read
Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation
Cover image for Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation

Comments
1 min read
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Cover image for One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

Comments
2 min read
Understanding DeepResearch via Reports
Cover image for Understanding DeepResearch via Reports

Understanding DeepResearch via Reports

Comments
1 min read
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Cover image for GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

Comments
1 min read
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Cover image for Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Comments
1 min read
Mitigating Overthinking through Reasoning Shaping
Cover image for Mitigating Overthinking through Reasoning Shaping

Mitigating Overthinking through Reasoning Shaping

Comments
1 min read
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control
Cover image for TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

3
Comments
1 min read
A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks
Cover image for A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks

Comments
2 min read
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models
Cover image for Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models

Comments
1 min read
Parallel Test-Time Scaling for Latent Reasoning Models
Cover image for Parallel Test-Time Scaling for Latent Reasoning Models

Parallel Test-Time Scaling for Latent Reasoning Models

Comments
1 min read
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition
Cover image for Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition

Comments
1 min read
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview
Cover image for ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview

ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview

Comments
2 min read
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Cover image for Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Comments
2 min read
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
Cover image for Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Comments
1 min read
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution
Cover image for BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution

Comments
1 min read
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Cover image for PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Comments
1 min read
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval
Cover image for MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval

Comments
1 min read
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics
Cover image for StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

Comments
1 min read
Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction
Cover image for Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction

Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction

Comments
1 min read
Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization
Cover image for Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization

Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization

Comments
1 min read
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Cover image for DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

Comments
1 min read
KORMo: Korean Open Reasoning Model for Everyone
Cover image for KORMo: Korean Open Reasoning Model for Everyone

KORMo: Korean Open Reasoning Model for Everyone

Comments
1 min read
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping
Cover image for ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping

Comments
1 min read
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting
Cover image for Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Comments
1 min read
loading...