Forem

Paperium profile picture

Paperium

Paperium AI Analysis & Review of Latest Scientific Research Articles

Joined Joined on  Personal website https://paperium.net
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
Cover image for Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Comments
1 min read
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents inHierarchical Rule Application
Cover image for HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents inHierarchical Rule Application

HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents inHierarchical Rule Application

Comments
2 min read
DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking
Cover image for DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

Comments
1 min read
Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs inMultimodal LLMs
Cover image for Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs inMultimodal LLMs

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs inMultimodal LLMs

Comments
1 min read
Accelerating Vision Transformers with Adaptive Patch Sizes
Cover image for Accelerating Vision Transformers with Adaptive Patch Sizes

Accelerating Vision Transformers with Adaptive Patch Sizes

Comments
1 min read
SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection
Cover image for SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection

SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection

Comments
1 min read
Machine Text Detectors are Membership Inference Attacks
Cover image for Machine Text Detectors are Membership Inference Attacks

Machine Text Detectors are Membership Inference Attacks

Comments
1 min read
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation inText-to-Image Models
Cover image for DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation inText-to-Image Models

DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation inText-to-Image Models

Comments
2 min read
What Questions Should Robots Be Able to Answer? A Dataset of User Questions forExplainable Robotics
Cover image for What Questions Should Robots Be Able to Answer? A Dataset of User Questions forExplainable Robotics

What Questions Should Robots Be Able to Answer? A Dataset of User Questions forExplainable Robotics

Comments
2 min read
RIR-Mega: a large-scale simulated room impulse response dataset for machinelearning and room acoustics modeling
Cover image for RIR-Mega: a large-scale simulated room impulse response dataset for machinelearning and room acoustics modeling

RIR-Mega: a large-scale simulated room impulse response dataset for machinelearning and room acoustics modeling

Comments
1 min read
See the Text: From Tokenization to Visual Reading
Cover image for See the Text: From Tokenization to Visual Reading

See the Text: From Tokenization to Visual Reading

Comments
1 min read
When Do Transformers Learn Heuristics for Graph Connectivity?
Cover image for When Do Transformers Learn Heuristics for Graph Connectivity?

When Do Transformers Learn Heuristics for Graph Connectivity?

Comments
2 min read
Learning from the Best, Differently: A Diversity-Driven Rethinking on DataSelection
Cover image for Learning from the Best, Differently: A Diversity-Driven Rethinking on DataSelection

Learning from the Best, Differently: A Diversity-Driven Rethinking on DataSelection

Comments
1 min read
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer andJudge
Cover image for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer andJudge

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer andJudge

Comments
1 min read
Steering Autoregressive Music Generation with Recursive Feature Machines
Cover image for Steering Autoregressive Music Generation with Recursive Feature Machines

Steering Autoregressive Music Generation with Recursive Feature Machines

Comments
1 min read
MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for LargeMultimodal Models
Cover image for MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for LargeMultimodal Models

MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for LargeMultimodal Models

Comments
2 min read
From Charts to Code: A Hierarchical Benchmark for Multimodal Models
Cover image for From Charts to Code: A Hierarchical Benchmark for Multimodal Models

From Charts to Code: A Hierarchical Benchmark for Multimodal Models

Comments
1 min read
NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning
Cover image for NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Comments
1 min read
TheMCPCompany: Creating General-purpose Agents with Task-specific Tools
Cover image for TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

Comments
1 min read
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
Cover image for ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

Comments
1 min read
OmniNWM: Omniscient Driving Navigation World Models
Cover image for OmniNWM: Omniscient Driving Navigation World Models

OmniNWM: Omniscient Driving Navigation World Models

Comments
1 min read
Are they lovers or friends? Evaluating LLMs' Social Reasoning in English andKorean Dialogues
Cover image for Are they lovers or friends? Evaluating LLMs' Social Reasoning in English andKorean Dialogues

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English andKorean Dialogues

Comments
1 min read
KORE: Enhancing Knowledge Injection for Large Multimodal Models viaKnowledge-Oriented Augmentations and Constraints
Cover image for KORE: Enhancing Knowledge Injection for Large Multimodal Models viaKnowledge-Oriented Augmentations and Constraints

KORE: Enhancing Knowledge Injection for Large Multimodal Models viaKnowledge-Oriented Augmentations and Constraints

Comments
1 min read
Directional Reasoning Injection for Fine-Tuning MLLMs
Cover image for Directional Reasoning Injection for Fine-Tuning MLLMs

Directional Reasoning Injection for Fine-Tuning MLLMs

Comments
1 min read
FinSight: Towards Real-World Financial Deep Research
Cover image for FinSight: Towards Real-World Financial Deep Research

FinSight: Towards Real-World Financial Deep Research

Comments
1 min read
Decomposed Attention Fusion in MLLMs for Training-Free Video ReasoningSegmentation
Cover image for Decomposed Attention Fusion in MLLMs for Training-Free Video ReasoningSegmentation

Decomposed Attention Fusion in MLLMs for Training-Free Video ReasoningSegmentation

Comments
1 min read
olmOCR 2: Unit Test Rewards for Document OCR
Cover image for olmOCR 2: Unit Test Rewards for Document OCR

olmOCR 2: Unit Test Rewards for Document OCR

Comments
1 min read
Unified Reinforcement and Imitation Learning for Vision-Language Models
Cover image for Unified Reinforcement and Imitation Learning for Vision-Language Models

Unified Reinforcement and Imitation Learning for Vision-Language Models

Comments
1 min read
Attention Sinks in Diffusion Language Models
Cover image for Attention Sinks in Diffusion Language Models

Attention Sinks in Diffusion Language Models

Comments
1 min read
Language Models are Injective and Hence Invertible
Cover image for Language Models are Injective and Hence Invertible

Language Models are Injective and Hence Invertible

Comments
1 min read
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
Cover image for Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Comments
1 min read
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos
Cover image for VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Comments
1 min read
Towards Faithful and Controllable Personalization via Critique-Post-EditReinforcement Learning
Cover image for Towards Faithful and Controllable Personalization via Critique-Post-EditReinforcement Learning

Towards Faithful and Controllable Personalization via Critique-Post-EditReinforcement Learning

Comments
2 min read
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1
Cover image for Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

Comments
1 min read
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
Cover image for ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

Comments
1 min read
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Cover image for GigaBrain-0: A World Model-Powered Vision-Language-Action Model

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Comments
1 min read
DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile PhoneAgents
Cover image for DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile PhoneAgents

DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile PhoneAgents

Comments
1 min read
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced PolicyOptimization with Adaptive Clipping
Cover image for BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced PolicyOptimization with Adaptive Clipping

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced PolicyOptimization with Adaptive Clipping

Comments
1 min read
Every Attention Matters: An Efficient Hybrid Architecture for Long-ContextReasoning
Cover image for Every Attention Matters: An Efficient Hybrid Architecture for Long-ContextReasoning

Every Attention Matters: An Efficient Hybrid Architecture for Long-ContextReasoning

Comments
1 min read
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Cover image for LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Comments
1 min read
When Correct Is Not Safe: Can We Trust Functionally Correct Patches Generatedby Code Agents?
Cover image for When Correct Is Not Safe: Can We Trust Functionally Correct Patches Generatedby Code Agents?

When Correct Is Not Safe: Can We Trust Functionally Correct Patches Generatedby Code Agents?

Comments
1 min read
Pruning Overparameterized Multi-Task Networks for Degraded Web Image Restoration
Cover image for Pruning Overparameterized Multi-Task Networks for Degraded Web Image Restoration

Pruning Overparameterized Multi-Task Networks for Degraded Web Image Restoration

Comments
1 min read
PokeeResearch: Effective Deep Research via Reinforcement Learning from AIFeedback and Robust Reasoning Scaffold
Cover image for PokeeResearch: Effective Deep Research via Reinforcement Learning from AIFeedback and Robust Reasoning Scaffold

PokeeResearch: Effective Deep Research via Reinforcement Learning from AIFeedback and Robust Reasoning Scaffold

Comments
2 min read
Static Sandboxes Are Inadequate: Modeling Societal Complexity RequiresOpen-Ended Co-Evolution in LLM-Based Multi-Agent Simulatio
Cover image for Static Sandboxes Are Inadequate: Modeling Societal Complexity RequiresOpen-Ended Co-Evolution in LLM-Based Multi-Agent Simulatio

Static Sandboxes Are Inadequate: Modeling Societal Complexity RequiresOpen-Ended Co-Evolution in LLM-Based Multi-Agent Simulatio

Comments
1 min read
Predicting the Unpredictable: Reproducible BiLSTM Forecasting of Incident Countsin the Global Terrorism Database (GTD)
Cover image for Predicting the Unpredictable: Reproducible BiLSTM Forecasting of Incident Countsin the Global Terrorism Database (GTD)

Predicting the Unpredictable: Reproducible BiLSTM Forecasting of Incident Countsin the Global Terrorism Database (GTD)

Comments
1 min read
Unimedvl: Unifying Medical Multimodal Understanding And Generation ThroughObservation-Knowledge-Analysis
Cover image for Unimedvl: Unifying Medical Multimodal Understanding And Generation ThroughObservation-Knowledge-Analysis

Unimedvl: Unifying Medical Multimodal Understanding And Generation ThroughObservation-Knowledge-Analysis

Comments
1 min read
Planned Diffusion
Cover image for Planned Diffusion

Planned Diffusion

Comments
1 min read
Expanding the Action Space of LLMs to Reason Beyond Language
Cover image for Expanding the Action Space of LLMs to Reason Beyond Language

Expanding the Action Space of LLMs to Reason Beyond Language

Comments
1 min read
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Cover image for Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth

Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth

Comments
1 min read
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from LimitedViews
Cover image for Think with 3D: Geometric Imagination Grounded Spatial Reasoning from LimitedViews

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from LimitedViews

Comments
1 min read
DeepSeek-OCR: Contexts Optical Compression
Cover image for DeepSeek-OCR: Contexts Optical Compression

DeepSeek-OCR: Contexts Optical Compression

Comments
1 min read
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-TranslationSolution
Cover image for Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-TranslationSolution

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-TranslationSolution

Comments
1 min read
GAS: Improving Discretization of Diffusion ODEs via Generalized AdversarialSolver
Cover image for GAS: Improving Discretization of Diffusion ODEs via Generalized AdversarialSolver

GAS: Improving Discretization of Diffusion ODEs via Generalized AdversarialSolver

Comments
1 min read
Efficient Long-context Language Model Training by Core Attention Disaggregation
Cover image for Efficient Long-context Language Model Training by Core Attention Disaggregation

Efficient Long-context Language Model Training by Core Attention Disaggregation

Comments
1 min read
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning
Cover image for EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning

EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning

Comments
1 min read
Extracting alignment data in open models
Cover image for Extracting alignment data in open models

Extracting alignment data in open models

Comments
1 min read
AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement LearningFramework for Stock Trading
Cover image for AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement LearningFramework for Stock Trading

AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement LearningFramework for Stock Trading

Comments
1 min read
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Cover image for PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies

PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies

Comments
1 min read
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposureMonocular Videos
Cover image for Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposureMonocular Videos

Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposureMonocular Videos

Comments
2 min read
Video Reasoning without Training
Cover image for Video Reasoning without Training

Video Reasoning without Training

Comments
1 min read
loading...