Forem

Data Science

Data Science allows us to extract meaning from and interpret data.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

1
Comments
4 min read
Easy Problems That LLMs Get Wrong

Easy Problems That LLMs Get Wrong

6
Comments
3 min read
Kotlin ML Pack: Technical Report

Kotlin ML Pack: Technical Report

Comments
4 min read
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

Comments
4 min read
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B

BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B

Comments
4 min read
Is Complexity an Illusion?

Is Complexity an Illusion?

Comments
2 min read
AnyLoss: Transforming Classification Metrics into Loss Functions

AnyLoss: Transforming Classification Metrics into Loss Functions

Comments
4 min read
The rising costs of training frontier AI models

The rising costs of training frontier AI models

Comments
5 min read
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations

Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations

Comments
4 min read
The Road Less Scheduled

The Road Less Scheduled

Comments
4 min read
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Comments
3 min read
Look Once to Hear: Target Speech Hearing with Noisy Examples

Look Once to Hear: Target Speech Hearing with Noisy Examples

Comments
5 min read
NPGA: Neural Parametric Gaussian Avatars

NPGA: Neural Parametric Gaussian Avatars

2
Comments
3 min read
MoEUT: Mixture-of-Experts Universal Transformers

MoEUT: Mixture-of-Experts Universal Transformers

Comments
3 min read
PlaceFormer: Transformer-based Visual Place Recognition using Multi-Scale Patch Selection and Fusion

PlaceFormer: Transformer-based Visual Place Recognition using Multi-Scale Patch Selection and Fusion

Comments
3 min read
Towards Lightweight Super-Resolution with Dual Regression Learning

Towards Lightweight Super-Resolution with Dual Regression Learning

Comments
3 min read
Neural Network Parameter Diffusion

Neural Network Parameter Diffusion

Comments
4 min read
Diffusion On Syntax Trees For Program Synthesis

Diffusion On Syntax Trees For Program Synthesis

2
Comments
4 min read
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

Comments
4 min read
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Comments
4 min read
Formalizing and Benchmarking Prompt Injection Attacks and Defenses

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

2
Comments
4 min read
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Comments
4 min read
Large Language Models Can Self-Improve At Web Agent Tasks

Large Language Models Can Self-Improve At Web Agent Tasks

Comments
3 min read
There and Back Again: The AI Alignment Paradox

There and Back Again: The AI Alignment Paradox

Comments
4 min read
LLMs achieve adult human performance on higher-order theory of mind tasks

LLMs achieve adult human performance on higher-order theory of mind tasks

Comments
4 min read
loading...