ever9998

Alex Chen – Building lean AI infrastructure. I run a small inference API service powered by vLLM + RTX 3090. My 7B model (Llama 3, Qwen 2.5) delivers ~30 tokens/s throughput at a fraction of OpenAI's

Joined on Apr 28, 2026