ever9998
Alex Chen – Building lean AI infrastructure. I run a small inference API service powered by vLLM + RTX 3090. My 7B model (Llama 3, Qwen 2.5) delivers ~30 tokens/s throughput at a fraction of OpenAI's
Alex Chen – Building lean AI infrastructure. I run a small inference API service powered by vLLM + RTX 3090. My 7B model (Llama 3, Qwen 2.5) delivers ~30 tokens/s throughput at a fraction of OpenAI's