DEV Community

Cover image for Claude 4: Opus vs Sonnet, Benchmarks, and Dev Workflow with Claude Code
Ayush kumar for NodeShift

Posted on

2 1 1 1 1

Claude 4: Opus vs Sonnet, Benchmarks, and Dev Workflow with Claude Code

Image description

Today, Anthropic unveiled Claude Opus 4 and Claude Sonnet 4, redefining what’s possible in software engineering, coding precision, and tool-based thinking. Claude Opus 4 stands out as the most advanced model for developers, consistently delivering top-tier results on long, uninterrupted workflows. With a commanding 72.5% on SWE-bench and 43.2% on Terminal-bench, it handles hours-long, multi-step challenges with a level of consistency that was previously out of reach. Claude Sonnet 4, meanwhile, offers a well-balanced upgrade from 3.7, achieving a standout 72.7% SWE-bench score and offering sharper reasoning, better code navigation, and more accurate responses across coding scenarios.

These models aren’t just faster—they’re smarter, more focused, and more practical in real-world applications. Developers can now pair these tools with VS Code and JetBrains for seamless background execution, GitHub integrations, and native code suggestions. With parallel tool execution, precise instruction following, and long-term memory through file-based context, Claude 4 models introduce a powerful shift in how people build and reason through technical problems.

Resource

GitHub
Link: https://github.com/anthropics/claude-code

Claude 4 models deliver strong performance across coding, reasoning, multimodal capabilities, and agentic tasks. See appendix for more on methodology.
Image description

Claude 4 models lead on SWE-bench Verified, a benchmark for performance on real software engineering tasks. See appendix for more on methodology.
Image description

How Claude 4 Sets New Standards in Performance Benchmarks

The performance results shared for Claude Opus 4 and Claude Sonnet 4 reflect a rigorous and transparent evaluation process designed to mirror real-world usage. Both models were tested across a blend of immediate-response tasks and extended thinking challenges involving deeper reasoning over longer contexts—up to 64,000 tokens. For coding-specific benchmarks like SWE-bench Verified and Terminal-bench, the models worked without extended thinking, operating under tightly scoped single-attempt conditions with two core tools: a bash shell and a string-based file editor. Claude 4 models set new highs in these tasks using only 500 problems, while OpenAI’s scores reflect a slightly smaller 477-task subset.

For extended thinking benchmarks—like GPQA Diamond, TAU-bench, MMMLU, and AIME—performance surged when the models were encouraged to reason step-by-step using tool feedback and parallel workflows. Notably, TAU-bench scores were gathered with longer sequences and additional step capacity, allowing the models to better plan, reason, and refine their outputs through iterative completions. For high-compute results, multiple completions were sampled, regression-breaking patches were filtered out, and the most effective responses were selected through internal review—leading to peak scores of 79.4% for Opus 4 and 80.2% for Sonnet 4. These scores don’t just represent raw accuracy—they reflect a shift in how complex software and reasoning tasks are approached at scale.

Claude Opus 4 — Built for Depth, Focus, and Endurance

Claude Opus 4 represents a major leap forward in building digital systems that can handle deep, uninterrupted thinking. Designed for complex, high-stakes work, it excels at tasks that demand multiple steps, structured logic, and long attention spans. Whether it’s a seven-hour engineering workflow, a legal audit across thousands of documents, or building systems that need to remember and evolve over time—Opus 4 stays locked in, delivering results with clarity, structure, and stamina. It’s not just fast; it’s deliberate, organized, and capable of picking up where it left off. With built-in memory capabilities and precision reasoning, Opus 4 unlocks new workflows where sustained effort matters.

Where Claude Opus 4 Shines:

  • Large-Scale Development Tasks Refactor complex codebases, migrate architectures, or build out full-stack systems from scratch with reliable flow and structure.
  • Process Automation for Knowledge Work Set up digital workflows to handle multi-step processes like legal research, compliance audits, or financial reporting reviews.
  • Research with Recall Analyze scattered documents—think whitepapers, case files, or filings—and bring structure to unstructured data over many sessions.
  • Persistent Digital Collaborators Build tools that remember what happened last week, summarize what’s changed, and help teams stay aligned across long-term projects.
  • Crafting Long-Form Content with Precision Write whitepapers, detailed documentation, or thoughtful strategy memos with coherence and fluency across several pages.

Claude Sonnet 4 — Fast, Reliable Thinking for Daily Ops and Scalable Workflows

Claude Sonnet 4 is built for high-speed, high-volume tasks—ideal for businesses that need clarity, consistency, and responsiveness at scale. It delivers strong reasoning and crisp output without sacrificing speed, making it a perfect fit for real-time interactions and workflow automation. Whether you’re building systems that need to respond instantly to users or engines that process large volumes of content, Sonnet 4 is tuned for performance under pressure. It’s efficient, scalable, and ready to plug into fast-moving operations—whether in customer service, dev teams, or enterprise strategy.

Where Claude Sonnet 4 Excels:

  • Real-Time Digital Support Power chat-based customer experiences, onboarding flows, or internal tools that deliver quick, reliable answers every time.
  • Agile Development Help Speed up code reviews, squash bugs, and wire up APIs with near-instant responses and accurate suggestions.
  • Rapid Insights & Analysis Scan through dashboards, trends, or competitor reports and get distilled summaries that save hours of manual digging.
  • Mass Content Workflows Create, format, and analyze everything from campaign assets to survey responses—at scale, without sacrificing quality.

Step-by-Step Process to Install Anthropic Claude Code Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.
Image description

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Image description

Image description
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
Image description

Image description
We will use 1 x RTXA6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Image description

Step 5: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy Claude Code on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install Claude Code on your GPU Node.
Image description
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Image description

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.
Image description

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Image description

Image description
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Image description

Step 8: Install Node.Js

Run the following command to install Node.js:

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

Enter fullscreen mode Exit fullscreen mode

Image description

Step 9: Confirm Installation

Run the following command to confirm installation:

node -v
npm -v
Enter fullscreen mode Exit fullscreen mode

You should see versions like:

v20.12.2
10.x.x
Enter fullscreen mode Exit fullscreen mode

Image description

Step 10: Install Claude Code

Run the following command to install claude code:

npm install -g @anthropic-ai/claude-code

Enter fullscreen mode Exit fullscreen mode

Image description

Step 11: Launch It in Terminal

Run the following command to launch the claude code:
claude

Image description

Image description

Step 12: Connect to your GPU VM using Remote SSH

  • Open VS Code on your Mac.
  • Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.
  • Select your configured host (claude-vm).
  • Once connected, you’ll see SSH: 116.127.115.18 in the bottom-left status bar (like in the image). Image description

Step 13: Claude Code Initial Launch in VS Code Terminal

Run Claude Code from the terminal in VS Code.

Execute the following command to run Claude Code from the terminal in VS Code:
claude

  • This will launch the Claude Code interface.
  • You’ll be prompted to select your preferred terminal theme.
  • Pick 1. Dark mode (recommended for most devs). Image description

Step 14: Claude Code Welcome Banner

  • Claude prints a large welcome message.
  • It confirms that you’ve launched Claude Code in your terminal.
  • This indicates you’re running on a fresh install or after /terminal-setup. Image description

Step 15: Choose Login Method

Authenticate Claude Code usage

Claude now supports two authentication methods:

  • Anthropic Console (API key billing)
  • Claude app login (for Max subscription users) Choose the one that matches your access. If you’re using Claude for free via Max, go with 2. Image description

Step 16: Login Successful

Authenticate and connect Claude Code with your account

You’ve logged in.
This screen confirms successful login to the Claude service.
Press Enter to continue setup.
Image description

Image description

Step 17: Claude Code IDE Integration + Startup Confirmation

Claude is now fully embedded in VS Code

This screen confirms:

  • The Claude Code VS Code extension is live (v1.0.2)
  • You can:
    • Press Cmd + Esc to launch Claude Code input bar
    • Apply file diffs right in the editor
    • Use Ctrl + Alt + K to insert file references You’ve now completed terminal setup + IDE connection! Image description

What You Can Do from Here

Use Claude as your pair programmer. Try:

/init                            # Initializes CLAUDE.md config
claude -p "Write a unit test for login.js"
claude -p "Summarize the purpose of this repo"
claude -p "Optimize this loop using Python best practices"

Enter fullscreen mode Exit fullscreen mode

Conclusion

You’re all set. Claude Code is live, running smoothly on your GPU VM, and ready to dive deep into your projects. Whether it’s writing, reviewing, or refactoring code—this setup helps you stay in flow and ship faster.

Just open your terminal or VS Code and run:
claude

Heroku

Built for developers, by developers.

Whether you're building a simple prototype or a business-critical product, Heroku's fully-managed platform gives you the simplest path to delivering apps quickly — using the tools and languages you already love!

Learn More

Top comments (0)

Real Talk: Realistic Voice AI with ElevenLabs

ElevenLabs is joining us to talk about how to power your applications with lifelike speech. Learn how to use ElevenLabs to enhance user interactions, build low-latency conversational agents, and tap into one of the leading AI voice generators.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️