Ayush kumar for NodeShift

Posted on May 24

Claude 4: Opus vs Sonnet, Benchmarks, and Dev Workflow with Claude Code

#claude4 #ai #llm

Today, Anthropic unveiled Claude Opus 4 and Claude Sonnet 4, redefining what’s possible in software engineering, coding precision, and tool-based thinking. Claude Opus 4 stands out as the most advanced model for developers, consistently delivering top-tier results on long, uninterrupted workflows. With a commanding 72.5% on SWE-bench and 43.2% on Terminal-bench, it handles hours-long, multi-step challenges with a level of consistency that was previously out of reach. Claude Sonnet 4, meanwhile, offers a well-balanced upgrade from 3.7, achieving a standout 72.7% SWE-bench score and offering sharper reasoning, better code navigation, and more accurate responses across coding scenarios.

These models aren’t just faster—they’re smarter, more focused, and more practical in real-world applications. Developers can now pair these tools with VS Code and JetBrains for seamless background execution, GitHub integrations, and native code suggestions. With parallel tool execution, precise instruction following, and long-term memory through file-based context, Claude 4 models introduce a powerful shift in how people build and reason through technical problems.

Resource

GitHub
Link: https://github.com/anthropics/claude-code

Claude 4 models deliver strong performance across coding, reasoning, multimodal capabilities, and agentic tasks. See appendix for more on methodology.

Claude 4 models lead on SWE-bench Verified, a benchmark for performance on real software engineering tasks. See appendix for more on methodology.

How Claude 4 Sets New Standards in Performance Benchmarks

The performance results shared for Claude Opus 4 and Claude Sonnet 4 reflect a rigorous and transparent evaluation process designed to mirror real-world usage. Both models were tested across a blend of immediate-response tasks and extended thinking challenges involving deeper reasoning over longer contexts—up to 64,000 tokens. For coding-specific benchmarks like SWE-bench Verified and Terminal-bench, the models worked without extended thinking, operating under tightly scoped single-attempt conditions with two core tools: a bash shell and a string-based file editor. Claude 4 models set new highs in these tasks using only 500 problems, while OpenAI’s scores reflect a slightly smaller 477-task subset.

For extended thinking benchmarks—like GPQA Diamond, TAU-bench, MMMLU, and AIME—performance surged when the models were encouraged to reason step-by-step using tool feedback and parallel workflows. Notably, TAU-bench scores were gathered with longer sequences and additional step capacity, allowing the models to better plan, reason, and refine their outputs through iterative completions. For high-compute results, multiple completions were sampled, regression-breaking patches were filtered out, and the most effective responses were selected through internal review—leading to peak scores of 79.4% for Opus 4 and 80.2% for Sonnet 4. These scores don’t just represent raw accuracy—they reflect a shift in how complex software and reasoning tasks are approached at scale.

Claude Opus 4 — Built for Depth, Focus, and Endurance

Claude Opus 4 represents a major leap forward in building digital systems that can handle deep, uninterrupted thinking. Designed for complex, high-stakes work, it excels at tasks that demand multiple steps, structured logic, and long attention spans. Whether it’s a seven-hour engineering workflow, a legal audit across thousands of documents, or building systems that need to remember and evolve over time—Opus 4 stays locked in, delivering results with clarity, structure, and stamina. It’s not just fast; it’s deliberate, organized, and capable of picking up where it left off. With built-in memory capabilities and precision reasoning, Opus 4 unlocks new workflows where sustained effort matters.

Where Claude Opus 4 Shines:

Large-Scale Development Tasks Refactor complex codebases, migrate architectures, or build out full-stack systems from scratch with reliable flow and structure.
Process Automation for Knowledge Work Set up digital workflows to handle multi-step processes like legal research, compliance audits, or financial reporting reviews.
Research with Recall Analyze scattered documents—think whitepapers, case files, or filings—and bring structure to unstructured data over many sessions.
Persistent Digital Collaborators Build tools that remember what happened last week, summarize what’s changed, and help teams stay aligned across long-term projects.
Crafting Long-Form Content with Precision Write whitepapers, detailed documentation, or thoughtful strategy memos with coherence and fluency across several pages.

Claude Sonnet 4 — Fast, Reliable Thinking for Daily Ops and Scalable Workflows

Claude Sonnet 4 is built for high-speed, high-volume tasks—ideal for businesses that need clarity, consistency, and responsiveness at scale. It delivers strong reasoning and crisp output without sacrificing speed, making it a perfect fit for real-time interactions and workflow automation. Whether you’re building systems that need to respond instantly to users or engines that process large volumes of content, Sonnet 4 is tuned for performance under pressure. It’s efficient, scalable, and ready to plug into fast-moving operations—whether in customer service, dev teams, or enterprise strategy.

Where Claude Sonnet 4 Excels:

Real-Time Digital Support Power chat-based customer experiences, onboarding flows, or internal tools that deliver quick, reliable answers every time.
Agile Development Help Speed up code reviews, squash bugs, and wire up APIs with near-instant responses and accurate suggestions.
Rapid Insights & Analysis Scan through dashboards, trends, or competitor reports and get distilled summaries that save hours of manual digging.
Mass Content Workflows Create, format, and analyze everything from campaign assets to survey responses—at scale, without sacrificing quality.

Step-by-Step Process to Install Anthropic Claude Code Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x RTXA6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy Claude Code on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install Claude Code on your GPU Node.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Step 8: Install Node.Js

Run the following command to install Node.js:

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

Step 9: Confirm Installation

Run the following command to confirm installation:

node -v
npm -v

You should see versions like:

v20.12.2
10.x.x

Step 10: Install Claude Code

Run the following command to install claude code:

npm install -g @anthropic-ai/claude-code

Step 11: Launch It in Terminal

Run the following command to launch the claude code:
claude

Step 12: Connect to your GPU VM using Remote SSH

Open VS Code on your Mac.
Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.
Select your configured host (claude-vm).
Once connected, you’ll see SSH: 116.127.115.18 in the bottom-left status bar (like in the image).

Step 13: Claude Code Initial Launch in VS Code Terminal

Run Claude Code from the terminal in VS Code.

Execute the following command to run Claude Code from the terminal in VS Code:
claude

This will launch the Claude Code interface.
You’ll be prompted to select your preferred terminal theme.
Pick 1. Dark mode (recommended for most devs).

Step 14: Claude Code Welcome Banner

Claude prints a large welcome message.
It confirms that you’ve launched Claude Code in your terminal.
This indicates you’re running on a fresh install or after /terminal-setup.

Step 15: Choose Login Method

Authenticate Claude Code usage

Claude now supports two authentication methods:

Anthropic Console (API key billing)
Claude app login (for Max subscription users) Choose the one that matches your access. If you’re using Claude for free via Max, go with 2.

Step 16: Login Successful

Authenticate and connect Claude Code with your account

You’ve logged in.
This screen confirms successful login to the Claude service.
Press Enter to continue setup.

Step 17: Claude Code IDE Integration + Startup Confirmation

Claude is now fully embedded in VS Code

This screen confirms:

The Claude Code VS Code extension is live (v1.0.2)
You can:
- Press Cmd + Esc to launch Claude Code input bar
- Apply file diffs right in the editor
- Use Ctrl + Alt + K to insert file references You’ve now completed terminal setup + IDE connection!

What You Can Do from Here

Use Claude as your pair programmer. Try:

/init                            # Initializes CLAUDE.md config
claude -p "Write a unit test for login.js"
claude -p "Summarize the purpose of this repo"
claude -p "Optimize this loop using Python best practices"

Conclusion

You’re all set. Claude Code is live, running smoothly on your GPU VM, and ready to dive deep into your projects. Whether it’s writing, reviewing, or refactoring code—this setup helps you stay in flow and ship faster.

Just open your terminal or VS Code and run:
claude

Top comments (1)

Alejandro Villamarin • Jul 15

I wonder why would you want to install Claude Code in a VM? What do you get vs installing it in your laptop? Am I missing something? Is not like you're pointing Claude Code to a local model running in that VM...that would make more sense, you're just proxying here nope?