Forem: Sabita kumari

50 Underrated Claude Tips Nobody Talks About

Sabita kumari — Sun, 12 Apr 2026 21:50:33 +0000

Practical advice for Claude power users — Claude Code, Cowork, prompting hacks, hidden Anthropic resources, and more.

🧠 General Claude Tips

50. CLAUDE.MD Files
Create a markdown file in your project root with project rules, preferences, and context. Claude reads this automatically every session. Most users don't know this exists.

49. Settings → Privacy → Export Data
A simple, quick way to export your Claude data/projects/chats into other LLMs. Great for switching between ChatGPT and Claude.

48. Settings → Connectors
This is how you prompt your favorite apps. Google Calendar & Notion connectors are game-changers for daily workflows.

47. Project Memory
Inside Claude projects, you'll find a "memory" tab. Here you can edit/revise exactly what you want Claude to remember.

46. Expert UI Design
When you see a UI design you like, take a screenshot and prompt Claude to create a Skill that will replicate the exact design schemes.

45. Avoid Claude Rambling
In any chat, click "+" → "Style" and set Claude to "Concise."

44. Haiku 4.5
Claude's latest and fastest model — the best Claude model for daily tasks, especially great in the Chrome extension for quick replies.

43. Claude Chrome Extension
Most people don't know this exists. An easy way to have Claude live in your sidebar and take control of your browser for tasks.

42. PDF Creation
Opus/Sonnet are hands-down the best and most consistent LLMs for PDF generation — great for project context files, converting images/text to PDF, and more.

41. Claude App
Download the Claude Mac app. It adds Claude to your top menu bar for instant access at any time — and gives you access to Cowork if you're on a Pro or higher plan.

💻 Claude Code

40. Plan Mode
Shift + Tab to enter Plan mode. This is the mode you want for planning projects before writing a single line of code.

39. /doctor Command
/doctor diagnoses environment issues, missing dependencies, and configuration problems. Run this first when things break.

38. Claude Code Best Practices Guide
Anthropic's official guide: claude-code-best-practices

37. Learn / Commands
Highly recommended — slash commands are underused. Reference repo: wshobson/commands

36. Solve Step by Step
Claude is good at long tasks, but not perfect. Go step-by-step for every build.

35. Rules File
Create a .md file with "rules" you want your coding partner to follow every session.

34. /compact Command
Learn to run /compact manually to keep your context clean without burning tokens.

33. MCP Connections
Learn the ins and outs of MCP. If Claude can't do something natively, an MCP connection almost certainly can help.

32. Claude Skills
One of the most underrated features. Skills marketplace with 60,000+ options: skillsmp.com

31. Claude Code for Everyone
A course that teaches you Claude Code in Claude Code: ccforeveryone.com

🪄 Prompting Hacks

30. The Best Prompt Format
Claude responds very well to this structure:

[Role] + [Task] + [Context] + [Constraints] + [Question]

29. Notion Prompt Library
Set up a "Prompt Library" database in Notion via the connector. Store your most-used prompts and connect them to other LLMs too.

28. Extended Thinking
Enable Extended Thinking for all advanced workflows and complex tasks.

27. "Think First" Pattern
Add this to all your project instructions: "Think deeply about all requests before responding."

26. Role Assignment
Basic, but it works every time. "You are a [insert role]."

25. Anthropic Prompt Library
Official prompts that just work: Prompt Library

24. Prompting Best Practices for Claude 4
Anthropic's official guide to prompting the latest Claude models: Claude 4 Best Practices

23. JSON / XML / Markdown
All great formats for structured, effective prompting.

22. Interactive Prompt Maker
A public Claude artifact that helps you optimise your prompts: Try it here

21. The Hard Stop
An easy way to manage your Claude usage limits. Tell Claude: "Hard stop when ___."

🖥️ Claude Cowork

20. Desktop Cleanse
Use Claude Cowork to do a deep clean of your entire desktop — with full folder access, it can reorganize everything systematically.

19. Batch Rename
Rename files with consistent patterns like YYYY-MM-DD formatting across entire folders in seconds.

18. Plan First, Execute Second
Cowork burns through credits. Use standard chat with Sonnet to plan, then switch to Cowork to execute.

17. Isolate Access with Folders
Create a dedicated "Cowork" folder and grant Claude access only to that directory — not your entire Documents folder.

16. Clean Up Safely
If using Cowork to delete files, always end your prompt with: "Confirm with me before deleting."

15. Claude in Chrome
Enable this to let Claude handle browser-based work via Claude Desktop.

14. Multiple Parallel Threads
Cowork can run multiple tasks simultaneously — start a research project, a document formatting job, and a data analysis task all in parallel.

13. Always Back Up First
Claude Cowork can take destructive actions. Always back up critical data before running the agent.

12. Teach Skills by Recording
Cowork can learn through demonstration. Record your browser to teach Claude custom workflows.

11. Cowork Safety Guide
Read this before using Cowork seriously: Using Cowork Safely

✅ Best Practices & Final Tips

10. Follow Boris' (Claude Code Creator) Setup
Boris Cherny, the creator of Claude Code, has shared his personal Claude setup. Worth studying.

9. CLAUDE.MD Guide
An excellent guide on how to craft effective CLAUDE.MD files: humanlayer.dev

8. Claude Subreddits
Stay current with the community: r/Claude · r/ClaudeCode · r/Anthropic

7. Use Google AI Studio for Prototyping
Claude burns tokens fast while coding. If you're in prototype mode, use Google AI Studio first — switch to Claude once most of the code is written.

6. Clear Context After 3+ Failures
If Claude is confused after 3+ failed attempts, clear context and re-explain. Otherwise, you're just burning tokens.

5. Session Logs / Claude Journal
Keep a "Claude Journal" — review past sessions to see what prompts worked. This is how you improve over time.

4. Ralph Plugin
A tool that enables running Claude in an autonomous loop — great for long-running tasks.

3. Get on a Paid Plan
If there's one AI tool worth paying for right now, it's Claude. Get on a paid plan while pricing is still reasonable.

2. Voice → ChatGPT → Claude
Voice prompt through ChatGPT, then export the transcript to Claude. This method is exponentially better than shallow text prompting and avoids wasting tokens on vague inputs.

1. Model Stacking
Claude works best when stacked with other models. Use Grok to scrape X, then import to Claude. Use AI Studio to prototype, then refine in Claude. Combine the strengths of multiple tools to get your best outputs.

Which of these tips was new to you? Drop a comment below — I'd love to know which ones land! 🚀

I'm Sabita, an AI/ML engineer writing about LangGraph, RAG systems, and everything I learn while building with AI. Follow along if that's your thing.

Large Language Models, Explained Like You're a Curious Human

Sabita kumari — Fri, 10 Apr 2026 18:38:39 +0000

Everything you need to know about how ChatGPT-style AI actually works.

What Actually Is a Large Language Model?

Strip away the hype and an LLM is surprisingly simple in structure. It boils down to two files sitting on a hard drive:

A very large file of numbers — these are the "parameters" (or weights) of the neural network. Think of them as billions of tiny dials that have been carefully tuned.
A small file of code — this is the algorithm that reads those numbers and actually produces text. It can be as short as ~500 lines of C code.

That's it. Meta's Llama 2 70B model, for example, is a 140 GB parameter file plus a tiny run script. Together, they can run on a regular MacBook — no internet needed.

┌─────────────────────────────┐     ┌─────────────────────────┐
│     📦 Parameters File      │     │      ⚙️ Run Code         │
│                             │  +  │                         │
│    140 GB of numbers        │     │   ~500 lines of C       │
│  Billions of tiny "dials"   │     │  The "engine" that      │
│  encoding world knowledge   │     │  reads the dials        │
└─────────────────────────────┘     └─────────────────────────┘

🧠 Everyday Analogy: Imagine a piano with 70 billion keys. The parameters file is a sheet of music that tells you exactly how hard to press each key. The run code is the pianist who reads the sheet and plays. Together, they produce language.

How Is an LLM "Trained"?

Training is the expensive, one-time process of figuring out the right value for every single parameter. Here's the recipe for a model like Llama 2 70B:

Training Recipe:
📚 ~10 TB of internet text (books, articles, code, forums…)
🖥️ A cluster of ~6,000 GPUs running for ~12 days
💰 Roughly $2 million in compute costs

During training, the model is given a sentence with one word missing and asked: "What comes next?" It guesses, checks the real answer, and adjusts its billions of dials slightly. Repeat this trillions of times, and the model becomes remarkably good at predicting the next word — and in doing so, it absorbs enormous amounts of factual knowledge, grammar, reasoning patterns, and even a bit of common sense.

┌──────────────┐        ┌──────────────────┐        ┌─────────────────┐
│   🌐 Internet │ -----> │   🔥 Training     │ -----> │  🧠 Trained      │
│              │        │                  │        │     Model       │
│  ~10 TB text │        │  6,000 GPUs      │        │  140 GB params  │
│              │        │  12 days · $2M   │        │  Compressed     │
│              │        │  "Predict next   │        │  knowledge      │
│              │        │   word"          │        │                 │
└──────────────┘        └──────────────────┘        └─────────────────┘

        Lossy compression: 10 TB of knowledge → 140 GB of parameters

🧠 Everyday Analogy: Think of training like a student reading the entire internet and taking the world's longest fill-in-the-blank exam. By forcing itself to predict missing words, it absorbs facts, writing styles, logic, and languages. The process is a lossy compression — like squeezing a library into a zip file. Most of the knowledge is kept, but some details get lost or garbled.

The Three Stages of Building an AI Assistant

A freshly trained model isn't ready to be a helpful chatbot. It goes through up to three stages to become the assistant you know from products like ChatGPT or Claude.

Stage 1 — Pre-training

The model reads a massive chunk of the internet and learns to predict the next word. At this point it's like a very well-read parrot: it can generate text that sounds like the internet, but it doesn't know how to hold a conversation. Ask it a question and it might just generate more questions, or make up a fake Wikipedia article. This is what people call "hallucination" — the model is dreaming plausible-sounding text rather than answering you.

Stage 2 — Fine-tuning (Alignment)

Human labelers write thousands of ideal question-and-answer pairs. The model is then trained on this curated dataset, teaching it to behave like a helpful assistant: answer directly, refuse harmful requests, and follow instructions. Think of it as finishing school for the parrot — it learns manners and format.

Stage 3 — RLHF (Optional Polish)

Reinforcement Learning from Human Feedback. Humans are shown two or more model answers and asked "which is better?" These preferences are used to further nudge the model toward responses people actually prefer. It's like letting a restaurant taste-tester rank dishes so the chef improves over time.

┌─────────────────┐       ┌─────────────────┐       ┌─────────────────┐
│  1️⃣ Pre-training │ ----> │  2️⃣ Fine-tuning  │ ----> │  3️⃣ RLHF         │
│                 │       │                 │       │                 │
│ Reads the       │       │ Learns Q&A      │       │ Human           │
│ internet        │       │ format          │       │ preferences     │
│ → Base model    │       │ → Assistant     │       │ → Polished      │
│                 │       │   model         │       │   assistant     │
└─────────────────┘       └─────────────────┘       └─────────────────┘

              Each stage builds on the previous one

Scaling Laws: Bigger = Smarter (Predictably)

One of the most surprising discoveries in AI is that LLM performance follows predictable scaling laws. There are two main knobs you can turn:

N — the number of parameters (model size)
D — the amount of training data

Crank either one up, and the model's ability to predict the next word improves in a smooth, predictable curve. And because next-word prediction accuracy correlates with reasoning ability, the model gets better at all sorts of tasks — math, coding, history, common sense — almost "for free," without being specifically taught those skills.

🧠 Everyday Analogy: It's like a student who reads more books and has a bigger brain — they get better at everything, not just one subject. Double the books and brain size, and you can predict roughly how much smarter they'll get.

Tool Use & Multimodality: LLMs Learn to Use Tools

Modern LLMs aren't limited to text-in, text-out. They're gaining abilities that make them feel more like capable assistants:

🌐 Web browsing — searching for up-to-date information
🧮 Calculator / code interpreter — running Python to crunch numbers or make charts
👁️ Vision — understanding images, screenshots, diagrams
🎤 Audio — hearing speech and speaking back

This means the model can look at a photo of a hand-drawn wireframe, write the HTML code for it, search the web for a library it needs, and run the code to show you a working preview — all in one conversation.

The "LLM Operating System" Vision

Here's a powerful way to think about where this is all heading. Instead of viewing an LLM as a chatbot, think of it as the kernel (the core brain) of a new kind of operating system:

                        ┌──────────────────┐
                        │   🧠 Memory       │
                        │  Context window   │
                        └────────┬─────────┘
                                 │
  ┌──────────────────┐   ┌──────┴───────┐   ┌──────────────────┐
  │  📁 Local Files   │───│  LLM KERNEL  │───│   🔧 Tools       │
  │  Documents, data  │   │  Coordinates │   │  Browser, calc,  │
  └──────────────────┘   │  everything  │   │  code            │
                         │  like a CPU   │   └──────────────────┘
                         └──────┬───────┘
                                │
            ┌───────────────────┼───────────────────┐
            │                                       │
   ┌────────┴─────────┐                 ┌───────────┴────────┐
   │  🌐 Internet      │                 │  👁️🎤 Senses       │
   │  Search, APIs     │                 │  Vision, audio,    │
   └──────────────────┘                  │  speech            │
                                         └────────────────────┘

Just as Windows or macOS coordinates your screen, keyboard, files, and apps, the LLM OS coordinates memory, tools, files, and senses to solve whatever problem you throw at it. The conversation window is its RAM; the internet is its hard drive; the code interpreter is its app store.

The Dark Side: Security Challenges

This new paradigm is powerful — but it also opens up entirely new categories of attacks. Here are the four biggest threats researchers are racing to solve:

🎭 Jailbreak Attacks

What it is: Tricking the model into ignoring its safety rules. For example, asking it to roleplay as a character who "happens" to reveal dangerous information, or encoding a harmful question in Base64 so the filter doesn't catch it.

Real-world analogy: Convincing a security guard to let you in by wearing a costume.

🧬 Adversarial Attacks

What it is: Specially crafted "gibberish" text suffixes or invisible noise patterns in images that exploit mathematical weaknesses in the neural network, forcing it to produce harmful output.

Real-world analogy: A dog whistle — sounds like nothing to humans, but the model "hears" a command.

💉 Prompt Injection

What it is: Hiding secret instructions in web pages or documents (e.g., in white text on a white background) that the model reads and obeys when it browses or processes files. The model can't easily tell "user instructions" from "content instructions."

Real-world analogy: Slipping a forged memo into someone's inbox so they follow fake orders.

☠️ Data Poisoning

What it is: An attacker publishes carefully crafted text on the internet. When that text gets swept into the model's training data, it plants a hidden "backdoor" — a trigger phrase that makes the model misbehave in a specific way.

Real-world analogy: Contaminating ingredients at the factory so every product made later has a hidden flaw.

The bottom line: AI security is an active cat-and-mouse game. Researchers discover attacks, build defenses, and then attackers find new workarounds. These models are empirical artifacts — they work remarkably well, but we don't yet have mathematical proofs of their safety.

Key Takeaways

If you remember just five things from this post, let them be these:

An LLM is two files — a huge parameter file and a tiny run-code file.
Training = compression. The model squeezes the internet's knowledge into its weights by learning to predict the next word.
Three stages turn a raw model into a polished assistant: pre-training, fine-tuning, and RLHF.
Scaling laws mean that bigger models + more data = predictably better performance.
Security is unsolved. Jailbreaks, adversarial attacks, prompt injection, and data poisoning are active open problems.

We're at the beginning of something genuinely new — a technology that compresses human knowledge into a portable, runnable format and can coordinate tools, senses, and memory to solve problems. The potential is enormous, and so are the challenges. Understanding how it works is the first step to using it well and thinking clearly about where it's headed.

Thanks for reading! If you found this helpful, drop a ❤️ and follow for more AI explainers.

System Design From Scratch: The Components That Actually Run Production Systems

Sabita kumari — Thu, 09 Apr 2026 20:53:10 +0000

You open amazon.com. A product page loads in under a second. Behind that single page load, your request hit a DNS server, bounced through a CDN edge node, passed a rate limiter, got distributed by a load balancer, routed by an API gateway, processed by a microservice, checked a Redis cache, and maybe — maybe — touched an actual database.

That's system design. Not theory. Not whiteboard boxes. The actual machinery that keeps websites alive when millions of people use them at the same time.

Here's how each piece works, why it exists, and when you need it.

The Client-Server Relationship and DNS

Everything starts with two things: a client and a server.

The client is whatever device makes the request — your phone, laptop, a smart fridge, doesn't matter. The server is a machine that runs 24/7 with a public IP address, sitting in a data center somewhere, waiting for requests.

The problem is that IP addresses look like 10.5.8.2. Nobody remembers that. So we have DNS — the Domain Name System — which is basically a global phone book. You type amazon.com, your browser asks a DNS server "what's the IP for this?", and the DNS server responds with 10.5.8.2. Your browser then connects directly to that IP.

That lookup process is called DNS resolution. It happens before anything else, every single time.

Vertical vs. Horizontal Scaling

Your server has 2 CPUs and 4 GB of RAM. Traffic grows. The machine starts choking. What do you do?

Vertical scaling (scale up): Upgrade the machine. Add more RAM, more CPU cores, faster disks. The problem? You usually need to restart the machine to do this. That means downtime. For a hobby project, fine. For Amazon during Black Friday, absolutely not. There's also a hard ceiling — you can only make a single machine so powerful before physics says no.

Horizontal scaling (scale out): Add more machines. Instead of one beefy server, run three identical servers in parallel. If one goes down, the other two keep serving traffic. No restart needed. No ceiling — just add another machine.

This is why every serious production system uses horizontal scaling. You get zero-downtime deployments, redundancy if a server dies, and linear capacity growth.

But horizontal scaling creates a new problem: if you have three servers, how does the client know which one to talk to?

Load Balancers

A load balancer sits in front of your servers and distributes incoming traffic across them. The client never talks to the servers directly — it talks to the load balancer, and the load balancer decides which server handles each request.

The simplest distribution algorithm is Round Robin: request 1 goes to server A, request 2 to server B, request 3 to server C, then back to A. More sophisticated load balancers also run health checks — they periodically ping each server, and if one stops responding, they stop sending it traffic until it recovers.

In AWS, this is the Elastic Load Balancer (ELB). Most teams don't build their own. Managed load balancers handle SSL termination, sticky sessions, and connection draining — so your team can focus on the application.

API Gateways and Microservices

As your application grows, you stop running everything in one monolithic codebase. Authentication becomes its own service. Orders become their own service. Payments get their own service. This is microservice architecture — each business function runs independently, with its own database, its own deployment pipeline, and its own team.

The question becomes: how does the client know which service to call? It doesn't. That's what the API gateway handles.

An API gateway is a single entry point that routes requests based on the URL path. A request to /auth goes to the authentication service. A request to /orders goes to the order service. A request to /payments goes to the payment service. The client only knows about one URL — the gateway handles the rest.

It also acts as a reverse proxy, meaning the internal services are never exposed to the public internet. The gateway is the only thing with a public IP. Everything behind it is internal.

The load balancer, API gateway, and microservices flow:

Asynchronous Communication and Queues

Some tasks don't need to happen in real time. If a user places an order and the system needs to send a confirmation email, that email doesn't need to go out in the same millisecond. It can happen 2 seconds later. Or 10 seconds later. The user won't notice.

This is where asynchronous communication comes in. Instead of the main server sending the email itself (and blocking until it's done), it pushes a task into a queue — a first-in, first-out list of jobs waiting to be processed. Background workers pull tasks from the queue at their own pace.

AWS SQS is the most common managed queue. The pattern is simple: producer pushes a message, consumer pulls it, processes it, and acknowledges it. If the consumer crashes before acknowledging, the message goes back into the queue for another worker to pick up.

This matters when the task is heavy. Imagine sending a million promotional emails. If the main server tried to send them synchronously, it would be stuck for hours. With a queue and 10 background workers, each worker handles 100,000 emails in parallel. The main server moved on the instant it pushed the tasks.

Event-Driven and Fan-Out Architecture

Here's a common scenario: a payment succeeds, and you need to send an email confirmation, an SMS, and a WhatsApp message. Three actions from one event.

You could have the payment service call each notification system directly. But that creates tight coupling — if the SMS service is slow, it blocks the payment response. If someone adds a push notification later, you have to modify the payment service code.

The better approach is pub-sub (publish-subscribe). The payment service publishes a "payment succeeded" event to a topic (AWS SNS, for example). Three separate queues are subscribed to that topic — one for email, one for SMS, one for WhatsApp. Each queue has its own worker.

This is a fan-out architecture. One event fans out to multiple independent channels. The critical benefit: if the SMS worker crashes, it retries on its own. The email and WhatsApp workers don't know or care. No cascading failures. Each channel is fully independent.

The async processing and fan-out architecture:

Rate Limiting

Without rate limiting, a single bad actor (or a botnet) can flood your servers with millions of requests and take your system down. This is a DDoS attack, and it happens constantly.

Rate limiting caps the number of requests a user or IP can make within a time window. Two common algorithms:

Token bucket: Each user has a bucket that fills with tokens at a fixed rate (say, 5 per second). Each request costs one token. If the bucket is empty, the request is rejected. This allows short bursts — if a user hasn't made requests in a while, their bucket is full and they can fire several at once.

Leaky bucket: Requests enter a queue that drains at a fixed rate. Excess requests overflow and get dropped. This produces a perfectly steady output regardless of input burstiness.

Most production systems implement rate limiting at the load balancer or API gateway level, before requests even reach your services.

Database Scaling: Read Replicas

Your database is a single machine. Most web applications read far more than they write — a product page might get viewed 10,000 times for every one inventory update. So the database bottleneck is usually reads, not writes.

The fix is read replicas. You keep one primary node that handles all write operations. Every write gets replicated to one or more read replicas. Your application sends reads to replicas and writes to the primary. This spreads the load across multiple machines.

The tradeoff is replication lag — there's a small delay (usually milliseconds) between a write hitting the primary and propagating to replicas. For most applications, this is fine. For financial transactions where you need to read your own write immediately, you route that specific read to the primary.

Caching with Redis

Even with read replicas, database queries take time. A cache sits between your application and the database, storing the results of frequent queries in memory.

Redis is the standard. It's an in-memory key-value store. When your application needs data, it checks Redis first. Cache hit? Return the result instantly — no database query needed. Cache miss? Query the database, store the result in Redis for next time, and return it.

For a product page that gets 50,000 views per hour, this means 1 database query and 49,999 cache hits. The database barely notices.

The hard part of caching is invalidation — knowing when to throw away stale data. If a product's price changes, the cached version is wrong until it expires or gets manually evicted. Most teams use a TTL (time-to-live) of 30 seconds to a few minutes, depending on how stale the data can be.

CDNs and Global Optimization

Your servers are in Virginia. A user in Mumbai is 13,000 km away. Even at the speed of light, that round trip adds latency. For static content — images, CSS files, JavaScript bundles, product photos — there's no reason to fetch them from Virginia every time.

A CDN (Content Delivery Network) copies your static content to edge locations around the world. Amazon CloudFront has edge nodes in Mumbai, London, São Paulo, Tokyo, and dozens of other cities. When the Mumbai user requests a product photo, the CDN serves it from the Mumbai edge — no round trip to Virginia.

CDNs use anycast routing, which means the same IP address resolves to different physical servers depending on the user's location. The network automatically routes each user to the closest edge node.

If the content is already cached at the edge, it's returned immediately. If not, the edge fetches it from the origin server, caches it, and serves it. Future requests from that region hit the cache instead of the origin.

For a global e-commerce site, CDNs cut page load times from seconds to milliseconds for users far from the data center. They also reduce bandwidth costs on the origin server, because most requests never reach it.

Database scaling, caching, and CDN:

Putting It All Together

Here's the complete request flow when someone opens a product page:

DNS resolves amazon.com to an IP address
CDN serves static assets (images, CSS, JS) from the nearest edge
Rate limiter checks if the user has exceeded their request quota
Load balancer picks a healthy server and forwards the request
API gateway routes /products/123 to the product service
Product service checks Redis cache for the product data
Cache miss → query a read replica database
If a purchase happens → payment service publishes an event
Pub-sub fans out to email, SMS, WhatsApp queues
Background workers process each notification independently

Every component exists because a single server running everything stops working at scale. DNS gives you human-friendly addresses. Horizontal scaling gives you redundancy. Load balancers distribute traffic. API gateways route to services. Queues decouple heavy tasks. Caching reduces database load. CDNs cut latency. Rate limiting protects the system.

None of this is optional at scale. It's the reason that page loads in under a second.

50 Claude Code Best Practices Every AI Engineer Should Know

Sabita kumari — Wed, 08 Apr 2026 03:02:33 +0000

50 Claude Code tips to help you build with Claude that nobody talks about.

Over the past 24 hours, I read the new Claude Code best practices document so you don't have to.

New Best Practices for Claude Code

I've extracted all the best practices + added some of my own from personal experience to compile the ultimate list of Claude Code best practices.

This list also includes various Claude Code tools + learning resources.

Rapid-fire style - let's go.

Foundational Tips

50. Clear Task Framing - State exactly what you want Claude to do before anything else.

49. Front Load Instructions - Always put the most important instruction at the very top of the prompt.

48. Give Claude a way to verify its work - Include tests, screenshots, or expected outputs so Claude can check itself. This is the single highest-leverage thing you can do.

47. Prompt Structure Tip - To make the last few tips practical, I like this prompting structure:
[Role] + [Task] + [Context]

46. Chrome Extension Tip - UI changes can be verified using the Claude Chrome extension. It opens a browser, tests the UI, and iterates until the code works.

45. Explore first, then plan, then code - Research (this process can include other LLMs), then enter Plan Mode, then switch back to normal mode to execute code.

44. Provide specific context in your prompts - The more precise your instructions, the better. Claude can only infer context.

43. Assume Zero Context - Assume Claude knows nothing about your project. Tell it everything it needs to know.

42. Rich Context - Use @ to link files, data, and images.

41. Claude.MD Tip - Run /init to generate a starter CLAUDE.md file for your current project.

Projects & Skills Use

40. Project Instructions - Use project-level instructions to define long-term behavior instead of repeating prompts.

39. Project Memory - Edit the "Memory" tab to control exactly what Claude should retain or ignore over time (this works in projects as well).

38. Claude Skills - Use them to turn repeatable workflows into Skills instead of re-prompting.

37. Skill From Examples - Paste a great output and ask Claude to turn it into a reusable Skill. You can even upload screenshots and ask Claude to replicate it; turn it into a skill (an easy way to create elite skills).

36. Skill Versioning - Duplicate and version Skills as you refine workflows instead of editing live ones.

35. Project Hygiene - Regularly prune memory, files, and instructions to avoid drift.

34. Project Context Bleed - Separate projects for unrelated workstreams to prevent context bleed.

33. Claude Skills Repo - A library of 80,000+ Claude Skills: https://skillsmp.com/

32. Claude Skills Library - A cool website with plug-and-play Skills and more: https://mcpservers.org/claude-skills

31. Project Memory Location - Project memory can be stored in either ./CLAUDE.md or ./.claude/CLAUDE.md.

Underrated Mini Tips (most people don't know about these)

30. Model Stacking - Use other LLMs to plan your projects and generate advanced mega prompts before ever opening Claude Code — this strategy also saves tokens from Plan Mode.

29. Create custom subagents - Define specialized assistants in .claude/agents/ that Claude can delegate to for isolated tasks.

28. Output Scoring - Ask Claude to score its answer against your pre-defined success criteria.

27. Install Plug-ins - Run /plugin to browse the marketplace. Plugins add skills, tools, and integrations without any configuration.

26. Claude Code taught IN Claude Code - A course that teaches you Claude Code directly IN Claude Code: https://ccforeveryone.com/

25. Claude Interviews - For larger projects, have Claude interview you first. Start with a minimal prompt and ask Claude to interview you using the AskUserQuestion tool.

24. Correct Often - Course-correct Claude often. The moment it starts going off track, stop (ESC to stop Claude mid-action).

23. Clear - Run /clear to start a clean session.

22. Rewind - Double-tap ESC or /rewind to open checkpoint menu.

21. Run Multiple Sessions - There are two main ways to run parallel sessions:

Claude Desktop: Manage multiple local sessions visually. Each session gets its own isolated worktree.
Claude Web: Run on Anthropic's secure cloud infrastructure in isolated VMs.

Debugging, Error Handling, Common Failure Patterns

20. Step Isolation - Re-run only the broken step instead of regenerating everything.

19. Error Reproduction - Ask Claude to intentionally reproduce the failure to understand it.

18. Rollback Prompts - Revert to the last known good prompt and reapply changes one at a time.

17. Over-Specified CLAUDE.md - If your CLAUDE.md is too long, Claude ignores half of it because important rules get lost in the noise.

Fix: Ruthlessly prune. If Claude already does something correctly without the instruction, delete it or convert it to a hook.

16. Don't make this mistake - You start with one task, then ask Claude something unrelated, then go back to the first task. Context is full of irrelevant information.

Fix: /clear between unrelated tasks.

15. Over-Correcting - Claude does something wrong, you correct it, it's still wrong, you correct again. Context is polluted with failed approaches.

Fix: After two failed corrections, /clear and write a better initial prompt incorporating what you learned.

14. Step-by-Step Replay - Have Claude walk through how it generated the answer line by line.

13. The Infinite Exploration - You ask Claude to "investigate" something without scoping it. Claude reads hundreds of files, filling the context.

Fix: Scope investigations narrowly or use subagents so the exploration doesn't consume your main context.

12. Debugging Project - Create an AI project dedicated to debugging code (Grok 4 Heavy is good at debugging).

11. Context Window Management - Claude's context window fills up fast. As this happens, Claude may start forgetting earlier instructions. This page will help you eliminate that problem: https://code.claude.com/docs/en/costs#reduce-token-usage

Final Tips

10. Notion Database - Connect your Notion database to Claude to store your best & most commonly used prompts.

9. Learn Claude Code in Action - Anthropic's learning resources: https://www.anthropic.com/learn

8. Claude Courses - Courses from Coursera: https://www.anthropic.com/learn

7. Boris' Set Up - How the creator of Claude Code maximises Claude Code: Boris' Claude Code Setup Cheatsheet

6. Claude Code Best Practices (DOC) - Link to the latest doc: https://code.claude.com/docs/en/best-practices

5. Safe Autonomous Mode - Use claude --dangerously-skip-permissions to bypass all permission checks and let Claude work uninterrupted. This works well for workflows like fixing lint errors or generating boilerplate code.

4. Slow & Steady - Take your time. Especially if building a serious workflow. Plan. Plan. Plan. THEN, execute.

3. Claude Superpowers - A GitHub Repo of Claude Code superpowers: https://github.com/obra/superpowers

2. Hooks - Best for actions that must happen every time with zero exceptions.

1. How to Extend Claude Code - Anthropic's Guide: https://code.claude.com/docs/en/features-overview