Why APIs Fail in Production (And Work Perfectly in Development)

The API works perfectly on your laptop.

You push to production. Within an hour, the error logs are filling up with 500s. Users are complaining. You cannot reproduce any of it locally.

This is the most common pattern we see when we get called in to fix production APIs. The bugs are rarely in the code. They are in the assumptions the code made about the environment.

Here are the seven assumption-failures we see most often, and how to fix each one.

1. You Assumed One Request at a Time

In development, you hit the API yourself. One request. It responds. You hit it again. It responds again.

In production, the API gets hit by ten requests at the same time, then a hundred, then a thousand during a spike. Code that works one-at-a-time falls apart when concurrent.

The most common failure pattern is shared state. A variable defined at the module level that gets mutated during a request. A cache that gets written to without locking. A counter that gets incremented without atomicity. None of this breaks in dev because dev never has concurrent requests. All of it breaks in production.

Fix: assume every request runs in parallel with every other request. Never share mutable state at the module level. Use request-scoped state, atomic operations, or proper locking. Test concurrent behaviour explicitly before deploying.

2. You Assumed Database Queries Complete Instantly

In development, your database has ten rows. Every query is fast.

In production, the database has ten million rows. A query that ran in 5ms in dev now runs in 4 seconds. Your API request times out. The user sees a 500. The database connection is held the entire time, so the next request gets queued. The queue fills up. Everything cascades.

The failure is almost always missing indexes, but it can also be N+1 queries that did not matter at 10 rows and now make 200 round-trips per request.

Fix: never deploy a new query path without checking the execution plan against production-scale data. If you cannot test against production-scale data, at minimum seed your dev database with realistic volumes. Use query timeouts. Use connection pooling with sensible limits. Monitor slow queries from day one.

3. You Assumed External Services Always Respond

In development, you call Stripe. It responds in 200ms. You call your email service. It responds in 150ms. Everything is fast and available.

In production, sometimes Stripe takes 8 seconds to respond. Sometimes it returns a 502. Sometimes the connection just hangs and your code waits forever. Your API request inherits the slowness. The user sees a timeout.

Fix: every external service call gets a timeout. Every call needs error handling that does not crash the request. Decide what happens when the external service is unavailable. Does the request fail? Does it retry? Does it queue for later? The decision matters per call. The wrong decision is "we will figure it out if it happens" — production will figure it out for you, badly.

4. You Assumed Inputs Are Well-Formed

In development, you test the API with the Postman collection you wrote. Every request has the right shape. Every field has the expected type.

In production, real clients send malformed requests. Missing fields. Wrong types. Strings where numbers should be. Numbers exceeding what your code expects. Arrays with a million items. JSON nested twenty levels deep. Strings with control characters. Strings with SQL injection attempts. Strings with billion-laugh XML bombs.

Your code probably assumes the input is what the documentation says it should be. Production does not care about your documentation.

Fix: validate every input at the API boundary. Use a schema library (Zod, Joi, AJV) and reject anything that does not match. Set hard limits on string lengths, array sizes, and nesting depth. Never trust client input. Never assume client input is well-formed because it was during testing.

5. You Assumed Your Process Can Run Forever

In development, you start the API and it runs until you press Ctrl-C.

In production, the process is going to be killed and restarted by your hosting platform. Often. Vercel, Railway, Fly, Render, ECS all do this. The process gets killed for deployments. For scaling. For instance recycling. For memory pressure. For reasons you do not control.

If your code assumes the process keeps running, in-flight requests get cut off. Background jobs get interrupted halfway. WebSocket connections drop. Cached state gets lost.

Fix: write code that handles SIGTERM gracefully. Finish in-flight requests before exiting. Persist state to a database, not to memory. Use queues for jobs that must complete. Assume the process can die at any moment, and design so death is not catastrophic.

6. You Assumed Memory Resets Between Requests

In development, the API barely uses any memory. The garbage collector cleans up. No issues.

In production, requests accumulate references. Caches grow. Connection pools grow. Logs buffer. The process uses 50MB initially, 200MB after an hour, 800MB after a day. The hosting platform kills it for exceeding the memory limit. It restarts. Memory grows again. The cycle continues.

The leak is usually one of three things. References stored in module-level variables that never get cleared. Event emitter listeners attached without being removed. Closures that capture large objects unintentionally.

Fix: run the API under load locally and watch memory for an hour. Use heap snapshots to find what is accumulating. Set hard limits on cache sizes. Always remove event listeners when done. Profile memory before deploying changes that touch caching, event handling, or long-lived references.

7. You Assumed Errors Stay Local

In development, when something throws, you see the error in your terminal. You fix it. Move on.

In production, errors thrown in one place affect places you did not expect. An unhandled promise rejection in a background job crashes the entire Node process. An error in a request middleware brings down all requests sharing that middleware. A circular dependency makes one error trigger a cascade of others.

Fix: handle errors close to where they happen. Catch promise rejections. Never let async errors bubble to the global handler if you can help it. Use a logging service (Sentry, Datadog, Honeycomb) so errors get captured even when the user request completes "successfully" with a 500. Treat unhandled errors as a build-failure during CI, not a runtime issue to discover in production.

The Pattern Behind All Seven Failures

Every one of these failures has the same shape.

Your code makes an assumption about the environment. The assumption is true in development. The assumption is not true in production. The code breaks.

The fix is never just changing the code. The fix is identifying the assumption, deciding whether it is acceptable in production, and either removing the assumption or making it true with infrastructure.

This is why "it works on my machine" is not the joke developers think it is. It is the real cause of most production failures. The machine running in dev is fundamentally different from the machine running in production, in ways most code does not handle gracefully.

How to Catch These Before They Reach Production

Four practices that catch most of these failures before users see them.

Load testing. Run your API under realistic load before deploying. Tools like k6, Artillery, or Vegeta can simulate concurrent requests. Find the breaking point.

Realistic dev data. Seed your local database with production-scale volumes. Use anonymised production data if you can. Find the slow queries before users do.

Chaos testing. Deliberately introduce failures. Kill external services. Throw timeout errors. Make the database slow. Watch what breaks.

Production monitoring from day one. Error rates, latency percentiles, memory usage, slow queries. Not just uptime. The early signals of all seven failures show up in monitoring before they become outages.

These four practices are the gap between APIs that work in production and APIs that work in development and fail in production. None of them is hard. All of them get skipped on most projects because they feel like overhead until they save you.

The Honest Reality

If you are reading this because your API is currently failing in production, the fix is rarely a code change. The fix is usually one of these assumption-failures, and the first thing to do is figure out which.

Look at the error logs. Look at the times. Look at what was happening when the errors clustered. The pattern will point to the assumption. Once you know the assumption, the code change is usually small.

If you are reading this before launching, the work is to remove the assumptions in advance. Validate inputs. Set timeouts. Profile memory. Test under load. Build the observability before you need it.

The APIs that survive production are not the ones with the cleverest code. They are the ones built by developers who knew which assumptions could break, and designed around them.

That is the real difference between development-grade code and production-grade code. Same language. Same framework. Different expectations of what the environment will throw at you.

Why Your API Returns 500s in Production (And Not in Dev)