Skip to main content

Serverless Workers

This page covers the following:

What is a Serverless Worker?

A Serverless Worker is a Temporal Worker that runs on serverless compute instead of a long-lived process. There is no always-on infrastructure to provision or scale. Temporal invokes the Worker when Tasks arrive on a Task Queue, and the Worker shuts down when the work is done.

A Serverless Worker uses the same Temporal SDKs as a traditional long-lived Worker. It registers Workflows and Activities the same way. The difference is in the lifecycle: instead of the Worker starting and polling continuously, Temporal invokes the Serverless Worker on demand, the Worker starts, processes available Tasks, and then shuts down.

Serverless Workers require Worker Versioning. Each Serverless Worker must be associated with a Worker Deployment Version that has a compute provider configured.

To deploy a Serverless Worker, see Deploy a Serverless Worker.

How Serverless invocation works

With long-lived Workers, you start the Worker process, which connects to Temporal and polls a Task Queue for work. Temporal does not need to know anything about the Worker's infrastructure.

With Serverless Workers, Temporal starts the Worker.

Serverless invocation flowServerless invocation flow

Temporal's Worker Controller Instance invokes a Serverless Worker when Tasks arrive on a Task Queue with a compute provider configured.

Temporal's internal Worker Controller Instance (WCI) decides when to start, scale, and stop compute invocations.

The invocation flow works as follows:

  1. A Task is submitted (for example, StartWorkflow or ScheduleActivity).
  2. The Matching Service attempts to route the Task directly to an available Worker (a sync match).
  3. If a Worker is available, the Task is routed to that Worker.
  4. If no Worker is available (sync match fails), the Matching Service pushes a signal to the WCI, and the WCI invokes the configured compute provider (for example, calling AWS Lambda's InvokeFunction API).
  5. The Serverless Worker starts, creates a Temporal Client, and begins polling the Task Queue.
  6. The Worker processes available Tasks until it exits (see Worker lifecycle).

The WCI also monitors the Task Queue backlog independently. If tasks arrive faster than Workers can process them, the WCI invokes additional Workers in parallel until the backlog drains or provider concurrency limits are reached.

Each invocation is independent. The Worker creates a fresh client connection on every invocation. There is no connection reuse or shared state across invocations.

Autoscaling

Temporal automatically scales Serverless Workers based on Task Queue signals. When Tasks arrive and no Worker is available, Temporal invokes new Workers. When the work is done, Workers exit and scale to zero.

The WCI uses two signals to decide when to invoke new Workers:

Sync match failure

When a Task is submitted, the Matching Service attempts to route it directly to an available Worker. If no Worker is available, the sync match fails, and the Matching Service pushes a signal to the WCI. The WCI then invokes a new Worker. This is the primary scaling path. Because the Matching Service pushes match failures to the WCI as they happen rather than the WCI polling on a timer, latency stays low and scaling is responsive.

Task Queue backlog

The WCI monitors Task Queue metadata to determine whether pending Tasks exist without enough Workers to process them. If there is work on the queue and not enough Workers, the WCI invokes additional Workers.

Worker lifecycle

A single Serverless Worker invocation has three phases: init, work, and shutdown.

Serverless Worker lifecycleServerless Worker lifecycle

The shutdown deadline buffer controls when the Worker stops polling, and the Worker stop timeout controls how long the Worker waits for in-flight Tasks to finish before shutdown hooks run.

During the init phase, the Worker initializes and establishes a client connection to Temporal.

During the work phase, the Worker polls the Task Queue and processes Tasks.

During the shutdown phase, the Worker stops polling, waits for in-flight Tasks to finish, and runs any shutdown hooks (for example, OpenTelemetry telemetry flushes). Shutdown begins before the invocation deadline so the Worker can exit cleanly before the compute provider forcibly terminates the execution environment.

Tuning for long-running Activities

If your Worker handles long-running Activities, set these three values together:

  • Worker stop timeout > longest Activity runtime. Gives in-flight Activities enough time to finish after polling stops.
  • Shutdown deadline buffer > Worker stop timeout + shutdown hook time. Ensures the drain and any shutdown hooks complete before the compute provider terminates the environment.
  • Invocation deadline > longest Activity runtime + shutdown deadline buffer. Set on the compute provider to give each invocation enough total runtime.

For example, if your longest Activity runtime is 5 minutes, and your shutdown hooks take 3 seconds to run, set the Worker stop timeout to more than 5 minutes, and the shutdown deadline buffer to more than 303 seconds (5 minutes + 3 seconds). Set your invocation deadline to at least 10 minutes and 3 seconds (5 minutes + 303 seconds).

The Worker stop timeout controls how long the Worker waits for in-flight Tasks to finish after it stops polling. The shutdown deadline buffer controls how much time before the invocation deadline the Worker stops polling for Tasks.

Raising only the shutdown deadline buffer makes the Worker stop polling earlier, but does not give in-flight Tasks any more time to complete.

Raising only the Worker stop timeout does not make the Worker stop polling earlier, which means the compute provider might terminate the Worker before the full stop timeout completes. In-flight Activities then do not get the full stop timeout to finish, and the shutdown hooks may not run.

Failure handling

Serverless Workers rely on Temporal's standard retry and timeout semantics to recover from failures. The following sections describe common failure scenarios and how they are handled.

Worker crash

If a Worker invocation crashes (out of memory, unhandled exception, etc.), the behavior follows standard Temporal retry semantics:

  • The Activity Timeout fires after the configured duration.
  • Temporal retries the Activity on a different Worker invocation.
  • No manual intervention is required.

Provider concurrency limit

If the compute provider's concurrency limit is reached (for example, AWS Lambda account concurrency):

  • Further invocations from the WCI fail.
  • Tasks remain in the Task Queue backlog. No data loss occurs.
  • Processing slows until concurrency frees up.

Resource exhaustion across Activity slots

By default, a single Worker invocation may run multiple Activity slots. A crash or resource exhaustion in one Activity (for example, out-of-memory from a memory-intensive operation) can affect other Activities running in the same invocation.

To isolate Activities from each other:

  • Split Workflow and Activity Workers into separate compute functions.
  • Set Activity slots to 1 per invocation.

With single-slot configuration, each Activity gets a dedicated execution environment.

Constraints

ConstraintDetail
Activity durationMust complete within the compute provider's invocation limit (minus shutdown deadline buffer). For AWS Lambda, the maximum is 15 minutes.
Workflow durationNo limit. Workflows of any duration work, regardless of the invocation timeout. A Workflow runs across as many invocations as needed.
Worker codeSame Temporal SDK Worker code, using the serverless Worker package for your SDK.
VersioningWorker Versioning is required. Each Workflow must declare AutoUpgrade or Pinned behavior.

Compute providers

A compute provider is the configuration that tells Temporal how to invoke a Serverless Worker. The compute provider is set on a Worker Deployment Version and specifies the provider type, the invocation target, and the credentials Temporal needs to trigger the invocation.

For example, an AWS Lambda compute provider includes the Lambda function ARN and the IAM role that Temporal assumes to invoke the function.

Compute providers are only needed for Serverless Workers. Traditional long-lived Workers do not require a compute provider because the Worker process manages its own lifecycle.

Supported providers

ProviderDescription
AWS LambdaTemporal assumes an IAM role in your AWS account to invoke a Lambda function.