Skip to main content
This guide covers how to handle errors that occur during workflow execution. You can safely handle failures caused by external API calls, network issues, temporary outages, and implement automatic retry patterns when necessary.

Basic Error Handling

Errors in workflows are handled with standard try-catch statements. Log the error and re-throw it if necessary to put the workflow in a failed state.
import { workflow } from "sonamu";

export const processPayment = workflow(
  { name: "process_payment" },
  async ({ input, step, logger }) => {
    try {
      await step.define({ name: "charge" }, async () => {
        await paymentGateway.charge(input.amount);
      }).run();

      return { success: true };
    } catch (error) {
      logger.error("Payment failed", { error, input });

      // Re-throw error (workflow fails)
      throw error;
    }
  }
);
Key Concepts:
  • Log errors with logger.error()
  • Fail the workflow with throw error
  • Failed workflows are automatically retried by the Worker

Step-Level Error Handling

Not all steps are required. Some steps can fail without stopping the workflow. Wrap these optional steps in try-catch to absorb errors.
export const syncData = workflow(
  { name: "sync_data" },
  async ({ input, step, logger }) => {
    // Step 1: Required task
    const data = await step.define({ name: "fetch_data" }, async () => {
      return await fetchFromAPI(input.url);
    }).run();

    // Step 2: Optional task (continue even if it fails)
    let cached = false;
    try {
      await step.define({ name: "cache_data" }, async () => {
        await cacheData(data);
      }).run();
      cached = true;
    } catch (error) {
      logger.warn("Cache failed, continuing...", { error });
    }

    // Step 3: Final save
    await step.define({ name: "save_data" }, async () => {
      await saveToDatabase(data);
    }).run();

    return { saved: true, cached };
  }
);
Use Cases:
  • Cache failure (data must still be saved)
  • Notification failure (order must still complete)
  • Logging failure (business logic continues)

Retry Patterns

Manual Retry

External API or network requests can fail temporarily. In such cases, multiple retries can improve success rates.
export const fetchWithRetry = workflow(
  { name: "fetch_with_retry" },
  async ({ input, step, logger }) => {
    const maxRetries = 3;
    let lastError: Error | null = null;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        const data = await step.define(
          { name: `fetch_attempt_${attempt}` },
          async () => {
            return await unstableAPICall(input.url);
          }
        ).run();

        logger.info("Fetch succeeded", { attempt });
        return { data, attempts: attempt };
      } catch (error) {
        lastError = error;
        logger.warn("Fetch failed", { attempt, error });

        // Wait before the next attempt
        if (attempt < maxRetries) {
          await step.sleep("retry_delay", "5s");
        }
      }
    }

    // All attempts failed
    throw new Error(`Failed after ${maxRetries} attempts: ${lastError?.message}`);
  }
);
Retry Strategy:
  1. Maximum 3 attempts
  2. 5 second wait between attempts
  3. Throw error if all attempts fail

Exponential Backoff

Exponential Backoff gradually increases the delay between retries, preventing server overload while improving retry success rates.
export const fetchWithBackoff = workflow(
  { name: "fetch_with_backoff" },
  async ({ input, step, logger }) => {
    const maxRetries = 5;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        return await step.define(
          { name: `attempt_${attempt}` },
          async () => {
            return await externalAPI.fetch(input.url);
          }
        ).run();
      } catch (error) {
        logger.warn("Attempt failed", { attempt, error });

        if (attempt < maxRetries) {
          // Exponential backoff: 2s, 4s, 8s, 16s
          const delay = Math.pow(2, attempt);
          await step.sleep(`backoff_${attempt}`, `${delay}s`);
        } else {
          throw error;
        }
      }
    }
  }
);
Backoff Intervals:
  • 1st failure -> 2 second wait
  • 2nd failure -> 4 second wait
  • 3rd failure -> 8 second wait
  • 4th failure -> 16 second wait
Benefits:
  • Provides time to recover from temporary overload
  • Distributes server load
  • Improves retry success rate

Compensating Transactions

In distributed transactions, when some operations fail, already completed operations must be rolled back. This is called a Compensating Transaction.
export const processOrder = workflow(
  { name: "process_order" },
  async ({ input, step, logger }) => {
    let paymentId: string | null = null;
    let inventoryReserved = false;

    try {
      // Step 1: Payment
      paymentId = await step.define({ name: "charge_payment" }, async () => {
        return await paymentService.charge(input.amount);
      }).run();

      // Step 2: Reserve inventory
      await step.define({ name: "reserve_inventory" }, async () => {
        await inventoryService.reserve(input.items);
      }).run();
      inventoryReserved = true;

      // Step 3: Create order
      const orderId = await step.define({ name: "create_order" }, async () => {
        return await orderService.create({
          paymentId,
          items: input.items,
        });
      }).run();

      return { orderId, success: true };
    } catch (error) {
      logger.error("Order processing failed, rolling back...", { error });

      // Compensate: Release inventory
      if (inventoryReserved) {
        await step.define({ name: "rollback_inventory" }, async () => {
          await inventoryService.release(input.items);
        }).run();
      }

      // Compensate: Refund payment
      if (paymentId) {
        await step.define({ name: "rollback_payment" }, async () => {
          await paymentService.refund(paymentId);
        }).run();
      }

      throw error;
    }
  }
);
Compensating Transaction Pattern:
  1. Track completion status of each operation
  2. Check completed operations when failure occurs
  3. Cancel operations in reverse order
  4. Re-throw the error
Practical Uses:
  • Payment refund
  • Inventory restoration
  • Reservation cancellation
  • File deletion

Timeout Handling

If an external API doesn’t respond, the workflow could wait indefinitely. Set a timeout to fail after a certain duration.
async function withTimeout<T>(
  promise: Promise<T>,
  timeoutMs: number
): Promise<T> {
  return Promise.race([
    promise,
    new Promise<T>((_, reject) =>
      setTimeout(() => reject(new Error("Timeout")), timeoutMs)
    ),
  ]);
}

export const fetchWithTimeout = workflow(
  { name: "fetch_with_timeout" },
  async ({ input, step, logger }) => {
    try {
      const data = await step.define({ name: "fetch" }, async () => {
        return await withTimeout(
          externalAPI.fetch(input.url),
          30000  // 30 seconds
        );
      }).run();

      return { data };
    } catch (error) {
      if (error.message === "Timeout") {
        logger.error("Request timed out");
      }
      throw error;
    }
  }
);
Timeout Strategies:
  • Short timeout (5-10 seconds): Fast APIs
  • Medium timeout (30-60 seconds): Standard APIs
  • Long timeout (5-10 minutes): File processing

Error Type-Based Handling

Use different handling strategies based on error types. Network errors should be retried, but data validation errors should fail immediately.
export const processData = workflow(
  { name: "process_data" },
  async ({ input, step, logger }) => {
    try {
      await step.define({ name: "process" }, async () => {
        await dataService.process(input.data);
      }).run();
    } catch (error) {
      // Network error: Retry
      if (error.code === 'ECONNREFUSED') {
        logger.warn("Connection refused, retrying...");
        await step.sleep("retry_delay", "10s");
        throw error;  // Retry
      }

      // Validation error: Fail immediately
      if (error.code === 'VALIDATION_ERROR') {
        logger.error("Validation failed", { error });
        throw new Error("Invalid data, not retrying");
      }

      // Other errors
      throw error;
    }
  }
);
Error Classification:
Error TypeHandlingExamples
Transient errorsRetryNetwork, timeout, 503
Permanent errorsFail immediatelyValidation, auth, 404
Partial failuresSelective handlingSome data corrupted

Practical Examples

1. Email Sending with Dead Letter Queue

If email sending fails 3 times, add it to a Dead Letter Queue for manual processing later.
export const sendEmail = workflow(
  { name: "send_email" },
  async ({ input, step, logger }) => {
    const maxRetries = 3;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        await step.define(
          { name: `send_attempt_${attempt}` },
          async () => {
            await emailService.send({
              to: input.email,
              subject: input.subject,
              body: input.body,
            });
          }
        ).run();

        logger.info("Email sent", { attempt });
        return { success: true, attempts: attempt };
      } catch (error) {
        logger.warn("Email send failed", { attempt, error });

        if (attempt < maxRetries) {
          await step.sleep("retry_delay", "30s");
        } else {
          // Final failure: Move to Dead Letter Queue
          await step.define({ name: "move_to_dlq" }, async () => {
            await deadLetterQueue.add({
              type: "email",
              data: input,
              error: error.message,
            });
          }).run();

          throw error;
        }
      }
    }
  }
);
DLQ Pattern:
  • 3 retry failures
  • Add to DLQ (process later)
  • Notify administrator

2. API Call with Circuit Breaker

When an external API keeps failing, protect the system by blocking requests with a Circuit Breaker.
class CircuitBreaker {
  private failures = 0;
  private lastFailureTime = 0;
  private readonly threshold = 5;
  private readonly timeout = 60000;  // 1 minute

  async call<T>(fn: () => Promise<T>): Promise<T> {
    // Circuit is open
    if (this.failures >= this.threshold) {
      const elapsed = Date.now() - this.lastFailureTime;
      if (elapsed < this.timeout) {
        throw new Error("Circuit breaker is open");
      }
      // Retry after timeout
      this.failures = 0;
    }

    try {
      const result = await fn();
      this.failures = 0;  // Reset on success
      return result;
    } catch (error) {
      this.failures++;
      this.lastFailureTime = Date.now();
      throw error;
    }
  }
}

const circuitBreaker = new CircuitBreaker();

export const callAPI = workflow(
  { name: "call_api" },
  async ({ input, step, logger }) => {
    try {
      const data = await step.define({ name: "api_call" }, async () => {
        return await circuitBreaker.call(() =>
          fetch(input.url).then(r => r.json())
        );
      }).run();

      return { data };
    } catch (error) {
      if (error.message === "Circuit breaker is open") {
        logger.error("Circuit breaker open, API unavailable");
      }
      throw error;
    }
  }
);
Circuit Breaker States:
  • Closed: Normal operation
  • Open: Blocked after 5 failures (for 1 minute)
  • Half-Open: Retry after 1 minute

3. File Upload with Partial Retry

When uploading multiple files, retry each file independently so that some failures don’t prevent others from succeeding.
export const uploadFiles = workflow(
  { name: "upload_files" },
  async ({ input, step, logger }) => {
    const results = [];

    for (let i = 0; i < input.files.length; i++) {
      const file = input.files[i];
      let uploaded = false;

      // Retry each file individually
      for (let attempt = 1; attempt <= 3; attempt++) {
        try {
          await step.define(
            { name: `upload_${i}_attempt_${attempt}` },
            async () => {
              await s3.upload(file.key, file.content);
            }
          ).run();

          uploaded = true;
          results.push({ file: file.key, success: true });
          break;
        } catch (error) {
          logger.warn("Upload failed", { file: file.key, attempt, error });

          if (attempt < 3) {
            await step.sleep(`retry_${i}_${attempt}`, "5s");
          }
        }
      }

      if (!uploaded) {
        results.push({ file: file.key, success: false });
      }
    }

    const failed = results.filter(r => !r.success);
    if (failed.length > 0) {
      logger.error("Some uploads failed", { failed });
    }

    return {
      total: input.files.length,
      succeeded: results.filter(r => r.success).length,
      failed: failed.length,
    };
  }
);
Partial Retry Strategy:
  • Each file has independent steps
  • 3 retries per file
  • Workflow succeeds even if some fail
  • Returns list of failed files

Important Notes

Error Handling Best Practices:
  1. Error Logging: Always log errors to enable problem tracking.
    catch (error) {
      logger.error("Operation failed", { error, context });
    }
    
  2. Retry Limits: Prevent infinite retries by setting a maximum count.
    const maxRetries = 3;  // Clear limit
    
  3. Compensating Transactions: Clean up completed operations on failure.
    catch (error) {
      await rollbackCompletedSteps();
      throw error;
    }
    
  4. Timeout Settings: Prevent infinite waits.
    await withTimeout(promise, 30000);
    
  5. Dead Letter Queue: Store final failures separately for manual processing.
    await deadLetterQueue.add(failedItem);
    
  6. Error Type Distinction: Differentiate between retryable and non-retryable errors.
    if (isRetryable(error)) {
      throw error;  // Retry
    } else {
      throw new NonRetryableError();  // Fail immediately
    }
    

Next Steps