A few months ago, I built a music-sharing platform where users could import their playlists from one platform and make them public so others could import them as well.

Functionally, everything worked. But there was one big problem: I didn’t design for scalability and reliability from the start.

This was unusual for me, because scalability is usually on my mind. The real reason was simple — at the time, I didn’t know how to properly solve the problem.

Fast forward a few weeks later, I learned about workflow engines, and everything clicked.

What Is a Workflow Engine?

A workflow engine is a system that orchestrates long-running, multi-step processes by persisting their state and executing tasks via workers.

This allows the process to:

  • survive crashes
  • retry on failure
  • resume from where it stopped
  • avoid losing progress

You can build one from scratch, or use existing tools like Temporal, which is what I used.

The Problem with My Original Design

Bad System Design

In my original system, when a user imported a playlist:

  • The backend would call external music APIs (Spotify, etc.)
  • If the API quota was exceeded, the import stopped
  • The system had no memory of where it stopped
  • When the quota reset, the user had to restart manually
  • Previously imported songs were imported again

This caused:

  • duplicate tracks
  • wasted API quota
  • bad user experience

In short: no fault tolerance, no progress tracking, and no recovery.

How a Workflow Engine Fixed This

Good System Design

I introduced Temporal as a workflow engine to manage playlist imports.

Each playlist import became a workflow:

  • Each track import became a step
  • Progress was persisted after every step
  • Failures were automatically retried
  • The workflow could pause and resume safely

Here’s what the core workflow looks like:

func PlaylistSyncWorkflow(ctx workflow.Context, input PlaylistSyncInput) (*PlaylistSyncResult, error) {
    logger := workflow.GetLogger(ctx)
    logger.Info("Starting PlaylistSyncWorkflow", "playlistID", input.PlaylistID)

    ao := workflow.ActivityOptions{
        StartToCloseTimeout: 10 * time.Minute,
        RetryPolicy: &temporal.RetryPolicy{
            InitialInterval:    time.Second,
            BackoffCoefficient: 2.0,
            MaximumInterval:    time.Minute,
            MaximumAttempts:    5,
        },
    }
    ctx = workflow.WithActivityOptions(ctx, ao)

    // Fetch user and playlist data
    var user db.User
    err := workflow.ExecuteActivity(ctx, FetchUserActivity, input.UserID).Get(ctx, &user)
    if err != nil {
        return nil, fmt.Errorf("failed to fetch user: %w", err)
    }

    var tracks []db.Track
    err = workflow.ExecuteActivity(ctx, FetchPlaylistTracksActivity, input.PlaylistID).Get(ctx, &tracks)
    if err != nil {
        return nil, fmt.Errorf("failed to fetch tracks: %w", err)
    }

    result := &PlaylistSyncResult{
        TracksProcessed: 0,
        TracksFailed:    0,
    }

    // Process each track as a separate activity
    for i, track := range tracks {
        logger.Info("Processing track", "index", i+1, "total", len(tracks), "title", track.Title)

        err = workflow.ExecuteActivity(ctx, AddTrackToSpotifyActivity, user, playlistID, track.SpotifyID).Get(ctx, nil)
        if err != nil {
            result.TracksFailed++
        } else {
            result.TracksProcessed++
        }
    }

    logger.Info("PlaylistSyncWorkflow completed", "processed", result.TracksProcessed, "failed", result.TracksFailed)
    return result, nil
}

This gave me three major wins:

1. Fault Tolerance

If the server crashes, deploys break, or workers restart, the workflow does not lose state.

Temporal replays the workflow from its last known state and continues execution. No manual restarts. No broken imports.

2. Progress Tracking

The system always knows:

  • which tracks were imported
  • which track is next
  • where the process stopped

So if syncing is interrupted, it resumes exactly from the last successful step.

3. Rate Limiting & Retries

When an API quota is hit, the workflow handles it gracefully. Here’s an example from the activity layer:

func AddTrackToSpotifyActivity(ctx context.Context, user db.User, playlistID, trackID string) error {
    client, err := services.GetSpotifyClient(ctx, user)
    if err != nil {
        return fmt.Errorf("failed to get Spotify client: %w", err)
    }

    _, err = client.AddTracksToPlaylist(ctx, spotify.ID(playlistID), spotify.ID(trackID))
    if err != nil {
        if strings.Contains(err.Error(), "429") {
            time.Sleep(30 * time.Second)
            return fmt.Errorf("rate limited, will retry: %w", err)
        }
        return fmt.Errorf("failed to add track: %w", err)
    }
    return nil
}

When rate limited:

  • The workflow sleeps
  • Retries automatically using Temporal’s retry policy
  • Continues when allowed

No duplicated imports. No wasted quota.

Why This Matters Architecturally

Playlist syncing is:

  • long-running
  • dependent on external APIs
  • failure-prone
  • stateful
  • side-effect heavy

This makes it a perfect use case for a workflow engine.

Without one, you end up writing fragile, ad-hoc logic with:

  • cron jobs
  • background queues
  • manual retries
  • inconsistent state

With a workflow engine, these concerns become infrastructure problems, not application problems.

General Use Cases for Workflow Engines

Workflow engines are ideal whenever a process:

  • has multiple steps
  • can fail
  • takes time
  • must not lose state

Common real-world examples:

  • KYC verification / onboarding
  • Payments & billing pipelines
  • AI agent task orchestration
  • CI/CD pipelines
  • E-commerce order fulfillment
  • Data pipelines and ETL jobs

Final Thought

Learning about workflow engines completely changed how I think about system design.

Instead of asking:

“How do I make this work?”

I now ask:

“How do I make this survive failure?”

And that shift is the difference between a system that works in demos and one that works in production.

Check out the EchoBridge repo here (please star it!)

Follow me on X