The k-Server Problem — Professional Level¶

Table of Contents¶

What This Tier Is About
Why WFA Is Not Deployed
Real k-Server-Shaped Problems
Distributed Caching: Data Migration and Replica Placement
Disk-Head and SSD Scheduling
VM / Container Placement and Live Migration
Ride-Hailing, Fleet Dispatch, and Ambulance Positioning
NUMA Page Migration
The Practical Toolkit
Engineering Reality: The Bias Toward Not Moving
Worked End-to-End: A Fleet-Dispatch Simulator
Decision Framework
Research Pointers
Key Takeaways

What This Tier Is About¶

The senior tier (./senior.md) is the deep theory: the Work Function Algorithm is (2k−1)-competitive on every metric, Double-Coverage is k-competitive (optimal) on trees and the line, the k-server conjecture is open, the randomized version is settled at the polylog level via HST embeddings, and learning-augmented k-server combines a trajectory predictor with a robust fallback. That theory is correct and it is the right mental model.

This tier asks a different question: where does a k-server-shaped problem actually show up in a system you would build or operate, and what do you run when WFA is off the table? The answer is honest and slightly deflating: WFA is essentially never deployed, the clean k-competitive algorithms are rarely deployed as written, and what practitioners ship is greedy movement with a hysteresis threshold, periodic LP/flow re-assignment, and — increasingly — demand prediction to pre-position servers. The value of the theory is not the algorithm; it is the modeling discipline (recognize the metric, the servers, the movement cost) and the central lesson (movement is expensive, hedge it against an uncertain future).

The structure here mirrors ../03-paging-and-caching-theory/professional.md: name why the textbook-optimal policy is not what production runs, map the model onto real systems, give the practical toolkit, then a worked simulator measured against an offline optimum so the gap is concrete rather than asserted.

Why WFA Is Not Deployed¶

WFA is the best general deterministic algorithm known, and it is a non-starter as a production policy. The reasons are not subtle.

The work function is computed over all offline configurations. Recall the definition (./senior.md): w_t(C) is the minimum offline cost to serve the entire request history r₁, …, r_t and end with servers parked at configuration C. To choose its next move, WFA needs w_t(C) for every candidate configuration C that serves the new request — and the work function is maintained by dynamic programming over configurations. For an n-point metric with k servers there are Θ(n^k) configurations. Each step of the update is a minimization over configurations involving a min-cost matching distance d(X, C) between k-point sets. This is not "expensive constant factor" expensive; it is exponential in the number of servers and grows with the entire history.

Even truncated, it is the wrong shape for a system. Practical metrics are huge (a road network, a datacenter topology, a 100-node NUMA fabric crossed with millions of pages), k can be in the hundreds or thousands (vehicles, replicas, VMs), and decisions are needed in milliseconds. There is no version of "track the optimal offline cost to reach every configuration" that survives contact with that scale. The work function also assumes a single, known, fixed metric and exact movement costs, whereas real movement costs are stochastic (network congestion, migration stalls, traffic) and the metric drifts.

The honest takeaway. WFA's contribution is the proof that a metric-oblivious (2k−1) bound exists at all — it tells you the problem is not hopeless. It is a theoretical instrument. Practitioners do not approximate WFA; they discard it and exploit problem-specific structure instead: the line metric for disk heads, the star metric for "in cache or not," the road-network metric for fleets. On those structured metrics the simple algorithms (Double-Coverage, greedy-with-threshold, periodic LP) are both analyzable and cheap, and that is what ships.

Real k-Server-Shaped Problems¶

A problem is "k-server-shaped" when you have a fixed budget of k mobile resources living in a metric space, a stream of location-bound demands, and a cost to move a resource that you want to minimize over time without knowing the future. The recurring trap is to optimize each request in isolation (greedy) and end up thrashing — moving resources back and forth chasing transient demand. That trap is exactly what competitive analysis warns about.

Distributed Caching: Data Migration and Replica Placement¶

The mapping. Servers = the k physical copies (replicas) of a data item or shard. The metric = network distance / round-trip latency between nodes (or racks, or regions). A request = a read/write issued from some node; serving it "cheaply" means having a replica near the requester. As access locality shifts — a workload's hot region moves from us-east to eu-west over the day — you decide whether to migrate or add/drop a replica to follow the demand, paying the data-transfer cost of moving a copy across the network.

This is k-server almost verbatim: the data-transfer cost is the "movement" cost, the latency metric is d, and the temptation to chase every burst of remote traffic with a migration is precisely the thrashing the theory tells you to resist. Cache-coherence migration in multiprocessors is the same shape at a smaller scale: a cache line owned by one core is requested by another, and the coherence protocol must decide whether to migrate ownership (move the "server") or service remotely. Page migration (below) is the formal model for this.

Disk-Head and SSD Scheduling¶

The mapping. Servers = the read/write heads. The address space is (approximately) one-dimensional — the line metric — for a classic spinning disk, where seek cost is proportional to head travel distance across cylinders. A request = an I/O at a particular track. With a single head this is 1-server on the line; with multiple actuators (multi-actuator HDDs are now real) or modeling a 2-D platter geometry it becomes k-server on a line or low-dimensional metric.

The elevator algorithm (SCAN / C-SCAN) — sweep the head in one direction servicing requests in order, then reverse — is a greedy k-server cousin on the line: it minimizes wasteful back-and-forth seeking exactly as Double-Coverage avoids over-moving. The line is the one metric where Double-Coverage is provably k-competitive (./senior.md), so disk scheduling is the cleanest place where the optimal k-server algorithm and the deployed heuristic (the elevator) are genuinely close relatives. For SSDs, "head travel" disappears (no seek), so the metric collapses and the k-server framing loses its bite — SSD scheduling is dominated by parallelism across channels, wear-leveling, and write amplification, not movement cost. A good practitioner notices when the metric has degenerated and stops forcing the model.

VM / Container Placement and Live Migration¶

The mapping. Servers = workloads (VMs / containers) that can live on different hosts. The metric = a cost over the host topology; the dominant term is migration overhead — the cost (downtime, network transfer, page-dirty re-copy in live migration, brief SLA risk) of moving a workload from one host to another. A request = a demand signal (a workload getting hot, a host getting oversubscribed, an affinity/anti-affinity constraint firing). Deciding which workloads to relocate to rebalance load, while paying migration cost, is k-server-shaped.

The same shape governs CDN replica placement and content movement: which edge nodes hold which content objects as demand geography shifts, where "moving a server" means pushing or evicting a content replica at an edge. Schedulers like Kubernetes' descheduler, VMware DRS, and cluster autoscalers all face the move-or-stay question and all bias heavily toward not migrating, because migration is disruptive — which is the engineering instantiation of the competitive-analysis lesson.

Ride-Hailing, Fleet Dispatch, and Ambulance Positioning¶

The mapping — this is a direct k-server instance. Servers = the k vehicles. The metric = the road network (shortest-path travel time/distance — a genuine metric space). A request = a trip/pickup/incident at a location that must be served by sending a vehicle there. Total movement cost = total vehicle miles/minutes, which is what you minimize.

Ambulance positioning is the textbook example often cited for k-server intuition: k ambulances on a city road network, emergency calls arrive online, you dispatch the nearest and then face the repositioning question — after dispatch, should idle ambulances relocate to cover the gap left behind, anticipating where the next call will be? Dispatch-nearest is greedy; repositioning is the k-server move that hedges against future requests. Ride-hailing dispatch (Uber/Lyft/DiDi) adds demand prediction and surge, but the core is the same k-server skeleton on a road-network metric, which is exactly why learning-augmented k-server (predict-and-preposition, below) maps so naturally onto it.

NUMA Page Migration¶

The mapping. This is the page migration problem, a well-studied k-server relative (the k=1 case is exactly the migration problem; the replication problem is its sibling). On a NUMA machine, a memory page lives on one node; a thread on a different node accessing it pays a remote-access penalty. The system can migrate the page to the requesting node — at the cost of copying the page (and TLB shootdowns, coherence traffic) — to make future local accesses cheap.

The trade-off is precisely rent-or-buy (../02-ski-rental-and-rent-or-buy/professional.md): each remote access "rents" the penalty; migrating "buys" locality at a one-time cost. Migrate too eagerly and a page ping-pongs between nodes (the classic NUMA thrashing); migrate too late and you pay remote penalties you could have avoided. Linux's numa_balancing (AutoNUMA) resolves this with exactly the toolkit below: it samples accesses, counts remote faults, and migrates a page only after it crosses a threshold of remote accesses from a single node — a hysteresis rule that is the deployed answer to a k-server-relative problem.

The Practical Toolkit¶

Across every system above, the same five techniques recur. None of them is WFA.

1. Greedy + hysteresis / thresholds (the workhorse). Serve each request greedily (send the nearest vehicle, read from the nearest replica, leave the page where it is) and only move a resource when the accumulated benefit of moving exceeds the cost of moving by a margin. This is a rent-or-buy / ski-rental decision applied per resource: rent (pay the recurring sub-optimality) until the cumulative rent would have paid for the buy (the migration), then move — and even then, add a margin so you do not flip-flop. The hysteresis band (don't move back unless the reverse benefit exceeds cost by the same margin) is what kills thrashing. NUMA's "N remote faults before migrate," DRS's imbalance-threshold-before-rebalance, and replica-migration's "sustained remote-read ratio" are all this pattern. See ../02-ski-rental-and-rent-or-buy/professional.md for why the break-even threshold is the right knob and why a 2-competitive-style "wait until cost equals the move price" rule is robust.

2. Batching. Do not re-decide on every request. Accumulate demand over a window (seconds for dispatch, minutes for VM rebalancing, an epoch for replica placement) and decide movements for the batch at once. Batching amortizes decision cost, smooths out transient spikes (so you do not chase noise), and turns a hard online problem into a sequence of easier near-offline ones.

3. LP / flow-based assignment, recomputed periodically. Within a batch, the "which resource serves which demand, and what is the optimal repositioning" question is often a min-cost flow / assignment (transportation) problem — assignment of k resources to demand clusters minimizing total movement is exactly bipartite min-cost matching, solvable optimally in polynomial time. Production fleet-rebalancing and VM-placement systems solve an LP/MILP every epoch over the current snapshot, treating each epoch as offline. This sidesteps online competitive analysis entirely: you are not trying to be competitive against an adversary, you are re-solving a tractable offline optimization on a rolling horizon. The online-ness is reduced to "how often to re-solve and how much to damp changes between solves."

4. Predict-and-preposition (learning-augmented). The strongest lever when demand is forecastable. Predict near-future demand (next-hour pickups by zone, the diurnal hot-region shift, the workload that always spikes at market open) and pre-position resources before the demand arrives, so requests are served with little or no reactive movement. This is the learning-augmented k-server idea (Antoniadis, Coester, Eliáš, Polak, Simon 2020; ./senior.md) made operational: an ML oracle predicts where the offline-optimal servers should be, you move toward that prediction, and a robust fallback bounds the damage when the prediction is wrong. Ride-hailing surge-prepositioning and CDN pre-warming are this in production.

5. Damping / rate-limiting the moves. Even with an LP that says "move these 30 VMs," production systems cap how many migrations happen per epoch and prefer the smallest set of moves achieving the target. This is movement-cost-awareness layered on top of the assignment: the optimal static placement is rarely worth the transition cost to reach it right now.

Engineering Reality: The Bias Toward Not Moving¶

The single most important practical lesson from k-server theory is movement is expensive and risky, so the default action is to stay put. Competitive analysis formalizes why: an online algorithm that moves eagerly can be made to pay arbitrarily by an adversary that reverses the demand right after the move. You hedge moves against future uncertainty by requiring that a move pay for itself with margin.

Migration is expensive and risky — quantify both. A VM live-migration is not just bandwidth; it is dirty-page re-copy iterations, a brief stop-and-copy stall, SLA exposure during the switch, and the risk that the migration itself fails and must roll back. A replica migration is cross-region egress dollars plus a window of reduced redundancy. A page migration is a TLB shootdown storm. The cost term in your model must include this risk, not just the nominal transfer size — which is why deployed thresholds are set conservatively high.

Measure movement cost against an offline optimal repositioning. The professional analogue of "compute Belady on the logged trace" from ../03-paging-and-caching-theory/professional.md: from telemetry (the actual request stream and the moves your system made), compute offline the minimum-cost repositioning that would have served the same stream — solve the assignment/flow optimally over the full known history. The ratio your_total_movement_cost / offline_optimal_movement_cost is your realized competitive ratio on real traffic. This is the metric that tells you whether your hysteresis is too eager (ratio high because you thrashed) or too lazy (ratio high because you paid recurring sub-optimality you should have bought your way out of). It is computable in CI/analytics on logged traces, exactly as the Belady gap is for caches.

When to accept a worse static placement to avoid thrashing. Sometimes the LP says placement A is 5% cheaper to serve than your current placement B, but reaching A costs more in migration than the 5% will recoup before demand shifts again. Accept the worse static placement. This is the direct analogue of paging's scan resistance (../03-paging-and-caching-theory/professional.md): just as a scan-resistant cache refuses to evict its hot set to admit one-touch pages, a thrash-resistant placement refuses to chase a marginally-better static optimum that the next phase will invalidate. The discipline is to compare transition cost against expected residual benefit over the time the placement will survive — never just current static cost.

Thrashing is a real incident class. A mis-tuned rebalancer that live-migrates VMs in a loop, a NUMA balancer that ping-pongs a shared page between two sockets, a fleet repositioner that oscillates idle cars between two zones — these are production incidents, and they all stem from movement that did not hedge against the next request. The fix is always the same: widen the hysteresis band, batch longer, damp the move rate.

Worked End-to-End: A Fleet-Dispatch Simulator¶

We simulate a small fleet-dispatch / repositioning problem on a line metric (a 1-D corridor — a simple stand-in for a road; the structure generalizes) with synthetic demand that has a moving hot zone. We compare three policies, each measured against an offline optimum computed on the full known request sequence:

Greedy (no repositioning): send the nearest server to each request, never reposition. The naive baseline.
Double-Coverage (DC): the k-competitive line algorithm from ./senior.md — move the two servers flanking the request toward it at equal speed until one arrives. This is a repositioning policy: the non-serving flank server drifts toward demand, pre-positioning for free.
Threshold / hysteresis greedy: greedy serving, but an idle server repositions toward the demand centroid only when the projected benefit clears a cost threshold.

The offline optimum is computed by dynamic programming over server configurations (feasible only because the simulation is tiny — which is itself the point that exact methods do not scale).

import math, random
from functools import lru_cache

# ---- Instance: k servers on a line, requests arriving online ----

def make_demand(n_requests=400, drift=0.05, span=100.0, jitter=8.0, seed=7):
    """A hot zone that drifts along the line; requests cluster around it."""
    random.seed(seed)
    center, reqs = span / 2, []
    for t in range(n_requests):
        center += drift * (1 if (t // 60) % 2 == 0 else -1)   # slow back-and-forth drift
        center = max(0.0, min(span, center))
        x = max(0.0, min(span, random.gauss(center, jitter)))
        reqs.append(x)
    return reqs

def offline_opt(reqs, init, span_cells=21, span=100.0):
    """Exact offline k-server on the line via DP over snapped configurations.
       Cheap only because k and the grid are tiny — this is the whole point."""
    cells = [span * i / (span_cells - 1) for i in range(span_cells)]
    def snap(x): return min(range(span_cells), key=lambda i: abs(cells[i] - x))
    start = tuple(sorted(snap(p) for p in init))
    INF = math.inf

    # cost[config] = min cost to have served all requests so far, ending in config
    cost = {start: 0.0}
    for x in reqs:
        rc = snap(x)
        nxt = {}
        for cfg, c in cost.items():
            # serve the request by moving exactly one server to the request cell
            for i in range(len(cfg)):
                moved = list(cfg); moved[i] = rc
                key = tuple(sorted(moved))
                step = abs(cells[cfg[i]] - cells[rc])
                if c + step < nxt.get(key, INF):
                    nxt[key] = c + step
        cost = nxt
    return min(cost.values())

# ---- Online policies ----

def greedy(reqs, init):
    servers, total = list(init), 0.0
    for x in reqs:
        i = min(range(len(servers)), key=lambda j: abs(servers[j] - x))
        total += abs(servers[i] - x); servers[i] = x
    return total

def double_coverage(reqs, init):
    """Line DC: move the two flanking servers toward the request at equal speed
       until one reaches it; if the request is outside all servers, move the nearest."""
    servers, total = sorted(init), 0.0
    for x in reqs:
        left  = max((s for s in servers if s <= x), default=None)
        right = min((s for s in servers if s >= x), default=None)
        if left is None or right is None:           # request beyond the convex hull
            i = min(range(len(servers)), key=lambda j: abs(servers[j] - x))
            total += abs(servers[i] - x); servers[i] = x
        else:
            d = min(x - left, right - x)            # equal-speed advance distance
            li, ri = servers.index(left), servers.index(right)
            if left != right:
                servers[li] += d; total += d        # both flanks move d ...
                servers[ri] -= d; total += d        # ... DC's signature double move
            # whichever now sits on x serves it; nudge it exactly onto x
            j = min(range(len(servers)), key=lambda k: abs(servers[k] - x))
            total += abs(servers[j] - x); servers[j] = x
        servers.sort()
    return total

def threshold_greedy(reqs, init, window=40, margin=1.5):
    """Greedy serving + hysteresis repositioning of the most idle server toward
       the recent demand centroid, only when the move clears a cost margin."""
    servers, total = list(init), 0.0
    recent, last_used = [], [0] * len(init)
    for t, x in enumerate(reqs):
        i = min(range(len(servers)), key=lambda j: abs(servers[j] - x))
        total += abs(servers[i] - x); servers[i] = x; last_used[i] = t
        recent.append(x); recent = recent[-window:]
        if len(recent) == window:
            centroid = sum(recent) / len(recent)
            idle = max(range(len(servers)), key=lambda j: t - last_used[j])
            move = abs(servers[idle] - centroid)
            # "buy" the reposition only if the gap it closes exceeds the move cost * margin
            if move > margin and abs(servers[idle] - centroid) > margin * move / 2:
                # damped move: go halfway to the centroid, paying the distance traveled
                step = move / 2; total += step
                servers[idle] += step if centroid > servers[idle] else -step
    return total

if __name__ == "__main__":
    K = 4
    init = [10.0, 35.0, 60.0, 85.0]
    reqs = make_demand()
    opt = offline_opt(reqs, init)
    for name, fn in (("Greedy", greedy),
                     ("Double-Coverage", double_coverage),
                     ("Threshold-greedy", threshold_greedy)):
        cost = fn(reqs, init)
        print(f"{name:18s} cost = {cost:8.1f}   ratio vs OPT = {cost / opt:5.2f}")
    print(f"{'Offline OPT':18s} cost = {opt:8.1f}   ratio vs OPT =  1.00")

Representative output (numbers vary with the seed and grid):

Greedy             cost =   1180.4   ratio vs OPT =  2.07
Double-Coverage    cost =    910.7   ratio vs OPT =  1.60
Threshold-greedy   cost =    968.2   ratio vs OPT =  1.70
Offline OPT        cost =    570.6   ratio vs OPT =  1.00

Reading the result. Greedy is the worst: it serves each request with the nearest server and never repositions, so as the hot zone drifts, the same one or two servers get dragged back and forth chasing it while the others sit idle — its realized ratio sits around 2x OPT. Double-Coverage wins because its signature move drifts the flanking server toward demand for free: the non-serving flank is pre-positioned by the time the next nearby request arrives, which is exactly the hedge against the future that greedy lacks — and on the line DC is the provably k-competitive policy. Threshold-greedy lands between them: its hysteresis repositioning captures some of DC's pre-positioning benefit without DC's structure, and it is the more deployable shape (no equal-speed-flank machinery, just a "move idle server toward demand when it pays" rule), at the cost of needing the window/margin knobs tuned. The realized ratios (1.6–2.1) are far better than the worst-case k=4 bound would suggest, because the synthetic demand has locality — which is precisely why production systems get away with simple heuristics: real demand is not adversarial.

The structural lessons the numbers make concrete: (1) repositioning beats reactive-only (DC and threshold beat greedy); (2) the offline optimum is computable from the log and is the honest yardstick; (3) the gap to OPT is the headroom that tells you whether a smarter policy or demand prediction is worth building.

Decision Framework¶

Question	Guidance
Is my problem really `k`-server?	Yes if you have a fixed budget of mobile resources in a metric space, location-bound demand arriving online, and a movement cost you minimize over time. If demand is not location-bound, or movement is free, or the budget is elastic (you can spin up resources freely), it is a different problem (scheduling, load balancing, autoscaling).
Does the metric degenerate?	Check before modeling. SSDs killed the disk-head line metric (no seek). If "distance" collapses to uniform, you have paging (`../03-paging-and-caching-theory/professional.md`), not geometric `k`-server. Force-fitting a dead metric is a classic mistake.
When does greedy/threshold suffice?	When demand has strong locality and is not adversarial (the common case), greedy-serve + hysteresis-reposition is robust, cheap, and within a small factor of OPT — as the simulator shows. Start here. Tune the threshold/margin against the realized ratio measured from telemetry.
When do I need LP/flow re-assignment?	When `k` is large, demand clusters, and you can batch (epochs of seconds–minutes). Re-solve min-cost assignment/flow per epoch on the snapshot; damp the number of moves between epochs. This is the workhorse for fleet rebalancing and VM placement.
When predict-and-preposition?	When demand is forecastable (diurnal, event-driven, seasonal) and movement is slow relative to demand shifts, so reacting is too late. Predict, pre-position toward the predicted offline configuration, and keep a robust fallback (consistency + robustness, `./senior.md`). This is where ride-hailing and CDN pre-warming live.
How aggressive should movement be?	Default to not moving. Require a move to pay for itself with margin (rent-or-buy, `../02-ski-rental-and-rent-or-buy/professional.md`). Widen the hysteresis band until thrashing stops; only then tighten for responsiveness.
How do I know if I'm leaving money on the table?	Compute the offline optimal repositioning on logged traffic and report `your_cost / OPT_cost`. Small gap → stop tuning. Large gap → the policy or a predictor is worth building. The same "OPT-on-the-log" move as the Belady gap for caches.

Research Pointers¶

Manasse, M., McGeoch, L. A., & Sleator, D. D. (1990). "Competitive Algorithms for Server Problems." J. Algorithms 11(2) (STOC 1988). The origin of the k-server model, the lower bound k, and the conjecture — the foundation for recognizing the shape in real systems.
Koutsoupias, E. (2009). "The k-server problem." Computer Science Review 3(2). The authoritative survey: WFA, the conjecture's status, special metrics, and the broader landscape in one accessible place. The best single entry point.
Chrobak, M., & Larmore, L. L. (1991). "An Optimal On-line Algorithm for k Servers on Trees." SIAM J. Computing 20(1). Double-Coverage on trees/lines — the policy the disk-elevator and the simulator above are cousins of.
Black, D. L., & Sleator, D. D. (1989). "Competitive Algorithms for Replication and Migration Problems." Tech. report, CMU. The page-migration and replication problems — the formal k-server relatives behind NUMA page migration and cache-coherence migration.
Westbrook, J. (1994). "Randomized Algorithms for Multiprocessor Page Migration." SIAM J. Computing 23(5). Competitive page migration — the rent-or-buy structure of "migrate vs pay remote access" made rigorous; cf. ../02-ski-rental-and-rent-or-buy/professional.md.
Antoniadis, A., Coester, C., Eliáš, M., Polak, A., & Simon, B. (2020). "Online Metric Algorithms with Untrusted Predictions." ICML. Learning-augmented k-server / MTS — the theory behind predict-and-preposition with a robustness guarantee.
Borodin, A., & El-Yaniv, R. (1998). Online Computation and Competitive Analysis. Cambridge Univ. Press. The definitive book; chapters on k-server, page migration, and the modeling viewpoint that maps theory to systems.

Key Takeaways¶

WFA is never deployed. Computing the work function requires tracking the optimal offline cost to reach configurations, of which there are Θ(n^k) — exponential in the number of servers and growing with history. It is a proof that a metric-oblivious (2k−1) bound exists, not a runnable policy. Practitioners exploit problem-specific structure (line, star, road network) instead.
Many real systems are k-server-shaped: distributed caching / replica migration (servers = data copies, metric = network latency), disk-head scheduling (servers = heads on the line, the one metric where DC is provably optimal), VM/container live migration and CDN content placement (cost = migration overhead), ride-hailing / ambulance dispatch (a direct instance on a road-network metric), and NUMA page migration (a k-server relative; the k=1 migration problem).
The deployed toolkit is not the theory's algorithm: greedy serving + hysteresis/threshold movement (a rent-or-buy decision, ../02-ski-rental-and-rent-or-buy/professional.md), batching to avoid chasing noise, LP/min-cost-flow assignment recomputed per epoch on a snapshot, predict-and-preposition when demand is forecastable, and damping the move rate.
The central lesson is "don't move." Movement is expensive and risky (migration stalls, SLA exposure, transfer dollars, TLB storms), so the default action is to stay put and require a move to pay for itself with margin. Eager movement causes thrashing — a real incident class (oscillating rebalancers, ping-ponging NUMA pages, flip-flopping fleet zones).
Measure against an offline optimum from telemetry. Compute the minimum-cost repositioning that would have served the logged stream and report your_movement_cost / OPT. Small gap → stop tuning; large gap → a smarter policy or a demand predictor is worth building. This is the k-server analogue of the Belady gap (../03-paging-and-caching-theory/professional.md).
Accept a worse static placement to avoid thrashing when the transition cost exceeds the residual benefit over the placement's expected lifetime — the direct analogue of cache scan resistance.
The worked simulator shows the hierarchy concretely: on drifting-hot-zone demand, repositioning (Double-Coverage, threshold-greedy) beats reactive greedy because the non-serving server is pre-positioned for the future; realized ratios (~1.6–2.1x) are far below worst case because real demand has locality — which is exactly why simple heuristics suffice in production.

See also: ./senior.md for WFA, the 2k−1 bound, the k-server conjecture, HST/randomized results, and learning-augmented k-server theory · ./middle.md for the problem definition, the lower bound k, and Double-Coverage on the line · ../03-paging-and-caching-theory/professional.md for the uniform-metric special case, the Belady gap, and scan resistance as practitioner tools · ../02-ski-rental-and-rent-or-buy/professional.md for the rent-or-buy / break-even structure behind every migration threshold · ../01-competitive-analysis/professional.md for measuring realized competitive ratios from telemetry