# Geometric Amortization of Enumeration Algorithms

Novembre 24, 2022

# Enumeration Complexity

## Notation

Let $A(x,y)$ be a set.

Problem $EnumA$: output $A(x,\_):=\{y \mid A(x,y)\}$ on input $x$.

$y \in A(x,\_)$ will be called a solution of (the problem) $A$ on input $x$.

## Main Assumption

In this talk, $A$ is an $NP$-predicate, that is:

• $y \in A(x,\_)$ can be tested in polynomial time.
• $y$ is of size polynomial in the size of $x$.

Many natural problems have this property:

• Enumerate the answer set of a database query $Q$ on database $\mathbb{D}$
• Enumerate the transversals of a hypergraph $\mathcal{H}$

## Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

## Total Time

Total time needed to output every solution.

There can be exponentially many.

$EnumA$ is in $OutputP$ if it can be solved in time polynomial in:

• $|x|$
• and $|A(x,\_)|$

## Delay

Total time is not always satisfactory:

• Process solutions in a stream.
• Only peek at the solution set.

Delay: the longest time one has to wait between the output of two solutions.

## Holy $DelayP$ Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

Why do we care?

1. Guarantees a resonnable waiting time before next solution.
2. Gives a $t \times delay$ upper bound on the time needed to output $t$ solutions.

(Unpopular?) Opinion: Reason 1 is rarely useful…

## Linear Incremental Time

$IncP_1$

Algorithms such that for every $t$, after $t\cdot d(n)$ steps, the algorithm has output at least $t$ solutions.

We say that $d(n)$ is the incremental delay.

# Promoting $IncP_1$

## $DelayP$ vs $IncP_1$

Clearly $DelayP \subseteq IncP_1$: after $delay \times t$, at least $t$ solutions output.

For the other way around: $2^n$ delay but incremental delay of $2$. • if $t \leq 2^n$, $t$ solutions are output in time $t \leq 2t$
• if $t = 2^n+1$, last solution is output at time $2^{n+1} \leq 2t$

## (Naive) Regularization

Given an $IncP_1$-enumerator $A$ with incremental delay $d$, one can regularize the delay using a queue to delay the output of solutions every $d$ steps:

step = 0
q = Queue()
while(A is not finished):
    move(A)
step += 1
    if A outputs x:
q.add(x)
    if step == d:
output(q.pull())
step=0 
output(q) # output what remains in q

## $IncP_1 = DelayP$ When the simulation of $A$ reaches step $2^n$:

• pushed $2^n$ solutions in the queue
• pulled $2^{n}/2$ of them
• $2^n/2$ remains…

## Natural notion in data structure

Suppose enumeration algorithm $A$ works as follows:

• $A$ maintains a dynamic array $T$
• $A$ performs insert/deletion on $T$ between two outputs

Delay of $A$: worst case complexity of insert/delete

$\rightarrow$ $O(|T|)$ if a resize occurs…

but incremental delay of $A$: amortized complexity of insert/delete

$\rightarrow$ to analyse the time needed to output $k$ solutions, one can consider each operation on $T$ to be $O(1)$.

## Wrap up on $IncP_1$

• Incremental delay better than (worst case) delay.
• Better analysis tools using amortized complexity of data structure.

If you do not agree, $IncP_1 = DelayP$ anyway…

… but regularizing the delay is expensive in space.

# Geometric Amortization

## Main contribution

$IncP_1$$(POLYSPACE)$ $=$ $DelayP$$(POLYSPACE)$

Regularization using only polynomial space.

## Detailed Statement

For every $IncP_1$-enumerator $A(x)$ with $N$ solutions:

• incremental delay $d$,
• space $s$,
• total time $T$

there exists a $DelayP$-enumerator with

• delay $O(d \log(N))$
• space $O(s \log(N))$
• total time $O(T)$

## Geometric Amortization

• Maintain $l+1=\lceil \log(N) \rceil$ simulations $A_0, \dots, A_l$ of $A$
• $A_i$ outputs solutions for steps in $[2^id;2^{i+1}d[$.
• $A_{i+1}$ “moves faster” than $A_i$.

## Faster how?

• $A_i$ moves by at most $2d$ steps.
• If $A_i$ finds a solution in its zone, outputs it and proceed back with $A_l$.
• Otherwise, proceed with $A_{i-1}$.
• Stops when $A_0$ is out of its zone.

## Key Lemmas

1. If $A_0, \dots, A_i$ have output $k$ solutions, $A_{i+1}$ has moved by at least $2dk$ steps.
2. There are at least $2^i$ solutions in $[0; 2^id]$.

When $A_0$ is finished, $A_i$ has moved by at least $2^{i+1}d$ steps: it has explored all its zone.

## Delay

Between two outputs:

• each process $A_i$ moves by at most $2d$ steps,
• at most $l = \log(N)$ processes,
• step counters are incremented / compared (to check whether $A_i$ is in zone $[2^i \cdot d; 2^{i+1}\cdot d]$

Delay: time needed to simulate $l \times 2d$ steps and to work with counters.

## Simulating RAMs and Counters

• Simulating RAM: With bounds on incremental delay, space and number of solutions of $A$: $O(1)$ to simulate one step.

• Counters: of size at most $\log(dN)$

Gives a $O(d\cdot\log(dN)^2)$ delay (polynomial).

• Gray Code based counters + pointer to its most significant bit: delay of $O(d \cdot \log(N))$.

## Improvements

Previous algorithm assumes knowledge of (at least an upper bound):

1. $N$, number of solutions of $A(x)$
2. $s$, space used by $A(x)$
3. $d$, Incremental delay of $A(x)$
• Start with a bounded number $A_0, \dots, A_k$ processes
• When $A_k$ enters its zone, copy it into $A_{k+1}$
• This approach preserves total time.
• Encode memory of process with a dedicated dynamic data structure
• Tradeoff: $O(1)$ simulation with space $O(s \cdot n^\varepsilon)$ vs $O(\log \log n)$ simulation with space $O(s)$.

We do not know exactly but

## Unknown incremental delay I

Goal:

• Construct $REG$, taking as input a black box oracle access to an enumerator $A$; the incremental delay $d_A$ is unknown.
• $REG(A)$ outputs the same solutions as $A$ with delay $\alpha \cdot d_A$.
• Can $\alpha$ be independent on $A$ (constant)?

It is provably impossible:

• Use an adversarial oracle $A$ that outpus $\#A-1$ solutions first and hold on the last solution until $REG(A)$ outputs $\#A-1$ solutions.

## Unknown incremental delay II

For every $\varepsilon$, one can construct $REG$ such that $REG(A)$ has delay $d_A^{1+\varepsilon}$.

Naive regularization and pull a solution from the queue if no solution has been output since $C_\varepsilon \cdot \tilde{d}$

where

• $\tilde{d} = {steps \cdot (Nsol+1)^{-1}}$ approximate locally the incremental delay ($Nsol$ solutions seen after $steps$)
• $C_\varepsilon = {(1-2^\varepsilon)}^{-1}$

Open Problem: make this tradeoff work with geometric amortization.

# Generalizations and Applications

## Incremental time

The approach generalizes to collapse the following classes:

• Usual $k$-incremental time: the delay between solution $i$ and $i+1$ is $O(poly(n) i^k)$.
• Relaxed $(k+1)$-incremental time: for every $i$, at least $i$ solutions have been output after time $poly(n)i^{k+1}$.

Change budget given to each process to $2S^k(k+1)d$ where $S$ is the number of solutions output so far.

## Trading average delay with worst case delay

$E(x,\_)$ a self reducible problem :

• $E(x, \_) = \bigcup_{i} E(x_i, \_)$ with $|x_i| < |x|$
• Branch and bound enumeration algorithm $A$ for $E$
• Polynomial delay
• Better average delay when branches have a lot of solutions

Average delay is valid in every branches:

• it is an incremental delay,
• can be trade for a worst case delay using geometric amortization

## Example: enumerating DNF models

Problem:

• Input: a DNF $D = \bigvee_{i \leq m}\ S_i$ on variables $x_1,\dots,x_n$
• Output: every satisfying assignments of $x_1,\dots,x_n$ of $D$.

Branch and bound algorithm: extension problem efficiently solvable.

• If $D[x_1=0]$ has a solution, recursively enumerate $D[x_1=0]$.
• If $D[x_1=1]$ has a solution, recursively enumerate $D[x_1=1]$.
• Linear delay, in $|D|$ (if implemented correctly).

## DNF enumeration in strong polynomial delay

Open question: can we solve it with a delay that is polynomial only in the size of the solutions, ie, $n$.

Two conjectures:

• Strong DNF Enumeration conjecture: impossible with a delay with a $o(m)$ dependency. Refuted using Geometric Amortization
• DNF Enumeration conjecture: impossible with a delay $poly(n)$.

## DNF enumeration: $O(n^2 m^\mu)$ delay

Branch and bound algorithm has:

1. average delay $O(n^2 \cdot m^\mu)$ with $\mu = 1-\log_2(3)<1$,
2. incremental delay $O(n^2 \cdot m^\mu)$ with $\mu = 1-\log_2(3)<1$,
3. Geometric amortization gives worst case delay $O(poly(n) \cdot m^\mu)$

## Conclusion

• New method to make delay regular without exponential space
• Message: incremental delay may be more natural than worst case delay.

Open problem: can we regularize without changing output order?