Building Relational Circuits

Florent Capelli

CRIL, Université d’Artois

ICDT 2026

24 March 2026

Representing Relations

Relations

Relation RDXR \subseteq D^X: set of tuples, over domain DD and variables XX.

xx yy zz
1 2 1
2 0 1
0 0 1
1 2 0

Table representation
Explicit representation

xx yy
1 2
2 0
0 0

\bowtie

yy zz
2 1
0 1
2 0

Result of a (join) query
Implicit Representation

Using Relations

A relation RR represents data we want to use.

Enumerate

Stream every tuple to perform an action.

Agregate

Compute statistics on RR, compute the size #R\#R, ratios #R[x=Alice]/#R\#R[x=Alice]/\#R, aggregate with semiring operation etc.

Glimpse

(Uniform) sampling, enumerate representative tuples etc.

Access

Quickly output the kthk^{th} tuple (assuming an order on RR), find median tuple etc.

Everything is “easy” if RR is given as a table, but is it necessary to materialize it?

A bit of (combined) complexity

Complexity varies depending on the representation of RDXR \subseteq D^X.

Task Table Join Query
Output one tuple in RR O(1)O(1) NP-complete
Output “largest” tuple in RR Linear O(#R)O(\#R) NP-complete
Compute #R\#R O(1)O(1) #P-complete

We are looking for succinctness/tractability tradeoffs!

Yannakakis Algorithm

A well behaved join

Q=S(x,y,a)R(x,y,z)T(x,z,b)Q = S(x,y,a) \bowtie R(x,y,z) \bowtie T(x,z,b)

R x y z
0 0 0
0 1 0
0 0 1
1 1 1
S x y a
0 0 1
0 0 2
0 1 1
T x z b
0 0 1
0 0 3
0 1 2
1 1 1
Q[xyz=000]Q[xyz=000] =S[xy=00]×T[xz=00]=S[xy=00] \times T[xz=00]
={a=1,a=2}×{b=1,b=3}=\{a=1,a=2\} \times \{b=1,b=3\}
Q[xyz=010]Q[xyz=010] =S[xy=01]×T[xz=00]=S[xy=01] \times T[xz=00]
={a=1}×{b=1,b=3}=\{a=1\} \times \{b=1, b=3\}
Q[xyz=001]Q[xyz=001] =S[xy=00]×T[xz=01]=S[xy=00] \times T[xz=01]
={a=1,a=2}×{b=2}=\{a=1, a=2\} \times \{b=2\}
Q[xyz=111]Q[xyz=111] =S[xy=11]×T[xz=11]=S[xy=11] \times T[xz=11]
=×{b=1}=\emptyset \times \{b=1\}

Acyclic Join Query

A Join Query is acyclic iff it has a join tree.

  • Each variable is connected.
  • Well behaved at each node.
  • Bottom-up algorithm.

Bottom up construction

Q=R(SQ1)(TQ2)Q = R \bowtie (S \bowtie Q_1) \bowtie (T \bowtie Q_2)

At most ×\times per tuple in RR.

At most one tree per tuple in SS.

At most one tree per tuple in TT.

R x y z
0 0 0
0 1 0
0 0 1
1 1 1
S x y a
0 0 1
0 0 2
0 1 1
T x z b
0 0 1
0 0 3
0 1 2
1 1 1

Q1Q_1

Q2Q_2

Acyclic Join Query Complexity

Given an acyclic join query Q=(R1Rm)Q = (R_1 \bowtie \dots \bowtie R_m) and a join tree for QQ:

  • Bottom up construction gives a linear size representation of rel(Q)rel(Q)
  • Representation uses Cartesian products, decision nodes and factorization.

Factorized Databases Zoo

{×,}\{\times, \cup\}-circuits

Introduced by Dan Olteanu and Jakub Závodný under the name d-representation.

✅ Find a tuple

✅ Find a tuple τ\simeq \tau

✅ Find “optimal” tuple

✅ Enumerate (delay O(|C|)O(|C|)).

❌ Counting

❌ Many Aggregation tasks.

Smoothness

  • Ambiguous value, two possible semantics:
    • “Don’t care”: (x=0×yD)(xD×y=1)(x=0 \times y \in D) \cup (x \in D \times y=1)
    • Zero suppressed semantics: (x=0×y=DEF)(x=DEF×y=1)(x=0 \times y=DEF) \cup (x=DEF \times y=1).
  • Smoothness: every \cup is on same attributes relations.
    • purely syntactic
    • ensured in the algorithms presented today
    • can always be ensured by syntactic transformation

{,×}\{\uplus,\times\}-circuit

Unions have disjoint tuples (sometimes called deterministic, or unambiguous unions).

✅ Find a tuple

✅ Enumerate: delay depth of CC.

We can do better, check [Amarilli, Bourhis, Jachiet, Mengel].

✅ Counting

✅ Counting with semiring weights

❌ coNP-hard to check.

{×,𝖽𝖾𝖼}\{\times,\mathsf{dec}\}-circuits

Knowledge Compilation

Multi-domain generalization of well-studied Boolean circuits.

Gates Boolean Domain Enum Count
{,}\{\bowtie, \cup\} NNF
{,×}\{\cup, \times\} DNNF
{,×}\{\uplus, \times\} d-DNNF
{𝖽𝖾𝖼,×}\{\mathsf{dec}, \times\} dec-DNNF
{𝖽𝖾𝖼}\{\mathsf{dec}\} FBDD

Removing constants

For convenience, we may allow constants:

  • \bot for empty relation.
  • \top for {}\{\langle \rangle\}.
  • \bot is annoying for enumeration purposes: remove it!




Solving Tasks on Circuits

Design algorithm on the circuit directly:

  • Size of the relation: O(|C|)O(|C|)
    • #rel(g1××gk)=i=1k#rel(gi)\#rel(g_1 \times \dots \times g_k) = \prod_{i=1}^k \#rel(g_i)
    • #rel(g1gk)=i=1k#rel(gi)\#rel(g_1 \uplus \dots \uplus g_k) = \sum_{i=1}^k \#rel(g_i)
    • works for any semi-ring
  • Enumeration: O(#vars)O(\#vars) delay
    • first tuple of gg
    • next tuple of gg

Consequence for Acyclic Join Queries: counting in linear time (data complexity), enumeration in constant time after linear preprocessing.

Lower Bounds

Circuits useful to design algorithms, but also for lower bounds.

The answers of a JQ QQ can be enumerated with linear preprocessing and constant delay for every DB

iff QQ is acyclic assuming the Boolean Matrix Multiplication Conjecture.

[Bagan, Durand, Grandjean] Conditional lower bound, for any algorithm

vs

If QQ is not acyclic, there exists ε>0\varepsilon > 0 s. t. for every NN, there is a database DND_N of size NN, such that any circuit computing the answers of QQ over DND_N is of size at least N1+εN^{1+\varepsilon}.

Unconditional lower bound, but targeting a large class of algorithms.

Bedtime stories

Top-down Compilation

Recalling Bottom-up Compilation

Q=R(SQ1)(TQ2)Q = R \bowtie (S \bowtie Q_1) \bowtie (T \bowtie Q_2)

R x y z
0 0 0
0 1 0
0 0 1
1 1 1
S x y a
0 0 1
0 0 2
0 1 1
T x z b
0 0 1
0 0 3
0 1 2
1 1 1

Q1Q_1

Q2Q_2

Start fresh

Set x,y,zx,y,z to the first tuple in RR

On the remaining variables, QQ has two connected components. Proceed with S(0,0,a)Q1S(0,0,a) \wedge Q_1 before T(0,0,b)Q2T(0,0,b) \wedge Q_2

Pick value 11 for aa and continue with Q1Q_1.

Backtrack and pick 22 for aa, continue with Q1Q_1.

Do the same with T(0,0,b)Q2T(0,0,b) \wedge Q_2.

Backtrack and set z=1z=1.

Recursive call on S(0,0,a)Q1S(0,0,a) \bowtie Q_1. AGAIN??! Better reuse!

Construct T(0,1,b)Q2T(0,1,b) \bowtie Q_2.

Draw the rest of the owl!

Breaking news

Use caching to “reverse” the dynamic programming :

Wait, do we even need the join tree? No! Only an order

Enters Exhaustive DPLL

Exhaustive DPLL complexity

Fix an order π=x1,,xn\pi = x_1,\dots,x_n on var(Q)var(Q):

EDPLL(Q,π)EDPLL(Q, \pi) builds a circuit of size at most O(Nι(Q,π))O(N^{\iota(Q,\pi)}) for some ι(Q,π)\iota(Q,\pi) \in \mathbb{Q}.

In data complexity, but polynomial in combined complexity.

Similar result using Yannakakis:

Projecting variables

Existentially project variables: Y.Q\exists Y. Q

If YY at the end of the order:

  • ACQ: free-connexity!
  • YY tested at the bottom
  • Can be projected!
  • Example: project aa, project yy

Exhaustive DPLL in Knowledge Compilation

Join Queries with Negated Atoms

Signed Join Queries

Q=i=1kPi(𝐳𝐢)Q = \bigwedge_{i=1}^k P_i(\mathbf{z_i}) i=1l¬Ni(𝐳𝐢)\bigwedge_{i=1}^l \lnot N_i(\mathbf{z_i})

Negation interpreted over a given domain DD:

NN
x1x_1 x2x_2 x3x_3
0 1 0
¬N\lnot N on {0,1}\{0,1\}
x1x_1 x2x_2 x3x_3
0 0 0
0 0 1
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
  • ¬N(x1,,xk)\neg N(x_1,\dots,x_k) encoded with |D|k#N|D|^{k}-\#N tuples.
  • Relation with SAT: ¬N\neg N is x1¬x2x3x_1 \vee \neg x_2 \vee x_3

Positive Encoding not Optimal

Q(x1,,xn)=¬N(x1,,xn)Q(x_1,\dots,x_n) = \neg N(x_1,\dots,x_n), domain {0,1}\{0,1\}.

Positive encoding: preprocessing O(2n)O(2^n)

N
xx yy zz
0 0 0

Hardness of subqueries

Q1=R(x1,x2,x3)S(x1,x2)T(x2,x3)U(x3,x1)Q_1 = R(x_1, x_2, x_3) \land S(x_1, x_2) \land T(x_2, x_3) \land U(x_3, x_1)

linear sized circuit

Q2=S(x1,x2)T(x2,x3)U(x3,x1)Q_2 = S(x_1, x_2) \land T(x_2, x_3) \land U(x_3, x_1)

non-linear sized circuit (|dom|3/2|dom|^{3/2})

Subqueries may be harder to solve than the query itself!

Subqueries and negative atoms

Q1=Q_1' = ¬R(1,2,3)\lnot R(1, 2, 3) S(1,2)T(2,3)U(3,1)\land S(1, 2) \land T(2, 3) \land U(3, 1)

Equivalent to Q2Q_2 if R=R=\emptyset

Q2=S(1,2)T(2,3)U(3,1)Q_2 = S(1, 2) \land T(2, 3) \land U(3, 1)

non-linear circuit

Circuit construction for Q=PNQ = P \wedge N implies circuit construction for Q=PNQ = P \wedge N' for every NNN' \subseteq N!

Generalizing Exhaustive DPLL

Exhaustive DPLL works out of the box in this case, but can be improved:

A Star in a Negative Potatoe

S1S_1 hh x1x_1
0 0
1 0
1 1
S2S_2 hh x2x_2
0 0
1 0
1 1
S3S_3 hh x3x_3
0 0
1 0
1 1
KK hh x1x_1 x2x_2 x3x_3
1 1 1 1

S1S2S3¬KS_1 \bowtie S_2 \bowtie S_3 \bowtie \neg K

Exhaustive DPLL Guarantees

QQ SJQ. Fix an order π=x1,,xn\pi = x_1,\dots,x_n on var(Q)var(Q):

EDPLL(Q,π)EDPLL(Q, \pi) builds a circuit of size at most O(Nκ(Q,π))O(N^{\kappa(Q,\pi)}) for some κ(Q,π)\kappa(Q,\pi) \in \mathbb{Q}.

In data complexity but polynomial wrt combined complexity.

Conclusion

Circuit propaganda
Take home message

Other uses of Relational Circuits

See survey Tractable Circuits in Database Theory, Amarilli and C. shameless plug

Thank you! Enjoy ICDT26!