# Direct Access for Conjunctive Queries with Negations

Université d’Artois - CRIL

Probabilistic Circuits and Logic Workshop - Simons Institute

October 18, 2023

# Direct Access for Join Queries

## Context

Join Queries: $Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\vec{z_i})$

where $\vec{z_i}$ tuple over $X = \{x_1,\dots,x_n\}$

Example: $Q(city, country, name, id) = People(id, name, city) \wedge Capital(city, country)$

People Id Name City
1 Alice Paris
2 Bob Lens
3 Chiara Roma
4 Djibril Berlin
5 Émile Paris
6 Francesca Roma
Capital City Country
Berlin Germany
Paris France
Roma Italy
Q City Country Id Name
Paris France 1 Alice
Roma Italy 3 Chiara
Berlin Germany 4 Djibril
Roma Italy 6 Francesca
Q City Country Id Name
Berlin Germany 4 Djibril
Paris France 1 Alice
Roma Italy 3 Chiara
Roma Italy 6 Francesca

Reorder $Q(\mathbb{D})$ in lexicographical order.

## Direct Access

Given $Q(\vec{x})$ and $\mathbb{D}$, simulate array access to $Q(\mathbb{D})$ where $Q(\mathbb{D})[k]$ is the $k^{th}$ element of $Q(\mathbb{D})$ by lexicographical order.

Q City Country Id Name
Berlin Germany 4 Djibril
Paris France 1 Alice
Roma Italy 3 Chiara
Roma Italy 6 Francesca

$Q(\mathbb{D})$ is $(Roma, Italy, 6, Francesca)$.

Preprocessing expensive: possibly $|\mathbb{D}|^{g(|Q|)}$

• Compute $Q(\mathbb{D})$
• Sort the table

Answering DA queries Cheap: $polylog(|\mathbb{D}|)$

• Output $Q(\mathbb{D})[k]$ on input $k$.

Can we have a better preprocessing?

## Tractable Join Queries and Direct Access

There are classes of queries for which one can decide $Q(\mathbb{D}) \neq \emptyset$ in polynomial time both in $|\mathbb{D}|$ and $|Q|$.

• Result generalizes to computing $\#Q(\mathbb{D})$
• And to direct access
• HoF: Bagan, Durand, Olive, Grandjean, Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, Bringmann, Mengel, Schweikardt, Zeevi, Berkholz.

## Acyclic queries

Central class of join queries because of their tractability.

$R_1(x,y,z) \wedge R_2(x,z,u) \wedge R_3(x,y,t) \wedge R_4(y,t) \wedge R_5(y,v)$

## Yannakakis Algorithm Filter every tuple in a relation that cannot be extended to a solution below

## Twisting Yannakakis for Counting

Total of $4$ solutions.

One can use these annotations top-down to solve direct access queries on compatible orders

• for example for $(x,y,z,t,u,v)$.
• $Q(\mathbb{D})$ must set $x=2,y=1,z=0$, then proceed down in the tree.

## Direct Access on Acyclic Queries

Given a join query $Q(x_1,\dots,x_n)$, if $x_1,\dots,x_n$ is a compatible order then we can answer direct access queries with precomputation time $O(|\mathbb{D}|poly(|Q|))$ and access time $O(poly(|Q|)polylog(|\mathbb{D}|))$.

Compatible order?

## Acyclicity and elimination order

An $\alpha$-leaf in a hypergraph $H=(V,E)$ is $x \in V$ such that there is $e \in E$, $N(x) \subseteq e$.

A hypergraph $H$ is $\alpha$-acyclic iff one can obtain $\emptyset$ by successively removing $\alpha$-leaves in $H$: induces an order on $V$ called an $\alpha$-elimination order

[Brault Baron, 2014], simplification of [Graham-Yu-Özsoyoglu, 1979], also known as “without disruptive trio” in [Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

## Direct Access on Acyclic Queries

Given a join query $Q(x_1,\dots,x_n)$, if $x_n,\dots,x_1$ is a $\alpha$-elimination order then we can answer direct access queries with precomputation time $O(|\mathbb{D}|poly(|Q|))$ and access time $O(poly(|Q|)polylog(|\mathbb{D}|))$.

[Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

Algorithm schema:

1. Construct a join tree “compatible” with the $\alpha$-elimination order
2. Load data in the join tree, anontate tuples with number of extension.
3. Use top-down induction on the annotated join tree to answer DA query $Q(\mathbb{D})[k]$.

# A circuit approach to Direct Access

## Yannakakis and circuits

Factorised DB POV: the trace of Yannakakis Algorithm is roughly a decision-DNNF of size linear.

## Ordered decision circuit

$x_1$ $x_2$ $x_3$
0 0 0
0 0 1
0 1 0
0 1 1
1 2 0
1 2 1
1 0 2
1 1 2
2 2 2

## Exhaustive DPLL, Join Queries and Circuit: Yannakakis Upside-down

Let $Q$ be a JQ and $x_n, \dots, x_1$ an $\alpha$-elimination order of $Q$.

• $Q(\mathbb{D}) = \biguplus_{d \in D} Q[x_1 = d](\mathbb{D})$
• If $Q = Q_1 \wedge Q_2$ and $var(Q_1) \cap var(Q_2) = \emptyset$ then $Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})$
• Implement this rules recursively + cache already computed queries $Q'$

It produces an ordered circuit computing $Q(\mathbb{D})$ of size $poly(Q) O(|\mathbb{D}|)$

## Solving Direct Access on Circuits: Precomputation

Precomputation: Compute for each decision gate $v$ on $x$ and value $d$, how many solutions of $v$ we can build by setting $x$ to $d' \leq d$.

## Solving Direct Access on Circuits: Access

Find the $7^{th}$ solution.

Access: construct the $k^{th}$ solution one variable at a time.

## Solving Direct Access on Circuits: Access

Find the $13^{th}$ solution.

Access: construct the $k^{th}$ solution one variable at a time.

## Direct Access for JQs from Circuits

Given a join query $Q(x_1,\dots,x_n)$, if $x_n,\dots,x_1$ is a $\alpha$-elimination order then we can answer direct access queries with precomputation time $O(|\mathbb{D}|poly(|Q|))$ and access time $O(poly(|Q|)polylog(|\mathbb{D}|))$.

Algorithm schema:

1. Construct a “compatible” join tree “compatible”
2. Load data in the join tree, anontate tuples.
3. Top-down induction to answer DA query $Q(\mathbb{D})[k]$.
1. Construct an ordered circuit $C$ for $Q(\mathbb{D})$.
2. Anotate the gates with number of solutions.
3. Use top-down induction on the circuit to answer DA query $Q(\mathbb{D})[k]$ in time $n \cdot polylog(|\mathbb{D}|)$

## Differences between both approaches

Why bother? It looks like the same algorithm on a slightly different data structure.

• It is not: join tree has an extra underlying “tree” structure, the circuit obtained from this has more structure.

• There exist relations with small ordered circuit but big tree-structured circuits [LICS17].

• Can we solve more direct access problem with our technique?

## Negative Join queries

Negative Join Queries (NJQ): $Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k \neg R_i(\vec{z_i})$

• Big difference:
• positively encoding $\neg R(\vec{z})$ on domain $D$ need $(D^{|\vec{z}|}-\#R)$ tuples.
• if $Q$ is tractable for every $\mathbb{D}$ then every $Q' \subseteq Q$ is tractable too because we can set $R_i = \emptyset$ in the database, which does not happen in the positive case.

## $\beta$-acyclic queries

Good candidates for tractability of Negative Join Queries: every $Q' \subseteq Q$ is acyclic.

This is a property known as $\beta$-acyclicity.

Characterization: $H$ is $\beta$-acyclic iff there exists a $\beta$-elimination order $x_n, \dots, x_1$, ie iteratively removing $\beta$-leaves, ie, $x$ such that edges containing $x$ are ordered by $\subseteq$:

Alternatively: a $\beta$-elimination order is an order that is an $\alpha$-elimination order for every subquery.

## Compiling $\beta$-acyclic NJQ

Let $Q$ be an NJQ and $x_n, \dots, x_1$ a $\beta$-elimination order for $Q$. Exhaustive DPLL on $Q$, $\mathbb{D}$ and with order $x_1 \dots x_n$ returns an ordered circuit of size $O(poly(|Q|)poly(|\mathbb{D}|))$.

Generalization of [LICS17].

Corollary: Direct Access for $\beta$-acyclic NJQ with $O(poly(|\mathbb{D}|))$ preprocessing and access time $O(polylog(|\mathbb{D}|)poly(|Q|))$ for lexicographical orders based on (reversed) $\beta$-elimination orders.

Side observation: we know that “join tree” based techniques fail for $\beta$-acyclic NJQ!

## Going further

Technique generalizes to:

• signed join queries (mixing positive and negative atoms)
• conjunctive (with $\exists$-quantifiers) signed queries:
• project $\exists$ directly in the circuits
• as long as one projects a suffix $x_k \dots x_n$
• Non-acyclic signed conjunctive queries:
• arbitrary elimination order can be associated with a notion of “width”
• match (fractional) hypertree width for the positive case
• corresponds to the width of the “worst” subhypergraph of a given order in the negative case

## Conclusion

• Circuit based tractability of queries is powerful and modular
• Deports the heavy lifting to the compilation part, treating the circuit is usually easier than working on annotated join trees.
• Future work:
• generalize FAQ/AJAR style queries to JNQ by working directly on the circuit
• Better understanding on the new width measures