Florent Capelli, Oliver Irwin
CRIL, UniversitΓ© dβArtois
June 13, 2024
Join Query : $Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\mathbf{x_i})$
where $\mathbf{x_i}$ is a tuple over $X = \{x_1,\dots,x_n\}$
Example: $Q(city, country, name, id) = People(id, name, city) \wedge Capitals(city, country)$
id | name | city |
---|---|---|
1 | Alice | Paris |
2 | Bob | Lens |
3 | Chiara | Rome |
4 | Djibril | Berlin |
5 | Γmile | Dortmund |
6 | Francesca | Rome |
city | country |
---|---|
Berlin | Germany |
Paris | France |
Rome | Italy |
city | country | name | id |
---|---|---|---|
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Berlin | Germany | Djibril | 4 |
Rome | Italy | Francesca | 6 |
Quickly access $Q(\mathbb{D})[k]$, the $k^{th}$ element of $Q(\mathbb{D})$.
city | country | name | id |
---|---|---|---|
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Berlin | Germany | Djibril | 4 |
Rome | Italy | Francesca | 6 |
$Q(\mathbb{D})[2]$? $(Rome, Italy, Chiara, 3)$.
Naive algorithm: materialize $Q(\mathbb{D})$ in an array, sort it. Access.
city | country | name | id |
---|---|---|---|
$\dots$ | $\dots$ | $\dots$ | $\dots$ |
Berlin | Germany | Djibril | 4 |
$\dots$ | $\dots$ | $\dots$ | $\dots$ |
Paris | France | Alice | 1 |
$\dots$ | $\dots$ | $\dots$ | $\dots$ |
Rome | Italy | Chiara | 3 |
Rome | Italy | Francesca | 6 |
$\dots$ | $\dots$ | $\dots$ | $\dots$ |
$Q(\mathbb{D})[1432] = ??$
Precomputation : $O(\#Q(\mathbb{D}))$ at least (may be worse), very costly
Access : $O(1)$, nearly free
Variable order $(city, country, name, id)$:
city | country | name | id |
---|---|---|---|
Berlin | Germany | Djibril | 4 |
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Rome | Italy | Francesca | 6 |
In this talk: only lexicographical orders.
Direct Access generalizes many tasks that have been previously studied:
Naive Direct Access:
Can we have better preprocessing and reasonable access time?
βIdealβ complexity:
Computing $\#Q(\mathbb{D})$ given $Q$ and $\mathbb{D}$ is $\#P$-hard.
No Direct Access algorithm with good guarantees for every $Q$ and $\mathbb{D}$.
Data complexity assumption: for a fixed $Q$, what is the best preprocessing $f(|\mathbb{D}|)$ for an access time $O(polylog |\mathbb{D}|)$?
In this work, all presented complexity in data complexity will also be polynomial for combined complexity.
$Q(a,b,c) = A(a,b) \wedge B(b,c).$
Direct Access for lexicographical order induced by $(a,b,c)$?
a | b |
---|---|
0 | 0 |
1 | 1 |
2 | 1 |
b | c |
---|---|
0 | 0 |
0 | 1 |
0 | 2 |
1 | 1 |
1 | 2 |
Precomputation :
Access $Q[5]$:
$Q(a,c,b) = A(a,b) \wedge B(b,c).$
Direct Access for lexicographical order induced by $(a,c,b)$?
Reduces to multiplying two $\{0,1\}$-matrices $M,N$ over $\mathbb{N}$:
Given a query $Q$ and order $\pi$ on its variables, we can compute $\iota(Q,\pi)$ such that:
So, if we understand everything for Direct Access and lexicographical orders, what is our contribution?
$Q = \bigwedge_{i=1}^k P_i(\mathbf{z_i})$ $\bigwedge_{i=1}^l \lnot N_i(\mathbf{z_i})$
Negation interpreted over a given domain $D$:
$x_1$ | $x_2$ | $x_3$ |
---|---|---|
0 | 1 | 0 |
$x_1$ | $x_2$ | $x_3$ |
---|---|---|
0 | 0 | 0 |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 0 | 0 |
1 | 0 | 1 |
1 | 1 | 0 |
1 | 1 | 1 |
$Q(x_1,\dots,x_n) = \neg N(x_1,\dots,x_n)$, domain $\{0,1\}$.
Positive encoding: preprocessing $O(2^n)$
$x_1$ | $x_2$ | $x_3$ |
---|---|---|
0 | 1 | 0 |
1 | 0 | 1 |
Linear preprocessing!
$Q_1 = R(1, 2, 3) \land S(1, 2) \land T(2, 3) \land U(3, 1)$
linear preprocessing
$Q_2 = S(1, 2) \land T(2, 3) \land U(3, 1)$
non-linear preprocessing
Subqueries may be harder to solve than the query itself!
$Q_1' =$ $\lnot R(1, 2, 3)$ $\land S(1, 2) \land T(2, 3) \land U(3, 1)$
Equivalent to $Q_2$ if $R=\emptyset$
$Q_2 = S(1, 2) \land T(2, 3) \land U(3, 1)$
non-linear preprocessing (triangle)
DA for $Q = P \wedge N$ implies DA for $Q = P \wedge N'$ for every $N' \subseteq N$!
Good candidate for $Q = Q^+ \wedge Q^-$:
Signed-HyperOrder Width $show(Q,\pi) = \max_{Q' \subseteq Q^-} \iota(Q^+ \wedge Q', \pi)$
For $Q$ a (positive) JQ signed JQ, and $\pi$ a variable ordering, we can solve DA with
Our contribution : new island of tractability for Signed JQ!
Signed HyperOrder Width (and incidentally, our result) generalizes:
Basically, everything that is known to be tractable on SCQ/NCQ.
$x_1$ | $x_2$ | $x_3$ |
---|---|---|
0 | 0 | 0 |
0 | 0 | 1 |
0 | 1 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 0 | 2 |
1 | 1 | 1 |
1 | 1 | 2 |
1 | 2 | 0 |
1 | 2 | 1 |
2 | 0 | 1 |
2 | 0 | 2 |
2 | 2 | 1 |
2 | 2 | 2 |
Factorized representation of relation $R \subseteq D^X$:
Ordered: decision gates below $x_i$ only mention $x_j$ with $j>i$.
For $C$ on domain $D$, variables $x_1,\dots,x_n$, DA possible :
Idea : for each gate $v$ over $x_i$ and for each domain value $d$
compute the size of the relation where $x_i$ is set to a value $d'\leqslant d$
Compute the 7^{th} solution $\to$ 111
Compute the 13^{th} solution $\to$ 221
SCQ $Q(x_1,\dots,x_n)$, $\pi = (x_1,\dots,x_n)$.
Preprocessing: $\tilde{O}(|\mathbb{D}|^{1+show(Q,\pi)})$
Direct Access : $O(\log |\mathbb{D}|)$
$Q$, $n$ considered constant here!
Compilation based on a variation of DPLL :