Florent Capelli, Oliver Irwin
CRIL, Université d’Artois
Séminaire KRDB
23 Janvier 2025
Join Query :
where is a tuple over
Example:
id | name | city |
---|---|---|
1 | Alice | Paris |
2 | Bob | Lens |
3 | Chiara | Rome |
4 | Djibril | Berlin |
5 | Émile | Dortmund |
6 | Francesca | Rome |
city | country |
---|---|
Berlin | Germany |
Paris | France |
Rome | Italy |
city | country | name | id |
---|---|---|---|
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Berlin | Germany | Djibril | 4 |
Rome | Italy | Francesca | 6 |
Quickly access , the element of .
city | country | name | id |
---|---|---|---|
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Berlin | Germany | Djibril | 4 |
Rome | Italy | Francesca | 6 |
? .
Naive algorithm: materialize in an array, sort it. Access.
city | country | name | id |
---|---|---|---|
Berlin | Germany | Djibril | 4 |
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Rome | Italy | Francesca | 6 |
Precomputation : at least (may be worse), very costly
Access : , nearly free
Variable order :
city | country | name | id |
---|---|---|---|
Berlin | Germany | Djibril | 4 |
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Rome | Italy | Francesca | 6 |
In this talk: only lexicographical orders.
Direct Access generalizes many tasks that have been previously studied:
Naive Direct Access:
Can we have better preprocessing and reasonable access time?
“Ideal” complexity:
Computing given and is -hard.
No Direct Access algorithm with good guarantees for every and .
Data complexity assumption: for a fixed , what is the best preprocessing for an access time ?
In this work, all presented complexity in data complexity will also be polynomial for combined complexity.
Direct Access for lexicographical order induced by ?
a | b |
---|---|
0 | 0 |
1 | 1 |
2 | 1 |
b | c |
---|---|
0 | 0 |
0 | 1 |
0 | 2 |
1 | 1 |
1 | 2 |
Precomputation :
Access :
Direct Access for lexicographical order induced by ?
Reduces to multiplying two -matrices over :
Given a query and order on its variables, we can compute such that:
So, if we understand everything for Direct Access and lexicographical orders, what is our contribution?
Negation interpreted over a given domain :
0 | 1 | 0 |
0 | 0 | 0 |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 0 | 0 |
1 | 0 | 1 |
1 | 1 | 0 |
1 | 1 | 1 |
, domain .
Positive encoding: preprocessing
0 | 1 | 0 |
1 | 0 | 1 |
Linear preprocessing!
linear preprocessing
non-linear preprocessing
Subqueries may be harder to solve than the query itself!
Equivalent to if
non-linear preprocessing (triangle)
DA for implies DA for for every !
Good candidate for :
Signed-HyperOrder Width
For a (positive) JQ signed JQ, and a variable ordering, we can solve DA with
Our contribution : new island of tractability for Signed JQ!
Signed Fractional HyperOrder Width (and incidentally, our result) generalizes:
show
can be defined
(better combined complexity)Basically, everything that is known to be tractable on SCQ/NCQ.
0 | 0 | 0 |
0 | 0 | 1 |
0 | 1 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 0 | 2 |
1 | 1 | 1 |
1 | 1 | 2 |
1 | 2 | 0 |
1 | 2 | 1 |
2 | 0 | 1 |
2 | 0 | 2 |
2 | 2 | 1 |
2 | 2 | 2 |
Factorized representation of relation :
Ordered: decision gates below only mention with .
For on domain , variables , DA possible :
Idea : for each gate over and for each domain value
compute the size of the relation where is set to a value
Compute the 7th solution 111
Compute the 13th solution 221
SCQ , .
Preprocessing:
Direct Access :
, considered constant here!
Compilation based on a variation of DPLL :