Université d’Artois - CRIL
Back to School Conference
October 06, 2023
Lens is roughly the same distance from Paris Gare du Nord than Saclay in the SNCF topology.
Knowledge is a central notion for AI:
Rich fields of research:
Data + Knowledge = Knowledge Base
Propositional Knowledge Bases:
Reasoning tasks of various nature:
This talk is about a specific topic tagged as AI but which has almost nothing to do with ChatGPT.
While ChatGPT represents knowledge and reasons on it in a way, it is merely an illusion:
Knowledge base for a finite set of propositions .
Looking for tradeoffs
A Propositional Knowledge Base is a subset of .
This is a Boolean function:
How can we represent Boolean functions?
where is a literal or for some variable .
CNF formulas are extremely simple yet can encode many interesting problems.
Cook, Levin, 1971: The problem SAT of deciding whether a CNF formula is satisfiable is NP-complete.
Valiant 1979: The problem #SAT of counting the satisfying assignment of a CNF formula is #P-complete.
Not very interesting for reasoning tasks.
Research has focused on factorized representation.
Data structure based on decision nodes to represent “ is even”.
Path for , and is accepting.
Previous data structure are Ordered Binary Decision Diagrams.
Let’s draw an OBDD that detects whether a matrix with has a row full of .
How many -matrices have a row full of ones?
This idea can be generalized to any OBDDs:
Let be a function computed by an OBDD having edges. We can compute with arithmetic operations.
Generalises to many tasks:
Good candidate for representing Boolean functions!
Orders of variables matters a lot:
Every OBDD computing has size .
Same as OBDD but variables may be tested in different order on different path as long as they are tested at most once on every path.
Advantages: more succinct
CNF : are natural, powerful but not tractable
Knowledge compilation: amortize the compilation (offline) phase during the query (online) phase
Many choices are possible: OBDD, FBDD, and many many others. Depends on what we want to do.
|CO||Consistency check||Is D satisfiable?|
|VA||Validity check||Is D a tautology?|
|CE||Clause entailment||does D[τ] is sat?|
|SE||Sentential entailment||does D1 ⇒ D2?|
|CT||Model counting||how many solutions has D?|
|ME||Model enumeration||Enumerate the solutions of D.|
Exhaustive DPLL with Caching based on Shannon Expansion:
This scheme is parameterized by:
For many tasks, such as model counting, it is interesting to detect syntactic decomposable part of the formula, that is:
CUDDfor manipulating Decision Diagrams.
D4 is a top-down compiler as shown earlier:
Is it useful to have -gates in practice?
Yes, exponential gain in circuit size on some instances:
There is a family of Boolean functions such that any FBDD computing has size at least but can be computed by a -sized dec-DNNF.
Data structure used in KC can be used in other areas of computer science to leverage existing results.
Data stored as relations (tables):
SELECT * FROM People JOIN Capital ON People.City=Capital.City
SELECT COUNT(*) FROM People WHERE City NOT IN (SELECT City FROM Capital)
SQL is a full fledge language, hard to study.
Large class of queries are expressed by a smaller class: conjunctive queries.
Conjunctive queries are queries of the form: where
Database : list of relations filled with values in domain
Defines a new table where if each part of on variables are in .
CQ correspond to doing JOIN queries in SQL.
Bad new: given a conjunctive query and a database , it is NP-complete to decide whether !
And yet databases systems solve this kind of queries all the time!
Central class of conjunctive queries because of their tractability.
Total of solutions.
The trace of Yannakakis Algorithm on acyclic CQ is a decision-DNNF (non Boolean domain) of size linear in the data.
Datastructures known as “Factorized Databases”.
For every acyclic query and database , one can build a decision-DNNF computing of size .
Knowledge compilation style approach. One can efficently:
Unify existing results and push the hardness in the compilation part.
This compilation results can be used to recover many other results:
where is a polynomial.
Observation: may be assumed to be multilinear since over
Boolean function and :
Allows to encode optimization problems on Boolean functions.
For define: where
and on as:
Try using Algebraic Model Counting for BPO:
Example: solve such that :