Florent Capelli
UniversitΓ© dβArtois - CRIL
Back to School Conference
October 06, 2023
Specialized in AI, from GOFAI to ML.
http://www.cril.univ-artois.fr/
Interns, PhD students welcome!
Reach out at capelli@cril.fr.
Lens is roughly the same distance from Paris Gare du Nord than Saclay in the SNCF topology.
Knowledge is a central notion for AI:
Rich fields of research:
Data + Knowledge = Knowledge Base
Propositional Knowledge Bases:
Reasoning tasks of various nature:
This talk is about a specific topic tagged as AI but which has almost nothing to do with ChatGPT.
While ChatGPT represents knowledge and reasons on it in a way, it is merely an illusion:
Knowledge base for a finite set of propositions .
Implicit representation
Explicit representation
Looking for tradeoffs
60s
Wrap up:
A Propositional Knowledge Base is a subset of .
This is a Boolean function:
How can we represent Boolean functions?
where is a literal or for some variable .
Examples:
CNF formulas are extremely simple yet can encode many interesting problems.
Cook, Levin, 1971: The problem SAT of deciding whether a CNF formula is satisfiable is NP-complete.
Valiant 1979: The problem #SAT of counting the satisfying assignment of a CNF formula is #P-complete.
Not very interesting for reasoning tasks.
Research has focused on factorized representation.
Data structure based on decision nodes to represent β is evenβ.
Path for , and is accepting.
Previous data structure are Ordered Binary Decision Diagrams.
Letβs draw an OBDD that detects whether a matrix with has a row full of .
How many -matrices have a row full of ones?
: matrices
: matrices
matrices
Total:
This idea can be generalized to any OBDDs:
Let be a function computed by an OBDD having edges. We can compute with arithmetic operations.
Generalises to many tasks:
Good candidate for representing Boolean functions!
Orders of variables matters a lot:
Every OBDD computing has size .
Same as OBDD but variables may be tested in different order on different path as long as they are tested at most once on every path.
Advantages: more succinct
Drawbacks:
60s
Wrap up:
CNF : are natural, powerful but not tractable
Knowledge compilation: amortize the compilation (offline) phase during the query (online) phase
Many choices are possible: OBDD, FBDD, and many many others. Depends on what we want to do.
Notation | Query | Explanation |
---|---|---|
CO | Consistency check | Is D satisfiable? |
VA | Validity check | Is D a tautology? |
CE | Clause entailment | does D[Ο] is sat? |
SE | Sentential entailment | does D1βββD2? |
CT | Model counting | how many solutions has D? |
ME | Model enumeration | Enumerate the solutions of D. |
Β | CO | VA | CE | SE | CT | ME |
---|---|---|---|---|---|---|
DNNF | β | Γ | β | Γ | Γ | β |
d-DNNF | β | β | β | Γ | β | β |
dec-DNNF | β | β | β | Γ | β | β |
FBDD | β | β | β | Γ | β | β |
OBDD | β | β | β | β | β | β |
Exhaustive DPLL with Caching based on Shannon Expansion:
This scheme is parameterized by:
For many tasks, such as model counting, it is interesting to detect syntactic decomposable part of the formula, that is:
and
Cachet
SharpSAT
DSharp
D4
SDD
c2d
CUDD
for manipulating Decision Diagrams.ADDMC
D4
is a top-down compiler as shown earlier:
Is it useful to have -gates in practice?
Yes, exponential gain in circuit size on some instances:
There is a family of Boolean functions such that any FBDD computing has size at least but can be computed by a -sized dec-DNNF.
60s
Wrap up:
Data structure used in KC can be used in other areas of computer science to leverage existing results.
Data stored as relations (tables):
People | Id | Name | City |
---|---|---|---|
1 | Alice | Paris | 2 | Bob | Lens |
3 | Carole | Lille | |
4 | Djibril | Berlin |
Capital | City | Country |
---|---|---|
Berlin | Germany | |
Paris | France | |
Roma | Italy |
SQL is a full fledge language, hard to study.
Large class of queries are expressed by a smaller class: conjunctive queries.
People | Id | Name | City |
---|---|---|---|
1 | Alice | Paris | 2 | Bob | Lens |
3 | Carole | Lille | |
4 | Djibril | Berlin |
Capital | City | Country |
---|---|---|
Berlin | Germany | |
Paris | France | |
Roma | Italy |
Conjunctive queries are queries of the form: where
Database : list of relations filled with values in domain
Defines a new table where if each part of on variables are in .
CQ correspond to doing JOIN queries in SQL.
Bad new: given a conjunctive query and a database , it is NP-complete to decide whether !
And yet databases systems solve this kind of queries all the time!
Central class of conjunctive queries because of their tractability.
Total of solutions.
The trace of Yannakakis Algorithm on acyclic CQ is a decision-DNNF (non Boolean domain) of size linear in the data.
Datastructures known as βFactorized Databasesβ.
For every acyclic query and database , one can build a decision-DNNF computing of size .
Knowledge compilation style approach. One can efficently:
Unify existing results and push the hardness in the compilation part.
This compilation results can be used to recover many other results:
BPO problem:
where is a polynomial.
Observation: may be assumed to be multilinear since over
where
is maximal.
Semi ring:
Boolean function and :
Allows to encode optimization problems on Boolean functions.
For define: where
encodes !
and on as:
Example:
where .
Try using Algebraic Model Counting for BPO:
Example: solve such that :
How?