kenics.net

Technical notes on perl, python, php, sql, cgi, c/c++, q/kdb+, unix/shell, revision control tools, data structures & algorithms, and their applications into web services and other various forms of software engineering.

knowledge based AI

Georgia Tech course. knowledge based artificial intelligence (KBAI) systems. cs7637 notes. 
 
########################## 
####  intro lecture   #### 
########################## 
 
- fundamental conundrums 
(1) intelligent agents have limited resources. 
(2) computation is local, but problems have global constraints. 
(3) logic is deductive, but many problems are not. 
(4) the world is dynamic, but knowledge is limited. 
(5) problem solving, reasoning, learning are complex. but explanation/justification are more complex. 
 
- characteristics of AI "problems" 
(1) knowledge often arrives incrementally. not at once. 
(2) problems often recurring patterns (optimal substructure, overlapping subproblems) 
(3) problems have multiple levels of granularity. 
(4) many problems are computationally intractable. 
(5) the world is dynamic but knowledge of the world is static. 
(6) the world is open-ended. but the knowledge of the world is limited. 
 
- characteristics of AI "agents" 
(1) agents have limited computing power. (processor, mem, etc) 
(2) agents have limited sensors. 
(3) agents have limited attention. (cannot focus on everything at once) 
(4) computational logic is fundamentally deductive. 
(5) agents have limited knowledge. 
 
- what is KBAI 
- watson AI example 
(1) reasoning 
(2) learning 
(3) memory 
==> "deliberation" 
 
input -> cognitive systems(=meta-cognition + deliberation + reaction) -> output 
 
- four schools of AI 
(1) thinking  <-> (2) acting 
(3) optimally <-> (4) like humans 
 
e.g. 
machine learning = (1)(3) 
semantic web = (1)(4) 
airplane autopilot = (2)(3) 
improv dabce robots = (2)(4) 
 
 
- cognitive: human-like intelligence 
- systems: multiple interacting components (e.g. learning, reasoning, memory) 
===> cognitive systems: systems that exhibit human like intelligence thru learning, reasoning, mem, etc. 
 
input -> cognitive systems(=meta-cognition + deliberation + reaction) -> output 
 
e.g. reaction = car driver sees a red light, then pulls a brake. 
e.g. deliberation = watson 
e.g. metacognition = driver turning left gets honked and decides to change his lane. ie his reaction caused input to another cognitive systems loop. 
 
 
--- topics in AI 
(video) https://www.youtube.com/watch?v=oLiQQGJYqp4 
 
(1) fundamentals 
- knowlegde representations(= semantic networks & production systems) 
- reasoning strategy(= generate and test, means-end analysis, problem reduction) 
(2) planning 
- logic as knowledge representation, which enables planning 
(3) common sense reasoning 
- knowledge representation (frames, scripts) 
- reasoning strategy (understanding, common sense reasoning) 
(4) learning 
- by recording cases 
- incremental concept learning 
- classification 
- version spaces 
(5) visuospatial reasonin 
- visual info 
- constraints propagation 
(6) design & creativity 
- configuration, diagnosis, design, creativity 
(7) meta-cognition 
- thinking about thinking, reasoning about reasoning 
- learning by correcting mistakes, meta-reasoning, ethics in AI 
(8) analogical reasoning 
- learning by recording cases 
- case based reasoning 
- explanation based learning 
 
 
------------------------------------------------- 
 
###################### 
###  course intro  ### 
###################### 
 
 
methods (learning, reasoning, memory, meta-cognition, etc) 
tasks (e.g. understanding, planning, classification, design, etc) 
 
agents use method to address tasks. 
 
relationship between AI and human cognition 
 
after the course, you will be able to 
(1) design and implement a KBAI agent 
(2) use the strategies and agents to address complex practival problems 
(3) use the design and results of the agents to reflect on human cognition 
 
class style 
(1) learning by example 
(2) learning by doing 
(3) project-based learning 
(4) personalization 
(5) learning by reflection 
 
computational psychometrics: science of measuring human intelligence, aptitude, knowledge, by AI agents. 
 
project: raven's progressive matrices 
 
2-by-1 matrix (a->b, c->d) 
2-by-2 matrix (a->b, c->d, a->c, b->d) 
3-by-3 matrix (horizontal: a->b->c, d->e->f, g->h->i) 
              (vertical: a->d->g, b->e->h, c->f->i) 
              (diagonal: a->e->i, b->f->g, c->d->h) 
 
 
7 principles of the course 
- KBAI agents represent and organize knowledge into structures to guide and support reasoning. 
- learning in KBAI agents is often incremental. 
- reasoning in KBAI agents can be both top-down and bottom-up. 
- KBAI agents match methods to tasks. 
- KBAI agents use heuristics to find solutions good enough, not most optimal. 
- agents make use of recurring patterns in problems when solving. 
- KBAI agents enable reasoning, learning, memory to support/constrain each other. 
 
 
----------------------------------------------------- 
 
## Aside; what kinds of knowledge in KBAI? 
(1) world knowledge (aka domain knowledge)    - knowledge about the world (like guards and prisoners) 
(2) strategy knowledge (aka control knowledge)- knowledge about the strategy (like Means-ends-analysis) 
 
 
------------------------------------------------------- 
 
Assignments VS Projects 
 
Assignments are meant for conceptual brainstorm/design (you can pose an open question, or specific stuff). meant to help with the projects. think of this as how you would start a conversation to approach the problem. 
Projects are for the actual implementation/design of solution you built, how well it worked, what it could've done better,etc,  not restricted to any ideas of assignments. 
 
 
################################ 
####   semantics networks   #### 
################################ 
 
represent & reason 
 
semantics networks as "knowledge representation" 
(1) language/expression - vocabulary 
(2) content 
 
e.g. 
f=ma 
(1) algebraic equation 
(2) content is each of force = mass * acceleration 
 
 

#  structure of SN 

 
lexicon  : nodes/obj 
structure: directional links 
semantics: application specific labels( like transformation type e.g. deleted, above, inside, etc) 
 

#  GOOD representations 

- makes relationships explicit 
- exclude extraneous details 
- expose natural constrains 
- brings objects and relations together 
==> transparent, concise, complete, fast, computable 
 
bad e.g. 
nutrition info on the side of beverage container. ==> NO good, because it does not make explicit connections between linked variables like fat and claroies, etc. 
 
NOTE: good representation helps problem solving 
      but no representation method is perfect, each representation has its own affordance and constraints 
 

# "Guards & Prisoners" probelm 

(aka cannibals & missionaries, jealous husbands, bro and sis.) 
 
- 3 guards and 3 prisoners must cross river 
- at the start, they are all on one coast of the river 
- boat can take one or two people at a time 
 
- prisoners may never outnumber guards on either coast, though prisoners may be alone on either coast. 
 
===> lets build a semantics network 
 
node: state of the locations of prisoners/guards 
 
initial state: everyone on the left. 
possible 2nd state: five possible moves 
(eliminate illegal moves, then only two moves remain) 
now think about the next legal moves. note the boat has to come back to the source coast. 
==> notice we reach only one possible move as the 3rd state. 
there are 3 legal moves from the 3rd state. 
(but two of them are "unproductive" meaning, it goes to the state already visited in prev stages) 
 
===> notice we need at least 11 moves 
 
semantics network = not always all-or-nothing, sometimes it's partial match. 
                  = sometimes multiple solutions 
 
NOTE: each transformation type may have diff "weight" then we can rank the relevance/match-ness of the available solutions. 
      e.g. when you get a shape that matches "rotation" and "reflection" you need to know which one weights more. 
e.g. 
Unchanged       = 5 points 
Reflected       = 4 points 
Rotated         = 3 points 
Scaled          = 2 points 
Deleted         = 1 point 
Shape unchanged = 0 point 
 
 
 
analogical reasoning, case based reasoning are useful/relevant for Raven's progressive matrices. (will revisit later) 
 
 
Assignment (1) semantics nets 
discuss how we can use semantics network scheme to represent Raven's progressive matrics problem. 
write your own representation scheme, and how it enables problem solving for 2-1 2-2 3-3 matrices. 
(recall a good analogy = (1) complete, (2) captures information at the right abstraction level) 
 
 
semantics network & human cognition 
-- knowledge representation 
-- spreading activation networks (inference, associative memories) 
 
 
# NOTE : propositional representation = representation with words as opposed to visual 
       : proj1 already gives good propositional representation of problems, but flawed in that it heavily uses labels for objects which are so arbitrary and yet lots of positional attributes like left-of depend on those object labels. so you will probably need to get around this somehow in your implementation. 
 
 
################################ 
####    Generate & Test     #### 
################################ 
 
think of this as part of the "fundamentals" of KBAI (as below) 
 
                       semantic networks 
                         /           \ 
            generate and test       production systems 
               /         \ 
means-ends analysis   problem reduction 
 
-------------------------------------------------------------- 
 
KBAI consistes of 
(1) knowledge representation method (e.g.) semantic networks 
(2) problem solving method (e.g.) generate and test 
(3) archtecture (e.g.) production systems 
 
====> obviously we need a combination of (1) & (2) to solve problems 
 
 
generate and test examples: guards and prisoners 
- generate all possible states, and test them. (test and prune illegal, unproductive ones) 
 
- smart/dumb generator/tester ==> efficiency 
who should own "illegal/unproductive" check? generator or tester? 
===> imagine millions of possible states, then obviously generator should be smart. 
===> balance the responsibility between generator and tester. (critically important, and determine the efficiency) 
 
 
another example of G&T:  genetic algo. 
 
 

# G&T can be harder in an unconstrained domain 

guards/prisoners example is simple. but for raven's progressive matrices, if you generate at the granularity of "take the shape ABC to the outside of the shape XYZ and increase by 50%" or 51 or 52 or... then you need to generate lots. 
====> the right level of "abstraction" for knowledge representation is crucial. 
 
e.g. 2-by-1 problem 
A -> B 
C -> (generate an answer for D, then test it against all choices 1-6) 
   or(put each of 1-6 to D, and generate transformation semantic NW C->D, then test it against A->B ) 
 
====> see how we have 2 diff ways. 
 
 
Assignment 2: 
how would you use the generate and test method to design an agent that could answer Raven's progressive matrices? 
e.g. 
- talk about how G&T can be harder in an unconstrained domain 
- talk about how we have 2 diff ways of G&T as above. 
- how you would implement? which one would you choose ? why? (where do you balance intelligence?) 
- think about 3-by-3 case (with more types of figures, and transformation, can be hard to determine what to generate. problem space can explode quickly) 
- how to infer the mapping between diff figures in the problem. 
- how do you know what shape in one frame map to a diff shape in another frame? 
- and how do you use that info to generate possible answers? 
 
 

# G&T and human cognition 

- no complete knowledge available 
- no infinite computing resource available 
- reasoning not guaranteed to be correct 
 
 
 
 
####################################################### 
####   Means-Ends analysis and Problem Reduction   #### 
####################################################### 
 
remember the "fundamentals" of KBAI (as below) 
 
                       semantic networks 
                         /           \ 
            generate and test       production systems 
               /         \ 
means-ends analysis   problem reduction 
 
--------------------------------------------------------- 
 
like G&T, MEA and PR are methods for problem solving(aka reasoning strategy). 
 

# state spaces 

 
===> a pool of possible states, and find the path from the initial state to the goal state. 
===> "path finding" in AI 
 
 
e.g.  reach the goal state # move a block at a time, and cannot move if anything is on top 
 
                  A 
B                 B 
A C      ====>    C 
------          ------- table 
 
# sequence of operation 
 
Move(C, Table) 
Move(B, C) 
Move(A, B) 
 
 
 
ideally each step should find the next state that reduces the distance/cost toward the goal state. 
 
reduction = ends 
application of the operation = means 
 
how do we define distance evaluator function? 
 
one way is to measure 
 
initial state      goal state 
------------------------------- 
A on table           A on B       # diff +1 
B on table           B on C       # diff +1 
C on A               C on Table   # diff +1 
                                    ===>  distance = 3 
 
and generate all possible immediate states, and measure each state's distance, and pick the smallest distance state. 
 
(obviously we can make it more elaborate that it measures the distance between each block and its goal position, etc) 
(but needs to be logically justified/constructed to be effective, otherwise, we can easily get into inefficient moves or infinite loops or move away from goal states.) 
 
=====> we discuss that in "Planning" lecture 
 
 
## 
## in summary, Means-Ends analysis is; 
## 
- compare the current state and goal state 
- find the diff(aka distance/cost) between "new" state and goal state 
- pick an operator that minimizes diff between them 
 
====> so M-E analysis is useful when we know enough info to be able to define distance evaluation function correctly. 
 
 
Assignment:  how would you use M-E analysis to solve Raven's progressive matrices. 
- how do you define a goal state? (find a correct state, or the transformation of a frame to another frame, and trace back to the original state) 
- then how do you measure distance ?  what are the individual operators that move to shorter distance state? 
- what are the strength/limitation of using M-E analysis in this context? 
- is it even suited ? or any other method that can be more effective? 
 
 
## NOTE :   MEA vs A* 
(quote from Prof. Goel) 
At one level there is some similarity between means-ends analysis and A*, but the two methods are quite different. Both MEA and A* provide different answers to the control question in search: what operator should the agent apply next? MEA says find the differences between the current state and the goal state, and pick an operator that help reduce a difference. A* says for each possible successor state, calculate the distance from the start state to the successor state g and estimate the distance from the successor state to the goal state h, and then select the successor state the minimizes f = g + h. So the similarity between MEA and A* lies in that both use heuristic knowledge to make local decisions. The difference is that they use different heuristics. In general, A* requires more knowledge that may not always be available. On the other hand when the knowledge required by A* is available, it can provide some performance guarantees. 
 
MEA is a general purpose method. It says simply that given the goal state, the current state, and a set of possible actions in the world, the agent may select an action that helps it reduce a difference between the current state and the goal state. A* is a specialized method that tells the agent how to estimate the distance between the successor states and the goal state. Because A* requires a function to "accurately" estimate the distance to the goal state, it provides more guarantees of performance but is less widely applicable (not all problems have good evaluation functions). Because MEA requires relatively little knowledge, it is applicable more broadly but gives fewer guarantees. We will see this trade-off between generality and performance  repeatedly in the class. 
 
 
 
 
## 
##  Problem Reduction 
## 
 
divide and conquer. 
not necessarily "overlapping subproblems and optimal substructure" but at least subproblems. 
 
now, for block move puzzle example, 
 
                  A 
B                 B 
A C      ====>    C 
------          ------- table 
 
==> you can think of this as three subproblems(subgoals) 
(1) how to get A on B 
(2) how to get B on C 
(3) how to get C on Table 
 
 
====> still, we need some guidance in deciding WHICH subgoal to attack first, but that is for "Planning" lecture. 
====> also some operations that are indirectly needed may not be possible to deduce from Problem Reduction method. (those we will revisit later), also how do we eliminate wrong moves ? (PR may not offer that intelligence) 
 
 
Now lets think about Raven's Progressive Matrices with MEA and PR. 
 
from initial state to goal state, you can determine a sequence of transformation. (e.g. delete shape 1, move shape 2, expand 3, etc) 
 
now apply that sequence of transform operations to C, to generate possible Ds 
 
A->B 
C->   // possible 
 
subgoals 
- find the set of transformation between A-B 
- apply the set to C   (generate) 
- compare all Ds   (test), and take the best one. 
 
 
====>  see how it's a combination of PR, then G&T, and MEA 
====>  obviously complex problems often require not just one but a combination of multiple methods 
====>  also notice how one single knowledge representation of Semantic Networks supports all PR, G&T, MEA 
 
link between Semantic Network(knowledge represenation) and PR,G&T,MEA(problem solving method) are "weak" 
because they make little use of knowledge. but later we will see "knowledge intensive" methods example where methods demands the knowledge. 
 
 
 
Assignment: how would you apply PR to Raven's progressive matrices? 
- first think about how we would do it, instead of how we would make an agent do it 
- when solving the matrices, what are the smaller problems we are breaking down into ? 
- how are we solving each subproblem? 
- how are we combining the results as a whole to get the solution to the whole problem? 
- now once you know how human does it, then how can you make an agent do the same kind of reasoning process? 
- how an agent recognizes when to split a problem into smaller probs? 
- how an agent solves each subproblem? 
- how an agent combines them as a whole? 
--- during the process, discuss what is it that makes it easier for the agent to solve subproblems instead of solving the entire problem as a whole. how does it help you solve these problems better? 
 
(now you can check out "Logic" and "Planning" lectures, that build on MEA PR) 
 
 

# MEA PR and human cognition 

 
G&T MEA PR are called weak methods. because they make little use of knowledge. 
we will later learn methods that intensively require knowledge. obviously, that is more powerful only when a lot of knowledge is available, but when not, then we need to rely on weak methods. 
 
 
 
 
##################################### 
#####    Production Systems     ##### 
##################################### 
 
 
remember the "fundamentals" of KBAI (as below) 
 
                       semantic networks 
                         /           \ 
            generate and test       production systems 
               /         \ 
means-ends analysis   problem reduction 
 
------------------------------------------------------- 
 
Production systems relate to "learning" 
 
- PS = a cognitive architecture in which knowledge is represented as "rules" 
- a particular mechanism of learning called "chunking" 
 
 
e.g. baseball strategy 
-- how can AI reason on complex situations like human does? 
 

# Cognitive agent: a function for cognitive architectures 

 
CA = function that maps "percepted" history -> "action" 
 
denoted as:    f:P* -> A       // * means history 
 
(assumptions about CA) 
- goal oriented (takes action toward that goal) 
- CA is in info rich complex dynamic environement 
- CA uses knowledge from the environment 
- knowledge represented at the right abstraction level captures important details in the form of "symbols" (more importantly, removes noise info/details) 
- flexible (env changes, action changes) 
- CA "learns" from experience 
 
 
intuititively:   Architecture + Content(knowledge) = Behavior 
 
===> now imagine architecture is fixed, then pretty much behavior is mapped to the content which can change to yield diff behavior. 
 
e.g. computer architecture (fixed) + code/program (can change) = diff behavior 
 
"what is a good architecture?"  we will examine later. 
 
 
 
recall    (Cognitive system) 
 
            metacognition 
              |||||| 
input ==>   deliberation   ==>  output 
              |||||| 
             reaction 
 
 
 
deliberation = (reasoning + learning + memory) 
 
a specific archtecture example for deliberation is SOAR 
 
SOAR: 
- long term memory (= procedural + semantic + episodic) 
- working memory 
 
episodic = events (e.g. what was for dinner yesterday?) 
semantic = generalized concept for things 
procedural = how to do things (how to move A to B) 
 
 
e.g.  baseball 
think of the pitcher as Production System 
percepts = the situation of the game the pitcher is facing (2 outs, 2nd/3rd base loaded, batter's average is .256, inning cout, score of the game, etc) 
action = walk the batter, or try to take him out. 
 
percepts = internal(general baseball rules) and external(the score, etc) 
 
if walk the batter, then face the next batter 
if pitch the batter, then what balls? 
===> leads to many "states" tree 
 
 
lets apply to SOAR model 
 
contents of the "working" memory 
------------------------------------------ 
inning: 7th 
outs: 2 
average: .256 
.. 
.. 
goal: escape inning without conceding points 
----------------------------------------- 
 
====> procedual knowledge = contains rules (aka Production Rules) 
e.g. 
(r1) if goal is escape, and perceive 2 outs, and X & Y bases loaded,etc, then do ABC 
(r1) if goal is pitch, and batter is lefty, then throw a curve inside 
(r2) if goal is walk, then suggest intentional walk operator(=action) 
.. 
.. if then rules 
.. 
(rx) if only one operator has been selected, then send the operator to motor system, and put pitch THROWN to the state of the working memory. 
 
operator e.g. 
- walk the batter 
- throw a curve ball 
- throw a fast ball 
- thwor XYZ balls 
- system cannot decide 
 
 
====> now we mapped the working mem to the production Rules, and decided on an action. 
i.e.   f:P* -> A 
 
====> once the resulting action takes place, now the content of the working mem changes (loaded base : all three) 
 
summary: based on the content of the working mem, some rules get activated and even updates the content of the working mem, then the new rules get activated. (NOTICE: constant interaction between the working mem and long term mem) 
 
depending on the production rules, system may not be able to decide on a single operator. 
(e.g. decided on multiple operators, or no operator reached) 
===> called "impasse" - a deadlock situation where no progress is possible. 
 
===> then SOAR will try to "learn" a rule that breaks the impasse, by invoking "Episodic" knowledge 
 
lets say one episodic knowledge piece tells "previously for the same batter, we threw a fast ball, and he hit a homerun" 
===> then we construct a new production rule "if two operators suggested for throw fast/curve ball, and the batter is A-Rod, then dismiss throw-fast-ball operator." 
 
===> this process of "learning" rules to break impasse, done by SOAR, is called "chunking" technique 
 
"chunking" is triggered when an impasse occurs. SOAR searches eposidic knowledge, to find an event that can be turned into production rules to break the impasse. SOAR compares the percepts of the working mem with the percepts of past event mem in the episodic mem. (in this case looks for the same batter, and checks its pitch type(fast or curve) and checks its result), then decide the suggested operator. if result was bad, then ABC, result was good, then XYZ, etc. 
 
=====> a good example of dliberation ( = memory + learning + reasoning ) 
 
 
 
more details on this example from this paper 
 
Lehman, J. F., Laird, J. E., & Rosenbloom, P. S. (1996). 
A gentle introduction to Soar, an architecture for human cognition. Invitation to Cognitive Science, 4, 212-249. 
 
 
 
=====> Production Systems (particularly its Chunking example) gave a good example of "learning" 
 
as seen, learning happens as part of deliberation, think about what/when/how/why the agents learn?  ==> determined by the reult of some reasoning. 
i.e. reasoning first, then backward lerning to resolve impasse. 
 
 
 
 
------------------------------------------- 
 
## theories of KBAI 
- abstraction levels 
 
high level        <->           mid level        <->    low level 
 
task/knowledge           algo/symbol/methods         HW/implementation 
(playing baseball,etc)  (semantics NW, MEA, etc)     (transistor, brain, etc) 
 
===> low level is an architecture for mid level 
===> mid level is an architecture for high level 
 
===> high level gives "content" for mid level 
===> mid level gives "content" for low level 
 
e.g. 
 
how do you explain a cell phone to someone who never used it? 
- high level ? 
- mid level ? 
- low level ? 
 
now change "cell phone" to "AI agent" 
 
contraints(+affordance) flowing into both directions between levels. 
 
so, once you have the content of the knowledge, then you can decide what methods to use in the mid level. (semantic NW? etc) 
 
in KBAI, we focus on high & mid levels. 
 
-------------------------------------------- 
 
 
Assignment: how would you use Production system to solve Raven's progressive matrices? 
 
think of this in two layers: 
 
(1) PS can take in any problems, and analyze its working mem, and production rules/episodic mem, (and chunking) to decide actions. 
(2) or a PS that is for specific individual problem, PS starts with no rules, but induces some rules that govern transformation between some figures(A->B), and transfers/applies to other figure to decide an answer (D of C->D) 
====> chuking/learning is the key factor. how does it take place? how does it write production rules based on a new problem? what is the benefit of doing this way? what do we get out of making productions rules based on specific indivudual problems? 
===> (Ken Sugimoto: i think it's benefitial because when all ruels are based on individual problems, it really captures only the rules that can directly apply to the specific problem, not from some generalized rules that may lead to suggest some not so precise action decision) 
 
 
Cognitive Archtectures, Production Systems example thru SOAR, action selection using the baseball percepts example. 
chuking technique. 
 
 

# PS and human cognition 

 
PS is a model of human cognition. (but rather in a simple closed/contained way.) 
working mem = human short term mem. 
 
 
 
 
 
########################## 
######    Frames     #####      (amazingly similar to OOP) 
########################## 
 
frame - a knowledge representation scheme 
 
Common sense reasoning = (1) frames                      # knowledge representation 
                       + (2) understanding               # reasoning strategy 
                       + (3) common sense reasoning      # reasoning strategy 
                       + (4) scripts                     # knowledge representation 
 
 
- function of frames 
- properties of frames 
- relationship between frames and prev topics 
- frames for advanced sense-making (story understanding) 
 
 
## 
##   common sense reasoning 
## 
 
input : Ashok ate a frog. 
 
Q: is the frog dead or alive ? 
Q: where is the frog ?  on table? in Lake ? inside Ashok's stomach? 
Q: is Ashok happy or sad ? 
 
 

# how did we understand the meaning of the sentence to infer the meaning? 

 
frames can be associated with any words of the sentence (noun, objects, verbs, etc) 
 
lets focus on "ate" 
 
"ate" 
 - subject: Ashok 
 - object:  a frog 
 - time: 
 - utensils 
 - location: 
 - object-alive : false 
 - object-is : inside-subject 
 - subject-mood : happy 
 
====>   lets call it a frame for "ate" 
====>   we call them slots : fillers  (= structure of frames) 
====>   some of the fillers come from the sentence, other fillers there by default. 
====>   you can add more slots as needed. 
====>   NOTICE: this is a key-value map data structure supported in many programming languages. (hash in perl, dictionary in python, hashmap in java, etc) 
 
So, a frame is a knowledge structure. 
 
knowledge representations; 
-- atom     :  production rules 
-- molecule :  frames (structure, many things within, expand, etc) 
 
 
another sentence : David ate a pizza at Sbarrow. 
 
"ate" 
 - subject: David 
 - object:  a pizza 
 - time: 
 - utensils 
 - location: Sbarrow 
 - object-alive : false 
 - object-is : inside-subject 
 - subject-mood : happy 
 
====> notice slots are the same as before because it's the frame for "ate" 
====> only fillers that are the same are the default ones. 
 
 
====> we may expand the above, to let other items in the sentence have their own frames. 
 
"ate" 
 - subject: ------------------------ "Person" 
 - object:  a pizza                   - name: David 
 - time:                              - surname : Joyner 
 - utensils 
 - location: ----------------------- "Restaurant" 
 - object-alive : false               - name : Sbarrow 
 - object-is : inside-subject         - location : Atlanta 
 - subject-mood : happy 
 
 
=======>  so far, we only looked at sentence-level understanding, but we can also look at discource-level (ie a series of sentences.) 
=======>  a good example is maybe there should be a slot "favorite food" in "Person" frame 
=======>   if "ate" object equals to the fav food, then subject-mood should be upgraded to "very" happy 
 
### 
### properties of Frames (much like class in OOP) 
### 
- represent "stereotypes" (often culture specific) 
- provide default values (may get overridden based on other info, which we call exception handling) much like constructors in OOP 
- exhibit inheritance (Animal - Ant, Human, etc) much like OOP class/instance. obviously sub class can have more slots. 
 
 
# NOTE:  sentence can be constructed from frames  and vice versa. 
         in other words, frames can be an intermmediate for both sentence comprehension and construction 
 
 
## 
##  Frames vs Semantic Networks 
## 
- closely related. representationally equivalent (i.e. another knowledge rep scheme). 
 
in SN, recall we used figure/object/attributes 
in frames, we can have a frame for each object, e.g. 
 
"Object" 
 - name : x 
 - shape : triangle 
 - size : large 
 - fill : false 
 - inside : 
 - above : 
 
"Object" 
 - name : y 
 - shape : circle 
 - size : large 
 - fill : false 
 - inside : x     (maybe a pointer to x frame) 
 - above : 
 
"Object" 
 - name : z 
 - shape : circle 
 - size : small 
 - fill : true 
 - inside : 
 - above : x      (maybe a pointer to x frame) 
 
 
## 
##  Frames vs Production Systems 
## 
 
recall    (Cognitive system) 
 
            metacognition 
              |||||| 
input ==>   deliberation   ==>  output 
              |||||| 
             reaction 
 
where deliberation = long term mem(=reasoning + learning + memory) + working mem 
 
===> working mem is frames !!  (so we were already using frames) 
 
lets say working mem here is "ate" frame 
 - subject: 
 - object: 
 - time: 
 - utensils 
 - location: 
 - object-alive : false 
 - object-is : inside-subject 
 - subject-mood : happy 
 
====> it tells us what to look for in an input sentence (instead of input sentence telling us what things are available) 
====> gives us both bottom-up and top-down approaches. very powerful 
 

#  limitation : how to interpret slots-fillers (subject/object/location etc) correctly? 

- hard. we will discuss further in "Understanding" and "Common Sense Reasoning"  (and "scripts") 
- but when combined with other methods, frames can be used for story understanding. 
 
 
## 
## Assignment prompt: how would you use frames to design an agent that could be useful in representing/solving RPM? 
## 
 
- how do you use Frames to represent RPM? 
-- at basic level, what are the slots/fillers associated RPM? (I think we can collect by collecting all attributes within a problem first) 
-- where are the frames coming from? - agents receive ? or generate based on some reasoning ? 
-- once the agent has frames, what are the reasoning methods to use to solve the prob? 
-- frames to represent problem/figures/objects ?  how does having frames in diff abstraction level help agent solve the prob more easily. 
-- what is it that frames enable an agent to do which cannot be done otherwise? 
 
 

# frames and human cognition 

(1) a knowledge structure. frames = molecule, production rule = atom 
(2) frames enable a theory of constructing cognitive processing which is both bottom up and top down. 
e.g. we see/hear a lot of things, but based on memory/knowledge in the form of frames we make sense some of it. 
(3) frames capture stereotypes. (sometimes can lead to incorrect inference, but still cognitively efficient because instead of analyzing the world/situation/event every time, we can have default values. that's the property of frames, which help us/agents generate a certain number of expectations very rapidly. that's cognitively efficient.) 
 
 
 
 
############################### 
#####    Understanding    ##### 
############################### 
 
recall: 
 
Common sense reasoning = (1) frames                      # knowledge representation 
                       + (2) understanding               # reasoning strategy 
                       + (3) common sense reasoning      # reasoning strategy 
                       + (4) scripts                     # knowledge representation 
 
 
## 
## understanding "events" and "stories" 
## 
- thematic role systems (= most structured/formal type of Frame representation) 
- resolving ambiguity with thematic roles 
- constraints (e.g. grammar guides us interpret the world, eliminate ambiguity.) 
 
Sentence 1:  A serious earthquake killed 25 people in Pittsburgh. 
Sentence 2:  The president killed 25 proposals for earthquake prediction. 
 
===> the word "kill" have different meaning in each sentence. 
===> we can have "kill" frames for each. But frame itself as knowledge representation method cannot disambiguate between. 
===> clearly human can easily distinguish. because we have lots of "background" knowledge. 
 
 
e.g.  Ashok made pancakes for David with a griddle. 
 
analysis in diff layers: 
- lexical  : categorize each word in the sentence. Noun, verb, object, etc. 
- syntax   : structure of subject, verb phrase, etc 
- semantics: Ashok = agent, made = action, pancakes = object of action, David = benefitiary, griddle = instrument 
 
Thematic Role (of various words in the sentence)  ==  meaning of meaning. 
- agent: Ashok 
- verb: make 
- benefitiary: David 
- thematic object : pancakes 
- instrument : griddle 
 
====>  now based on this, we can answer questions, draw the right inference. then we say we "understood" this sentence. 
====>  e.g. Who ate the pancakes?   -->  well must be David who is the benefitiary. 
====>  so the above frame representation of the meaning of the sentence allows you to draw the right inference. 
 
## 
## how do we extract meaning of words from a sentence into thematic role frames? 
## 
- first, "thematic role systems(frames)" is a rigorous type of frame. 
- e.g. "throw" will generally require "agent" "object" "benetifitiary/destination" 
 
e.g.  David went to the meeting with Ashok by car. 
 
Thematic Role: 
- agent : David 
- verb : go 
- coagent : Ashok 
- destination : meeting 
- conveyance : car 
 
====>  but how did we know, for example, "car" = "conveyance" 
====>  we employ "constraints" such as prepositional constraints like below. 
 
Prepositional Constraints (as background knowledge): 
Preposition   |  Thematic Roles 
--------------------------------------------- 
     by       |  agent, conveyance, location 
     for      |  benefitiary, duration 
     from     |  source 
     to       |  destination 
     with     |  coagent, instrument 
 
====> notice we only map to "thematic roles"   not syntax/lexical level. 
====> as you notice, this is still ambiguous. "by" can mean different things. 
 
now we look at ontology (in AI, ontology means conceptualization of the world, categories in terms of which the agent specifies the worlds) 
the categories then become the vocaburary for knowledge representation. 
 
lets suppose an agent has the following ontology of the world (as background knowledge): 
 
           |-- Agents - People - Ashok 
           |                   - David 
Things  ---|-- Objects - Conveyances - Car 
           |                         - Train 
           |-- Locations - Park 
                         - Statue 
 
===> now thanks to this ontology, ambiguity can be resolved. 
===> notice how we initially did bottom-up approach (lexicon analysis) but now we just did top-down approach using constraints/ontology. 
 
## 
## ambiguity 
## 
- we already looked at preposition. but ambiguity can exist in other cateogries. e.g. verb. "take" 
- notice "pun" humor which happens when a word in a sentence can have two possible valid thematic roles. 
 
(video) https://www.youtube.com/watch?v=RTWFkMaxEK0 
 
- see how the verb "take" has many diff meanings. we can assign a specific frame for each. some "take" frames need certain thematic roles such as "target" "source" "time duration" "destination" "quantity" "coagent" "location" "medicine" etc 
- by analyzing sentence using prepositional constraints first, then we can extract some thematic roles, which will help eliminate choices for "take." 
- notice some thematic role frames are easier/harder to disambiguate between them. 
- notice this applies not just to English language, but other languages also. (does this mean, we can leverage this thematic roles frames with constraints to do AI translation??) 
 
 
## 
##  revisiting "kill" example 
## 
Sentence 1:  A serious earthquake killed 25 people in Pittsburgh. 
Sentence 2:  The president killed 25 proposals for earthquake prediction. 
 
thematic role: kill_1 
agent: earthquake 
victim: people 
 
thematic role: kill_2 
agent : president 
object: proposal 
 
====> again, needs a good ontology base/constraints. 
====> as the sentence becoems complex, we will need more and more constraints. 
 
 

#  assignment:  how would you use Understanding to design an agent that could answer RPM ? 

- recall Reflection VS Rotation discussion. we can prepare some thematic role frames that expect certain items which will help us eliminate less weighted transformations. each transformation will have a diff degree of fitness to this frame's expected values, which will help us determine the best accurate answer. 
- analyze in each of the Understanding process. how do we understand what's happening in the problem? (well analyzing SN propositional representation attributes) how do you fit that into thematic role frames? (this is my transformation rules) how does that solve the problem? 
 

# understanding and congnitive connection 

- understanding : not just natural language. but many other types of data. 
- various input data type : accoustic, visual, numerical, verbal. 
(1) exploit constraints about the world (physical/scientific, grammatical, social, etc) 
(2) structured knowledge representation which organize knowledge. (we can leverage from our memory as per the third part) 
(3) low level bottom up approach helps us activate the structured knowledge representation from our memory. once activated, the structured knowledge representation can generate expectations, enable the top down approaches. Very very powerful. 
 
 
 
 
 
######################################## 
#####   Common Sense Reasoning     ##### 
######################################## 
 
recall: 
 
Common sense reasoning = (1) frames                      # knowledge representation 
                       + (2) understanding               # reasoning strategy 
                       + (3) common sense reasoning      # reasoning strategy 
                       + (4) scripts                     # knowledge representation 
 
- primitive actions 
- actions and sub-actions (hierarchy) 
- state changes 
 
recall the verb "take" in diff sentences. we looked at (1)"the structure of the sentence" and (2)"background knowledge" to disambiguate. 
 
now, an 'inverse' example where diff verb words can have the same meaning. 
 
e.g. 
Ashok ate a frog. 
Ashok devoured a frog. 
Ashok consumed a frog. 
Ashok ingested a frog. 
Ashok partook of a frog. 
Ashok swallowed a frog. 
Ashok dined on a frog. 
Ashok inhaled a frog. 
 
====> context is the key. but that is hard for AI. we will revisit context later. 
====> not practical to have frames for each. 
====> use 14 "primitive actions" as our ontology of actions 
 
 
## 
##  14 Primitive Actions (as our ontology of actions) 
## 
- move-body-part 
- expel             # like exhale 
- propel            # kick a ball (launching the moving, which results in move-object action) 
- see 
- smell 
- move-possession   # he took money from her. (which kind of include move-object but we pick the highest match) 
- move-object 
- think-about 
- ingest            # eat 
- speak 
- hear 
- feel 
- move-concept      # Ashok gave an idea to Susan to go see a movie 
- conclude 
 
 
====> not guaranteed comprehensive. you may see an argument on other type of ontology. 
====> now look at Ashok ate a frog sentence variations. we all translate them into "ingest" case. 
====> imagine each of the 14 primitive actions has a frame. (agent, object, benefitiary, etc) 
 
e.g.   John pushed the card. 
 
===> first we approach from lexical raw data analysis (bottom-up) to capture a verb 
===> once found, then use memory knowledge(top-down) e.g any animate entity preceding the verb should be the agent 
===> and any inanimate entity following the verb should be the object. and so on. 
 
Action Frame:          ##  = structural knowledge representation = frame = molecule 
- primitive : propel   ## we chose "propel" but "move-object" is almost equally as good 
- agent : John 
- object: cart 
 
====>  molecule as opposed to "atom" = production rules(from production systems) or predicates(from logic lecture) 
 
====>  so the process of picking the right primitive action frame for a verb is kind of try-and-error. 
====>  you may try one frame, and then try others and realize there is one that fits better. 
====>  or may find two frames that both match perfect. (= pun humor) 
====>  we still fully use lexical/syntactic/semantics/constraints processing. 
 
 
## 
## implied action  (good feature of Common Sense Reasoning) 
## 
 
e.g.  John fertilized the field. 
 
===> if "structural knowledge" cannot map to any primitive actions. ==> must be implied action 
===> use "background knowledge" and re-write it as   "John put fertilizer on the field." 
===> then easily we can fill the frame as below. 
 
Action Frame: 
- primitive : move-object 
- agent : John 
- object : fertilizer 
- destination : field 
 
 
e.g.   Bill shot Bob 
 
===>  fails to map to any primitive actions. must be implied action. 
===>  lets rewrite it as "Bill properlled a bullet into Bob." 
 
Action Frame: 
- primitive : propel 
- agent : Bill 
- object : bullet 
- destination : Bob 
 
===> again, we assume we had rules for this primitive that an animate entity that procedes the verb is the agent, and an inanimate entity immediately following the verb is the object, and prepositional constraints that tell "into" specifies destination. 
 
 
## 
##  Actions and Sub-actions 
## 
 
e.g.   Ashok put the wedge on the block. 
 
===>  can an AI agent answer "did Ashok move his hands?" 
===>  we will need the AI agent to have some sort of "background knowledge" about sub-actions aside from input to be able to make that common sense inference. 
 
Action Frame: 
- primitive : move-object 
- agent : Ashok 
- object : wedge 
- destination : block 
- sub-action : ---------------- Action Frame: 
                                   - primitive  : move-body-part 
                                   - object     : fingers 
                                   - destination: closed 
- sub-action : ---------------- Action Frame: 
                                   - primitive : move-body-part 
                                   - object    : hand 
 
 
====> by looking at only this case, it's as if move-body-part is primitive action for move-object, so move-object shouldn't be primitive action. 
====> but there are other cases where that is not true. 
 
====> all in all, actions and subactions give us the hierarchcal view. 
 
## 
##  State Changes 
## 
 
enables us to predict certain effect/cause of a certain event. 
 
 
e.g.  Ashok enjoyed eating a frog. 
 
Action Frame: 
- primitive : ingest 
- agent : Ashok 
- object : frog 
- result : ---------------------  State-Change Frame 
                                          - object : Ashok's mood 
                                          - destination : happy 
 
 
====>  now how did we know primitive : ingest from "eating",  instead of   primitive : feel from "enjoyed" 
====>  a valid observation, we could have another frame for primitive : feel and link to this Action Frame 
====>  or, depending on what rules you put into those frames, the agent may choose one over the other. that is how it works. 
 
 
## 
##   Implied Action  &  State Changes 
## 
 
e.g.  Susan comforted Jane. 
 
===> we don't know what exactly Susan did. but lets say an agent's background knowledge (or stereotypes) tells A comforting B makes B happy. 
===> so we can make at least below partial inference. 
 
Action Frame: 
- primitive : do 
- agent : Susan 
- result : --------------------  State-Change Frame 
                                          - object : Jane's mood 
                                          - destination : happy 
 
 
## 
##   Actions and Resultant Actions 
## 
 
lets look at a sentence with two verbs. 
 
e.g.   Maria told Ben to throw the ball. 
 
Action Frame: 
- primitive : speak 
- agent : Maria 
- result : ----------------------  Action Frame: 
                                      - primitive : propel 
                                      - agent : Ben 
                                      - object : ball 
 
 
 
=======>  this and the other example of "enjoyed eating", we will know more unambiguously once we start considering at paragraph level. 
=======>  so far, we've been only looking at a single sentence level. but preceding/suceeding sentences should be considered when understanding a sentece. i.e. context. 
 
 
more examples (video)   https://www.youtube.com/watch?v=dRjtNy1I-Do 
 
 
## 
##   Assignment : how would you use Commonsense Reasoning to design an agent that could answer RPM? 
## 
 
two connections: 
(1) primitive actions = primitive transformations 
-- different problems could be composed of finite sets of possible transformations 
-- what are those transformations? 
-- what are the actions involved in those transformations? 
 
(2) connect primitive actions to higher level transformations. what primitive actions result in those higher transformations? 
-- how composing higher level tranformations help solve RPM that you couldn't do otherwise? 
 
 
## 
##  common sense reasoning   and   human cognition 
## 
- strong but not fully yet understood. 
- context helps us make plans on actions 
- "theory of minds" theory that includes common sense reasoning. it's how we make inferences about the world. 
 
 
 
 
############################## 
#####     Scripts       ###### 
############################## 
 
recall: 
 
Common sense reasoning = (1) frames                      # knowledge representation 
                       + (2) understanding               # reasoning strategy 
                       + (3) common sense reasoning      # reasoning strategy 
                       + (4) scripts                     # knowledge representation 
 
 
so far, we looked only at single sentences. but with "scripts" we can analyze discourses, stories, scenes. 
 
composing simple frames into longer stories called "scripts" which help us make sense of complex repeated events with ease, and generate expectations. 
 
- defining scripts 
- form VS content 
- generating expectations 
- hierarchies of scripts 
 
 
## 
##  a sample conversation 
## 
 
Ali asked, "Do you think we will have many customers in the next half hour?" 
Sarah replied, "Go ahead and grab your lunch." 
 
====> makes sense?  well yes, if an agent is equipped with background knowledge to make inferences to connect these two sentences. 
====> common sense reasoning. 
 
## 
##   story understanding for AI agents 
## 
 
e.g.  Bob walks into a restaurant, and orders a hamburger, and waits for 1 hour, then a burnt hamburger came. he doesnt finish it, and left. 
 
===> now did he leave a big tip?  no. but how do you know? 
===> "scripts" is the answer. 
 
 
## 
##   Definition of Scripts 
## 
- a causally coherent set of events. (that give a prototype of what to expect in certain situations) 
(causally) = one event sets off another event 
(coherent) = the causal connectsion between events make sense (in the context of the world around us) 
(events)   = actions or scenes in the world. (mostly observable) 
 
 
## 
##   6 Parts of a script 
## 
there are 6 parts.  lets think of a restaurant script. 
 
(1) entry  : conditions necessary to execute the script. e.g. a hungry customer with money. 
(2) result : conditions that will be true after the script is executed. e.g. restaurant owner has more money, customer less money and full, etc. 
(3) props  : objects involved in the execution of the script.  e.g. tables, menu, food items, etc. 
(4) roles  : agents involved in the execution of the script. e.g. customer, owner, waitress, etc 
(5) track  : subclasses of the particular script. e.g. coffee house, fast food shop, a 5 star restaurant. 
(6) scenes : sequence of events that occurs during the execution of the script. e.g.  ordering food, paying the bill. 
 
====> a big moluecule of knowledge. 
 
 

#  restaurant script example 

 
script: restaurant 
track : formal dining 
props : tables, menu, check, money, F = food, P = place 
roles : S = customer, O = owner, W = waiter, M = cashier, C = cook 
entry : S is hungry, S has money 
result: S has less money, O has more money, S is full, S is happy 
scenes: ---| 
           |- Scene1 : entering 
           |   Action Frame:                     # customer moves himself into the place 
                  primitive: move-object 
                  agent    : S 
                  object   : S 
                  dest     : P 
           | 
           |-> Action Frame:                     # customer sees a table 
                  primitive: see 
                  agent    : S 
                  object   : table 
           | 
           |-> Action Frame:                     # customer decides 
                  primitive: conclude 
                  agent    : S 
                  result   : ------------   Action Frame:               # decided to move himself into a table 
                                               primitive: move-object 
                                               agent    : S 
                                               object   : S 
                                               dest     : table 
                                                      | 
                                                      | 
                                            Action Frame:               # customer moves himself to sit 
                                               primitive: move-body-part 
                                               agent    : S 
                                               part     : body 
                                               dest     : sitting 
                                                      | 
                                                      | 
                                            Action Frame:               # waitress sees the customer 
                                               primitive: see 
                                               agent    : W 
                                               object   : S 
                                                      | 
                                                      | 
                                            Action Frame:               # waitress moves herself to the customer 
                                               primitive: move-object 
                                               agent    : W 
                                               object   : W 
                                               dest     : S 
                                                      | 
                                                      | 
                                            Action Frame:               # waitress brings the menu to the customer 
                                               primitive: move-object 
                                               agent    : W 
                                               object   : menu 
                                               dest     : S 
 
 
=====> point here is script functions as the knowledge representation of the stereotypical situations about going to a restaurant 
 
=====> recall Frame is similar to OOP.  we can instantiate this script with S = Tom, P = Denny's, W = Jane 
=====> imagine Tom = AI agent. then he can pick the restaurant script when going into a restaurant. also, once he sits, he knows what to expect next, i.e. waitress comes to him with a menu. (generate expectations about the future before it happens) 
 
====> violation is when the expected things dont happen, or unexpected things happen. (theory of "surprise" used in movies, novels.) 
 
 

#  hierarchy/OOP structure of scripts/tracks,   Form VS Content 

 
track for "restraurant" - lets say coffee house, fast food, casual dining, formal dining. 
 
- suppose an AI agent is hungry and has some money. based on long-term memory, out of many scripts it might pick the restaurant script. 
- once executing, once it enters the restaurant, it obvserves and realizes the conditions for "fast food" track is filled. then invokes that track. 
 
so, abstract general script of restaurant = FORM (class) 
an instantiation specifies the CONTENT 
 
 

# script's  relation to other knowledge representation methods 

 
===> the above discussion exactly relates to "Classification" 
===> recall this is a bit like "Planning" where an agent looks at the initial state and goal state, and navigates the operator for the next step it decides on. but in script, all tracks are already pre-defined/compiled in momery. so it's much faster. 
===> but once the agent in "Planning" decides the operator between the init state and goal state, if that particular case is already known, and repeating, then it can simply become "script" which we can use to other similar situations without re-Planning everything from scratch. in other words, script can be seen as a plan that's been executed once. 
===> this also relates to Incremental Concept Learning. 
 
(see more discussion by David)  https://www.youtube.com/watch?v=aJPY_z0qt3M 
 
 

#  Script and other methods 

 
good discussion by David :   https://www.youtube.com/watch?v=mjdK60hdr-w 
 
 

#   Assignment : how would you use scripts to design an agent that could answer RPM ? 

how would scripts help inform the design of an agent? 
scripts are ways of making sense of complex events in the world = RPM. 
you might have diff scripts or a script with diff tracks for diff RPMs. 
would would be your entry/scenes? 
how does the agent know which script to use? is it all pre-defined or does the agent learn from prior problems? 
==> if all pre-defined, then is it intelligent? but at discussed before, we only have a dozen RPM problems. maybe not a good enough number of samples for the agent to learn a consistent pattern. 
 
 

#  scripts and human cognition 

- strong connection 
- a theory of "brain" as predictable machine. = we do a very quick bottom up processing followed by mostly top down processing with general expectations about the world. and we act on them. (when violated/failed, it is surprise.) 
- scripts are mental model in a way. 
 
 
 
 
############################# 
#####       Logic       ##### 
############################# 
 
a formal language that allows us to make assertions about the world in a precise way. 
(another knowledge representation scheme) 
 
Logic(basis) -> Planning 
 

# formal logic-based AI agent 

(1) knowledge base (e.g. sentence)  ## must be complete for (2) to work provably. 
(2) inference engine (apply "rules of inference" to the knowledge base) 
 

# property of "inference" 

(1) soundness: only valid conclusions can be proven. 
(2) completeness: all valid conclusions can be proven. 
 

# property of "knowledge" 

- it's a broad topic. for this course, we will limit the scope to; 
- conceptual/experiencial/heuristic knowledge. 
 
given conceptual knowledge: like semantic nets of bricks and blocks for a wall, how can we translate it to the language of logic? 
 

# how can we construct "knowledge base"? 

in SN, it was "objects" and "relations btw objects" 
in Logic, we have; 
- Predicate: a function that maps object arguments to true or false 
 
e.g. 
Feathers(bluebird)   # returns true.  Feathers() is predicate, bluebird is object arg 
 
if Feathers(animal) => then Bird(animal)   ## this sentence written in language of logic captures the relationship btw two predicates. 
 
"=>" means "implication"    ## we may use "=" or "==" 
Feathers(animal) implies Bird(animal) i.e. if animal has feathers, then the animal is a bird. 
 
e.g. 
if an animal lays eggs and it flies, then it is a bird.    ## translate this into the language of logic. 
 
Lays-eggs(animal)  /\  Flies(animal) => then Bird(animal) 
 
"/\"  means "and"       ##  we may use "&" or "&&"    aka conjunction 
"\/"  means "or"        ##  we may use "|" or "||"    aka disjunction 
 _ 
" |" means "not"        ## we may use "!" or "~"      aka negation 
 
e.g. 
if an animal flies and is not a bird, then it is a bat. 
                  _ 
Flies(animal) /\   |Bird(animal) => then Bat(animal) 
 
 
## 
##  Truth Table 
## 
- previous logic examples, sometimes, there are so many predicates combined, how do we know if it resolves to true or false? 
- we use truth table. (simple) 
 
e.g. 
    A   |   B   |   A&&B 
--------------------------- 
  true  | true  |  true 
  true  | false |  false 
  false | true  |  false 
  false | false |  false 
 
 
A && B  ==  B && A                   ##  called commutative property 
(A && B) || (A && C) == A && (B||C)  ##  called distributive property 
A || (B||C)  == (A||B) || C          ##  called associative property 
 
DeMorgan's Law 
!(A && B) == !A || !B 
 
 
========>  recall "knowledge base" discussion. 
========>  knowledge base may be coming from many places (default values, perception, etc) 
========>  when we try to apply the rules of inference to the sentences of the knowledge base, it is useful to re-write the sentences of knowledge base in a way that enables us to easily apply the rules of inference. (examples to follow below) 
 
 

#  truth of implications 

 
    A   |   B   |   A => B 
--------------------------- 
  true  | true  |  true         ## if feather, then birds 
  true  | false |  false        ## if wings, then human  (lots of animals with wings, but no human has wings) 
  false | true  |  true         ## if flight, then birds (penguin cannot fly, but it is bird) 
  false | false |  true         ## if flight, then bird  (lets say you got a cat, it's false for both A & B, but implication can still be true) 
 
===> the third column is actually talking about whether or not the implication "can" hold true. 
===> if flight is false, we cannot make determination whether or not an animal is a bird. 
 
## 
##  implication elimination 
## 
 
given       :   A => B 
re-write as :  !A || B    ## obvious from the above table, but this re-write is crucial in logic proof. 
 
 
## 
##  Rules of Inference : instantiate general rules to prove specific claims 
## 
 
(1) Modus Ponens 
 
Sentence1: P=>Q 
Sentence2: P 
(therefore) 
Sentence3: Q 
 
e.g. 
Feathers => Bird    ## could be bootstrap default embedded knowledge 
Feathers            ## could be externally given percept knowledge 
(therefore) Bird    ## logically inferred knowledge (sound and complete) 
 
e.g. 
If it is raining, I will meet you at the theater. 
It is raining. 
Therefore, I will meet you at the theater. 
 
 
(2) Modus Tollens      ## aka counter-positive, proof by contradiction, indirect proof 
 
Sentence1: P=>Q 
Sentence2: !Q 
(therefore) 
Sentence3: !P 
 
e.g. 
If the watch-dog detects an intruder, the watch-dog will bark. 
The watch-dog did not bark 
Therefore, no intruder was detected by the watch-dog. 
 
 
(ref) 
http://en.wikipedia.org/wiki/Modus_ponens 
http://en.wikipedia.org/wiki/Modus_tollens 
 
 
 
e.g.  Prove Harry is a bird. 
 
S1: Feathres(animal) => Bird(animal) 
S2: Feathers(Harry) 
S3: Bird(Harry) 
 
 
e.g.  Prove Buzz does not have feathers 
 
S1: Feathers(animal) => Bird(animal) 
S2: !Bird(Buzz) 
S3: !Feathers(Buzz) 
 
 
==========> this is all powerful and useful, but for long sentences, computation can be complex, and not realistic. 
==========> especially for AI agents with little resource from which we expect near real time performance. 
 

# propositional logic : 0th order logic (no variable) 

 
For one animal: 
Lays-eggs(animal) && Flies(animal) => Bird(animal) 
 
### we can generalize this using variables and quantifiers. 
### then we call it "predicate logic" aka predicate calculus and first-order logic 
 
e.g. 
 
For "all" animals: 
\-/ x [Lays-eggs(x) && Flies(x) => Bird(x)]     ##  \-/ is called "universal quantifier" 
                                                ##  \-/ x  means "for all x" 
                                                ##  variable helps reduce/generalize sentences 
For "at least one" animal: 
E x [Lays-eggs(x) && Flies(x) => Bird(x)]       ##  E(inverted) is called "existential quantifier" 
                                                ##  E x  means "at least one x" 
 
 
(see notations in video) 
https://www.youtube.com/watch?v=GeWyQmDAvmA 
 
 

# simple proof 

- writing sentences in propositional logic 
- rules of inference 
- proving theorems 
- finding truth values of sentences 
==> all computationally inefficient. 
 
--> alternatively, in AI, we have "Resolution Theorem Proving" 
 
(see video for an example scenarios) 
https://www.youtube.com/watch?v=mp6ju2OOHsg 
 
Prove S3 based on S1,S2 
 
S1: !can-move  =>  !liftable 
S2: !can-move 
therefore 
S3: !liftable 
 
===> instead of using Modus Ponens, lets try resolution theorem proving 

# Resolution Theorem Proving  (similar to proof by contradiction/refutation) 

step 1: convert every sentence into a conjunctive normal form. 
= a conjunctive normal form means either of (1) literals, e.g. negation, (2) disjunctional literals, (3) conjunction of disjunctional literals 
= (must eliminate implication) 
 
 
S1: !can-move  =>  !liftable 
can be re-written as 
S1: can-move || !liftable    ## recall implication elimination 
 
S2: !can-move     ## this needs no re-writing 
 
we assume S3: lifetable   ## take negation of what we want to prove 
 
e.g. 
S1: can-move || !liftable 
S2: !can-move 
S3: liftable 
 
===> S3 and S1's right hand side contradict, lets remove(resolve) them. 
===> we will only have can-move and !can-move, which cannot hold true. 
===> thus S3: must be !liftable in the first place. 
 
 
e.g. 
Prove !liftable 
 
(video) 
https://www.youtube.com/watch?v=5cLJPqvlK0g 
 
S1:  !can-move && battery-full => !liftable 
(re-write as) 
S1:  !(!can-move && battery-full) || !liftable     ## now, the rule says we can only have conjunc of disjunc 
                                                   ## not disjunc of conjunc 
(re-write again as) 
S1:  can-move || !battery-full || !liftable        ## just used DeMorgan's law 
                                                   ## now this is valid for resolution theorem proving 
 
S1:  can-move || !battery-full || !liftable 
S2:  !can-move 
S3:  battery-fill 
S4:  liftable                      ## negation of what we want to prove 
 
===> lets remove the conflict. 
 
S1:  can-move || !battery-full (removed) 
S2:  !can-move 
S3:  battery-fill 
S4:  (removed) 
 
===> now, lets look at either one, say battery-full, remove its conflict. 
 
S1:  can-move (removed) (removed) 
S2:  !can-move 
S3:  (removed) 
S4:  (removed) 
 
===> contradiction. therefore, S4 must be !liftable. 
 
===> now notice, how "focus of attention" is simple. we started from the negation of what we wanted to prove (= S4), then looked for its negation, and resolve, and so on. since we converted all sentences to disjunctions of literals, we can take on any of them. computationally efficient. 
===> aka "Horn Clauses" = disjunctions that contain at most one positive literals. (= S1 in this case) 
(ref) 
http://en.wikipedia.org/wiki/Horn_clause 
 
 
======> we have mechanized logic. 
- instead of coming up with a large truth table 
- instead of coming up with complex chains of Modus Ponens/Tollens 
- we have found Resolution Theorem Proving, an efficient way of proving sentences and the truth values. 
 
to recap, 
(1) take all sentences in the knowledge base, convert them into a conjunctive normal form. 
(2) take what you want to prove, and its negation, put that as a new sentence. 
(3) find a literal in other sentnces on which you can resolve. keep on doing this until you reach null, which is a contradiction. 
(if not reaching contradiction, it means what you started couldn't be proven.) 
 
 
## 
##  Assignment: how would you represent RPM using formal logic ? 
## 
 
just like production systems, we can look at this in two different levels. 
 
(1) use formal logic to represent the overall algorithm your agent uses to solve RPM problems 
(2) agent uses formal logic to understand the new problems that just arrived 
==> it can then use formal rules to develop transformations that occur within the problem and transfer the transformations into the new answer. 
==> alternatively, use formal logic to allow agent to prove why a particular answer is correct. 
==> and IF the answer is incorrect, the agent can go back and repair its reasoning to do better next time. 
 
formal logic notation 
properties of truth values, like De Morgan's Law. 
Rule of inference 
proof by refutation 
 
## 
## formal logic and human cognition 
## 
 
- precise and formal way of reasoning 
- formal notation for agents to express info 
- we use Modus Pollens everyday 
 
so far, we've been looking at deductive reasoning (cause -> effect). 
abduction: effect -> cause (like medical doctor diagnosing) 
induction: given a relation between cause and effect for a sample, generalize the relationship. 
 
 
 
########################## 
####    Planning     ##### 
########################## 
 
recall intelligent agent = function that maps "percepted" history -> "action" 
 
denoted as:    f:P* -> A       // * means history 
 
Planning is a method for action selection. 
 
- states, goals, operators (we will use notation from Logic lecture) 
- conflicts in planning 
- "partial-order" planning to avoid conflicts 
- "hierarchical task networks" = a representation method for hierarchical plans 
 
# recall block-move games. how does an AI agent select next action to take? 
# a simpler example : a robot that has to paint both the ladder and the ceiling. 
===> how does the robot know it needs to paint the ceiling first, cos the paintd ladder becomes un-usable for a while till dried. 
 
in propositional logic 
the goal state :  Painted(ceiling) && Painted(ladder) 
the init state :  On(robot, floor) && Dry(ladder) && Dry(ceiling) 
 
an operator in propositional logic: 
 climb-ladder: 
   Precondition:                       ## by convenstion, all literals in precondition are "positive" 
     On(robot, floor) && Dry(ldadder)  ## these must be true for the operator to be applied 
   Postcondition:                      ## postcondition can contain negative literals 
     On(robot, ladder)                 ## these will become true after the operation 
 
 paint-ceiling: 
   Precond: 
     On(robot, ladder) 
   Postcond: 
     Painted(ceiling) && !Dry(ceiling) 
 
 paint-ladder: 
   Precond: 
     On(robot, floor) 
   Postcond: 
     Painted(ladder) && !Dry(ladder) 
 
 descend-ladder: 
   Precond: 
     On(robot, ladder) && Dry(ladder) 
   Postcond: 
     On(robot, floor) 
 
 
## 
##  Planning v.s. MEA 
## 
- strong v.s. weak (in terms of knowledge-based-ness) 
 
- imagine a car navigation AI. 
-- diff kinds of knowledge 
-- (1)world knowledge (states, operators, etc) 
-- (2)tacit/control knowledge (goals provide us with control knowledge) 
 
recall MEA was a heuristic prob solv method that compares next possible moves with the goal state, and takes the move that reduces the most distance between the current state and the goal state. (operator selection, using goal as control knowledge) 
 
Planning provides more systematic methods for selecting between diff operators(=actions) 
 
a plan example; 
 
Stete 1;   on(robot, floor) && dry(ladder) && dry(ceiling) 
   | 
   | climb-ladder operation 
   | 
State 2;   on(robot, ladder) && dry(ladder) && dry(ceiling)    ## we just carry-over (aka propagate) unaffected property from prev state 
   |                                                           ## climb-ladder was silient about dry/painted property 
   . 
   . 
State N;   on(robot, floor) && Painted(ceiling) && Painted(ladder) 
 
 
===> pre/post conditions let us restrict which operators can be applied to bind which states. 
===> notice how we got precise way to specify staets, operators, and connection between them. 
 

#  how Planning systematically avoid conflict? 

- imagine the above paint problem example. 
- MEA will look at the init state, and may pick paint-ladder(because it will immediately achieve one of the goals), then gets stuck. cos ladder is painted/wet, and cannot climb... 
- how Planning systematically avoid conflict? 
 
--> MEA or simple Planning(such as Linear Planning) that not take into account of the ordering (how a seemingly good move is actually a catastrophy move toward the final goal state) 
(another good example is when you buy a piece of furniture, and have it assembled at the store, and delivered, and realize it does not fit thru the house door, so need to de-assemble it and take pieces in, then assemble again.) 
 
## 
##  "Partial Order Planning: POP" (aka non-linear planning)  for when there are multiple goals. 
## 
 
goal state (as control knowledge to select between operators) 
 
in case of ladder/ceiling problem; 
 
(1) divide it into two independent problem of paint-ladder and paint-ceiling. 
 
which operator has the goal state as postcondition? pick that operator A. which operator leads to the precondition of the operator A? 
==> you can find your way backward thru these subgoals. 
==> relates to "problem reduction" where we didn't know how an agent decides which order to tackle wich subproblems, or how an agent identifies those sub-goals. but with this "Partial Order Planning" we can. 
 
(2) detect conflicts as follows; 
 
for each precondition in current plan: 
if precondition for an operator in the current plan is clobbered by a state in another plan: promote current plan's goal above other plan's goal. 
 
Plan1 : paint-ladder 
Plan2 : paint-ceiling 
 
===> notice precondition of climb-ladder is clobbered by the state in Plan1. thus promote Plan2's goal over Plan1's goal. 
 
now we will need to find an operator to connect Plan2's goal state to the init state of the Plan1. 
 
===> that is how POP algo works in practice. notice how not just the world knowledge, but also tacit/control knowledge helped the agent select between diff operators. goals are control knowledge in this case. 
===> you can think of this multiple agents, each doing (1) generating plans for each sub goal (2) detecting conflicts (3) resolving conflicts. 
===> in a way, partial order planning is as a result of interaction of three agents, each responsible for its sub task. 
 
(see video also) 
https://www.youtube.com/watch?v=nAImor8mgwA 
 
===> writing this in code is hard. (must precisely clearly define pre/postcond, every state, operator, everything) 
 
(see video for move-block puzzle example) 
https://www.youtube.com/watch?v=iOojthZgKbE 
 
init state          goal state 
  D                     A 
  B           ===>      B 
  A                     C 
  C                     D 
----------          ---------- 
 
write in propositional logic: 
 
init state 
clear(d) && on(d,b) && on(b,a) && on(a,c) && on(c,table) 
 
goal state 
clear(a) && on(a,b) && on(b,c) && on(c,d) && on(d,table) 
 
operator move x onto y 
Move(x,y) 
  precond: 
    clear(x) && clear(y) 
  postcond: 
    clear(x) && on(x,y) 
 
operator move x onto table 
Move(x,table) 
  precond: 
    clear(x) 
  postcond: 
    clear(x) && on(x,table) 
 

# what are sub goals? 

 
(1) on(a,b) 
(2) on(b,c) 
(3) on(c,d) 
(4) on(d,table) 
 
===> solve independently 
 
Plan1: move(d,table) move(b,d) move(a,b) 
Plan2: move(d,table) move(b,table) move(a,table) move(b,c) 
Plan3: move(d,table) move(b,table) move(a,table) move(c,d) 
Plan4: move(d,table) 
 
===> now detect/resolve the order of the 4 plans. 
(each goal clobbers next goal in its final step) 
 
Plan4 
Plan3 
Plan2 
Plan1 
 
thus, the final Plan becomes:  move(d,table) move(b,table) move(a,table) move(c,d) move(b,c) move(a,b) 
 
===> notice how MEA/PR couldn't sort this ordering goals. but partial order planning can. 
===> POP makes human's implict reasoning steps explicit 
 
 
## 
## hierarchical planning and HTN(=hierarchical task network) representation 
## 
- recall the final plan of our move-block puzzle. 
 
 move(d,table)   --\ 
 move(b,table)   --->  unstack(x,y,z) 
 move(a,table)   --/ 
 move(c,d)       --\ 
 move(b,c)       --->  stack(x,y,z) 
 move(a,b)       --/ 
 
you can define (un)stack(x,y,z) precond, postcond, and methods(sequence of move cmds) 
 
===> enables an agent to deal with complex problems in multiple levels of abstraction. 
===> in order for an agent to be able to reason in multi lvl abstraction, we need knowledge in multi lvl abstration. 
===> how we grouped move operations into a macro operation (e.g. un/stack). 
 
 
## 
##  Assignment: how would you use Planning to address RPM ? 
## 
 
(1) what are our states?  init state, goal state, etc 
(2) how do you propositionally represent them (topic from SN, Logic) 
(3) what are the operators that allow transition between init and goal states? 
(4) how would you select those operators? (goal state as control knowledge) 
(5) what conflicts are possible in RPM problem? 
(6) how do we detect/avoid them? 
 
---> think of this in two diff levels. 
(a) an agent having a plan for how to address any new problem that arrives. 
(b) an agent discerning the underlying plan behind a new problem. 
 
planning in propositional logic 
goal conflicts 
partial order planning for conflict avoidance 
HTN 
 
later lectures "Configuration" and "Diagnosis" will heaviliy leverage this "Planning" concepts. 
 
## 
## connection to cognition 
## 
- action selection to central to congnition (obvious. e.g. what to eat for dinner today?) 
- multiple goals. multiple agents can interact with each other. (can be conflicts) 
- cognitive agents detect/avoid conflitcs. planning is central to cognition. 
 
 
 
 
################################################### 
######      Learning by Recording Cases      ###### 
################################################### 
 
 
"learning by recording cases" is considered a core process in cognition. 
 
[Learning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Incremental Concept Learning 
                  || 
                  /\ 
                 /  \ 
   Classification    Version Spaces 
 
 
also, 
[Analogical Reasoning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Case based Reasoning 
                  || 
        Explanation based Learning 
                  || 
        Analogical Reasoning 
 
 
 

# preview 

- learning by recording cases in general 
- nearest neighbor method 
- k nearest neighbor (KNN) 
- cases in the real world 
 
 
## 
## learning by recording cases in general 
## 
 
a case = an encapsulation of a past experience. 
 
e.g.   tying shoe laces, programming, medical diagnosis 
 
===> human can do it easily even if it's new shoes, a new problem, a new patient. 
===> because we have records of so many past cases, and can pick the most similar cases (if not the exact match) and apply the same solution. 
 
 
## 
## how to measure/represent "most similar" ?  -  Nearest Neighbor method 
## 
- knowledge representation is crucial. 
- using color block example problem (video) https://www.youtube.com/watch?v=tCr_zXkP0Lo 
-- where we have differently sized/colored rectangles. 
-- when given the size of an unknown rectangle, how do we know which known color rectangle is the most similar? 
 
===> one way is to have a two dimensional grid. X-axis = width, Y-axis = height 
===> simply compare the Euclidean distance. d = sqrt{ (Yp - Yc)^2 + (Xp - Xc)^2 } 
where c = cases, p = new problem 
 
====> obviously, real-world problems are not that easy to represent. 
 
e.g. car navigation system that can only use past recorded traveled cases. 
(video)  https://www.youtube.com/watch?v=xZ6CbqxNf98 
(video)  https://www.youtube.com/watch?v=gq5Kl-STi_g 
==> two two-dimensionalgrids for src/dest each. take the min(euc_dist(srcC,srcQ) + euc_dist(destC,destQ)) route. 
==> dont confuse dimensions and neighbors. it's two dimensional, but neighbor k = 1. 
 
## 
##  k nearest neighbor (KNN) method 
## 
- just generalizing the two dimensional Euclidean distance equation, to k nearest neighbors. 
 
(video)  https://www.youtube.com/watch?v=Wrgt4UnsIEA 
 
d = sqrt{ sum(i=1,k){(Ci-Pi)^2} } 
 
where c = existing cases, p = new problem 
 
====> called "k nearest neighbor(KNN) method" 
 
limitations with KNN: 
(1) neighbors/dimensions needed for some problems may be too high. 
(2) even if the new prob is very close to the existing case, it does not mean we can directly apply the same solution from past cases to the new case. 
 
===> we will need both 
- alternative methods for retrieving cases from memory 
- adapting past cases to fit the requirements of the new problem 
===> this is exactly the topic of "Case based Reasoning" lecture. 
 
 
## 
##  Assignment : how would you use Learning by Recording Cses to design an agent that can solve RPM? 
## 
 
- think of what constitutes a case in RPM. each figure? each transformation? each problem? 
- how to evaluate the "similarity" ?  if figure = case, then how do you measure the similarity of two figures(transformations/problems)? 
 
===> again, strategy to overcome the limitations is addressed by "Case based Reasoning" lecture 
 
## 
##  cognitive connection 
## 
- learning by recording/storing cases in memory itself is how human cognition works. 
- we human, as cognitive agents, are situated in the world. our interactioins of the world have certain patterns of regularity. 
- recall our cognitive architecture. (1:reasoning, 2:memory, 3:learning) 
--> Learning by Recording Cases tells us how we use (2)(3) a lot, not just (1) 
 
 
 
 
######################################### 
######      Case based rasoning     ##### 
######################################### 
 
recall 
 
[Analogical Reasoning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Case based Reasoning 
                  || 
        Explanation based Learning 
                  || 
        Analogical Reasoning 
 
 

#  case based reasoning 

- agent addresses new problems by tweaking solutions to similar previously encountered problems. 
(so it really builds onto "learning by recording cases" which could only deal with identical cases) 
(so "case based reasoning" is more flexible) 
 
phases: 
(1) retrieval    : retrieve a case from memory similar/identical to the current new problem (like KNN) 
(2) adaptation   : adapt the solution to that past case to fit the current problem (this is what "learning by recording cases" lacked) 
(3) evaluation   : evaluate how well the adapted solutions address the current prob. (e.g. just walk that new route in navigation system) 
(4) storage      : store the new prob and solution as a case 
 
====> notice how CBR combines Memory(retrieval), Reasoning(adaptation & eval) and Learning(storage). 
 
 
## 
##   assumptions for CBR 
## 
- patterns(of regularity/repetition) exist in the world. 
- similar problems have similar solutions  (does not always hold. obviously there are exceptions) 
 
## 
##  Adaptation in CBR 
## 
- recall a fundamental conundrum in KBAI: problem is so complex. KBAI has only limited computational resource, but has to perform near-real-time. part of the answer is "memory" supplies an answer. Not the exact answer, but close enough so we just need to adapt/tweak a little bit. 
 
e.g.  when you write a new program that does file I/O, you probably won't write from scratch, just re-use previously written code, and modify only the filename, path, etc. 
 
===> cliche: all design is redesign. 
===> pretty much how evolution works. small changes from prev generation. 
 
Question: how do we make adaptation? 
 
===> there are several ways. we will look at three common ways. 
 
(1) model based 
(2) recursive reasoning based 
(3) rule based 
 
 

# (1) model based adaptation 

- suppose you want to go from Home to Restaurant for the first time. you retrieve a past case Home-Office where Office is close to Restaurant. You just use the "model" of the map you have, to search a path Office-Restaurant. 
- i.e. you had a past case solution, which you adapted using the model of the world. (in this case was a map model) 
 
e.g.  suppose you've done file IO in python many times. and now doing in java for the first time. 
but if you know java API as your model, then you can pretty much translate your past python case solution to java code. 
 
 

# (2) recursive reasoning based adaptation 

- suppose you wanna go from Home to Restaurant, but don't know how. 
- but you know how to go from Home to Office, from Office to Restaurant. 
- so you retrieve a case for Home-Office, and a case for Office-Restaurant. 
- once you pull the case for Home-Office, which works as partial solution, then you can defint the subgoal for Office-Restaurant, for which you pulled the second case. 
- another similar example is when you write file I/O code for the first time, but have written input only code, and output only code, then you can divide them into subproblems, and recursively solve each. 
 
 
- compound analogy (a specific type of recursive reasoning adaptation) 
(ref) 
Vattam, S. S., Helms, M. E., & Goel, A. K. (2008). Compound analogical design: interaction between problem decomposition and analogical transfer in biologically inspired design. In Design Computing and Cognition'08 (pp. 377-396). Springer Netherlands. 
 

# (3) rule based adaptation 

- uses heuristics expressed as rules. aka rule of thumb. 
e.g. 
when you move to a new city, and want to find a big mall. your heuristic knowledge/rule says find the downtown, which is where big buildings are. so look for big buildings. (doesn't always work, but often works, at least in USA) 
 
e.g. how to find a way from Restaurant to Home? 
retrieve a case for Home-Restaurant if heuristic/rule says just flip all the turns for the opposite direction case. 
--> again, heuristic/rule in this context is something which we expect doesn't work always but works often. 
 
e.g.  in case of File input code, lets say you retrieved a prev case where you did your file input byte-by-byte. 
but suppose your heuristic/rule says for better performace purpose of the current problem, do array-by-array read. 
 
 
## 
##  Evaluation in CBR 
## 
- given solution candidates from Adaptation phase, how do we assess them? 
- well, simply execute or simulate, depending on the cost of each. 
 
e.g. you want to go Restaurant-Home, and a candidate is take the past case of Home-Restaurant, and reverse all turns. 
simply drive the path actually, but then you may find that some road is one-way, then you know the candidate solution failed. 
 
e.g. when you write a new program, just execute it, to see if it works. if it works, then proceed to the storage phase, but if it fails, then go back to adaptation or even retrieval phase. 
 
 
## 
##  Storage in CBR 
## 
- encapsulate the problem/solution as a case. 
- accumulating these cases as memory, is in fact "learning". 
 
two ways for storage: 
(1) indexing 
(2) discrimination tree 
 

# (1) storage by indexing   (aka tabular storage) 

- index = tag. to identify each case, which allows for effective and efficient retrieval. 
 
e.g.  map of Office-Restaurant routing. 
we might store past cases by origin/destination coordinates. e.g. Ox, Oy, Dx, Dy. 
also we might want to add "fast/slow" because some path is a long distance. 
 
e.g.  file I/O programs 
we might index/tag them by "langugae(python/java, etc)" "fast/slow" "input/output" etc 
 
===> gotta be careful about data structure, "effective" and "efficient" can become a tradeoff relation. 
 
 

# (2) storage by discrimination tree 

- a data structure where leaf nodes are cases. 
- all intermediate/root nodes are questions that describe various cases, which helps indexical structure of cases. 
 
e.g. 
          is origin noth of a coordinate 5n ? 
                        /\ 
                   yes /  \  no 
                      /    \ 
             east of 5e?    east of 5e? 
                 /\             /\ 
             yes/  \ no    yes /  \ no 
               /    \         /    \ 
cases:        A      B       C      D 
 
===> imagine a new case "E" pops up, we place into the tree as below. 
 
e.g. 
          is origin noth of a coordinate 5n ? 
                        /\ 
                   yes /  \  no 
                      /    \ 
             east of 5e?    east of 5e? 
                 /\             /\ 
             yes/  \ no    yes /  \ no 
               /    \         /    \ 
cases:        A      B       C      D,E          ## added "E" here 
 
====> but we need to discriminate cases. so we add another question. modify the tree. i.e. organization of cases memory changes. 
 
e.g. 
          is origin noth of a coordinate 5n ? 
                        /\ 
                   yes /  \  no 
                      /    \ 
             east of 5e?    east of 5e? 
                 /\             /\ 
             yes/  \ no    yes /  \ no 
               /    \         /    \ 
cases:        A      B       C    east of 3e? 
                                  y /\ n 
                                   E  D 
 
 
====> done. notice how discrim tree is more efficient(less space compared to preparing matrix for all coordiates) and effective( O(logN) search ) 
(this is actually "Incremental Learning" example. with the addition of each case, some new knowledge structure is learnt.) 
 
 
 
 
e.g.  discrimination tree for file IO program 
 
             what language? 
                 |  |    | 
           python/  java  \perl                ## notice discrim tree doesn't have to be binary. 
                /   |      \ 
               / 
         fast/slow? 
       big/small prob? 
            .etc 
 
 
 
## 
##  retrieval (revisited) 
## 
 
for storage by indexing (table way), we can use KNN method. 
(video) 
https://www.youtube.com/watch?v=5rVnKBqNfNk 
 
for storage by discrimination tree, just apply the new guy thru the tree questions, and the case you reach is the case you need to adapt. 
(video) 
https://www.youtube.com/watch?v=K532zk4JIWA 
 
 
## 
##  advanced CBR 
## 
 
we just looked at the linear phases. 
 
(1) retrieval 
(2) adaptation 
(3) evaluation 
(4) storage 
 
but, in reality, these steps are intertwined. 
e.g. 
- (1) retrieved case matched exactly, and can skip (2)adaptation, and go directly to (3)eval 
- adaptation failed, go back to retrieval 
- eval failed, then go back to adaptation or retrieval 
- also, do we store only when eval succeeds?  no, sometimes it is meaningful for learning to store eval-failed case. 
- in fact, we don't need to store every success case. memory becomes large, retrieval becomes inefficient. (aka "utility problem") : we want to store only cases that help us cover a large span of problems. even when a case succeeds, only store it if it's noteworthy/entrusting. 
- should we forget cases? or abstract individual cases somehow ? (we will revisit this later in this course) 
 
 
 
## 
##  Assignment: how would you use CBR to design an agent that can solve RPM? 
## 
- discuss how it's different from using "learning by recording cases" alone. 
- what is your adaptation phase? how are you adapting past solutions to the new problem? 
- what is evaluation in this context? how are you evaluating your answer? 
- do you record cases as the agent encounters problems?  OR will you record some cases in advance before the agent sees its first problem? 
 
 
how can CBR co-exist with G&T? 
 
-------  TA David's advice: 
try to use them in conjunction with one another. It may be the case that for the simple problems, your Generate & Test method is certainly adequate, but it starts to be limited for the later problems. Your agent might, for example, apply G&T on a problem, then check to see if it got the answer right. If not, it might store that problem as a case in its library. Then, on future problems, it might judge the similarity of the new problem to the case library, and if it finds a close match, it might try to apply that case's reasoning instead of its typical G&T approach. You'll be guaranteed to face at least two similar cases for every Basic problem (first the Test problem, then the Unordered problem), so that approach might pay significant dividends. 
 
That approach starts to be very similar to what we saw in our Production Systems lesson -- your agent might have a set of rules (which govern its G&T approach) that it uses most of the time, but occasionally it will encounter a problem for which those rules don't work. In that case, it returns to its memory, "chunks" the reasoning that would have solved a problem in the past, and applies it to the new problem. 
 
That's one way in which a CBR agent might augment, instead of replace, a G&T agent -- I'm sure it's just one of many y'all could think of. 
-------- 
 
using Learning may degrade performance of an Agent. 
-------- TA David's advice: 
really interesting -- that reminds me of an observation made in developmental psych. At a young age, children were observed speaking correct grammar sentences like "I am", "You are", and "He is". However, later those children started to commit errors like "I are" and "He are". Initially, the kids were just repeating what they had heard before -- they had no notion that 'am', 'are', and 'is' were versions of the verb 'be'. Once they started to learn the rules, however, they started to make errors because they were trying to apply the rules, not just repeat the known clauses. So, the mistakes were actually indicative of and caused by learning. Even though the results were flawed, the reasoning that went into developing those results was improved. 
 
It sounds like something similar might be happening here. The agent learns some new information and is thus better overall, but that causes a bit of confusion on simpler problems as it overcomplicates them. It might then need to learn not only how to solve new problems, but also when to apply newly-learned rules instead of just the ones it uses by default on simple problems. That gets toward strategy selection, part of metacognition. 
-------- 
 
 
 
## 
##  CBR and cognitive connection 
## 
- CBR as part of "analogical reasoning" which is a core part of human cognition. 
- "analogical reasoning" depends on the "similarity" spectrum. at one end, new prob is idendical to one of the past probs. at the other end, the new prob is totally dissimilar, in the middle it's similar (which we can adapt/eval). in fact it's this middle of the spectrum that is common to human cognition. 
- recall cognitive architecture: (1) reasoning (2) memory (3) learning 
- recall how "learning by recording cases" shifted our focus to (2)(3). but CBR now unifies them clsely. 
 
(1) reasoning  for adaptation,eval 
(2) memory for retrieval 
(3) learning for storing experiences 
 
 
 
 
################################################ 
######    Incremental Concept Learning     ##### 
################################################ 
 
recall 
[Learning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Incremental Concept Learning 
                  || 
                  /\ 
                 /  \ 
   Classification    Version Spaces (also, scripts) 
 
 

# purpose of ICL 

in LRC and CBR, we stored cases/examples in memory. here in ICL, we study how we can extract an abstract concept out of those cases/examples. 
 

# phases 

(1) variabilization 
(2) specialization 
(3) generalization 
 
===> (2)(3) may involve some heuristics 
 
 

#   concept example 

 
(video) https://www.youtube.com/watch?v=9I4qta2zeqU 
 
====> it's all about how to interpret the background knowledge. 
(1) learning is often incremental. 
(2) each example is often labelled. i.e. there is a teacher figure/entity that tells this is a circle or not. (called supervised learning in machine learning) 
(3) examples may come in random orders. 
(4) unlike CBR, in ICL, we dont store past cases/examples in raw form, we abstract into a concept. 
(5) the number of examples is small. 
(6) when abstracting/exracting a concept, balance between "specialization" and "generalization" is hard. 
 
 
## 
##  incremental concept learning 
## 
 
basic algo as below: 
 
                   (given a new example) 
               Is this an example of the concept? 
                             /\ 
                        yes /  \ no 
                           /    \ 
    does it fit the current      does it fit the current 
   definition of the concept?     definition of the concept? 
              /\                             /\ 
          yes/  \no                      yes/  \ 
            /    \                         /    \ 
   do nothing    generalize       specialize     do nothing 
 
 
 
===> negative examples lead to specialization 
===> positive examples lead to generalization 
 
e.g.  a child who has seen only black cats. so his concept of a cat is always black. when he sees a white cat, he needs to generalize his concept of a cat. 
 
 
## 
##  variabilization 
## 
(video) 
https://www.youtube.com/watch?v=ijN249s6bvY 
 
imagine the first (positive) example of an "arch"  with four bricks. 
 
[brick A] 
[brick B] 
||     || 
||     || 
C      D 
||     || 
||     || 
 
 
===> suppose an agent builds semantic nets, each brick[ABCD] will be just a brick, i.e. variabilized. 
===> constants -> variables 
 
 
      brick 
        /\ 
        || (supports) 
      brick 
      /\  /\ 
      ||  || (support) 
  brick -> brick 
     (left-of) 
 
 
## 
##  generalization by "drop-link" heuristic  (aka ignore feature) 
## 
 
in continuation from the above arch example, suppose another positive-labeled example. 
 
 
[brick A] 
||     || 
||     || 
B      C 
||     || 
||     || 
 
 
===> now the agent can generalize the model/concept by dropping a link, specifically the top brick. 
===> called "drop-link" heuristic 
 
      brick 
      /\  /\ 
      ||  || (support) 
  brick -> brick 
     (left-of) 
 
 
 
## 
##  speclialization by "require-link" heuristic   (aka require feature) 
## 
 
in continuation from the above, suppose a negative-labeled example as below. 
 
||     || 
||     || 
A      B 
||     || 
||     ||    [brick C] 
 
 
====> based on this, we can reinforce the current SN. 
 
      brick 
      /\  /\ 
      ||  || (MUST support)    ##  we call this "require-link" heuristic 
  brick -> brick 
     (left-of) 
 
 
 
## 
##  specialization by "forbid-link" heuristic  (aka exclude feature) 
## 
 
in continuation from the above, suppose another negative-labeled example as below. 
 
[brick A] 
  |||| 
  |||| 
   BC 
  |||| 
  |||| 
 
 
 
====> based on this, we can reinforce the current SN. 
 
      brick 
      /\  /\ 
      ||  || (MUST support) 
  brick -> brick 
     (left-of) 
       <-> 
     (!touch)    ## negated touch between the base blocks. aka "forbid-link" heuristic 
 
 
## 
##  generalization by "enlarge-set" heuristic    (aka abstract feature) 
## 
 
in continuation from above, suppose another positive-labeled example as below. 
 
 
[wedge A]    #  changed to wedge 
||     || 
||     || 
B      C     #  still brick 
||     || 
||     || 
 
 
====> based on this, we can reinforce the current SN. 
 
 
   brick|wedge            # here it got more relaxed. called "enlarge-set" heuristic 
      /\  /\ 
      ||  || (MUST support) 
  brick -> brick 
     (left-of) 
       <-> 
     (!touch) 
 
 
 
## 
##  generalization by "climb-tree" heuristic   (aka generalization with background knowledge) 
## 
 
in continuation from above, suppose we are given a background knowledge as below. 
 
 
block == [brick|wedge|cylinder] 
 
 
====> based on this, we can reinforce the current SN. 
 
 
      block            # here it got more relaxed. called "climb-tree" heuristic 
      /\  /\ 
      ||  || (MUST support) 
  brick -> brick              # notice how we did NOT generalize here yet. because columns may have to be strictly bricks. 
     (left-of) 
       <-> 
     (!touch) 
 
 
===>  after dozens of iteration of genelization & specialization, we will refine the space. 
 
 
## 
## alternative visualization 
## 
(video) 
https://www.youtube.com/watch?v=HjfnbzGEAio 
 
 
## 
##  recap of heuristics for generalization/specialization 
## 
(1) require-link    # spe 
(2) drop-link       # gen 
(3) forbid-link     # spe 
(4) enlarge-set     # gen 
(5) climb-tree      # gen 
(6) close-interval  :  expand range of values to be a positive example of the concept. 
 
(6) e.g.  a child who has seen only small dogs in his life encounters a big dog, then he expand the range of "size" for dog concept. 
 
 
====> so it's a combination of incremental examples and background info. 
====> notice this is a bit diff from standard machine learning algorithms, where a large number of examples are given to begin with, so it can do cool statistical analysis, pattern recognition, etc. 
====> here our ICL can be used for small number of examples. 
 
 
===> we will leverage ICL at later lessons "classification" "version spaces" "explanation based learning" "learning by correcting mistakes" 
 
 
## 
##   Assignment : how would you use ICL to design an agent that could answer RPM? 
## 
- what are the concepts you are learning in this case? 
- what are you incrementing over? (transformation or problems?) 
--> this might depend on the scope : concept learning within a problem or across problems? 
- how are you generalizing/specializing your concept over time? 
- how you use the concepts to solve new problems? how you instantiate/leverage concept? 
 
 
## 
##  ICL and cognitive connection 
## 
- closely related to human cognition. 
- recall two learning perspectives. 
(1) agent is given a large number of examples at once to begin with. 
(2) agent is given one example at a time, and learns incrementally. 
 
===> (2) is much closer to human reality. 
 
 
 
 
 
################################### 
######    Classification     ###### 
################################### 
 
- such a fundamental topic in AI  because it allows us to select actions, which is considered a core characteristics of intelligence. 
 
recall 
 
[Learning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Incremental Concept Learning 
                  || 
                  /\ 
                 /  \ 
   Classification    Version Spaces 
 
## 
## Classification is about mapping sets of percepts in the world into equivalance classes, so the agent can decide actions. 
## 
- equivalence classes and hierarchies 
- kinds of concepts 
- botton-up and top-down processes 
 
 
## 
## classification (really builds on ICL) 
## 
 
recall    (Cognitive system) architecture 
 
               metacognition 
                  |||||| 
input     ==>   deliberation   ==>  output 
(percepts)        ||||||            (action) 
                 reaction 
 
where, deliberation = (reasoning + learning + memory) 
 
e.g.  lets say you have a concept of Bird from ICL. then you see another example, this time without any teacher labelling it positive/negative, would you classify(label) it as a bird or not? 
 
e.g.  percepts: 
- has wings? 
- has featehrs? 
- has talons(claws)? 
- has a beak? 
- flies? 
etc 
 
lets say there are N percepts, then (assuming each percept is binary) we have 2^N combinations of percepts 
 
e.g.  action: 
 
lets say there are M actions, then we have 2^M combinations of output. 
 
===> classification lets the agent map a certain combination of percepts into a certain set of actions. 
 

# challenge = as the number of percepts/actions increases, mapping computation cost increases huge. 
#             how do we select actions in near real time?  (solution) "Equivalence Classes" 
 
## 
##  Equivalence Classes(concepts) 
## 
instead of computing one giant table of 2^N * 2^N, we first reduce percepts into fewer concepts as below. 
 
 
2^N percepts --> k concepts --> 2^M actions 
 
 
e.g. a patient may have millions of symptoms, but a doctor may classify them into fewer diseases. then he can give the right medicine for each disease(concept). 
 
 
## 
##  concept hierarchy 
## 
 
        vertebrate 
       /    |     \ 
  reptile  bird   mammal 
          /  |  \ 
  bluebird eagle penguin 
 
 
===> it's a top-down approach.  establish-then-refine loop. 
===> one benefit of this approach is it helps us figure out which variables to focus on. e.g. when classifying a vertebrate into reptile/bird/mammal, we dont care about a variable "size" but when classifying a bird into bluebird/eagle/penguin, "size" matters. 
 
===> each concept/class is a piece of knowledge. and their relations/organization is another useful piece of knowledge for reasoning. 
 
e.g.  how would you characterize the "bird" class ? 
(video) 
https://www.youtube.com/watch?v=dncw9Zi9YD0 
 
- has wings?              yes 
- has featehrs?           maybe  (because it can go either way, depending on birds) 
- has talons(claws)?      maybe 
- has a beak?             maybe 
- lay eggs?               yes 
- has a horn?             no 
- flies?                  maybe 
etc 
 
====> class for specific birds are "sub-class" and some features belong to its super class. just like OOP. 
 
 
## 
##  kinds of concepts 
## 
 
look at the below spectrum of formality. 
the more formal, the more logical explicit conditions we can define. 
 
 
                 axiomatic concept      prototype concept      exemplar concept 
(more formal) <-----------------------------------------------------------------> (less formal) 
              e.g. (math formula)                              (beauty, freedom) 
 
 
## 
##  Axiomatic concepts: 
## 
- concepts defined by a formal set of necessary and sufficient conditions. 
 
e.g. a circle. a triangle. (geometric shape) 
 
circle: all points in a plane that are equidistance from a single point. 
 
 
## 
##  Prototype concepts: 
## 
- base concepts defined by a typical example with overridable properties. 
 
e.g.  a chair 
 
we all have our notion of a typical chair. lets represent concepts(content=properties) in a frame form. 
(recall frames are used to (1)represent stereotypes, (2)set default values, (3)define inheritance. so it's closely related here) 
 
Chair: 
 - # of legs   : 4 
 - has-back    : true 
 - material    : wood|metal 
 - has-arms    : false 
 - is-cushioned: false 
 
===> only typical properties which can be overridden in case of certain instances. 
===> but prototype ceoncept helps communicate a chair concept. 
 
Stool: 
 - # of legs   : 4 
 - has-back    : false        ## this property got overridden 
 - material    : wood|metal 
 - has-arms    : false 
 - is-cushioned: false 
 
===> Stool which is a kind of chair, so it inherits all the properties, but overrode one property in this case. 
 
 
## 
##  Exemplar concepts: 
## 
- concepts defined by implicit abstractions of instances, or exemplars(examples), of the concept. 
 
e.g.  beauty, freedom 
 
- hard to define, communicate, teach an AI agent.  sometimes culture/individual specific. 
 
aside: even less formal concept is called "qualia" which is studied by philosophers. 
       qualia refers to the raw sensations we may get from our sensors. e.g. bitterness from fruits, chocolate, we feel in our mouth. 
       but it is hard to communicate that to others. 
 
 
 
## 
##  diff methods for diff kinds of conditions/concepts 
## 
- recall a periodic table of elements in chemistry. each element is simple, but combinations of them can be complex. 
- think of a periodic table of intelligence. our elements are diff representation methods, and reasoning methods. 
 
--> so depending on the kinds of concepts/conditions, suitable representation/reasoning methods are different. 
e.g. for less formal abstract concepts, CBR may be more suitable. because we cannot really abstract "inspirational" into a protptype/axiomatic concept. 
--> for more formal concepts like a triangle, CBR may be less suitable. 
 
 
## 
##  bottom-up classification 
## 
 
recall our top-down classification approach of "establish-refine", to produce a concept hirarchy. 
 
        vertebrate 
       /    |     \ 
  reptile  bird   mammal 
          /  |  \ 
  bluebird eagle penguin 
 
 
===> suited for one kind of organization of concepts. 
===> suited for one set of situation where we know something is already a vertebrate, so we can refine. 
 
 
===> now lets look at the bottom-up approach, where you know values for leaf nodes, and try to predict the value for the root node. 
===> bottom up processing is called "identify-abstract" 
 
                       future DowJones industrial average 
                      /               |                 \ 
                    GDP            inflation           employment 
                   / | \            /       \          /       \ 
           overtime  | manfacture  jobless commod   job        actual 
             hours   | index       claims  index    vacancy    work hours 
                  consumer 
                  sentiment 
                  index 
 
 
## 
##  Assignment: how would you use classification to design an agent that can solve RPM? 
## 
- how to develop a classification scheme(concept hirarchy)? pre-program? or let the agent learn thru ICL? 
- what does that classification scheme look like? 
- what percepts are used? to produce the classification scheme? how? 
- once created, how does the clasification scheme help solve the problem in a way we couldn't before? 
 
 
## 
## classification and human cognition 
## 
- such a fundamental topic in AI. because it allows us to select actions, which is considered a core characteristic of intelligence. 
- everywhere in our lives. like doctor's example. or you walk on the street, and see a car, and identiies its type, a toyota, etc. 
 
 
 
 
############################################### 
#####     Explanation-based Learning      ##### 
############################################### 
 
recall 
 
[Analogical Reasoning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Case based Reasoning 
                  || 
        Explanation based Learning 
                  || 
        Analogical Reasoning 
 
 
EBL: agent does NOT learn new concepts. it just learns "connections" between existing concepts. 
     EBL is used to transfer knowledge from old situations to new situations. 
 
- concept space 
- abstraction 
- analogical reasoning 
 
 
## 
##  example  (which we will use for the rest of the EBL lecture) 
## 
 
A Cup 
---------------- 
a cup is an object that is stable and enables drinking. 
 
 
A given object 
---------------- 
this object is light and made of porcelain. it has a concavity and a handle. the bottom is flat. 
 
 
===> how does an AI agent prove the object is a cup? 
 
 
## 
## concept space & prior knowledge 
## 
 
concept space: space of information that enables us to draw inference/connections about existing concepts. 
 
- how does an AI agent build an explanation? 
- lets say an explanation for "the object is a cup" given some facts/properties as below. 
- what prior knowledge does an AI agent require? 
 
e.g. 
 
object --has--> bottom 
                bottom --is--> flat 
object --made_of--> Porcelain 
object --has--> concavity 
object --is--> light 
object --has--> handle 
 
 
==> prior knowledge of concepts (such as a brick, a glass, a briefcase, a bowl) 
 
A Brick 
--------------- 
A brick is stable because its bottom is flat. A brick is heavy. 
 
 
====> lets render the characteristics of a brick in a visual representation 
 
     Brick --is--> stable       ## functional(factual) feature 
             || 
             || 
         (causality)            ## "because" causal relation is captured here 
             || 
        ============== 
        ||          || 
Brick -has->bottom  ||          ## factual feature 
            bottom-is->flat 
 
     Brick --is--> heavy        ## notice this is just an independent piece 
 
 
(video) more example 
https://www.youtube.com/watch?v=u6OU9q-3lpk 
 
 
## 
##  abstraction 
## 
 
we will abstract (transferrable) causal explanations from prior knowledge. (then we transfer those explanations next) 
 
e.g.   knowledge representation of the characteristics of a bowl instance. 
 
    Bowl --carries--> liquids 
             || 
         (causality) 
             || 
    Bowl --has---> concavity 
 
    Bowl --has---> cherry soup 
 
 
======> an AI agent might extract knowledge as below. 
 
 
    Object --carries--> liquids 
             || 
         (causality) 
             || 
    Object --has---> concavity 
 
 
====> so (1) it only extracted the causally related features. 
         (2) replaced "Bowl" with "Object" 
 
====> thus the "explanation" is a causal explanation, thru which the AI agent will try connect the object into a cup. i.e. "the object is a cup" is our goal explanation. 
 
 
## 
##  transfer 
## 
 
- how does an AI agent build the explanation for the object is a cup? 
 
(video) 
https://www.youtube.com/watch?v=yRDp2Gd0Shw#t=40 
 
 
- assume we have abstracted causal explanations for each of Brick, Bowl, Glass, Briefcase 
- by combining each causal explanation set, we can construct the below big causal explanation tree. 
 
 
                                                   object is cup 
                                                          || 
                               =========================================================== 
                               ||                                                       || 
                     object enables drinking                                     object is stable 
                               ||                                                       || 
             ==========================================               object has bottom && bottom is flat 
             ||                                      || 
    Object carries liquids                  Object is liftable 
             ||                                      || 
    Object has concavity             object is light && object has handle 
 
 
 
===> an AI agent starts from the top. it looks at two features of a cup. and searches causal explanation sets to satisfy each feature, recursively traverses down the tree. 
===> notice how the above process involves the element of  "problem reduction"  and  "planning (open precondition)" 
 
 
(another exercise example) https://www.youtube.com/watch?v=G8Etdbit5oM 
                           https://www.youtube.com/watch?v=oR3yxesPe6g 
 
===> so, what knowledge does an AI agent need? not so much about the amount of knowledge, but the right knowledge that lets the AI agent build the explanation tree to prove the object is something. 
 
===> in the above example, when it starts from the top, it asks what do i need to know to show "object enables drinking" and "object is stable" ? and then it goes searching for that knowledge. 
===> also, as shown in the last video, depending on the background knowledge available to the agent, AI agent will opportunistically build the right(diff) kind of casual proofs. 
 
 
## 
##  EBL in real world 
## 
- you use a mug instead of a paperweight 
- you use a bowl instead of a coffee cup 
 
===> people encounter novel (new/diff) situations all the time, and need to exhibit EBL type of creativity. (yes, "creativity" is related to EBL) 
 
how? 
--- as above, we dont need new concepts, just find new connections between them to build explanations. (called "Speed Up" learning) 
 
 
 
## 
##  Assignment : how would you use EBL to design an agent that can solve RPM? 
## 
- what are you explaining?  answer?  or transformations? 
- what new connections are you learning? 
- is the learning performed within the problem? or across problems where new transformations/problems can be connected? 
- e.g. you encountered two problems, one for rotation, one for reflection. now the new prob contains both. how do you use those earlier problems to explain the answer to the new problem? 
 
 
Learnt about explanation-based learning, specifically concept space/prior knowledge, and abstraction & transfer. 
 
 
## 
##  EBL and human cognition 
## 
- common cognitive tasks seen in everyday life. like using a coffee mug as paperweight. 
- human cannot explain everything. only conciously accessible stuff. 
- some explanations can lead to mistakes, interfere with reasoning. 
- causal explanations are important in human reasoning(trust) 
- leads to "analogical reasoning" "diagnosis" "correcting mistakes" 
 
 
 
 
########################################### 
#####      Analogical Reasoning       ##### 
########################################### 
 
[Analogical Reasoning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Case based Reasoning 
                  || 
        Explanation based Learning 
                  || 
        Analogical Reasoning 
 
 
- AR : understanding/addressing new problems in terms of familar problems, by transferring knowledge of relationships (from known problems across domains) 
 
- the idea of "transfer" was discussed in EBL, but we discuss a more generalized "transfer" in AR. 
 
# preview 
- similarity of CBR 
- AR process (retrieval, mapping, transfer) 
- design by analogy 
 
 

#  similarity 

given a sentence "a woman is climbing a ladder." 
how similar are the following sentences? 
 
- a woman climbing a set of stairs. 
- an ant walking up the wall. 
- a woman clibming the corporate ladder. 
- a water bottle sitting on a desk. 
- a plane taking off into the sky. 
 
 
===> how did you rank? based on what criteria? 
===> a few possible dimensions. (relationships, objects, features of objects, values of the features of objects participating in relationships.) 
 

# Cases revisited 

in LRC, we talked about KNN method for retrieval. 
in CBR, we talked about indexing and discrimination tree for storage. 
 
==> both can be used for similarity measurement. 
==> but we assumed both old & new problems are in the same doamin. 
==> here in AR, we consider old & new problems in different domains. e.g. "a woman climbing up a ladder." and "an ant walking up a wall." 
 
==> need a cross-domain analogies. how? 
 
 
## 
##  cross-domain analogy 
## 
(video) https://www.youtube.com/watch?v=P80QbQz405A 
 
e.g. 
imagine a patient with a tumor and a physician with a laser gun. we need to remove the tumor to save the patient. 
but laser is strong that before reaching the tumor, the laser also kills cells between the tumor and the skin, thus killing the patient. 
what should we do? 
 
imagine a diff example where a rebel army needs to attach a castle to which there are four roads all equipped with mine bomb. 
if the entire army proceed onto one road, they all get blown to death. but if they divide into four sub groups, and each proceeds on a diff road, then because of the scarcity, no mine bomb would go off, and all sub groups of the army reach the castle at the same time, and successfully take over the castle. 
 
===> now revisit the paitient/physician story. 
===> perhaps you can now come up with an answer that the physician should prepare a few laser guns, each with less intense laser strengh. each laser goes thru a diff angle, all intersecting at the tumor location, thus only critically destroying the tumor cells. cool! it's resolved. 
 
 
===> now re-consider the a woman and an ant example. objects are diff (woman-ant, ladder-wall) but the relationship is similar (climbing up) 
===> so, the cross-domain analogy is where objects/features can be diff, but similarity is based on the relationship of obj/features. 
===> it is the relationship that gets transferred from the source case to the target problem. 
 
 
## 
##  spectrum of similarity 
## 
 
                LRC           CBR           AR 
              =======   ================  ======= 
Relations     similar   similar  similar  similar 
Objects       similar   similar  similar  dis-sim 
Features      similar   similar  dis-sim  dis-sim 
Values        similar   dis-sim  dis-sim  dis-sim 
 
more similar <----------------------------------------------->  less similar (diff) 
(identical) 
 
 
====> in a way, LRC & CBR are part of AR. but AR = cross-domain, LRC&CBR = within-domain 
 
 
## 
##  process of AR 
## 
 
(1) retrieval 
(2) mapping        ## because it's cross-domain, we need to know the correspondance, of what in src to what in target. laser beam = rebel army. 
(3) transfer       ## only after mapping, we can try transfer the relationship from src to tar for the mapped objects. 
(4) evaluation 
(5) storage 
 
===> note: some argue (2) is part of (1) 
 
===> very similar to CBR process 
(1) retrieval 
(2) adaptation     ## because it's within the same domain, obj/relations are the same, we only needed to adapt. 
(3) evaluation 
(4) storage 
 
 
## 
## analogical retrieval 
## 
- similarity detection aka retrieval was easy with within-domain problems. we used KNN. 
- but in cross-domain problems, how do we do it? what criteria? 
 
(video) https://www.youtube.com/watch?v=Cmsn8oVpSD4#t=44 
 
 
[Superficial Similarity]                   VS     [Deep Similarity] 
- features(size, fill, etc)                       - relationships between objects ("above" "inside" "left-of") = binary relationship 
- obj counts (number of circles, etc)             - relationships between relationships betweeen obj(comparing transformation of A-B versus C-N) 
- objects (circle, square, triangle)                = higher order/tertiary relationship 
== unary relationship 
 
 
====> so, in recap, as the relationship goes from unary,binary,tertiary, we say the similarity becomes "deeper". 
 
 
## 
##  types of similarity 
## 
 
- semantic   : conceptual similarity between the src case and the target porblem. 
- pragmatic  : similarity of external factors such as goals. (killing tumor and siezing castle) 
- structural : similarity between "representational(relational)" structures. (example) https://www.youtube.com/watch?v=iTzu3JhLIxg 
 
====> we can assign weights to each of them. 
 
 
(example quiz of analogical retrieval) https://www.youtube.com/watch?v=NrvrnpnDhzk#t=16 
                                       https://www.youtube.com/watch?v=H4tPp84Avsc 
 
===> notice "retrieval" = similarity detection/measurement. 
 
 
## 
##  analogical mapping 
## 
 
look again at killing-tumor and seazing castle problem. 
how did we map the laser beam and the rebel army? 
between the src case and the target problem, there are N*M mapping patterns (if you use simple Generate & Test) = inefficient. 
but, in analogical mapping, we do a more efficient mapping based on "relationship" like goals, obstacle, resource, etc. 
(or like "inside" "left-of" "unchange" etc in RPM) 
in fact, we try to use a deeper relationship as much as possible. 
 
(analogical mapping example quiz)  https://www.youtube.com/watch?v=0-RDAQWXHK4 
 
===> quiz shows it's crucially important to have the right representation of obj/relationships for analogical mapping to happen. 
 
 
## 
##  analogical transfer 
## 
 
(video) https://www.youtube.com/watch?v=i1Z41epBsDk 
 
again, look at tumor-castle example. 
 
given the tumor problem, 
- analogical retrieval picked the castle case. 
- analogical mapping established the correspondance. (tumor-castle, army = laser gun, mine bomb = cells between skin and tumor) 
- analogical transfer transfers the solution in the castle case: 
--- decompose the army = decompose the laser guns 
--- send each sub army from diff roads = shoot each laser from diff direction 
 
 
=====> now think in terms of RPM. retrieval is not hard. in fact, mostly it's already given. we look at A-B, and C-N 
=====> mapping is hard. it can be "shape" based, or higher order, like "inside/left-of/above" 
=====> transfer is just apply the transformation from A-B to C, to get D(s) 
 
 
(analogical transfer quiz)  https://www.youtube.com/watch?v=iTzu3JhLIxg 
 
==> good example of transfer done based on structural similarity. 
 
 
## 
##  evaluation and storage in AR 
## 
 
- pretty much the same as CBR 
- storage can add to our case pool for later reuse 
 
 
## 
##  cross-domain AR in action (real world example) 
## 
 
(video) 
https://www.youtube.com/watch?v=aj1etGiPoGU 
 
e.g.  the shape of the head of shinkansen(bullet train) is inspired by the shape of the beak of king-fisher bird. 
 
bullet train goes thru two medium (tunnel and open air), and the king-fisher bird goes thru water/air. 
both adapt to shock-wave good. 
 
===> biologically inspired design (aka biomimetics) 
===> relates to the worlds of design/creativity/innovation 
 
(another design-by-AR example) 
- a robot that can walk on water. 
- for analogical retrieval, the right representation of both src/target case/problem is mandatory. 
--- e.g. structure function behavior model: init/goal states, behavior = a series of states and transitions between states. function = achieved by behaviors. then we retrieve a barsilist wizard (that can walk on water). 
- for analogical mapping/transfer, suppose the target problem is a robot that can already walk on ground. so we just need to aligh the structural/behavioral/functional models (= map specific features of the structural/behavior/function model from src to target) 
structure -> behavior -> functional (deeper) 
 
# note: retrieval-mapping-transfer-eval-storage is just a simple case step. sometimes retrieval/mapping/transfer goes back a non-linear loop, etc. 
# example : "compound analogy"      (video) https://www.youtube.com/watch?v=PWVsWf-HQ3w 
- wanna build a robot that can swin underwater in a stealthy way 
- retrieves a jelly fish which can swim stealthy at low velocity, but when it swims fast(high velocity), then no longer stealthy 
--> ok we partially got the solution for our design. the only remaining sub-problem is to swim stealthy at high velocity. 
--> do retrieval again, which yielded a squid. sub problem solved. 
 
===> notice design of the robot came from multiple source cases.  == called "compound analogy" 
(notice the retrieval/mapping/transfer process was non-linear.) 
(a good paper for this is "Vattam, S. S., Helms, M. E., & Goel, A. K. (2008). Compound analogical design: interaction between problem decomposition and analogical transfer in biologically inspired design. In Design Computing and Cognition'08 (pp. 377-396). Springer Netherlands.") 
(http://extras.springer.com/2008/978-1-4020-8727-1/Digital%20pdf/Vattam.pdf) 
 
 
- for eval, just like CBR, it can do simulation/physical-actual test, and may send it back to prev steps(retrieval/mapping/transfer) 
 
 
## 
##  open issues in AR 
## 
 
(1) common vocablary (often a challenge in cross-domain problems) 
(2) abstraction and transformation (all examples we looked at have problems "fixed" but often agents have to abstract and transform the problem before it goes into retrieval step) 
(3.1) compound analogy : where you retrieve several cases for the problem (e.g. when designing a car, you pull diff cases, one case for engine, one case for chassis, etc) 
(3.2) compositional analogy: where you use layers of abstraction. e.g. comparing two business organizations. we may compare at people layer, then process layer, and another layer, etc.  mapping at one layer supports transfer to the next level.   this is hard for AI agents. 
(4) visuo-spatial analogy: so far when we considered transfer, we looked at causal relations. but visuo-spatial analogy is where that causalty is implicit. we look at this again at a later lecture. 
(5) conceptual combination: recall how we used solar system concept to enhance the atom concept. 
 
 
## 
##  Assignment: how would you use AR to design an agent that could answer RPM? 
## 
 
- a tough question. because the agent we design only operates on one domain: RPM. so where does it get knowledge necessary to do cross-domain AR? 
- maybe instead of the agent, it is you/human doing the AR for the agent. 
- what other cases/biology inspires you on the design of your RPM agent? 
- can you put knowledge from other areas and put into agent so the agent can do AR? 
 
 
similarity, analogical retrieval/mapping/transfer/evaluation/storage, and design by analogy. 
 
## 
##  AR and human cognition 
## 
- a core process of cognition 
- we use mataphors. 
- RPM test, which is considered one of the best intelligence test, but it's entirely based on analogy. 
 
 
 
######################################## 
######       Version Spaces       ######     (related to ICL, will be revisited by Learning by Correcting Mistakes) 
######################################## 
 
[Learning] topic tree: 
 
        Learning by Recording Cases 
                  || 
        Incremental Concept Learning 
                  || 
                  /\ 
                 /  \ 
   Classification    Version Spaces 
 
 
VS - an algo that converges onto a model regardless of ordering of examples, and even without background knowledge . 
 
the issue of generalization in learning = the issue of ICL 
 
both deal with a small set of examples, one at a time. 
 
ICL uses "background knowledge" which may not be available in generalization in learning. 
in ICL, we controlled the order in which examples arrived, but that is not possible in generalization in learning. 
 
 
# preview 
- definition of VS 
- VS as an abstract technique 
- algo for VS 
- identification tress (aka decision trees = alternative method to organizating examples) similar to discrimination trees in CBR. 
 

#  ICL revisited 

 
recall in ICL; 
- negative examples led to specialization 
- positive examples led to generalization 
 
===> order of examples were important. usually a positive example first, then it was assumed that there is always a teacher who gives you the lavel of pos/neg for each example. hopefully a good mix of pos/neg examples for spe/gen to take place. 
(why ordering matters?) 
===> we assumed each new example gives "one" diff from the current/latest concept so that we can shape up/update the concept. (but this is naive. so in a way it's the weakness of ICL) 
 
===> also recall "background knowledge" of (block includes brick,wedge) which helped agent to generalize brick|wedge into block. 
 
Challenge: 
(1) ordering of examples 
(2) background knowledge availability 
 
====> if ordering is messed up, and background knowlege is unavaiable, it leads to; 
- over-generalization : may give incorrect answer 
- over-specialization : answer is correct but not useful 
 
===> VS overcomes this (under certain generalization) to do the right generalization. 
 
 
## 
##  abstract VS 
## 
 
(video) https://www.youtube.com/watch?v=O8t85EEXDW0#t=25 
 
 
- we will maintain 2 models : specific and general 
 
e.g. lets say we try to build a concept of dogs. 
 
the first example (positive) is four-legged, furry, black, named Bud. 
 
the most specific model (matches one thing) : four-legged, furry, black, named Bud 
the most general model (matches everything) : all animals 
 
positive example -> generalize the specific model 
negative example -> specialize the general model 
 
====>  prune the cases that no longer match the current data 
====>  eventually these two medels will converge (i.e. we reached the right generalization of the concept !!!) 
 
 
## 
##  visualizing VS   :   diff btw ICL and VS 
## 
 
(good visualization video) https://www.youtube.com/watch?v=ma2D6Xhc0_s 
 
 
ICL                    [current concept]                   ## moves left/right based on neg/pos examples 
VS  [spe-model] --->                     <---  [gen-model] ## gonna converge per examples 
    <---------------------------------------------------> 
(specific)                                           (general) 
 
===> shows how ICL is dependent on the ordering of examples. 
===> shows how VS is somewhat immune to this ordering. 
 
in this context, model == concept 
 
model/concept = the representation of the world, such that there is only one correspondance between the representation and what is represented. 
(i.e. from an-arch-made-of-block concept, we can build an arch, and build a representation of that particular arch) 
 
NOTE: again, we still need both pos/neg examples. 
 
## 
##  VS example - food allergy 
## 
 
lets say you go to five restraurants under different conditions, and build an allergy model(concept). 
 
Example 1 (positive) 
 
Visit1: 
- restaurant: Sam's 
- meal : breakfast 
- day : Friday 
- cost: cheap 
 
====> lets build the most spe/gen models. 
 
[spe]                          [gen] 
- restaurant: Sam's            - restaurant: any 
- meal : breakfast             - meal: any 
- day : Friday                 - day : any 
- cost: cheap                  - cost: any 
 
 
 
now, Example 2 (negative) 
 
Visit2: 
- restaurant: Kim's 
- meal : lunch 
- day : Friday 
- cost: expensive 
 
====> because it's negative, now we specialize our most [gen] case to exclude this example. 
====> four possible permutations, depending on which attribute you specialize. 
 
[gen]                [1]     [2]        [3]     [4] 
- restaurant: any -> Sam's   any        any     any 
- meal: any       -> any     breakfast  any     any 
- day : any       -> any     any        Friday  any 
- cost: any       -> any     any        any     cheap 
 
===> which one should we prune?   well, changing  day:Friday  does not exclude this negative example. so we shall prune it. 
===> we keep [1][2][4] 
 
 
Visit3(pos): 
- restaurant: Sam's 
- meal : lunch 
- day : Sat 
- cost: cheap 
 
====> now, lets generalize the [spe] model, to still include this Visit3 example. 
 
[spe] 
- restaurant: Sam's 
- meal : any              ## changed to "any" 
- day : any               ## changed to "any" 
- cost: cheap 
 
===> technically, we should've generalized mean:lunch or breakfast, day: Fri or Sat, but we streamline this for simplicity sake. 
===> now look at [1][2][4], do we need to prune any ? 
===> all of [1][2][4] should be able to "include" the Visit3 (pos) example, or else should be pruned. 
===> pruned [2], we still keep [1][4] 
 
 
Visit4(neg): 
- restaurant: Bob's 
- meal : breakfast 
- day : Sun 
- cost: cheap 
 
===> since it's a neg example, we specialize our [1][4] to exclude this example. 
===> well, [1] already excludes Bob's. so not action. 
===> [4] includes this neg example. gotta specialize [4] that excludes Visit4 AND does not violate our latest [spe] 
===> which yields a revised [4] 
 
Sam's 
any 
any 
cheap 
 
===> but this is subsumed by [1], in VS, we prune if a particular branch is subsumed by other branch that covers broader examples. 
===> in other words, we prune [4], and keep only [1]. 
 
 
Visit5(neg): 
- restaurant: Sam's 
- meal : breakfast 
- day : Sun 
- cost: expensive 
 
===> it's a neg example, lets specialize [1] to exclude Visit5, in such a way that [gen] still stays consistent with [spe] 
===> now recall the latest [spe] & [gen] 
 
[spe] 
- restaurant: Sam's 
- meal : any 
- day : any 
- cost: cheap 
 
[gen]  # aka [1] 
- restaurant: Sam's 
- meal : any 
- day : any 
- cost: any 
 
====> the only place we can specialize in [gen] is cost:cheap 
====> now we see [gen] excludes Visit5, also converged to [spe].  VS is done ! 
 
so we know the subject person gets allergy when eating cheap food at Sam's. 
 
NOTE: again, appreciate (1) ordering of examples didnt matter (2) we didn't use any background knowledge ==> diff from ICL 
    : we needed only 5 examples. not 1000 examples. but still we require both pos/neg examples. 
    : unlike ICL, examples didnt need to be different from prev example conveniently only by one feature. so VS is much more flexible. 
    : if two many examples, too many attributes, computation cost grows huge. 
    : it's possible to not converge, but it is still useful to know that. 
 
 
## 
##  VS algo 
## 
- already discussed above, lets summarize. 
 
-------------------------------------------------------------------- 
VS_algo(example) 
  if example = pos: generalize all specific models to include it 
                  : prune away general models that cannot include it 
  if example = neg: specialize all general modesl to exclude it 
                  : prune away specific modesl that cannot include it 
  prune away any models subsumed by other models 
 
-------------------------------------------------------------------- 
 
 
## 
##  more VS example 
## 
- just like the above allergy food model. 
(video) 
https://www.youtube.com/watch?v=Ewu96i_2NFE     ##  green = pos 
https://www.youtube.com/watch?v=3F3Q_jjA7Q4     ##  green = pos 
https://www.youtube.com/watch?v=zZuiVHHoXJQ 
https://www.youtube.com/watch?v=COr2vN5ylQk     ##  red = neg 
https://www.youtube.com/watch?v=3GRdF6mZAHI     ##  2 is ok.  Breakfast violates [spe] 
https://www.youtube.com/watch?v=wuGN84BF98E 
 
 
## 
##   Identification Trees (aka  Decision Tree learning) 
## 
- recall the discrimination tree in CBR. DT grew incrementally. 
---> DT structure depends on the order of examples. worst case, we will just have a list. structure is not guaranteed optimal at all. 
---> at a cost/tradeoff of having to know ALL examples at the beginning, we can have an optimal tree called Decision Tree. 
 
Visit count   Restaurant    Meal         Day   Cost      Allergy? 
-------------------------------------------------------------- 
Visit1        Sam's         breakfast    Fri   cheap     yes 
Visit2        Kim's         lunch        Fri   expensive no 
Visit3        Sam's         lunch        Sat   cheap     yes 
Visit4        Bob's         breakfast    Sun   cheap     no 
Visit5        Sam's         breakfast    Sun   expensive no 
 
 
      Restaurant 
     /    |     \ 
  Kim    Bob    Sam 
   /      |      \ 
 Visit2 Visit4   cost 
  neg    neg     /  \ 
             cheap  exp 
              /      \ 
         Visit1,3   Visit5 
           pos       neg 
 
=====> so by carefully picking attribute, we managed to build an optimal decision tree. 
=====> for Visit6, you can easily pick the most similar past visit from this tree. 
(as you can imagine, this soon becomes harder once we have many examples/attributes) 
 
==> how do we pick the attribute?  well pick ones that discriminate neg/pos most efficiently. 
 
(example)  https://www.youtube.com/watch?v=K1a0sRsr8HM 
 
so, in recap, it's a tradeoff of tree optimality and knowing all examples at the beginning rather than incrementally. 
 
 
 
## 
##  Assignment: how would you use VS to design an agent that can solve RPM? 
## 
- what concept are you learning?  transformations? or types of problems? 
- what are the increments/examples?  individual problems/figures/tranformations? 
- what are you converging onto?  converging into an answer within a problem?  or converging into a particular algo for a problem? 
 
 
## 
##  VS and human cognition 
## 
- human specialize/generalize concepts. (convergence to the right level of abstraction) 
- aka cognitive flexibility 
- relates to Learning by Correcting Mistakes. (for later lecture) 
 
 
 
 
############################################ 
#####      Constraints Propagation     ##### 
############################################ 
 
Constraint Propagation 
         || 
Visuospatial Reasoning 
 
- a mechanism of inference. agents assign values to variables to satisfy certain conditions called "constraints" 
- well relates to other topics "planning" "understanding" "NLP" 
 
# preview 
- definition 
- image/visual processing 
- natural language processing(NLP) 
- advanced problems 
 
 
## 
##  3D objects drawn in a 2D plane 
## 
 
(video) https://www.youtube.com/watch?v=kxBZ9nWLn2s 
 
a cube : a trihedral object = three planes meeting at one point(aka junction/intersection) 
poly-hedral objects : multiple planes meeting at a point, like a pyramid, a soccer ball 
 
===> relates to constraints processing 
 
## 
##   grammatically correct and semantically non-sensical sentences 
## 
 
as long as a sentence has one of; 
(1) SV 
(2) SVC 
(3) SVO 
(4) SVOC 
(5) SVOO 
 
e.g. 
colorless green ideas sleep furiously. 
wall decor notifies business cards of nonsensical whims. 
 
====> this is constraints processing. constraints of the above mentioned grammatical structures(lexicons:subject,verbs,objects,etc) of the english language. 
 
## 
##   constraints propagation definition 
## 
 
definition: a method of inference that assigns values to variables characterizing a problem in such a way that some conditions (called "constraints") are satisfied. 
 
e.g. 
   ---------- 
  /         /| 
 /         / |       # do we see this as a 2D image of some lines? 
/         /  |       # or do we see this as a 3D cube where three planes join at a point(junctions) 
----------   |       # even for 3D choice, is this a box or a building?  (note CP does not always disambiguate completely) 
|        |   |       # multiple interpretations can simultaneously satisfy all the constraints. (box/building, or puns in a sentence) 
|        |   /       # it is also possible that any assignment-pattern of values-variables does not satisfy the constraints ( tricky case ) 
|        |  / 
|        | / 
----------/ 
 
e.g. 
colorless green ideas sleep furiously. 
 
===> variables: subject,verb,adverb,predicates. 
===> values: words in the sentence 
===> constraints: grammar rules in the english language 
 
thus we must assign values to variables that satisfy the constraints(that is what the deifnition says) to be able to assess "this sentence is grammatically correct." 
 
now, recall "Understanding" lecture, where we discussed diff prepositions (from, to, with, etc) that gives "constraints" to the words that follow them. e.g. the word that comes after "from" must be some kind of "source" 
===> grammatical constraints were used in service of semantical analysis of the sentence, but here in CP, we just use those constraints to figure out if a sentence is grammatically correct or not. 
 
 
## 
##  decompose a task of 3D shape recognition into sub-tasks. 
## 
 
(1) detect edges & lines  (pixel to lines) 
(2) grouping lines into surfaces with orientations. (called "line labeling") 
(3) grouping surfaces into an object 
 
then your visual system recognize a cube.  (a related/deeper discussion in "conputer vision" course) 
 
NOTE: before we get into the algorithm choice for addressing the whole task, we should understand how the task can be decomposed into sub-tasks. 
e.g. in "understanding" lecture, we saw how surface level cues acted as probes into memory, then a frame was retrived. then frames generated expextations. lexical/grammatical analysis led to the identification of objects and predicates that satisfy the expectations. then the fillers were put in. 
 
thus, this notion of problem-reduction is important before we discuss what algo we use for the task(s). 
 
 
## 
##   constrains example 
## 
 
e.g. 
   L--(blade)-W 
  /          /| 
 b          f | 
/          /  b 
W-(fold)--Y   |     # fold is a line that connects two surfaces. 
|         |   L     # blade is a line that we cannot infer if it connects two surfaces. 
b         f   / 
|         |  b 
|         | / 
L-(blade)-W/ 
 
===> lets enumerate what edges(junctions/intersections) are present. 
 
(1) L-edge : 2 lines intersect. blade-blade 
(2) Y-edge : 3 lines intersect. fold-fold-fold 
(3) W-edge : 3 lines intersect. blade-blade-fold 
(4) T-edge : 2 lines intersect. blade-fold 
 
===> in effect, we defined spatial grammer rules for trihedral objects. 
===> look at the above cube, now a KBAI agent can recognize a group of surfaces, then a cube. 
===> notice from which edge we start the CP does not matter. 
 
note: each edge type may have multiple patterns of constraints, but here we just simplified and assumed each edge type has one constraint pattern. 
      or we may add more details for each edge constraint, about angles 
 
 
## 
## more complex visual image example 
## 
 
e.g.    is this a 3D object? 
    /| 
   / | 
  /  | 
 /|  | 
/ |  |------ 
| | /      / 
| |/      / 
| /------/ 
|/      / 
-------/ 
 
 
===>  no (if we only use the ontology of edges from prev problem) 
===>  yes, it is a folded piece of paper, like a book. the line in the middle is just a line. 
      but to be able to adapt this regonition, we need more elaborate edge constraints. 
 
(1) Y-edge  : fold-fold-fold 
            : blade-blade-blade 
(2) W-edge  : fold-blade-blade 
            : blade-fold-fold 
(3) L-edge  : blade-blade 
            : blade-fold 
            : fold-fold 
            : fold-blade 
(4) T-edge  : blade-blade-fold 
 
===> we may need more constraints pattern to handle more complex visual image. 
===> but now that adds more ambiguities, given a L-edge, how do we determine which one of the 4 possible patterns it is? 
===> to resolve some of the ambiguity, we can come up with additional "convention" that says e.g. "any lines adjacent to the background must be blade." which will immediately label some lines to be only blade, not fold. then we move on to diagnosing other edges, the rest of them also get determined. hence CP. 
===> "abduction" (we will revisit in "diagnosis" lecture) 
 
(wikipedia) 
Abductive reasoning (also called abduction, abductive inference[2] or retroduction) is a form of logical inference that goes from an observation to a hypothesis that accounts for the observation, ideally seeking to find the simplest and most likely explanation. In abductive reasoning, unlike in deductive reasoning, the premises do not guarantee the conclusion. One can understand abductive reasoning as "inference to the best explanation" 
 
 
## 
##  CP in NLP 
## 
 
e.g. 
colorless green ideas sleep furiously. 
 
- constraints: 
sentence = Noun phrase + Verb phrase 
Noun phrase = [adjectives] + Noun or Pronoun    #  [] meams optional 
Verb phrase = Verb + [Adverb] 
 
note: here we only covered (1)SV type sentence. 
 
Noun phrase = colorless green ideas 
Adjectives  = colorless green 
Noun        = ideas 
 
Verb phrase = sleep furiously 
Verb        = sleep 
Adverb      = furiously 
 
====>  once we have done grammatical analysis, then we can apply semantic analysis also. 
====>  just like visual example, line labeling, and grouping lines into surfaces/orientations, then agent can do object recognition. 
 
 
## 
##  advanced topics 
## 
 
- we only looked at the simple examples of "visual(image)" and "verbal(NLP)" but there are more complex examples. 
- also, there are "auditory" and "tactile" examples. e.g. reading braille(tenji) is an instance of CP in "tactile" sense. 
- will explore more in "visuo-spatial reasoning" lecture 
- "configuration" is an instance of CP in the context of "design" 
 
 
## 
##  assignment: how would you use CP to design an agent that could solve RPM? 
## 
 
- strongly relates to Project 4, where you have to do visual processing of RPM problems (instead of propositional representations). 
- what are constraints for the project? 
- how do you propagate them into the image to understand it? 
- after propagation, how do you use the inferences? (will you abstract out propositional representation? or still use visual reasoning to transfer the results directly?) 
 
 
## 
##  CP and cognitive connection 
## 
- relates to human cognition, it leverages our knowledge of the world. 
- a general purpose method, like MEA 
- constraints can be any kind: symbolic, numeric, etc. 
(numerical constraints example): Excel spreadsheet: one cell referencing other cells for calculation, you change one, then the constraints propagates to the rest of the connected cells. 
- CP concept was already seen in "planning" "understanding" "scripts" then we will look deeper in "configuration" 
 
 
 
 
################################# 
#####    Configuration      ##### 
################################# 
 
- a routine design task to arrange values/variables according to some constraints. 
- [assumption] : all variables/components/requirements for the task are already known. 
 
- recall the "Design & Creativity" tree 
 
          [Configuration] 
                || 
            [Diagnosis] 
                || 
             [Design] 
                || 
           [Creativity] 
 
## preview 
- desing & configuration 
- plan refinement method 
- relation to past topics (classification, CBR, planning) 
 
 

# design (wide-range, open-ended) 

- input : needs/goals/functions 
- output: specifications of the structure of some artifact(entity/object) that satisfies those needs/goals/functions 
 
note: artifact can be intangible, i.e. a process/policy/prgram 
 
e.g. design a robot that can walk on water. 
     design an economic policy to contain inflation. 
 
in problem solving, problem:fixed, solotion evolves. but in design, both problem & solution evolve. 
 
design is relevant to AI in that we want to design an AI agent that can solve problems, or even design other AI agents. 
 
 
## 
##  configuration definition 
## 
- a common form of design. 
- a problem solving activity that assigns values to variables to satisfy constraints. 
(= a routine design task to arrange values/variables according to some constraints.) 
- [assumption] : all variables/components/requirements for the task are already known. 
 
e.g. 
 
you have a room of certain width/length/height. and you want to decide a layout of furniture. 
- a TV of size X 
- a couch of size Y 
- TV must be in front of the couch 
- so on 
 
you configure paramters of your digital camera based on the target, distance, darkness, focus resolution, etc. 
 
 

# design/configuration example 

(video) https://www.youtube.com/watch?v=iZpcbHtOnLo 
 
=> arrangement of all the components that "fit" the requirements. (spatial layout) 
=> heuristics can be used to order/restrict variables. 
 
 
## 
## configuration process 
## 
 
(video) https://www.youtube.com/watch?v=G8M39axPT3Q 
 
                                                  abstraction hierarchy 
                                                         | 
specification space <----> configuration space (= solution refinement <---> arrangement model) 
                                                         | 
                                                  component hierarchy 
 
## 
##  configuration example 
## 
 
design/configure a chair. 
 
- first, we need representation (then reason) 
- lets use frames to represent components( which we assume we already have for configuration processing) 
 
e.g. 
 
Chair 
- mass: 
- cost: 
- legs: ------ Chair Legs 
                  - count: 
                  - size: 
                  - material: 
                  - cost: 
- arms: ------ Chair Arms 
                  - size: 
                  - material: 
                  - cost: 
- back: ------ Chair Back 
                  - size: 
                  - material: 
                  - cost: 
 
 
====> there may be constrains about the material, or global variable (cost of Chair) will set the upper boundary of the sum of costs of child frames. 
 

#  range of values 

 
each variable in each frame may have contraints such as its value range. 
 
e.g. 
 
Chair Legs 
 - count:  {3,4,5} 
 - size:   {5-50g} 
 - material: {plastic, wood, metal} 
 - cost: 
 
Material table: 
 material | cost per gram 
======================== 
 plastic  | $0.01 
 wood     | $0.05 
 metal    | $0.10 
 
====> once we pick the count/size/material values, then the cost value will be calculated. 
====> like so, once all child frames are filled, then the "mass/cost" of the parent frame can be calculated. 
 
lets say we got an order [aka "specification"]: a chair that has 4 legs, weighs over 200g, costs at most $15. 
we can begin filling some variables. 
 
Chair 
- mass: > 200g 
- cost: > $15 
- legs: ------ Chair Legs 
                  - count: 4 
                  - size: 
                  - material: 
                  - cost: >$5    # suppose the agent first decides cost for each before deciding others 
- arms: ------ Chair Arms 
                  - size: 
                  - material: 
                  - cost: >$5 
- back: ------ Chair Back 
                  - size: 
                  - material: 
                  - cost: >$5 
 
=====> expand further 
 
Chair 
- mass: > 200g 
- cost: $15 
- legs: ------ Chair Legs 
                  - count: 4 
                  - size: 25g 
                  - material: wood 
                  - cost: $5 
- arms: ------ Chair Arms 
                  - size: 50g 
                  - material: metal 
                  - cost: $5 
- back: ------ Chair Back 
                  - size: 50g 
                  - material: metal 
                  - cost: $5 
 
 
===> really up to the agent to have its priority of which variables to decide first. 
===> now suppose the end result configuration we got violates the specification. two choices. 
===> (1) try iterating other variable/value choices. 
     (2) change the specification itself. 
 
more example: (video) https://www.youtube.com/watch?v=YsY0IvDHuIw 
- knowledge representation (e.g. frames) that decomposes the problem in its expression so the whole problem can be addressed effectively. 
- heuristics 
- constraints 
 
 
## 
##  configuration  VS  classification 
## 
 
(1) hierarchical (either top-down or bottom-up) 
classification: establish-refine cycle of top-down hierarchy 
configuration : assign values to variables, refine/expand, then repeat the cycle 
 
(2) configuration leverages classification's notion of prototype concepts. 
 
e.g.  a prototypical concept and a chair and its sub components 
 
Chair 
- mass: 
- cost: 
- legs: ------ Chair Legs 
                  - count: 
                  - size: 
                  - material: 
                  - cost: 
- arms: ------ Chair Arms 
                  - size: 
                  - material: 
                  - cost: 
- back: ------ Chair Back 
                  - size: 
                  - material: 
                  - cost: 
 
 
note the diff: 
- classification : given certain details(percepts) of the world, and decide(classify) what they are. 
- configuration  : given something to create, we decide on individual variable assignment. 
 
 
## 
##  configuration  VS  CBR 
## 
 
- both applied to routine design problems of the kind that we've often encountered in the past. 
 
configuration: given a prototypical concept, then arrange/assign values to variables to satisfy the constraints of a particular case. 
CBR          : given the design of a specific case(like a chair), look at its variables/values, then tweak them to satisfy the constraints of the current problem. 
 
[assumptions] 
- CBR: seen many cases so it can find a similar case and tweak variables/values. 
- cfg: (supposedly we ve configured enough cases) that we know what plan abstraction hierarchy(chair frames) and variables there are. 
 
 
===> so cfg=task, CBR=method.  AI agents need to pick the right method for a given task. CBR is just one of the methods (others include Planning, etc) 
 
 
## 
##  configuration  VS  planning 
## 
 
planning will give you the skelton of the specifications of the components(variables) but without values. configuration can then work on the value assignment (thru the cfg proc we discussed above, i.e. abstraction hierarchy, refining/expanding = reasoning). 
 
also, think of "plan" as "script" learnt thru a method like ICL. 
 
before deciding a learning method, decide what it is that needs to be learnt. what kind of knowledge? 
 
 
recall    (Cognitive architecture) 
 
            metacognition 
              |||||| 
input ==>   deliberation   ==>  output 
              |||||| 
             reaction 
 
deliberation = reasoning(refine/expand) + learning(ICL) + memory(prototype concepts) 
 
 
## 
##  Assignment: how would you use Configuration to design an agent that can solve RPM? 
## 
 
- configuration as a type of constraint propagation. then how can you leverage the idea of variables/values to design an agent? 
- what are variables/values? 
- planning can be used in RPM solving. then if cfg leverages old plans, then how does your agent remember those old plans and reconfigure them for new problems?  pre-planted knowledge or LRC/CBR/ICL approach? 
 
 
 
## 
##  cfg and human cognition 
## 
- design is a common cognitive activity 
- cfg is a routine design activity (cooking, we need to arrange variables for constraints) 
- separate "task" and "method" 
- cfg = task,  method = G&T, CBR, Planning 
 
 
 
############################ 
#####    Diagnosis     ##### 
############################ 
 
- identification of false instance/faults responsible for a malfunctioning system (e.g. a car, the economy, an organism, a computer program). 
 
- recall the "Design & Creativity" tree 
 
          [Configuration] (closely related to Classification) 
                || 
            [Diagnosis] 
                || 
             [Design] 
                || 
           [Creativity] 
 
 
## preview 
- diagnosis definition 
- data and hypothesis spaces 
- mapping data to hypothesis (= diagnosis) 
- two views of diagnosis (classification and abduction) 
 
 
## 
##  medical diagnosis example 
## 
 
- given a set of symptoms(input data), and a set of diseases with known symptoms, doctors need to diagnose(classify). 
- the choice must account for all the input ([1]principle of coverage) 
- multiple answers exist. a single disease(hypoethsis) might explain all while a more complex combinations of diseases(hypotheses) might be the reality. (like Dr. House TV show.) 
-- we tend to prefer a single hypothesis over multi-combined hypothesese = [2] principle of parsimony : we want a simpler hypothesis 
- [3] hypotheses may interact between themselves. (when two diseases combined, their symptoms get mixed. e.g. high fever + low fever = normal fever, high pain + high pain = super pain, as you can see you may get a whole set of new symptoms or a set of symptoms that fit the set owned by a other single hypothesis or a combinations of diff hypotheses.) 
- [4] we want to set hypothesis that can "explain" the input data. (but what do we do if no good hypothesis or multiple hypo equally explain?) 
 
 
## 
##   diagnosis  definition 
## 
 
- diagnosis : identification of false instance/faults responsible for a malfunctioning system 
              (e.g. determine what is wrong with this device?) 
 
- assume there is a discrepancy between "expected behavior" and "observed behavior" 
- and the cause = fault responsible for the malfunction. 
 
e.g.  insert a key and ignite the engine, but the car doesnt start 
(1) CBR: last time it happened, it was the battery, and so on 
(2) rule-based reasoning(PS) : if the car doesnt start, check battery. if battery is fine then check the oil, and so on. 
(3) model-based reasoning: explain the problem to a rubber duck. 
 
===> as above, a few reasoning methods are applicable. (model-based reasoning, we will revisit later) 
 
 
## 
##  Data space  &  Hypothesis Space 
## 
 
Data space : a set of data (input) ranging from specific to abstract. e.g. Ashok has 39degree temperture. e.g. Ashok running a fever. 
 
Hypothesis space: a set of hypotheses each explaining a part (or a whole) of the data space. e.g. a diabetes. e.g. patient has a flu. 
 
===> obviously mapping can be a complex reasoning task. size of DS/HS, say N * M 
===> and 2^M possible combinations. 
 
===> better not deal with all the raw data in their raw form, but abstract them. e.g. Ashok has a 39degree fever -> Ashok has a high fever. 
===> then refine hypotheses. 
===> abstract DS, refine HS 
 
recall "classification" we had (1) bottom-up and (2) top-down 
- bottom-up: we had raw data, and abstracted them to a higher node 
- top-down: we had a high level class which we established/refined into concrete branches. 
 
===> notice the similarity. 
 
(4) heueistic classification: bottom-up classification in DS, mapping into HS, then top-down classification of HS 
 
 
## 
##  problems with diagnosis as classification 
## 
(1) multi hypo match a single data choice 
(2) one hypo matches multiple data choices 
(3) multi hypo match multi data choices 
 
===> mapping is bi-directional. say d4 matchs an expection of h6 which also contains other expected ds, then agent may collect the info. 
 
(4) mutually exclusive hypotheses. (e.g. d1,d2,d3 mapped to h1, d4,d5,d6 to h2. but h1/2 are mutually exclusive.) 
(5) cancellation of interacting data points. (e.g. d3 and d5 cancel each other out) 
 
====> as above, classification-only diagnosis can be complex and limited in its strength. 
====> use "abduction" based diagnosis 
 
 
## 
##   deduction - induction - abduction 
## 
- fundamental forms of inference. 
 
cause ----(rule)----> effect 
 
e.g. 
 
cause: flu 
effect: high fever 
rule: if flu, high fever 
 
cause: joe enters 
effect: bob leaves 
rule: bob hates joe 
 
 
Deduction: given the rule & the cause, deduce the effect.  (truth preserving. if rule:true && cause:true then definitively effect:true) 
Induction: given a cause and an effect, induce a rule. 
Abduction: given a rule and an effect, abduce a cause. 
 
====> diagnosis as abduction 
====> induction & abduction are NOT truth preserving. i.e. we cannot guarantee the correctness of our conclusion. 
      (e.g. rule:flu causes fever, effect:Ashok has fever, does not guarantee cause:Ashok has flue. cos it can be other reason. 
      (e.g. bob leaves whenever joe arrives, it could be just they are on a shift.) 
 
 
===> notice how science employes all 3 forms of reasoning. 
 
 
## 
##  diagnosis as abduction to choose hypotheses 
## 
 
criteria: 
(1) explanatory coverage: hypotheses must cover as much of the data as possible. 
(2) principle of parsimony: the smallest number of hypotheses ought to be used. 
(3) some hypo are more likely than others. (just heuristic, or just confidence provided off of some other info) 
 
===> balancing out (1)(2)(3) 
===> notice the criteria is not just for diagnosis, but for choosing explanation(hypothesis) in AI in general. 
 
 
example diagnosis problem. 
(video) https://www.youtube.com/watch?v=asWcZv98Ahc 
 
====> notice how an AI agent alternates between diff reasoning methods (CBR, PS, model-based, etc) 
====> how does it decide which method to use?  (will be discussed in "meta-reasoning" lecture) 
 
(input data=percepts)   (equidistance classes)     (output=action) 
      DS ------(mapping)-------> HS -------------------> TS(treatment space) 
(abstraction)                 (refine) 
 
====> see the similarity to "classification" 
 
 
## 
##   Assignment: how would you use Diagnosis to design an agent that could solve RPM? 
## 
the agent can investigate when it made an inccrrect answer. 
(1) what data will the agent use to investigate its incorrect answer? 
(2) what hypotheses might the agent have for incorrect answers? 
(3) how will it select a hypo that best explains that data? 
(4) once the hypo is picked, how will the agent use that to repair its reasoning, to avoid making the same incorrect answers? 
 
 
 
## 
##  Diagnosis and human cognition 
## 
- a common cognitive task. whenever a discrepancy between the expected and observed behaviors of the sytem(device, code, etc), diagnosis begins. 
- several reasoning methods to address the task. 
 
 
 
 
##################################################### 
######     Learning by Correcting Mistakes     ###### 
##################################################### 
 
- how does an agent learn from a mistake? how does it detect a mistake? how does it then change its reasoning method? 
 
 
       [meta-cognition] (meta-reasoning) tree 
 
       Learning by Correcting Mistakes 
                   || 
             Meta-Reasoning 
                   || 
              Ethics in AI 
 
 
## preview 
- explanation-based learning revisited 
- isolating mistakes (similar to diagnosis) 
- explaining mistakes 
- correcting mistakes 
 
## 
##  LCM example:  identifying a cup 
## 
 
a cup definition: an object that is stable and enables drinking 
 
an object description: light and made of prcelain, has a decoration, a concavity, and a handle(liftable). the bottom is flat. 
 
===> the agent might pick that object as a cup. 
 
but then, the agent may not pick up a cup that does not have a handle.(identifying a true example as negative) 
or the agent may pick a bucket when it is not a cup. (false positive) 
 
===> both of these are mistakes.  (we assume the agent is given some feedback of "positive" or "negative") 
 
 
## 
##  questions for learning from mistakes 
## 
(1) how can the agent isolate the error in its model?  (e.g. model of a cup)  similar to diagnosis 
-----> called "credit(blame) assignment" look for certain rules or criteria it used(= must be gaps/faults in the knowledge/reasoning/architecture), that led to the error. 
(2) how can the agent explain the problem that led to the error? 
(3) how can the agent repair the model to prevent the rror from recurring? 
 
===>  note this is a combination of ICL + explanation based learning 
===>  "credit assignment" is the most important learning feature in AI. (as per Marvin Minsky) 
      because the agent must have some self-diagnosis/repair ability instead of human always fixing/updating its knowledge/reasoning, etc 
 
 
## 
##   error detection: 
## 
- errors can occur in any of (1) knowledge (2) reasoning (3) architecture 
 
==> here we focus on (1) knowledge, especially "classification knowledge" 
 
as the agent goes thru examples, there are both "positive" and "negative" examples (of a cup), which the agent may (correctly or incorrectly) identified as true(like as a cup). 
each example has features. (e.g. has a fixed handle, or movable handle, or concavity, red color, some logo, etc) 
 
==> categorize into three sets: (1) features ONLY in positive examples (2) features ONLY in negative examples (3) features seen in both examples. 
==> we will focus on (2), we call them "false-success suspicious" features, an example which the agent identified as a cup but it was not a cup.  some of those features are responsible for the agent choosing positive examples as negative. 
==> obvious next question is which ones of those false suspicious features should we deal with as the culprit? 
==> one approach is ICL style where agent examines one by one, test it and see if it gets false/positive feedback or not. 
==> the other approach is process more and more examples, so maybe we just refine the (1)(2)(3) sets, eliminating more false suspicous features. 
 
==> now, the same discussion applies reversely to (1) as "true suspicious" features, some of which must be responsible for the agent choosing positive examples as negative. (as in incorrectly identifying a cup as not a cup) 
==> we will look at this more rigorously below. 
 
 
## 
##   error detection algorithm: 
## 
 
(p.385 in MIT Winston book) 
http://courses.csail.mit.edu/6.034f/ai3/rest.pdf 
 
lets write what we discussed above in a more formal logic: 
 
"false-success" : an example where an agent identified as 
 
To find suspicious false-success relations: 
   intersect all false successes( =  /\F )    ## collect all features present in "any" false-success == "or" type of accumulation 
   union all true successes ( = \/T )         ## collect all features present in "every" true-success == "and" type of accumulation 
   remove all assetions in union from intersection ( = /\F - \/T)     ## we get features only present in some false success 
 
 
(similarly, for true-success case, just reverse) 
 
To find suspicious true-success relations: 
   intersect all false successes( = /\T ) 
   union all false successes( = \/F ) 
   remove assertions in union from intersection ( = /\T - \/F ) 
 
===> obviously, this needs to be based on many enough examples for the meaningful suspicious relations to be extracted. 
===> in other words, if there are many suspicous relations, we may need many examples to precisely determine the exact responsible ones. 
 
 
## 
##   "explanation" and LCM 
## 
 
as a result of LCM, if rule-based approach, rules will be added/deleted.  (this is much like ICL) 
 
e.g. 
 
IF:                      # this IF clause will evolve(or explode) 
  object has bottom 
  bottom is flat 
  object has concavity 
  object is lightweight 
  object has a handle 
THEN: 
  object is a cup 
 
 
===> suppose a new rule is added "handle is fixed"  but why is that important? 
===> requires an explanation. (any AI can do classification, but "explanation" is only for KBAI. ) 
 
 
recall our Explanation tree from EBL 
 
                                                   object is cup 
                                                          || 
                               =========================================================== 
                               ||                                                       || 
                     object enables drinking                                     object is stable 
                               ||                                                       || 
             ==========================================               object has bottom && bottom is flat 
             ||                                      || 
    Object carries liquids                  Object is liftable 
             ||                                      || 
    Object has concavity             object is light && object has handle 
 
 
===> now lets say, after the error detection and the isolation of false suspicous relation, we determined that we need to add "handle is fixed" relation into this explanation tree. 
===> Where should we add? 
===> above "Object is liftable" because object is liftable even with movable handles. 
===> below "object" enables drinking because without fixed handle, it does not enable drinking. 
 
maybe like this; 
 
 
                     object enables drinking 
                               || 
             =========================================== 
             ||                ||                     || 
 Object carries liquids    handle is fied       Object is liftable 
             ||                                       || 
 Object has concavity                    object is light && object has handle 
 
 
=====> or the agent might reorganize the tree more drastically if it comes up with a richer explanation all together. 
=====> as such, "explanation" is as important as "classification" 
 
(video) https://www.youtube.com/watch?v=OTs0mRux4LM 
 
 
## 
##   connection to ICL 
## 
 
in ICL, we studied how the agent "learns" but not how the learnt knowledge is "used." 
KBAI agent looks at reasoning, and decides how knowledge is gonna be used, and then decide what knowledge is to be learnt. that sets a target for learning. 
 
it's a feedback loop. 
 
input --->  cognitive system (reasoning+learning+memory) --->  output 
 /\                                                              | 
 |                                                               \/ 
 ----------------------------------------------------------------- 
 
 
===> we only looked at mistakes in "knowledge classification" but what about mistakes in "reasoning" "architecture" ? 
===> what about "gaps" instead of "mistakes" where we dont know about something instead of making mistakes? 
===> relates to meta-cognition. to be revisited in later lecture. 
 
 
===>  unlike ICL, explanation based learning, we let the agent get feedback and correct its mistakes. 
 
## 
##   Assignment:  how would you use LCM to design an agent that could sovel RPM? 
## 
- to know it made a mistake, we can feed the agent the answer once it decides its answer. 
- how does your agent isolate the root cause of the mistake? 
- what is isolated exactly?  relative weight table values? or the way transfomation is applied from a figure to a figure? 
- how will it explain the mistake?  how will the explanation be used to repair the mistake? 
- will the agent correct its mistake by itself?   OR  will you as the author of the agent use the output to correct the mistake in agent's reasoning ? (this is a good point, but if you do that, then who has intelligence? you ? or your agent?) 
 
 
## 
##   LCM and human cognition 
## 
- fundamental in human cognition. this is how we learn. 
- human is an active learning agent. 
- human use knowledge/reasoning, then generate expectations, which when violated, generate explanations, and correct knowledge/reasoning. 
- correcting knowledge/reasoning is "thinking about thinking" which is meta-reasoning, which is the topic of next lecture. 
 
 
 
 
####################################### 
######       Meta-Reasoning      ###### 
####################################### 
 
- thinking about thinking. 
- knowledge about knowledge. 
 
 
reacall [meta-cognition] (meta-reasoning) tree 
 
       Learning by Correcting Mistakes 
                   || 
             Meta-Reasoning 
                   || 
              Ethics in AI 
 
 
 
# preview 
- mistakes in knowledge, reasoning, and learning. (we only focused on knowledge in LCM) 
- "gaps" = knowledge/reasoning missing 
- strategy selection and integration  (we talked about many methods in this course, but how does an agent decide/combine which ones?) 
- meta-meta-reasoning ? 
- goal-based autonomy 
 
 
## 
##   mistakes in reasoning and learning 
## 
 
cognitive architecture = reasoning + learning + memory(knowledge) 
 
recall G&T/MEA/Planning on "move blocks Hanoi tower" example. 
====> this was meta-cognition over "reasoning" strategy. 
 
recall in prev lecture LCM, we talked about how the agent can modify explanation tree (we added "handle is fixed" as a subtree to "object enables drinking") 
====> this was meta-cognition over "learning" 
 
 
## 
##   "gaps" in knowledge 
## 
 
- mistakes: error/incorrectness in knoeledge/reasoning/learning 
- gaps    : knowledge/reasoning/learning itself missing 
 
==> if an agent detects a "gap" then it spwans a "goal" to fill the gap, it may go search its memory or go inquire in the external world (may ping teacher for info). 
==> another aspect of meta-cognition 
 
e.g.  Move block hannoi tower 
if MEA gets stuck, then the agent sets a goal to break the stuck situation, maybe the agent then uses PR and use MEA to avhieve each sub problem resolution. 
 
 
## 
##   meta-cognition and cognition 
## 
- there is only a thin line. 
 
 
recall    (Cognitive system) 
 
            metacognition 
              |||||| 
input ==>   deliberation   ==>  output 
              |||||| 
             reaction 
 
deliberation = (reasoning + learning + memory) 
 
 
but as we've seen so far, metacognition is part of deliberation sometimes. the line is a blurr. 
 
 
## 
##  strategy selection 
## 
- so far we've covered many reasoning methods (G&T, MEA/PR, Logic/Planning, Constraint Propagation, Analogical Reasoning, CBR, etc). 
- meta-cognition picks which method to use. (hence, reasoning about reasoning) 
 
(criteria) 
(1) exactly what knowledge is available? (cases? then go use CBR. constraints? then go use Constraint Propagation, and so on) 
(2) computational efficiency (if closed domain prob, then G&T may be suitable. if the current prob is close to past probs, then CBR. if a definitive single goal state, then MEA could be efficient, so on) 
(3) quality of solution (sometimes computational efficiency is not important but the quality of solution is. some methods are approximation, other methods produce rigorously correct solutions, like Logic) 
 
===> similar discussion applies for selecting learning methods (EBL, LRC, ICL, VS, Classification, etc) 
(criteria) 
- applicability (depending on the nature of the problem) 
- the way problems come (incremental ? then go use ICL. if all at once? then maybe use identification/decision tree from VS) 
- computational efficiency 
- quality of solution 
 
 
## 
##   strategy integration 
## 
- it's about adapting a chosen strategy into the current problem. we already demostrated a lot of this in RPM solver agent. 
- also the KBAI agent may shift between different methods. e.g. move-block hannoi towner, agent may shift between MEA-PR 
 
 
## 
##  process of meta-cognition 
## 
- what is the process of meta-cognition?  well it can be any of the underlying reasoning methods. it's just meta-abstract layered approach to reasoning/learning. 
 
 
## 
##  meta-meta-cognition 
## 
- not possible (at least in our understanding. because the contents of meta-cognition is nothing but all the methods it reasons/learns over.) 
- i.e. MEA, Planning, CBR, etc. 
- so it's only self-referential loop. 
 
    -   ## we may call this meta-meta-cognition but it does not go any higher. 
   | | 
meta-cognition 
   || 
deliberation 
 
 
## 
##   goal-based autonomy 
## 
- a good example of meta-meta-cognition 
- meta-cognition provides flexibility and robustness 
 
e.g. 
- a robot designed to assemble cameras is given a task of de-assembling a camera 
- it reasons over what reasoning methods to use, then achieves the goal. 
- but then gets the feedback "it took so long, next time do it faster, now here is one more camera to de-assemble." 
- then the robot must reason over the reasoning method it used to select the reasoning method for de-assembling last time. 
- this is meta-meta-reasoning. 
 
 
## 
##  Meta-cognition and earlier topics 
## 
- many topics already use meta-cognition. 
e.g. LCM reflecting on its own knowledge, and correcting it. EBL also. 
e.g. Planning thinks about its own plans, and resolve conflicsts 
e.g. PS chunking is it is thinking about its rules via memory to resolve an impasse, to set up a new rule. 
e.g. VS considers conversion of specific/general models 
e.g. diagnosis abstractionof data space, mapping to hypothesis space, then refine in treatment space, then feedback into data space, which is essentially diagnosis on diagnosis. 
 
 
## 
##  meta-cognition in this KBAI course 
## 
- meta-reasoning has been a motivating pedagogical goal for this course. 
- in each lecture, we start off by an example (move block, baseball, defining/identify a cup, etc) and first look at how human does it, then think about how we let the agent do the same rasoning, etc. that is already meta-cognition, how we think about how to let the agent think. 
 
 
## 
##   Assignment: how would you use Meta-Cognition to design an agent that could solve RPM? 
## 
- discuss how you design your agent to meta-reason. what method(s) does your agent choose?  how does it select strategy given a new prob? 
- how does meta-reasoning improve the performance ? 
- any meta-meta-reasoning? and answer the above points. 
 
 
## 
##  Meta-cognition and human cognition 
## 
- a critical process of human cognition 
- some research suggest developing meta-cognition at early age leads to success 
- connected to creativity. (agent thinks about its existing knowledge/reasoning, then works on a new goal, abandans a goal, suspends a goal, that is creativity.) 
 
 
 
########################################## 
#####    Advanced Topics in KBAI     ##### 
########################################## 
 
### 
###   Visospatial Reasoning 
### 
 
Viso    = visual = what 
Spatial = space  = where 
 
recall [Visospatial Reasoning tree] 
 
          Constraint Propagation 
                    || 
          Visualspatial Reasoning 
 
we did CP+VR to do line-labeling in 2D images. 
 
 

#  definition of visuospatial knowledge: 

 
visuospatial knowledge:  knowledge wherein causality is, at most, "implicit". 
 
e.g. a glass cup is lying down, and you see a puddle of water around it. then we human can easily infer the water spilled out of the glass. 
===> an example of implicit causality that we infer from visuospatial knowledge. 
 
 
## 
##  two views of reasoning 
## 
 
(1) propositional representation 
given a triagnle shape facing the left, and another faicing the right, just like we did in RPM project, we may extract propositional(text) representation of those figures, thru image processing, image segmentation, or CP. 
 
===> separated from perceptual modality. 
 
(2) analogical representation (NOT analogical reasoning) 
it is a structural correspondence between representation and the external figure. 
 
===> modal representations, related to the perceptual modality(= visual image in this case). 
 
 
===> human cognition, mental imagery, appears to use analogical representations (perceptual modality of visual images). 
===> human is good at using both (1)(2) 
===> but computers are only good at (1) 
 
e.g. you hear a music melody and get reminded of other music melody 
===> did you extract propositional representation of the melody, tone, rythm, etc and compared to the pro-rep of the other melody? 
===> likely no, you somehow directly mapped between them, using perceptual modality of the melody. 
===> an open issue in KBAI, and cognitive science. 
 
 
## 
##  symbol grounding problem 
## 
 
 
                       |   visuospatial               |     Verbal 
-------------------------------------------------------------------------------------- 
"content"              |   appearance                 |   Arbitrary 
(knowledge/reasoning)  | (what & where)               | (Driven by Inferential Needs) 
-------------------------------------------------------------------------------------- 
"form/encoding"        |   analogical                 |  propositional 
(representation)       | (structural correspondence)  | (no correspondence) 
-------------------------------------------------------------------------------------- 
 
====> so there are 2×2=4 combinations of content&form, and so far in this course, we only looked at verbal & propositional combo. 
====> in fact, the left column is not yet known well for KBAI as well as human cognition 
 
 
## 
##  visuospatial reasoning example 
## 
 
Davies, J., & Goel, A. K. (2001, August). Visual analogy in problem solving. In IJCAI (pp. 377-384). 
 
===> "dunker radiation problem"  : recall the rebel army overthrowing a bad king and a tumor laser gun story. 
===> traditional AI solves this via propositional representation using abstraction over goal state/resources and causal pattern(= decompose resource and bring them to the target from diff angles/direction/path, etc). 
 
===> "Galatia" agent transfers the solution procedure without any causal pattern extraction but thru only visuospatial structural mapping. 
e.g.  left-road to the castle = left-body to the tumor 
      army = laser beam 
 
## 
##  visuospatial reasoning on RPM 
## 
 
Kunda, M., McGreggor, K., & Goel, A. (2010, August). Taking a look (literally!) at the Raven's intelligence test: Two visual solution strategies. In Proc. 32nd Annual Meeting of the Cognitive Science Society, Portland. 
 
===> until this paper, all RPM solver was using "propositional rep" but now some user visuospatial reasoning (analogical representation only). 
===> agent called "asti" solved 50/60 
===> later another kind of analogical representation called "fractal rep" is proposed by Keath. 
 
(link) 
- Reasoning on the Raven's Advanced Progressive Matrices Test with Iconic Visual Representations 
http://mindmodeling.org/cogsci2012/papers/0321/paper0321.pdf 
- Confident Reasoning on Raven's Progressive Matrices Tests (fractal representation) 
http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/viewFile/8518/8445 
- Addressing the Raven's Progressive Matrices Test of General Intelligence 
http://www.cc.gatech.edu/~bmcgregg/AddressingRavens.pdf 
- Taking a Look (Literally!) at the Raven's Intelligence Test: Two Visual Solution Strategies 
http://csjarchive.cogsci.rpi.edu/proceedings/2010/papers/0431/paper0431.pdf 
- Fractal Analogies: Preliminary Results from the Raven's Test of Intelligence 
http://iccc11.cua.uam.mx/proceedings/the_cybernetic/mcgreggor_iccc11.pdf 
 
 
## 
##   Systems thinking 
## 
- AI agents reason about external world(which consists of systems of many kinds. each system has heterogeneous interacting components.) 
- interaction btw components lead to processes of many kinds, at different levels of abstraction. 
 
e.g.  an ecosystem 
- physical layer 
- chemical layer 
- biological layer 
(some of which are invisible but they all interact with each other) 
 
e.g.  business 
- manufacturing 
- marketing 
- delivery 
etc 
(each unit composed of organization-team-individual) 
 
Systems Thinking: reasoning about systems with numerous components and processes at multiple, potentially invisible, levels of abstraction. 
(focus being on the complex behavior of a system, and derive invisible/less obvious processes that affect visible processes) 
 
 
## 
##  Systems Thinking and earlier topics 
## 
 
frames, scripts, diagnosis (definitely interactions between components, processes) 
 
e.g. population of bees dropped significantly. cause is suspected as some chemicals used for plants. we cant really see the process of certain chemicals affecting bees (unless we do lab experiments) so that is an invisible process which we can infer. 
 
 
## 
##  models of structure-behavior-function (SBF model) 
## 
 
light bulb 
 
# function model : create light (external force on switch, light state:off->on) 
# structure model: switch, bulb, battery, etc 
# behavior model : series of states and transition (each state comtaining states of electricity and light, each transition comtaining functions/structure) 
 
====> illustrates some invisible processes, enables systems thinking. 
 
 
## 
##    Design thinking 
## 
 
Design thinking: reasoning about ill-defined, unconstrained, open problems in the world. 
===> notice in DT, as solutions evolve, problems evolve also. 
 
(configuration was a kind of a routine design task, but all components/parameters are already known) 
 
e.g. 
your agent knows a circuit design for using a 1.5 voltage battery to get 10 lumen brightness, but when asked to get 20 lumen brightness, it has to understand the correlation between voltage and brightness and understand it needs 3.0 voltage. what if 3.0 voltage battery is not available? then maybe a rule or external input that tells connecting 2 1.5 voltage batteries in sequence will give 3.0 and so on. 
 
IDOL: a design AI agent that can transfer analogical rep of a model to adapt to a new prob. 
 
 
## 
##   Creativity 
## 
- defining creativity is hard. just like defining intelligence. 
- but we human are all creative. (assumption we grant.) 
 
what is creativity? 
- novel (= new) 
- valuable 
- unexpected (= non-obvious) 
 
processes of creativity (example) 
- analogical reasoning 
- emergence 
(draw three lines, then a triangle emerges) 
- re-representation 
(original rep is not effective, then we re-rep. e.g. for solar system. "plant revolves around the sun" -> atom model "electron rotates around nucleus") 
- serendipity 
(you cannot come up with a solution to a problem, but later after solving other problems, you connect the solution to that unsolved problem. ) 
- conceptual combination 
 
 
=========> still "definition" is hard. because what is "novel"? AI agents' outputs are all (by design/definition) expected patterns of algorithms. or should we treat something as novel when different combinations of algo yield unseen patterns? 
there are different angles for looking at this. individual v.s. social. personal v.s. historical. etc. 
 
 
 
## 
##  AI and ethics 
## 
- robots may replace human jobs. people may lose jobs, may further the divide between the rich and the poor. ethical? 
- robot soldiers. do we embed morality, creativity?  what does that tell us about our own morality? 
- robots become indistinguishable from human? civil rights for the robots? 
- those are still in sci-fi book world, but coming closer to the reality every year. 
 
 
 
############################## 
#####     wrap-up       ###### 
############################## 
 
# principles of KBAI revisited 
# KBAI research topics 
 
 
recall    (Cognitive architecture) 
 
                      metacognition 
                         |||||| 
world --> input ==>   deliberation   ==>  output  --> world (which consists of physical/social elements and other cognitive systems) 
                         |||||| 
                        reaction 
 
deliberation = reasoning + learning + memory  (see & think) 
reaction = see & act (directly maps input/percepts to output/action) 
 
## 
##  (recall) 7 principles of the course 
## 
- KBAI agents represent and organize knowledge into structures to guide and support reasoning. 
(SN, Frames, explanation tree) 
 
- learning in KBAI agents is often incremental. 
(diff from other general AI in machine learning, in fact human exp is incremental) 
 
- reasoning in KBAI agents can be both top-down and bottom-up. 
(scripts, constraint propagations, etc) 
 
- KBAI agents match methods to tasks. 
(methods = G&T, MEA/PR, PS, CBR, Planning, AR) (tasks = config, diagnosis, design, meta-reasoning, creativity, classification, systems thinking) 
 
- KBAI agents use heuristics to find solutions good enough, not most optimal. 
(another diff in focus from other general AI. this is just like human. can be a powerful principle) 
 
- agents make use of recurring patterns in problems when solving. 
(e.g. LRC, CBR, AR cross-domain, configuration) 
 
- KBAI agents enable reasoning, learning, memory to support/constrain each other. 
(knowledge glue reasoning/learning/memory e.g. PS chunking, logic, explanation based reasoning/learning, LCM, constraints propagation) 
memory stores knowledge 
reasoning uses knowledge 
learning adds knowledge 
 
 
## 
##   research topics 
## 
 
CALO: cognitive assistant that leans and organizes knowledge (precursor to Siri) 
 
Cyc and OMCS: knowledge bases of every day common sense knowledge (omcs = open mind common sense) 
 
Wolfram Alpha: a search engine that leverages KBAI principles. 
 
VITA: a computational model of visual thinking in autism, based on RPM. (only using visuospatial reasoning) 
 
Dramatis: a computational model of suspense and drama in stories. 
 
DANE: support for design based on analogies to natural systems. 
 
 
 
 

  1. 2014-08-26 22:43:11 |
  2. Category : gatech
  3. Page View:

Google Ads