Harry's Tracker - An Attempt to Create a "bottom up" Learning Program -
Harry Baya - Sept 10, 2006, revised Sept 19
In a previous Blog entry ("New" Consciousnesses) I wrote some thoughts engendered by reading the book "Emergence". I put in this link to an excerpt from the book. That excerpt has given me an idea for a program that I think catches some of the "bottom up" intelligence discussed in the book.
This entry describes that program and my plans to write it. If you find this too technical or boring, just skip it.
I have studied the description of the Tracker program and I think I understand how it worked in a general sense. However, I cannot actually imagine how I would design a program to do what they describe. Apparently they start with 16,000 independent programs and then cull out some, and interbreed the others, to create the next generation. Then they repeat the process generation after generation. Whether they use half of each generation, or reduce to a handful each time, I do not know.
How they “mate” the programs so that the next generation has some of the characteristics of the two parents I don’t know either, though I can imagine modules that either were or were not included in the next generation and, perhaps, a probability factor to determine how often each module was used. I wonder if the part selected from each parent is random or automatic.
Even if I had enough information, or insight, to create a working version of something like the Tracker program, I certainly don’t have access to the computer horsepower it requires to run.
However, reading this did cause me to think up a fairly simple “learning” system that seems to catch the essence of Tracker concept. I am aware that my simple approximation may miss the breakthrough essence of the Tracker program, but, if that is so, I hope someone with a better understanding of the concepts involved will explain to me why that is so. Even if I missed the “big picture”, the approach I see intrigues me enough to want to try it.
The purpose of this essay is to outline the simple “learning” program I am going to attempt to write and to discuss the “learning” aspect of it. It is this ability to “learn”, to “evolve” a solution that so intrigues me.
The problem space in which the program will operate will, initially, be like a checkerboard, 12 rows by 12 columns. On the board will be a “path”. A path will consist of a set of marked squares that start in one square and end in another and conforming to the rules below.
Think of the path as starting on one end and going through the middle square to the other “end”. We will call one “end” the “ start_square” and the other the “ finish_square”. The path goes from the “ start_square” to the “ finish_square” in a series of “connected” squares.
Assume that the matrix below shows labeled squares.
Assume that the matrix below shows labeled squares.
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
Assume that 18 was a start_square. If so then the next square in the path could be 12 ,13,14,19,24,23, 22, or 17. Any other square would not meet the rules above.
Now assume that the second square in the path was 13. The square after that could be 12,7,8,9, or 14. Squares 17 and 19 would not be permitted because they would cause the start_square (18) to be connected to two squares, which violates rule 1. Assuming that Squares 13 is a “middle” square, then Square 18 would not be permitted because then 13 would violate rule (5) above.
Here is a path that follows these rules.

My idea is to create a program that will “evolve” to follow such a path reasonably efficiently. Though it may seem that the approach I am suggesting is really just a convoluted way of writing the obvious path following algorithm (i.e. search all “connected squares”, other than the previous square, to find which is the next square. ), I think the approach is more general than that and I hope to show why.
My approach is that we start with a program that begins with the location of the “ start_square ” and will have 144, or more, equally possible actions. Each action will “sniff” a square to see if it is the next square on the path. There are 144 squares in our 12 x 12 board, so each of the 144 actions would sniff one of those squares. However, the squares would be identified in relation to the current square (e.g. 1 up, 2 left ; 5 up, 4 right) etc.
Initially each “action/sniff” is equally likely and has a “ success_score” of, say 0. As each action (defined by sniffing a square at some particular remove from the current square) is taken an associated counter is increased by one. This “ action_counter” counts the number of times each action is tried.
If the “sniff” identifies a square that meets the criteria for “ next square”, this is considered a success and the success_score for that action is increased by 1. Each time an action/sniff occurs, the associated “ action_counter” is increased by one, whether or not the sniff was a success.
After a “next” square is identified in the path, it then becomes the “current” square and the program attempts to find the next square in the path after the “current” square. However, this time the actions are “ranked” based on the “ success_ratio” of each action. The “ success_ratio” of an action is the number of successes for that action divided by the number of times that action has been tried. I.e. the “ success_score” divided by the “ action_counter”.
The action with the highest ratio is tried first, followed by the next highest, and so forth. If one or more actions have the same rank, the next action is chosen at random from those actions. If a random choice fails, then a random choice is made among the remaining actions at that rank. When all the actions at one level are tried unsuccessfully, the action with the next highest success_ratio is chosen, and so forth.
This means that any action that has ever been successful will be tried before any action that has never been successful.
The idea is that we would let this program run from the start_square to the finish_square. The “_score” variables would be adjusted with each action chosen.
If we were to keep the “_scores” and run the program again from the first square, it would try all the actions that had ever been a success, somewhat weighted for the number of times each was a success. It seems very likely that it would find it’s way through the path, by finding the next connected square at each turn, much quicker than initially. In fact, since it has a record of every successful move it should go through the path fairly quickly.
At this point I would say that learning has occurred and that the preference scores reflect a path seeking algorithm.
Assuming we set this up right we would hope that the preference scores would, in effect, say something like “never bother to sniff a square that is not “connected” to the current square.
OK, we knew this was part of the answer from the start .. but we did not tell it that. Rather, we defined the criteria of success and let the process come up with its own weightings to achieve that goal.
Further, if the path we gave it happen to emphasize a particular shape, such a lot of right “curls” and no left “curls”, we would expect the weightings to reflect that. It seems to me that this process would arrive at a path seeking algorithm that was uniquely tuned to the path used in it’s evolution. We could say that we let the program “train” itself to find a particular kind of path.
Now expand this concept and assume that we have two different individuals making up paths for our program to work with and that we allow one program to run repeatedly through all the paths made by one individual and the second program to work repeatedly through all the paths from the other individual. If those two individuals had any particular personal quirks in their path design (e.g. let’s say one of them did a lot of straight lines in their path while the other did not), we would expect the weightings to be tuned to that individual. We would expect that using the preferences evolved with the paths made by one person would take longer to find the paths made by another person. The “learning” would be customized to the individual path maker.
From my perspective some sort of “learning” has occurred.
If I can get this far I would consider how much this approach would have to vary in order to deal with other situations. For example, could it deal with branches and dead-end paths where the “end” was not the “ Finish_square” being sought at the end of the path.
Further, I can imagine moving this approach into 3 dimensional space where “connected” meant anything with a side or corner “connected” to another cube. We could then give the programs a set of paths that were along any plane, even a tilted flat plane, and expect the process to find a set of preferences tuned to that plane.
We could even move into n-dimensional space, though defining “connected” could become challenging.
Though I’m clearly reaching here I can also imagine using other criteria for determining what constituted success and letting a solution “evolve”. For example, this approach could be used for developing a good strategy for playing the game “Battleship”.
I hope to write the program for evolving a set of preferences to follow a simple path on a 12 x 12 square board.
In my thoughts so far I cannot come up with an approach that would deal appropriately with branching and dead-ends. I can certainly see how to write a program that would deal with them, but I don’t see how to write a program that would “evolve” a solution to them in the same sense the program described above would “evolve” an approach to finding a connected paths that does not have branches and dead ends.
Comments? Suggestions? Please leave them on this blog or e-mail them to me at Harry[zzz]@baya.net (but leave out the [zzz])
**********************************************
Part 2 - September 23, 2006
I've been thinking about this a bit but seem to be somewhat stuck. I have not been able to think of anything much to do with it beyond the above. I did think through how to get the program to deal with dead-ends and branching, but it's very inefficient and is even more trial and error oriented than the original approach.
If we assume that the original approach of trying every possible square until the right one is found continues, with the assumption that the more successful moves will be tried before the less successful ones, I can just barely convince myself that the system is showing some kind of learning when it deals with a single path with no branching or dead-ends.
I thought through one way of adding the capacity of dealing with branching, but it's not very satisfying. I know how to write the program to find the path and deal with branching and dead-ends, but it would not be learning, it would just be using what I tell it to do. The approach I describe above has something like learning in it, though at times I see it as just a clumsy way of writing the program to deal with a single path.
My approach for dealing with branching and dead-ends does not seem to me to being much, if any, learning. Here's how it would work. Imagine that the path finding program has wandered down a connected sequence of squares and comes to an "end" square with no additional connected squares. In my original outline using a single path this would be the "finish_square" of the sequence. Now imagine that path can split every now and then and that a branch can lead to a dead-end that is not the "finish_square". We want the program to keep trying different branches till it finds the right one, i.e. the one that leads to the square we have identified as the "real" end.
So now the program get's to a square and tries every possible move/sniff, with the most successful ones first, and it does not find another connected square. It would try 23x23 possible moves. The only connected square is the one it just came from. This is an example of what I am calling a branch with a dead-end. Assuming that the full path does have a "good" end this means that somewhere back down the line the path branched and this is the wrong branch. So how do we set this up so that the program "learns" to go back down the path and look for other branches to try? Here's my first cut at an approach:
When the program reports that it has tried every possible move/sniff and has not found another connected square that has never been passed over, the criteria for choosing the next square would be changed. The previous criteria was to find a connected square that it had never passed over. There would now be a hierarchy of criteria and it would work like this:
step 1: Look for a connected square that has never been passed over. Use every possible move/sniff until one is found or none is found. If one is found, go there and repeat this approach. Because it would rank the choices, this could become somewhat efficient.
step 2: If none is found, look for a connected square that has been passed over only once. Use every possible move/sniff until one is found or none is found. If one is found, go there and repeat this process starting with step.
step 3: if none is found, look for a connected square that has been passed over only twice,
Etc.
This approach would eventually find the square marked as the "true" end. It might be the last branch it tries, but it will find it. However, consider how inefficient and ( the word that comes to mind is "brutal") brutal this approach is. Let's say it goes out to the end of branch and start's backtracking over the squares it used to get there. As it gets back to each prior square in the branch it will go through a thorough search for any connected square (i.e. any square connected to the one it's on) that has never been passed over. If there is one it will find it as efficiently as it does with a simple path. However, if there is no "branch" from that square, it will go through all 23x23 possible sniffs before starting to look for a connected square that has been passed over only once. If it finds none of those in its 23x23 sniffs, it will look for a connected square that has only been passed over twice, and so forth.
In a complex path with lots of branching this would be outrageously inefficient. It would sometimes take multiple cycles of 23x23 sniffs just to move on from one square to the prior square. It might have to backtrack over some branches a number of time and as the number of times it goes over them goes up it will start doing more and more complete sets of 23X23 sniffs. It would be good at finding branches, but it would horrible at dealing with squares where there was no branch.
To tell it, in code, how to backtrack seems to me to be cheating in the context of getting it to learn. There must be some way that it could learn to deal with backtracking (branching and dead-ends) efficiently, or at least significantly more efficiently than this approach. It's an interesting mental puzzle. One direction is the 16,000 programs and the mating of the more successful ones... but that is still outside my ability to design, and even further out of my ability to implement.
I will continue to seek some way for the program to "learn" to deal with branching, dead-ends, and efficient back-tracking.