Tom Lieber's Microblog

alltom.com, Archive, Currently Reading, Have Read, Micro.blog
Subscribe with RSS or @tom@micro.alltom.com

Jul 18, 2020

I used to underestimate the importance of context in machine perception, but then I read (and re-read and re-read) Fluid Concepts and Creative Analogies. It’s not just that chihuahuas look like muffins up close, or that photos sometimes have smaller photos inside them. The crux of transfer learning and generalization is that what you perceive is dependent on the task you’re solving. Douglas Hofstadter’s book is littered with analogy puzzles to drive that point home.

Consider his lab’s alphabet-based microdomain, wherein no worldly knowledge applies except the existence of letters, their predecessor/successor relationships, grouping, and counting. Here’s one example problem:

abc → abd; ijk → ?

It has an obvious solution: ijl. The rule seems to be “replace the final letter by its successor.”

abc → abd; opq → opr
abc → abd; ghi → ghj
abc → abd; tuv → tuw

You could do this all day! And things might be fine until you mechanically answer:

abc → abd; yxw → yxx(?)

Wait… this isn’t the same problem, is it? It’s like someone accidentally recorded a video upside down, or transposed the middle letters of a word. Those are problems in other domains where a naïve model produces odd results, yet more satisfying solutions come to mind almost immediately to a human, as they do for this analogy problem:

abc → abd; yxw → zxw (replace the “highest” letter by its successor?)
abc → abd; yxw → yxv (take the final letter one step further?)

What’s happened is, seeing a descending sequence of letters made the fact that “abc” is an ascending sequence (+1, +1) far more important than it was before. The task changes what we perceive in the data.

Typical machine learning responses might be:

That’s clearly out-of-domain. Put a bunch of examples like that in the dataset and retrain.
The model needs to be robust to that transformation. Switch to an architecture where it’s invariant, or augment the dataset by applying that transformation to existing examples.

But consider this:

abc → abd; mrrjjj → ?

This is the letter analogy equivalent of taking a photo of a mirror. To a self-driving car, it’s like a pedestrian walking down the sidewalk with a stop sign. To a language model, it’s an essay written in Igpay Atinlay. Again, as a human, the mechanical solution that seemed sufficient for the easier problems doesn’t feel right here, and neither do any of the later iterations… yet you can find one that does with a bit of thought. And again, finding a satisfying solution requires changing the way you see the original data.

(I’m not going to provide a solution to that last problem because it’s one of my favorites. It’s discussed at length in the book and probably online, if you care to cheat. And if you’re happy with “mrrjjk” as an answer, then I have a bulletproof image classifier to sell you…)

The “classical AI” ideas coming from Hofstadter’s lab are old and, as far as I can tell, have never been successfully applied to problems as open-ended as those that we use gigantic neural networks to “solve” today in computer vision, language, etc. I spend so much time with them, though, because the way we’re going, the alternative is to spend the rest of my life locating areas of poor model accuracy, adding corresponding points to my datasets, and making models bigger and bigger and bigger… and this is more promising, potentially more rewarding, and just more fun!