John Salvatier

The “I Already Get It” Slide

Followup to: Words as Mental Paintbrush Handles, Guessing The Teacher’s Password

Jessica Taylor recently wrote a description of Paul Christiano’s and MIRI’s differing driving intuitions for thinking about the AI alignment problem. Jacob Steinhardt observes that the “do cognitive reductions” intuition seems to be at the heart of MIRI’s thought and the “search for solutions and fundamental obstructions” intuition at the heart of Paul’s thought.

As I read his comment, I noticed myself make an error I’ve made before: thinking I get the intuitions by mere virtue of not thinking they’re crazy.

I call this The “I Already Get It” Slide, and I suspect this error happens to people all the time but passes unnoticed.

This is unfortunate because the error prevents you from actually absorbing other’s intutions, and absorbing other’s intuitions is important for doing anything hard.

Jessica describes Search For Solutions And Fundamental Obstructions like this:

Almost all technical problems are either tractable to solve or are intractable/impossible for a good reason. […]

If the previous intuition is true, we should Search For Solutions And Fundamental Obstructions. If there is either a solution or a fundamental obstruction to a problem, then an obvious way to make progress on the problem is to alternate between generating obvious solutions and finding good reasons why a class of solutions (or all solutions) won’t work. In the case of AI alignment, we should try getting a very good solution (e.g. one that allows the aligned AI to be competitive with unprincipled AI systems such as ones based on deep learning by exploiting the same techniques) until we have a fundamental obstruction to this. Such a fundamental obstruction would tell us which relaxations to the “full problem” we should consider, and be useful for convincing others that coordination is required to ensure that aligned AI can prevail even if it is not competitive with unaligned AI.

As I thought about Paul’s Search For Solutions And Fundamental Obstructions intuition, a justification easily came to mind — a non-verbal feeling that it looked like other well-accepted problem solving strategies.

This justification was easy, familiar and wrong.

There is no way that “it looks like other accepted strategies” is actually the reason Paul thinks finding fundamental obstructions is central.

And yet it was very easy for me to mentally slide from getting the conclusion and not immediately thinking it’s crazy, into thinking I also got the intuitive argument that generated it.

If I had to guess at Paul’s actual intuitive reasons, I would guess something like this

In Computer Science Theory, whenever there have been these kind of hard and confusing problems and people have tried to solve them, they’ve always turned out to either be possible or have some very revealing fundamental problem. For example, here are 4 clear examples. Furthermore, this makes intuitive sense because X. Also, this is also the case in these 3 other fields. And AI alignment looks a lot like these fields because it has Y and Z in common.“

But I also bet that not only will Paul have a more detailed argument, but also he will use a different ontology in a way that makes the argument meaningfully different. The argument is not yet compelling to me.

Now, perhaps his arguments sound weak or just boring to you. How could a useful intuition be consistent with weak sounding arguments?

To answer, put yourself in Paul’s shoes, and ask yourself what could explain weak or boring sounding arguments?

Maybe you have a strong but difficult to articulate intution – maybe a mental picture of how different parts of the research process move against each other.

Or maybe you can articulate your intuition, but when you do people quickly offer counterarguments that are — sigh — totally off topic. They nod along as if understanding, but then go right back to what they were doing before.

You can probably imagine your conclusion being wrong, but not your insight being irrelevant.

If Paul is at least as sensible as you are and his arguments sound weak or boring, you probably haven’t grokked his real internal reasons. Your intuitive mental picture of how parts of the research process moves is shaped differently than his. Maybe you’re even using different piece.

If so, then it is not surprising that you come to different conclusions. You don’t even have the machinery to come to his conclusion.

Maybe instead you think that getting his intuitive reasons from him doesn’t matter. After all, now that I know what Search For Solutions And Fundamental Obstructions means, I can just check that it should be a central strategy myself. But without an intuitive model of why it should be a central strategy, to check I would probably have to do computer science theory for at least a few months.

Without my own intuitive model pulled from Paul’s intuitive model, there’s little to distinguish Search For Solutions And Fundamental Obstructions from a near-infinite variety of nearby strategies like “search for solutions and obstructions on complexity problems” or “search directly for fundamental obstructions”. Intuitive models let us cut down our uncertainty in great swaths by concentrating our probability on simple hypotheses.

With my own intuitive model, checking often just requires seeing a few well chosen examples, or even just thinking back on past problems.

All this is to say that Paul almost certainly has a valuable intuitive reason for his position. If I don’t catch my slide from understanding the conclusion to thinking I understand the argument, I’ll never notice that there’s something more to absorb.

There’s a world of difference between understanding what Search For Solutions And Fundamental Obstructions means, and understanding the intuition that generates it. A difference, in other words, between understanding the conclusion and understanding the argument for it.

If you mistake the conclusion for the argument, you will never get the argument.

This reasoning doesn’t just apply to Paul and his intuitions, it applies to anyone who you think is about as reasonable as you. If they avoid errors about as well as you, then it would be silly to think that their intutions don’t point to real insight about the world.

This also applies nicely to MIRI’s intuition that doing Cognitive Reductions is the main thing that can push AI alignment research ahead. Jessica describes Do Cognitive Reductions like this:

Cognitive Reductions are great. When we feel confused about something, there is often a way out of this confusion, by figuring out which algorithm would have generated that confusion. Often, this works even when the original problem seemed “messy” or “subjective”; something that looks messy can have simple principles behind it that haven’t been discovered yet.

Again, it is tempting to gloss over the fact that cognitive reductions are useful but not central, since we do already agree to some extent.

But consider: if I were in their position, what kind of intuitions would actually lead me to think that Cognitive Reduction is so central? It couldn’t be just a stronger version of the belief that I already have, that would just make me think its somewhat more useful, rather than something to base my whole strategy around. Only a new argument could make sense of that.

If I go argue with MIRI without noticing that there’s an argument I’m missing, we’ll just go around in circles.

I suspect that The “I Already Get It” Slide happens all the time and passes unnoticed. That people mistake a person’s conclusions with their intuitive reasons and don’t end up absorbing their real arguments, even when they have insight. That would explain why peoples opinions converge so slowly.