Making sense of language is an NP-complete problem, which seems like something you might intuitively guess, but you can prove this by reducing the well known NP-complete problem of graph k-coloring to the anaphora agreement problem in linguistics.
This is not my proof, I’m simply restating and visualizing a proof from 1991 by Eric Sven Ristad I came across in my research. It doesn’t seem like it’s readily available on the web other than in the form of a scanned PDF on Princeton University’s website. I’ll discuss it below, albeit with less rigor to simplify.
Background
A subtask of language comprehension is the anaphora problem. Because it is difficult to define, what the “correct” antecedent is (since that relies on the intentions of the producer), the proof utilizes anaphoras where the solution would introduce no new information.
In the anaphora agreement model, the himself in: \text{Chris liked himself.}
gender
for Chris is masculine
since himself is masculine
.
An example of an impossible interpretation is:
\text{The student prepared her breakfast after doing his homework.} masculine
and feminine
for the agreement feature gender
.There are many other types of agreement features beyond gender
like number
, animacy
, social class
, etc. The number of agreement features that exist vary in each language and doesn’t necessarily have an upper limit in linguistic theory.
Two anaphoric elements are nondistinct if their feature sets do not disagree on an any common feature. Let’s use \equiv as short-hand notation for nondistinct and \not\equiv as short-hand notation for distinct. Then, for the examples above, to visualize this:
\text{Chris}\space\{\textsf{gender:masculine}\} \equiv \text{himself}\space\{\textsf{gender:masculine}\}
For a possible interpretation to be valid, anaphoric elements that share the same antecedent must be nondistinct from the antecedent and from each other.
With this in mind, the formalized anaphora problem in the agreement model is:
- Let the \text{link}(\alpha, \beta) relation mean that the anaphoric element \alpha has an antecedent \beta.
- Let \text{link}^+(\alpha, \beta) be the positive transistive closure of the directed \text{link} relation.
- Now, the problem is to find a set of \text{link}s L such that:
- Every anaphoric element is linked to some antecedent in L.
- After taking the transitive closure L^+, for all \text{link}^+(\alpha_1, \beta) and \text{link}^+(\alpha_2, \beta) in L^+, \alpha_1 \equiv \alpha_2 \equiv \beta must hold. If it does, then the set of \text{link}s L are a valid interpretation.
Proof
We will reduce the well known NP-complete problem of graph k-coloring to the anaphora agreement problem.
Given a graph G = (V, E) with k colors and |V| = n, we must construct a linguistic expression such that each valid interpretation corresponds 1:1 with a k-coloring of the graph G.
We can easily construct such an expression. Let’s get some basics defined:
- Let each vertex v_i in G correspond to a pronoun p_i in the expression.
- Let there also be n binary agreement features where each v_i corresponds to a binary feature \varphi_i. For example,
gender
is a binary agreement feature with two values (masculine
andfeminine
). - Let there be \beta_1, \beta_2, \ldots, \beta_k antecedents in the expression. These antecedents should not have any of their agreement features values be specified by default. For example, by just seeing the word student, you can’t tell if it is
masculine
orfeminine
. But with actress, you can. - Arrange the antecedents and pronouns into an expression such that each \beta_j could be a possible antecedent to each p_i.
Now, for each v_i in V:
- Set \varphi_i = 0 for p_i. For each edge (v_i, v_j) in E, set \varphi_i = 1 for p_j.
At the end, select a pronoun for each p_i that satisfies its binary feature values \varphi_0, \varphi_1, \ldots, \varphi_n. For example, if \varphi_0 is gender
and \varphi_0 = 0 for p_0 and \varphi_0 = 1 for p_1, then you could select p_0 = \text{his} and p_1 = \text{her}.
That’s it, the expression is now complete.
With this construction, each valid interpretation corresponds 1:1 with a k-coloring of the graph G because:
- Each pronoun p_i corresponds to a vertex v_i in the graph. Since each pronoun refers to one of k antecedents, it can also be colored with one of k colors.
- Two vertices sharing an edge (v_i, v_j) will never have the same color because that would mean they share the same antecedent, which in turn means they are nondistinct in a valid interpretation. But, in our construction we explicity made it so that the two pronouns corresponding to any given edge have different values for the \varphi_i feature (0 and 1) which would make them distinct.
So we have shown that if you could form a valid interpretation of any arbitrary linguistic expression (language comprehension) you could also solve the graph k-coloring problem.
Realism
It’s important to note that Ristad explicitly says the proof is not concerned with “what linguistic expressions can and cannot be easily understood”.
The proof is not limiting itself to only looking at the set of linguistic expresions that humans would normally use day-to-day. Most humans do not formulate sentences with hundreds of anaphoras because we generally strive to make our statements easily understood by others and since it would be hard on ourselves likely to even form those.
Does it mean the proof doesn’t matter then if limiting to the subset of linguistic expressions that the average human would encounter and comprehend? Maybe those are easy enough to be solved within a fixed amount of computation.
Maybe most humans usually use only simple anaphoras and referring expressions, few of them, antecedents that are local to the anaphora, and a limited number of agreement attributes. That makes it a lot easier. NP-complete problems only become “difficult” when scaled up. Otherwise, you can absolutely brute-force the subset of “small” problems given a sufficient fixed amount of computation time. For example, given a minute of computation time in Python on an average desktop computer, you could solve any instance of the NP-complete Traveling Salesman problem provided it fell into the subset of problems where the parameter n_\text{cities} \leq 3 by using the minute to brute-force all possible solutions.
But, if we extend the definition of the \text{link} relation to exophoras like the word president in: \text{The president has two daughters named Malia and Sasha.}
Ultimately, it still says something about how quickly the complexity of comprehending language grows as more and more references are involved.
What does it mean for NLP?
If solving the anaphora problem is critical to natural language understanding (NLU), then it would follow that NLU and more broady, NLP are NP-complete.
Both RNNs and Transformers used in models like BERT and GPT-3 have been shown to be Turing-complete, at least in theory. That is for any Turing Machine, formalized as M = (Q, \Gamma, b, \Sigma, \delta, q_{init}, F), there is a RNN or Transformer that can simulate the Turing Machine. So this doesn’t preclude them from being able to comprehend language, given sufficient computation time.
For that reason, the question around if BERT or GPT-3 can comprehend language seems to be more rooted in whether they have the common sense knowledge embedded to resolve an exophora like the example above and whether they are performing the “right computation” when resolving linguistic devices like anaphoras such that they can solve novel and diverse sentences never encountered before like humans can. The answer to those questions seems to be no for both BERT and GPT-3.