Making sense of language is an NP-complete problem, which seems like something you might intuitively guess, but you can prove this by reducing the well known NP-complete problem of graph $k$-coloring to the anaphora agreement problem in linguistics.
This is not my proof, I’m simply restating and visualizing a proof from 1991 by Eric Sven Ristad I came across in my research. It doesn’t seem like it’s readily available on the web other than in the form of a scanned PDF on Princeton University’s website. I’ll discuss it below, albeit with less rigor to simplify.
Background
A subtask of language comprehension is the anaphora problem. Because it is difficult to define, what the “correct” antecedent is (since that relies on the intentions of the producer), the proof utilizes anaphoras where the solution would introduce no new information.
In the anaphora agreement model, the himself in:
$$ \text{Chris liked himself.} $$
must be Chris and the agreement feature gender
for Chris is masculine
since himself is masculine
.
An example of an impossible interpretation is:
$$ \text{The student prepared her breakfast after doing his homework.} $$
Here, following transitivity, the student cannot be both masculine
and feminine
for the agreement feature gender
.
There are many other types of agreement features beyond gender
like number
, animacy
, social class
, etc. The number of agreement features that exist vary in each language and doesn’t necessarily have an upper limit in linguistic theory.
Two anaphoric elements are nondistinct if their feature sets do not disagree on an any common feature. Let’s use $\equiv$ as short-hand notation for nondistinct and $\not\equiv$ as short-hand notation for distinct. Then, for the examples above, to visualize this:
$$ \text{Chris}\space\{\textsf{gender:masculine}\} \equiv \text{himself}\space\{\textsf{gender:masculine}\}$$ $$ \text{student}\space\{\textsf{gender:feminine}\} \equiv \text{her}\space\{\textsf{gender:feminine}\}$$ $$ \text{student}\space\{\textsf{gender:feminine}\} \not\equiv \text{his}\space\{\textsf{gender:masculine}\}$$ $$ \text{her}\space\{\textsf{gender:feminine}\} \not\equiv \text{his}\space\{\textsf{gender:masculine}\}$$
For a possible interpretation to be valid, anaphoric elements that share the same antecedent must be nondistinct from the antecedent and from each other.
With this in mind, the formalized anaphora problem in the agreement model is:
- Let the $\text{link}(\alpha, \beta)$ relation mean that the anaphoric element $\alpha$ has an antecedent $\beta$.
- Let $\text{link}^+(\alpha, \beta)$ be the positive transistive closure of the directed $\text{link}$ relation.
- Now, the problem is to find a set of $\text{link}$s $L$ such that:
- Every anaphoric element is linked to some antecedent in $L$.
- After taking the transitive closure $L^+$, for all $\text{link}^+(\alpha_1, \beta)$ and $\text{link}^+(\alpha_2, \beta)$ in $L^+$, $\alpha_1 \equiv \alpha_2 \equiv \beta$ must hold. If it does, then the set of $\text{link}$s $L$ are a valid interpretation.
Proof
We will reduce the well known NP-complete problem of graph $k$-coloring to the anaphora agreement problem.
Given a graph $G = (V, E)$ with $k$ colors and $|V| = n$, we must construct a linguistic expression such that each valid interpretation corresponds 1:1 with a $k$-coloring of the graph $G$.
We can easily construct such an expression. Let’s get some basics defined:
- Let each vertex $v_i$ in $G$ correspond to a pronoun $p_i$ in the expression.
- Let there also be $n$ binary agreement features where each $v_i$ corresponds to a binary feature $\varphi_i$. For example,
gender
is a binary agreement feature with two values (masculine
andfeminine
). - Let there be $\beta_1, \beta_2, \ldots, \beta_k$ antecedents in the expression. These antecedents should not have any of their agreement features values be specified by default. For example, by just seeing the word student, you can’t tell if it is
masculine
orfeminine
. But with actress, you can. - Arrange the antecedents and pronouns into an expression such that each $\beta_j$ could be a possible antecedent to each $p_i$.
Now, for each $v_i$ in $V$:
- Set $\varphi_i = 0$ for $p_i$. For each edge $(v_i, v_j)$ in $E$, set $\varphi_i = 1$ for $p_j$.
At the end, select a pronoun for each $p_i$ that satisfies its binary feature values $\varphi_0, \varphi_1, \ldots, \varphi_n$. For example, if $\varphi_0$ is gender
and $\varphi_0 = 0$ for $p_0$ and $\varphi_0 = 1$ for $p_1$, then you could select $p_0 = \text{his}$ and $p_1 = \text{her}$.
That’s it, the expression is now complete.
With this construction, each valid interpretation corresponds 1:1 with a $k$-coloring of the graph $G$ because:
- Each pronoun $p_i$ corresponds to a vertex $v_i$ in the graph. Since each pronoun refers to one of $k$ antecedents, it can also be colored with one of $k$ colors.
- Two vertices sharing an edge $(v_i, v_j)$ will never have the same color because that would mean they share the same antecedent, which in turn means they are nondistinct in a valid interpretation. But, in our construction we explicity made it so that the two pronouns corresponding to any given edge have different values for the $\varphi_i$ feature ($0$ and $1$) which would make them distinct.
So we have shown that if you could form a valid interpretation of any arbitrary linguistic expression (language comprehension) you could also solve the graph $k$-coloring problem.
Realism
It’s important to note that Ristad explicitly says the proof is not concerned with “what linguistic expressions can and cannot be easily understood”.
The proof is not limiting itself to only looking at the set of linguistic expresions that humans would normally use day-to-day. Most humans do not formulate sentences with hundreds of anaphoras because we generally strive to make our statements easily understood by others and since it would be hard on ourselves likely to even form those.
Does it mean the proof doesn’t matter then if limiting to the subset of linguistic expressions that the average human would encounter and comprehend? Maybe those are easy enough to be solved within a fixed amount of computation.
Maybe most humans usually use only simple anaphoras and referring expressions, few of them, antecedents that are local to the anaphora, and a limited number of agreement attributes. That makes it a lot easier. NP-complete problems only become “difficult” when scaled up. Otherwise, you can absolutely brute-force the subset of “small” problems given a sufficient fixed amount of computation time. For example, given a minute of computation time in Python on an average desktop computer, you could solve any instance of the NP-complete Traveling Salesman problem provided it fell into the subset of problems where the parameter $n_\text{cities} \leq 3$ by using the minute to brute-force all possible solutions.
But, if we extend the definition of the $\text{link}$ relation to exophoras like the word president in: $$\text{The president has two daughters named Malia and Sasha.}$$ suddenly there is now a reference to a larger pool of assumed prior common sense knowledge that president here refers to President Obama. I’m now just speculating here, but it seems like that extension brings normal, “everyday” sentences to a higher level of comprehension complexity again. Though, even with this, as humans, we still probably only care about understanding a small subset of all valid linguistic expressions that are capped at a level of comprehension complexity we find “reasonable”.
Ultimately, it still says something about how quickly the complexity of comprehending language grows as more and more references are involved.
What does it mean for NLP?
If solving the anaphora problem is critical to natural language understanding (NLU), then it would follow that NLU and more broady, NLP are NP-complete.
Both RNNs and Transformers used in models like BERT and GPT-3 have been shown to be Turing-complete, at least in theory. That is for any Turing Machine, formalized as $M = (Q, \Gamma, b, \Sigma, \delta, q_{init}, F)$, there is a RNN or Transformer that can simulate the Turing Machine. So this doesn’t preclude them from being able to comprehend language, given sufficient computation time.
For that reason, the question around if BERT or GPT-3 can comprehend language seems to be more rooted in whether they have the common sense knowledge embedded to resolve an exophora like the example above and whether they are performing the “right computation” when resolving linguistic devices like anaphoras such that they can solve novel and diverse sentences never encountered before like humans can. The answer to those questions seems to be no for both BERT and GPT-3.