17

Google recently updated their translation tool so that it can now translate between language pairs that it hadn't seen before, something they're calling "zero-shot translation." See here for the full paper and here for a summary.

For example, they can train a neural network to translate from Japanese to English and from English to Korean. They then ask it to perform Japanese-Korean translations, and it performs "reasonably" well, even though it was never trained to translate that particular language pair.

What stood out to me is the following conclusion from the article:

5.1 Evidence for an Interlingua:

Several trained networks indeed show strong visual evidence of a shared representation. For example, Figure 2 below was produced from a many-to-many model trained on English↔Japanese and English↔Korean. To visualize the model in action we began with a small corpus of 74 triples of semantically identical cross-language phrases. That is, each triple contained phrases in English, Japanese, and Korean with the same underlying meaning.[...] Inspection of these clusters shows that each strand represents a single sentence, and clusters of strands generally represent a set of translations of the same underlying sentence, but with different source and target languages.

In other words, Google was able to group sentences into an underlying geometrical structure, which corresponds to a meta-language, or as the authors say, an interlingua. Some of the popular articles I've read about this are going so far as to say that Google's Neural Network "invented its own language", but I feel that they're just being sensationalist.

My question: Does this evidence for a meta-language or a shared representation underlying all languages support theories like Jerry Fodor's Language of Thought Hypothesis (i.e. Mentalese) or Chomsky's claim of there being a universal grammar?

Alexander S King
  • 26,984
  • 5
  • 64
  • 187
  • 4
    Sure it could be used as support, however, Chomsky's universal grammar has already been demonstrated to be bunk. See Sampson's "[There Is No Language Instinct](https://periodicos.ufsc.br/index.php/desterro/article/download/8076/7459)" as well as [criticisms, particularly re: the Pirahå](https://en.wikipedia.org/wiki/Universal_grammar#Criticisms) – MmmHmm Jan 10 '17 at 23:02
  • 5
    Doesn't it support just the opposite? The "interlingua" is an interpretation by human researchers of neuro-net's global states, the net itself, on the other hand, is not based on primitives, nor combines them compositionally, as LOT and UG would have it. Not only does Google's net only *emulate* interlingua, it *developed* this emulation, which goes against all "wired language" speculations. That unification of different languages optimizes translation is no more surprising than existence of [esperanto](https://en.wikipedia.org/wiki/Esperanto), but it does not support esperanto in the brain. – Conifold Jan 11 '17 at 00:52
  • How does it compare to two different people translating J->E and then E->K? – JAB Jan 11 '17 at 01:22
  • 8
    There will never be a true language of thought until you can completely discard the word *representation*. As long as you are just *representing* something, you are always falling short of a what needs to be represented. It's just another code which requires humans to decode it, because there is no foreseeable way to free machines from the code. –  Jan 11 '17 at 02:35
  • 3
    @Conifold one of the assumptions of pattern recognition (Neural Nets, SVMs, etc...) is that there exists a pattern to be discovered, if the PR algorithm can't find a natural pattern and "forces" one on the existing data, this leads to overfitting, where the algorithm works well ont the training set but fails miserably when trying to generalize to new patterns. In this case that means that Google's NNet would be able to translate existing language pairs very well but not be able to generalize to new language pairs - the fact that it was able to partition the space so efficiently is remarquable – Alexander S King Jan 11 '17 at 04:30
  • 5
    Given that Japanese and Korean have [major documented similarities](https://en.wikipedia.org/wiki/Classification_of_the_Japonic_languages#Japanese-Korean_hypothesis), this example sounds more like a case of a mentalist improving their odds of a good-looking result than anything else. This question may be a good fit for Skeptics.SO in that regard. – bright-star Jan 11 '17 at 05:27
  • It is well-known to linguists that major languages, which Google mostly deals with, share many commonalities (they become scarce beyond that group, which is what sunk the original UG, and likely reflect typical uses rather than mentalese). Pulling them into an interpolating language was done e.g. in esperanto. What is remarkable to me is that interlingua was made by a neuro-net rather than human, and what it confirms to me is "plasticity of neuro-nets", their capacity to emulate a variety of structures without pre-wired composing of primitives, including non-linguistic thinking in the brain. – Conifold Jan 13 '17 at 19:36
  • I don't really read Spanish or French but I find if I look at sentences from these languages they look remarkably familiar; it doesn't surprise that googles language tools are able to find a great deal of commonality. – Mozibur Ullah Jan 14 '17 at 12:18
  • 1
    @bright-star as someone who has studied Japanese and Korean (and also speak Chinese), I would say that because of the similarity of grammar between the two languages and the fact that I've even performed translation between languages that I don't have a high level of proficiency in by using Google Translate to do this exact same thing, there's not enough evidence to suggest that it is doing anything sophisticated. Given that more complex sentences (even if you provide enough context) fail the reverse translation test quite badly, I would like to see more evidence before making this claim. – Michael Lai Mar 02 '21 at 04:07
  • 1
    @MmmHmm Scientific American (2016) : [Is Chomsky's Theory of Language Wrong? Pinker Weighs in on Debate](https://blogs.scientificamerican.com/cross-check/is-chomskys-theory-of-language-wrong-pinker-weighs-in-on-debate/) – Chris Degnen Jun 26 '22 at 22:31
  • I don't think such a conclusion is warranted. – Agent Smith May 13 '23 at 02:58

3 Answers3

1

The thesis behind mentalese is algorithmic and representational. It is also postulated based on language theory. Google translate is a neural net based AI, and neural nets do not use representation, and google does not actually use language. Hence google translate cannot have either has verified nor discovered representational mentalese, or a universal language.

Dcleve
  • 9,612
  • 1
  • 11
  • 44
1

Is this an interlingua?

From the paper:

We propose a simple, elegant solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language.

From Wikipedia:

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

The word sequence modeling was at first typically done using a recurrent neural network (RNN). A bidirectional recurrent neural network, known as an encoder, is used by the neural network to encode a source sentence for a second RNN, known as a decoder, that is used to predict words in the target language. Recurrent neural networks face difficulties in encoding long inputs into a single vector. This can be compensated by an attention mechanism which allows the decoder to focus on different parts of the input while generating each word of the output.

So, maybe.

The system described in this paper translates sentences to vectors (a fixed-length string of numbers; this is a standard technique), and then translates those numbers back to sentences. This system uses the same model to translate multiple different languages, by kinda treating all languages as if they're the same language with a more complicated grammar. Previous systems used a separate model per pair of languages.

The representation for Apfel in one model can be mapped onto the representation for pomme in another, because both refer to apples! Apples are generally described as red or green, no matter what language you're using, so the structure around them is the same. (Especially so if you're using a corpus consisting of the same document translated into loads of languages – but I'd expect this even if you weren't.)

So there's an extent to which this internal representation is an interlingua. However, it's probably more accurate to describe it as a correlation. See section 5.2 of the paper (emphasis mine):

For example, Figure 3a shows a t-SNE projection of attention vectors from a model that was trained on Portuguese→English (blue) and English→Spanish (yellow) and performing zero-shot translation from Portuguese→Spanish (red). This projection shows 153 semantically identical triples translated as described above, yielding 459 total translations. The large red region on the left primarily contains zero-shot Portuguese→Spanish translations. In other words, for a significant number of sentences, the zero-shot translation has a different embedding than the two trained translation directions. On the other hand, some zero-shot translation vectors do seem to fall near the embeddings found in other languages, as on the large region on the right.

It is natural to ask whether the large cluster of “separated” zero-shot translations has any significance. A definitive answer requires further investigation, but in this case zero-shot translations in the separated area do tend to have lower BLEU scores.

Additionally, this "interlingua" (a vector) probably has no grammar. Charitably, you could describe it as an impressionist artwork of parts of a sentence, or a numerical representation of complex concepts, or a protocol, but I don't think it's really a language. It's better described by statistics than linguistics.1 If it's a language, it's an alien one.

Does this support the Language of Thought Hypothesis?

From Wikipedia:

The language of thought hypothesis (LOTH), sometimes known as thought ordered mental expression (TOME), is a view in linguistics, philosophy of mind and cognitive science, forwarded by American philosopher Jerry Fodor. It describes the nature of thought as possessing "language-like" or compositional structure (sometimes known as mentalese). On this view, simple concepts combine in systematic ways (akin to the rules of grammar in language) to build thoughts. In its most basic form, the theory states that thought, like language, has syntax.

Since this translation tool does not exhibit behaviour obviously like thought, I can't see how it supports this hypothesis at all. The LOTH is about human thought being language-like, not about human language being universal in some way. (It's not even supposing that "mentalese" is universal.)

Does this support there being a universal grammar?

From Wikipedia:

Universal grammar (UG), in modern linguistics, is the theory of the genetic component of the language faculty, usually credited to Noam Chomsky. The basic postulate of UG is that there are innate constraints on what the grammar of a possible human language could be. When linguistic stimuli are received in the course of language acquisition, children then adopt specific syntactic rules that conform to UG.

UG is a claim about human psychology; this machine translation technology is not limited to natural language, and would exhibit the same behaviour in languages that are outside a hypothetical UG (so long as they have enough locality to be comprehensible). The internal vector representation is more about the meaning than the grammar. I don't think this says anything about UG.

Except to the extent we can model human language processing as behaving like this machine translation model. But by the time we know enough about human psychology to know if our language processing works this way, UG will already be settled.


1: In fact, it's the kind of format that we convert written language into, in order to apply statistics to it. You can do statistics on written language directly, but it's usually quite limited unless you're very clever. (Not to say that Markov chains are in any way the limit of what you can do if you're very clever with statistical analysis of language.)

wizzwizz4
  • 1,300
  • 6
  • 17
0

The evidence for a shared representation underlying different languages that is seen in Google's zero-shot translation system is interesting and could potentially be relevant to theories like Jerry Fodor's Language of Thought Hypothesis and Chomsky's idea of a universal grammar. However, it is important to note that this is a very early and preliminary result, and more research would need to be done in order to determine the full significance of this finding. It is also worth noting that Fodor's Language of Thought Hypothesis and Chomsky's idea of a universal grammar are complex and multifaceted theories, and it is unlikely that the evidence from Google's translation system would be sufficient to fully support or refute these theories.