According to Microsoft, Natural Language Understanding (NLU) is one of the longest-running goals in AI (artificial intelligence), and SuperGLUE is currently among the most challenging benchmarks for evaluating NLU models.
The SuperGLUE benchmark consists of a wide range of NLU tasks, including question answering, natural language inference, co-reference resolution, word sense disambiguation, and others. The Redmond tech giant took the causal reasoning task as an example.
Given the premise “the child became immune to the disease” and the question “what’s the cause for this?,” the model is asked to choose an answer from two plausible candidates: 1) “he avoided exposure to the disease” and 2) “he received the vaccine for the disease.”
While it is easy for a human to choose the right answer, it is challenging for an AI model. To get the right answer, the model needs to understand the causal relationship between the premise and those plausible options.
In order to better handle benchmarks, Microsoft updated the DeBERTa (Decoding-enhanced BERT with Disentangled Attention) model by training a larger version that consists of 48 Transformer layers with 1.5 billion parameters.
DeBERTa is a Transformer-based neural language model pretrained on large amounts of raw text corpora using self-supervised learning. Like other PLMs, DeBERTa is intended to learn universal language representations that can be adapted to various downstream NLU tasks.
The DeBERTa model of AI now scores 89.9 in SuperGLUE for the first time in terms of the macro-average score while the ensemble model with 3.2 billion parameters scores 90.3 outperforming the human baseline by a decent margin (90.3 versus 89.8). The model also sits at the top of the GLUE benchmark rankings with a macro-average score of 90.8.
This is not the first time that an AI model has outperformed human baselines. Google’s 11 billion parameter “T5 + Meena” model surpassed the human baseline with a score of 90.2 on January 5, which was outperformed by Microsoft’s DeBERTa model on January 6.
Microsoft is integrating DeBERTa into the next version of the Microsoft Turing natural language representation model (Turing NLRv4). The model will be trained at large scale to support products like Bing, Office, Dynamics, and Azure Cognitive Services, powering a wide range of scenarios involving human-machine and human-human interactions via natural language (such as chatbot, recommendation, question answering, search, personal assist, customer support automation, content generation, and others) to benefit hundreds of millions of users through the Microsoft AI at Scale initiative.
According to Microsoft, compared to Google’s T5 model, which consists of 11 billion parameters, the 1.5-billion-parameter DeBERTa is much more energy efficient to train and maintain, and it is easier to compress and deploy to apps of various settings.
“DeBERTa surpassing human performance on SuperGLUE marks an important milestone toward general AI. Despite its promising results on SuperGLUE, the model is by no means reaching the human-level intelligence of NLU. Humans are extremely good at leveraging the knowledge learned from different tasks to solve a new task with no or little task-specific demonstration. This is referred to as compositional generalization, the ability to generalize to novel compositions (new tasks) of familiar constituents (subtasks or basic problem-solving skills). Moving forward, it is worth exploring how to make DeBERTa incorporate compositional structures in a more explicit manner, which could allow combining neural and symbolic computation of natural language similar to what humans do,” the company added.
Microsoft has released the 1.5-billion-parameter DeBERTa model and the source code to the public.