World News

A Taxonomy of Natural Language Processing | by Tim Schopf | Sep, 2023

As an atmosphere pleasant technique to understand, generate, and course of pure language texts, evaluation in pure language processing (NLP) has exhibited a quick unfold and big adoption in current instances. Given the quick developments in NLP, buying an abstract of the realm and sustaining it is robust. This weblog submit targets to produce a structured overview of assorted fields of study in NLP and analyzes newest tendencies on this space.

Fields of study are academic disciplines and concepts that usually embody (nevertheless are normally not restricted to) duties or strategies.

On this text, we study the subsequent questions:

  • What are the utterly totally different fields of study investigated in NLP?
  • What are the traits and developments over time of the evaluation literature in NLP?
  • What are the current tendencies and directions of future work in NLP?

Although most fields of study in NLP are well-known and outlined, there for the time being exists no usually used taxonomy or categorization scheme that makes an try to collect and building these fields of study in a continuing and understandable format. Subsequently, getting an abstract of your complete space of NLP evaluation is hard. Whereas there are lists of NLP topics in conferences and textbooks, they’ve an inclination to fluctuate considerably and are generally each too broad or too specialised. Subsequently, we developed a taxonomy encompassing a wide range of numerous fields of study in NLP. Although this taxonomy couldn’t embody all doable NLP concepts, it covers a wide range of probably the most well-liked fields of study, whereby missing fields of study is also thought-about as subtopics of the included fields of study. Whereas creating the taxonomy, we found that positive lower-level fields of study wanted to be assigned to plenty of higher-level fields of study reasonably than just one. Subsequently, some fields of study are listed plenty of situations throughout the NLP taxonomy, nevertheless assigned to utterly totally different higher-level fields of study. The last word taxonomy was developed empirically in an iterative course of together with space specialists.

The taxonomy serves as an overarching classification scheme by means of which NLP publications may be categorized based mostly on not lower than certainly one of many included fields of study, even after they do not instantly take care of certainly one of many fields of study, nevertheless solely subtopics thereof. To analyze newest developments in NLP, we educated a weakly supervised model to classify ACL Anthology papers based mostly on the NLP taxonomy.

You probably can study further particulars regarding the progress technique of the classification model and the NLP taxonomy in our paper.

The following half provides fast explanations of the fields of study concepts included throughout the NLP taxonomy above.

Multimodality

“Multimodality refers again to the performance of a system or method to course of enter of assorted kinds or modalities” (Garg et al., 2022). We distinguish between strategies that will course of textual content material in pure language along with seen information, speech & audio, programming languages, or structured information akin to tables or graphs.

Pure Language Interfaces

“Pure language interfaces can course of data based mostly totally on pure language queries” (Voigt et al., 2021), usually carried out as question answering or dialogue & conversational strategies.

Semantic Textual content material Processing

This high-level space of study comprises all types of concepts that attempt to derive which means from pure language and permit machines to interpret textual information semantically. One of many very important extremely efficient fields of study on this regard are “language fashions that attempt to be taught the joint chance function of sequences of phrases” (Bengio et al., 2000). “Newest advances in language model teaching have enabled these fashions to effectively perform different downstream NLP duties” (Soni et al., 2022). In illustration finding out, “semantic textual content material representations are usually realized inside the kind of embeddings” (Fu et al., 2022), which “could be utilized to test the semantic similarity of texts in semantic search settings” (Reimers and Gurevych, 2019). Furthermore, “info representations, e.g., inside the kind of info graphs, may be built-in to reinforce different NLP duties” (Schneider et al., 2022).

Sentiment Analysis

“Sentiment analysis makes an try to find out and extract subjective data from texts” (Wankhade et al., 2022). Typically, analysis take care of extracting opinions, emotions, or polarity from texts. Additional simply these days, aspect-based sentiment analysis emerged as a way to produce further detailed data than fundamental sentiment analysis, as “it targets to predict the sentiment polarities of given parts or entities in textual content material” (Xue and Li, 2018).

Syntactic Textual content material Processing

This high-level space of study targets at “analyzing the grammatical syntax and vocabulary of texts” (Bessmertny et al., 2016). Marketing consultant duties on this context are syntactic parsing of phrase dependencies in sentences, tagging of phrases to their respective part-of-speech, segmentation of texts into coherent sections, or correction of defective texts with respect to grammar and spelling.

Linguistics & Cognitive NLP

“Linguistics & Cognitive NLP presents with pure language based mostly totally on the assumptions that our linguistic skills are firmly rooted in our cognitive skills, that which means is definitely conceptualization, and that grammar is fashioned by utilization” (Dabrowska and Divjak, 2015). Many alternative linguistic theories are present that normally argue that “language acquisition is dominated by widespread grammatical pointers which may be frequent to all generally creating individuals” (Intelligent and Sevcik, 2017). “Psycholinguistics makes an try to model how a human thoughts acquires and produces language, processes it, comprehends it, and provides recommendations” (Balamurugan, 2018). “Cognitive modeling is anxious with modeling and simulating human cognitive processes in different varieties, considerably in a computational or mathematical type” (Photo voltaic, 2020).

Accountable & Dependable NLP

“Accountable & dependable NLP is anxious with implementing methods that consider fairness, explainability, accountability, and ethical parts at its core” (Barredo Arrieta et al., 2020). Inexperienced & sustainable NLP is particularly focused on atmosphere pleasant approaches for textual content material processing, whereas low-resource NLP targets to hold out NLP duties when information is scarce. Furthermore, robustness in NLP makes an try to develop fashions which may be insensitive to biases, proof in opposition to information perturbations, and reliable for out-of-distribution predictions.

Reasoning

Reasoning permits machines to draw logical conclusions and derive new info based mostly totally on the information accessible to them, using strategies akin to deduction and induction. “Argument mining routinely identifies and extracts the development of inference and reasoning expressed as arguments provided in pure language texts2 (Lawrence and Reed, 2019). “Textual inference, usually modeled as entailment downside, routinely determines whether or not or not a natural-language hypothesis may be inferred from a given premise” (MacCartney and Manning, 2007). “Commonsense reasoning bridges premises and hypotheses using world info that is not explicitly provided throughout the textual content material” (Ponti et al., 2020), whereas “numerical reasoning performs arithmetic operations” (Al-Negheimish et al., 2021). “Machine finding out comprehension targets to indicate machines to seek out out the right options to questions based mostly totally on a given passage” (Zhang et al., 2021).

Multilinguality

Multilinguality tackles all types of NLP duties that include a few pure language and is conventionally studied in machine translation. Furthermore, “code-switching freely interchanges plenty of languages inside a single sentence or between sentences” (Diwan et al., 2021), whereas cross-lingual swap strategies use information and fashions accessible for one language to unravel NLP duties in a single different language.

Data Retrieval

“Data retrieval is anxious with discovering texts that fulfill an data need from inside large collections” (Manning et al., 2008). Typically, this entails retrieving paperwork or passages.

Data Extraction & Textual content material Mining

This space of study focuses on extracting structured info from unstructured textual content material and “permits the analysis and identification of patterns or correlations in information” (Hassani et al., 2020). “Textual content material classification routinely categorizes texts into predefined classes” (Schopf et al., 2021), whereas “matter modeling targets to seek out latent topics in doc collections” (Grootendorst, 2022), normally using textual content material clustering strategies that arrange semantically comparable texts into the an identical clusters. “Summarization produces summaries of texts that embody the vital factor components of the enter in a lot much less home and preserve repetition to a minimal” (El-Kassas et al., 2021). Furthermore, the information extraction & textual content material mining space of study moreover comprises “named entity recognition, which presents with the identification and categorization of named entities” (Leitner et al., 2020), “coreference determination, which targets to find out all references to the an identical entity in discourse” (Yin et al., 2021), “time interval extraction, which targets to extract associated phrases akin to key phrases or keyphrases” (Rigouts Terryn et al., 2020), relation extraction that targets to extract relations between entities, and “open data extraction that facilitates the domain-independent discovery of relational tuples” (Yates et al., 2007).

Textual content material Period

The goal of textual content material expertise approaches is to generate texts which may be every comprehensible to individuals and indistinguishable from textual content material authored by individuals. Accordingly, the enter usually consists of textual content material, akin to in “paraphrasing that renders the textual content material enter in a particular ground type whereas preserving the semantics” (Niu et al., 2021), “question expertise that targets to generate a fluid and associated question given a passage and a purpose reply” (Observe et al., 2018), or “dialogue-response expertise which targets to generate natural-looking textual content material associated to the quick” (Zhang et al., 2020). In a number of situations, nonetheless, the textual content material is generated due to enter from totally different modalities, akin to throughout the case of “data-to-text expertise that generates textual content material based mostly totally on structured information akin to tables or graphs” (Kale and Rastogi, 2020), captioning of images or films, or “speech recognition that transcribes a speech waveform into textual content material” (Baevski et al., 2022).

The number of papers per yr throughout the ACL Anthology from 1952 to 2022. Image by creator

Considering the literature on NLP, we start our analysis with the number of analysis as an indicator of research curiosity. The distribution of publications over the 50-year commentary interval is confirmed throughout the Decide above. Whereas the first publications appeared in 1952, the number of annual publications grew slowly until 2000. Accordingly, between 2000 and 2017, the number of publications roughly quadrupled, whereas throughout the subsequent 5 years, it has doubled as soon as extra. We because of this reality observe a near-exponential progress throughout the number of NLP analysis, indicating rising consideration from the evaluation group.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button