What is Natural Language Processing?

What is Natural Language Processing?

Claudia García

March 24, 2021

March 26, 2021

The concept of Natural Language Processing (NLP) is increasingly used. Although it started in 1970, its evolution in recent years has been exponential due to the large volume of data available. To understand this type of language, it is important to have a clear understanding of what language is.

What is language?

A language is considered as a set of sentences, which are usually infinite and are formed by combinations of words. These combinations must be syntactically and semantically correct.

A language is the function that expresses thoughts and communications between people. This function is carried out by vocal sound signals (voice) and/or by written signals (text)

At this point we can distinguish between two types of languages: Natural Languages (English, German, Spanish, etc.) and Formal Languages (mathematical, logical, etc.).

1. Natural Language

Natural Language (NL) is the medium we use on a daily basis to communicate with other people. It has been developed and organized from human experience and can be used to analyze highly complex situations and to reason about them in a subtle way. 

The semantic components from which the Natural Language draws generate the richness of its great expressive power and add value as a tool for reasoning. On the other hand, the syntax of Natural Language can be easily modeled by the other type of language, the Formal Language. 

Characteristics of Natural Language:
  • Developed by progressive enrichment prior to any attempt at theory formation.
  • The importance of its expressive character due largely to the richness of the semantic component.
  • Difficulty or impossibility of a complete formalization.
  • All languages are systematic. They are governed by a set of interrelated systems.
  • They are conventional and arbitrary. They obey rules, such as assigning a particular word to a particular thing or concept. 
  • They are redundant, meaning that the information in a sentence is signaled in more than one way.

2. Formal Language

Formal Language is the kind of language that man has developed to express situations that are specific to each area of scientific knowledge. The words and sentences of a Formal Language are perfectly defined (a word maintains the same meaning regardless of its context or use).

This type of language is free of any semantic component outside of its operators and relations. They can be used to model a theory of mechanics, physics, mathematics, electrical engineering, or of any other nature, with the advantage that all ambiguity is removed.

Characteristics of Formal Language:
  • They are developed from a pre-established theory.
  • They possess a minimum semantic component.
  • Present the possibility of increasing the semantic component according to the theory to be formalized.
  • The syntax produces unambiguous sentences.
  • Numbers play a very important role. 
  • Their formalization is complete and therefore, the potential of computational construction.

Within the Formal Language, we can find the Programming Language that is defined as a set of elements organized by grammatical rules that allow us to write in a technology program. This type of language has two important elements.

  • The syntax which is the proper order of the lexical components. It is the set of rules that define the combination of symbols that are considered to be correctly structured elements or expressions.
  • The semantic which ensures that each sequence used has a correct meaning.

So what is the Natural Language Processing (NLP)?

By bringing together the Programming Language and the Natural Language, we obtain the Natural Language Processing. It is the field of Artificial Intelligence that gives the machine the ability to interpret, understand and derive meaning from human language. The NLP use cases are mainly based on the application of AI and integrated machine learning:

  • Automatic translation
  • Information retrieval
  • Information extraction and summarization
  • Intelligent tutoring
  • Cooperative problem solving
  • Voice recognition

At Erudit AI we rely on studies that show how AI can be used beneficially in the field of mental health through NPL. In our specific case we use semantic analysis for the interpretation, understanding and manipulation of human language, in order to automate it and let our Neural Networks (AI) be in charge of detecting mental health problems in employees.

If you want to know more about Erudit AI’s technology and what it does, click here!