It's no coincidence that we can now communicate with computers using human language - they were trained that way - and in this article, we're going to find out how. We'll begin by looking at a definition and the history behind natural language processing before moving on to the different types and techniques. Finally, we will look at the social impact natural language processing has had.
Definition of Natural Language Processing
Natural language processing (NLP) is a branch of artificial intelligence (AI) that assists in the process of programming computers/computer software to 'learn' human languages. The goal of NLP is to create software that understands language as well as we do.
Natural language processing has roots in linguistics, computer science, and machine learning and has been around for more than 50 years (almost as long as the modern-day computer!).
Today, we can see the results of NLP in things such as Apple's Siri, Google's suggested search results, and language learning apps like Duolingo.
Fig 1. We can talk to 'Alexa' because of natural language processing
History of Natural Language Processing
The beginnings of NLP as we know it today arose in the 1940s after the Second World War. The global nature of the war highlighted the importance of understanding multiple different languages, and technicians hoped to create a 'computer' that could translate languages for them.
The creation of such a computer proved to be pretty difficult, and linguists such as Noam Chomsky identified issues regarding syntax. For example, Chomsky found that some sentences appeared to be grammatically correct, but their content was nonsense. He argued that for computers to understand human language, they would need to understand syntactic structures.
Syntactic structures - In 1957, Noam Chomsky released his highly influential book Syntactic Structures, in which he argued that syntax should be treated separately from semantics and that there must be a formal and standardized approach to analyzing syntax.
By the 1990s, NLP had come a long way and now focused more on statistics than linguistics, 'learning' rather than translating, and used more Machine Learning algorithms. Using Machine Learning meant that NLP developed the ability to recognize similar chunks of speech and no longer needed to rely on exact matches of predefined expressions. For example, software using NLP would understand both "What's the weather like?" and "How's the weather?".
By 2011, Apple released the first successful and publicly available NLP virtual assistant, Siri.
How Does Natural Language Processing Work?
You're probably wondering by now how NLP works - this is where linguistics knowledge will come in handy.
NLP uses AI to take in real-world human language and perform processing tasks in order to turn the language into code the computer will understand. There are two parts to this process:
- Algorithm development - Once the language has been turned into data, an algorithm must be developed to process and use it.
Let's look at some of the most common pre-processing techniques now. These techniques are rooted in linguistics and linguistic analysis. We won't be looking at algorithm development today, as this is less related to linguistics.
Natural Language Processing Techniques
There are two main pre-processing types: syntactic and semantic analysis. Before we dive into these techniques, let's look at some definitions for these two terms.
Syntax - The arrangement and order of words within a sentence. The most basic syntax structure is subject-verb-object (SVO).
Semantics - The branch of linguistics that looks at the meaning, logic, and relationship of and between words.
Syntactic Analysis
Syntactic analysis involves looking at a sentence as a whole to understand its meaning rather than analyzing individual words. There are several syntactic analysis techniques NLP utilizes.
Parsing
Parsing involves breaking a sentence down into each of its constituents. A constituent is a unit of language that serves a function in a sentence; they can be individual words, phrases, or clauses. For example, the sentence "The cat plays the grand piano." comprises two main constituents, the noun phrase (the cat) and the verb phrase (plays the grand piano). The verb phrase can then be further divided into two more constituents, the verb (plays) and the noun phrase (the grand piano).
Conducting a parsing analysis involves representing each sentence's constituents in a parse tree, like so:
Fig 2. Example of a parse tree
Parse trees can show us the relationship between words in a sentence and how they work together to form constituents. For example, we can see that "the grand piano" is a constituent, but "plays the" isn't. This information can be turned into data for an NLP algorithm.
Stemming
Stemming is a morphological process that involves reducing conjugated words back to their root word.
Conjugation (adj. conjugated) - Inflecting a verb to show different grammatical meanings, such as tense, aspect, and person. Inflecting verbs typically involves adding suffixes to the end of the verb or changing the word's spelling.
Root word - Walk (verb)
Conjugations - walking, walked, walks, walker
Taking each word back to its original form can help NLP algorithms recognize that although the words may be spelled differently, they have the same essential meaning. It also means that only the root words need to be stored in a database, rather than every possible conjugation of every word.
Text Segmentation
Text segmentation is the process of separating language into meaningful units, such as morphemes (e.g., un-, luck, -y), words, sentences, paragraphs, and intent (i.e., what is the purpose of the language? does it ask a question, provide a statement, or give an order?).
Semantic Analysis
Sometimes sentences can follow all the syntactical rules but don't make semantical sense. This is why it's important to also conduct semantic analyses. These help the algorithms understand the tone, purpose, and intended meaning of language.
Sentiment Analysis
Sentiment analysis is an NLP technique that aims to understand whether the language is positive, negative, or neutral. It can also determine the tone of language, such as angry or urgent, as well as the intent of the language (i.e., to get a response, to make a complaint, etc.). Sentiment analysis works by finding vocabulary that exists within preexisting lists.
Adjectives like disappointed, wrong, incorrect, and upset would be picked up in the pre-processing stage and would let the algorithm know that the piece of language (e.g., a review) was negative.
Disambiguation
Word disambiguation is the process of trying to remove lexical ambiguities. A lexical ambiguity occurs when it is unclear which meaning of a word is intended.
"I'll meet you at the bank."
The word bank has more than one meaning, so there is an ambiguity as to which meaning is intended here. By looking at the wider context, it might be possible to remove that ambiguity.
"I need to deposit some money, so I'll meet you at the bank."
Now we can see that the word bank is referring to a financial establishment and not a river bank or the verb to bank.
Removing lexical ambiguities helps to ensure the correct semantic meaning is being understood.
Natural Language Processing Examples
Now we have a good idea of what NLP is and how its works, let's look at some real-world examples of how NLP affects our day-to-day lives.
Email filters
If you open up your email and look at the menu, you'll likely find different folders such as "spam" or "social." Emails you've received have been automatically 'filtered' to these folders based on the vocabulary they contain. This is a type of sentiment analysis.
Predictive text
One of the earliest uses of NLP was in predictive text. Today, predictive text uses NLP techniques and 'deep learning' to correct the spelling of a word, guess which word you will use next, and make suggestions to improve your writing.
Activity: Try sending a message using only predictive text. It's possible to create a whole message only using the suggested words proposed by predictive text. Thanks to NLP, these words will be unique and tailored to you and can create some very funny (and revealing) messages!
Language apps
Natural language processing has made huge improvements to language translation apps. It can help ensure that the translation makes syntactic and grammatical sense in the new language rather than simply directly translating individual words.
Fig 3. Language translation as we know it today wouldn't be possible without NLP
The Social Impact of Natural Language Processing
In 2016, the researchers Hovy & Spruit released a paper discussing the social and ethical implications of NLP. In it, they highlight how up until recently, it hasn't been deemed necessary to discuss the ethical considerations of NLP; this was mainly because conducting NLP doesn't involve human participants. However, researchers are becoming increasingly aware of the social impact the products of NLP can have on people and society as a whole.
Here are some of the main issues they identified:
Exclusion - NLP may learn from dominant cultures, making it easier to use and more appropriate for those from those dominant cultures.
Overgeneralization - NLP may lead to software making widespread assumptions about things like our gender, age, religion, and sexual orientation.
Bias - Most NLP tools focus on English and can therefore produce more rich data for English speakers than for others.1
Natural Language Processing - Key takeaways
- Natural language processing (NLP) is a branch of artificial intelligence (AI) that assists in programming computer software to 'learn' human languages.
- Natural language processing has roots in linguistics, computer science, and machine learning.
- NLP uses AI to take in real-world human language and perform processing tasks to turn the language into code the computer will understand. There are two parts to this process: pre-processing and algorithm development.
- Pre-processing involves categorizing language into data an algorithm can work with. Common pre-processing techniques include syntactic analysis (e.g., parsing, stemming, and text segmentation), and semantic analysis (e.g., sentiment analysis and disambiguation).
- We can see examples of NLP in predictive text, email filters, language learning apps, virtual assistants (e.g., Siri), and more.
References
- D. Hovy & S. L. Spruit. The social impact of natural language processing. 2016.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel