N-gram is an N sequence of words. It can be a unigram (one word), bigram (sequence of two words), trigram (sequence of three words), and so on. It focuses on a sequence of words. Such a method is very useful in speech recognition and predicting input text. It helps us to predict the next words that could occur in a given sequence. Search engines also use the n-gram technique to predict the next word while a search query is typed in a search bar.
Let us consider a sentence:
“This is going to be an amazing experience.”
Unigram for the above sentence can be written as:
- "This"
- "is"
- "going"
- "to"
- "be"
- "an"
- "amazing"
- "experience"
Bigram for the above sentence can be written as:
- “This is”
- “is going”
- “going to”
- "to be"
- "be an"
- "an amazing"
- "amazing experience"
Trigram for the above sentence can be written as:
- "This is going"
- "is going to "
- "going to be"
- "to be an"
- "be an amazing"
- "an amazing experience"
With n-grams, we can use a bag of n-grams (for example, a bag of bigrams) instead of only using a bag of words. A bag of bigrams or trigrams is more powerful than using just a bag of words as it takes the context into consideration as well.