Topic Modeling: Working of LSA (Latent Semantic Analysis) in simple terms ~ VIJAY YADAV

Tuesday, December 6, 2022

Topic Modeling: Working of LSA (Latent Semantic Analysis) in simple terms

By VIJAY YADAV December 06, 2022 No comments

LSA (Latent Semantic Analysis) is another technique used for topic modeling. The main concept behind topic modeling is that the meaning behind any document is based on some latent variables so we use various topic modeling techniques to unravel those hidden variables i.e., topics so that we can make sense of the given document. LSA is mostly suitable for large sets of documents. It converts the documents into a document term matrix before actually deriving topics from the documents.

Working of LSA:

The given text is converted into the document-term matrix using either bag of words or the Term Frequency- Inverse Document Frequency.
Then, using Truncated Singular Value Decomposition (SVD). It is at this stage the topics within the documents are identified. Mathematically, it can be given as,

Though it may look difficult to understand at first glance, in simple terms what the above formula represents is that it simply decomposes a high dimensional matrix into smaller matrices i.e., u, s, and v, where,

A = n*m document-term matrix (n = no. of documents and m = no. of words)

U = n*r document-topic matrix (n = no. of documents and r = no. of topics)

S = r*r matrix (r = no. of topics)

V = m*r word-topic matrix (m = no. of words and r = no. of topics)

Finally, we can now classify which document belongs to which topics.

Topic Modeling using Latent Semantic Analysis

Schematic diagram of LSA algorithm

VIJAY YADAV

Tuesday, December 6, 2022

Topic Modeling: Working of LSA (Latent Semantic Analysis) in simple terms

0 comments:

Post a Comment

About Me

My Youtube Channel

Badges Earned

LinkedIn Profile

Total Pageviews

Labels

Blog Archive