SAS® Text Analytics and Text Mining in Action: Experiences From a ‘Self-Trial’ With SAS® Contextual Analysis :
But let’s start from the beginning.
Functions and capabilities of SAS® Contextual Analysis
If you take a look at the product description of SAS Contextual
Analysis, you learn that you can use it to analyze large collections of
text documents, identify sentiments, and create robust models to
categorize and extract content. This allows you to automatically
identify topics in your document collections and define categories and
rules in natural language to assign documents to these categories.
To better
understand the processes and the outcome of text analytics with SAS
Contextual Analysis, I used a document collection that is close to my
heart and that I know in great detail: the 59 chapters of my two SAS
Press books, Data Preparation for Analytics Using SAS and Data Quality for Analytics Using SAS.
Sure, the small number of 59 documents is not really a “big data problem,” and the SAS In-Memory Analytics engine
can also deal with millions of documents. However, I was interested to
see whether SAS Contextual Analysis can identify topics in my book
chapters and which book chapters should be combined into the same
cluster. And no a priori knowledge from me as an author would be used
for the categorization.
Text analytics processing with SAS® Contextual Analysis
Illustration of underlying topics in the documents |
From a data mining point of view, we are dealing here with a typical
unsupervised analysis. Just the data are presented to the analytic tool,
and no
additional information of segment assignment is available. SAS
Contextual Analysis imports the data, one file per chapter, from a
folder on my hard disk and runs through the entire process of text
analytics:
- Document parsing and assigning the words to different entities (noun, verb, etc.).
- Synonym detection and the application of stop lists to remove redundant words like “the,” “and,” “of,” “with,” “we,” etc.
- The weighting of the terms and the identification of those terms that are important to define groups of documents.
- Automatic detection of underlying topics in the documents.
It works! Eight clearly separated document clusters as a result
For better illustration, I have used weights of the automatically
detected topics for each of the 59 documents to cluster them with SAS®
Enterprise Miner™. Eight clusters were automatically detected, which are
presented in the table below.
For better visualization, the chapters of the “Data Quality Book” are shown in green and the chapters of the “Data Preparation Book” are shown in yellow.
You can easily see how the chapters grouped to clusters based on content. Some clusters only contain chapters from one book:
- Cluster 1 contains those chapters from the Data Quality Book that deal with the topic of missing values.
- Cluster 7 contains the simulation studies that are described in chapters 15-23 of the Data Quality Book.
Some clusters contain chapters from both books:
- Cluster 8 contains chapters from the Data Preparation Book that deal with analytics data mart structures. And Appendix E in the Data Quality Book is a summary of the content of these chapters. This is an impressive example of documents only grouped based on their content. And chapter content that is considered to be “close” or “similar” is truly detected as such.
The different number of documents per cluster also show that no fixed
clustering scheme is used here, but that the document content defines
how the groups are set up and how they are populated.
- Cluster 4 only contains a single chapter. This chapter is an introduction to a collection of case studies and obviously does not compare with other chapters in the books.
Moving on to new business cases
These results convinced me even more that SAS Contextual Analysis
allows you to gain insight into your document collections. You learn
what your customers think and write about your company or organization.
You see the topics that are contained in your documents and how you can
automatically group them without having to read every single document.
Source :
http://blogs.sas.com/content/text-mining/2016/02/19/sas-text-analytics-and-text-mining-in-action-experiences-from-a-self-trial-with-sas-contextual-analysis/
http://blogs.sas.com/content/text-mining/2016/02/19/sas-text-analytics-and-text-mining-in-action-experiences-from-a-self-trial-with-sas-contextual-analysis/
Epoch Research Insitute Links:
Email us: info@epoch.co.in
SAS Training & Placement Programs with Internship: Epoch Research Institute India Largest and Oldest #SASTraining Institute (#epochsastraining)
EPOCH RESEARCH INSTITUTE OFFERS:
Authorized SAS TRAINING | SAS CERTIFICATION | SOFTWARE PURCHASE | BUINESS CONSULTING | TECHNICAL SUPPORT ON SAS || SAS STAFFING SOLUTION
Label:
#SASELEARNING,#SASELEARNING,#SASONLINETRAINING,
#SASONLINETRAININGFORBEGINNERS,#LEARNSASPROGRAMMINGONLINE,
#SASCLINICALONLINETRAINING,#SASBASEONLINETRAINING
#BIGDATASASTRAININGEPOCH,#SASBIGDATATRAINING #EPOCHRESEARCHINSTITUTE, #SASTRAINING, #EPOCH SAS FEEDBACK,
#CLINICALSASPROGRAMMING, #EPOCHCLINICALSASPROGRAMMING.
#CLINICALSASPROGRAMMING, #EPOCHCLINICALSASPROGRAMMING.
No comments:
Post a Comment