SAS® Text Analytics and Text Mining in Action: Experiences From a ‘Self-Trial’ With SAS® Contextual Analysis : 
But let’s start from the beginning.
Functions and capabilities of SAS® Contextual Analysis
If you take a look at the product description of SAS Contextual 
Analysis, you learn that you can use it to analyze large collections of 
text documents, identify sentiments, and create robust models to 
categorize and extract content. This allows you to automatically 
identify topics in your document collections and define categories and 
rules in natural language to assign documents to these categories.
Sure, the small number of 59 documents is not really a “big data problem,” and the SAS In-Memory Analytics engine
 can also deal with millions of documents. However, I was interested to 
see whether SAS Contextual Analysis can identify topics in my book 
chapters and which book chapters should be combined into the same 
cluster. And no a priori knowledge from me as an author would be used 
for the categorization.
Text analytics processing with SAS® Contextual Analysis
| Illustration of underlying topics in the documents | 
From a data mining point of view, we are dealing here with a typical 
unsupervised analysis. Just the data are presented to the analytic tool,
 and no
additional information of segment assignment is available. SAS 
Contextual Analysis imports the data, one file per chapter, from a 
folder on my hard disk and runs through the entire process of text 
analytics:
- Document parsing and assigning the words to different entities (noun, verb, etc.).
- Synonym detection and the application of stop lists to remove redundant words like “the,” “and,” “of,” “with,” “we,” etc.
- The weighting of the terms and the identification of those terms that are important to define groups of documents.
- Automatic detection of underlying topics in the documents.
It works! Eight clearly separated document clusters as a result
For better illustration, I have used weights of the automatically 
detected topics for each of the 59 documents to cluster them with SAS® 
Enterprise Miner™. Eight clusters were automatically detected, which are
 presented in the table below.
For better visualization, the chapters of the “Data Quality Book” are shown in green and the chapters of the “Data Preparation Book” are shown in yellow.
You can easily see how the chapters grouped to clusters based on content. Some clusters only contain chapters from one book:
- Cluster 1 contains those chapters from the Data Quality Book that deal with the topic of missing values.
- Cluster 7 contains the simulation studies that are described in chapters 15-23 of the Data Quality Book.
SAS Text Analytics automatically detected 8 clusters in the 2 books
Some clusters contain chapters from both books:
- Cluster 8 contains chapters from the Data Preparation Book that deal with analytics data mart structures. And Appendix E in the Data Quality Book is a summary of the content of these chapters. This is an impressive example of documents only grouped based on their content. And chapter content that is considered to be “close” or “similar” is truly detected as such.
The different number of documents per cluster also show that no fixed
 clustering scheme is used here, but that the document content defines 
how the groups are set up and how they are populated.
- Cluster 4 only contains a single chapter. This chapter is an introduction to a collection of case studies and obviously does not compare with other chapters in the books.
Moving on to new business cases
These results convinced me even more that SAS Contextual Analysis 
allows you to gain insight into your document collections. You learn 
what your customers think and write about your company or organization. 
You see the topics that are contained in your documents and how you can 
automatically group them without having to read every single document.
Epoch Research Institute Links:
https://www.linkedin.com/company/epoch-research-institute-india-pvt-ltd-
Email us: info@epoch.co.in
Source :
http://blogs.sas.com/content/text-mining/2016/02/19/sas-text-analytics-and-text-mining-in-action-experiences-from-a-self-trial-with-sas-contextual-analysis/
Email us: info@epoch.co.in
Source :
http://blogs.sas.com/content/text-mining/2016/02/19/sas-text-analytics-and-text-mining-in-action-experiences-from-a-self-trial-with-sas-contextual-analysis/
SAS Training & Placement
Programs with Internship: Epoch Research Institute India Largest and Oldest
#SASTraining Institute (#epochsastraining)
EPOCH RESEARCH INSTITUTE OFFERS:
Authorized SAS TRAINING | SAS
CERTIFICATION | SOFTWARE PURCHASE | BUINESS CONSULTING | TECHNICAL SUPPORT ON
SAS || SAS STAFFING SOLUTION 
Label:
#SASELEARNING,#SASELEARNING,#SASONLINETRAINING,
#SASONLINETRAININGFORBEGINNERS,#LEARNSASPROGRAMMINGONLINE,
#SASCLINICALONLINETRAINING,#SASBASEONLINETRAINING
#BIGDATASASTRAININGEPOCH,#SASBIGDATATRAINING
#EPOCHRESEARCHINSTITUTE, #SASTRAINING, EPOCH SAS FEEDBACK,

 
 
 
No comments:
Post a Comment