Text Miner: The Flagship of SAS Analytics
The Text Miner software from SAS uses the term “import node” to describe the portal through which collected textual data is routed into a “working directory,” if you will, for the purpose of storing all natural language content upon which their text mining software will perform its analytics. This working directory is called the data set for a particular analysis project.
This import and handling process that creates the data set is used on any in-house repositories of textual data. However, SAS also uses retrieval algorithms to crawl web pages and collect textual content from any location on the Web, particularly social media outlets and news feeds. And, SAS reminds us that their text mining tools can be used in any of 27 languages and 14 dialects.
Mutliple Main Functions
Text Miner categorizes its functionality in four different processes that match the most commonly used analytic subroutines. These processes can be used in any combination to provide specific analysis results. Previously defined synonym data sets can be imported and reduced by filtering then holding for further analysis.
There are text topic views which allow the creation and inspection of user-specified topics. Text Miner uses a concept link diagram that provides a visual relationship between terms. There are process-flow diagrams that allow spontaneous modification of the text mining process and which can be saved and shared collaboratively. And, there is a report function that allow analytical results to be published and disseminated in HTML format.
Text nodes are useful text mining tools that can easily interface with other SAS Text Miner nodes. Text nodes are scalable to allow the customization of algorithms or the declaration of new user-created rules for purposes of predictive modeling and reporting.
The text parsing node allows for the elimination of post-analysis informational segments that are of little or no value. The parsing node also performs automated spelling correction and automatic stemming that will identify root words. Such stemming will also provide compound word splitting into discrete sub-terms. Other features allow user-defined common multi-word phrases, such as “cause and effect” or “second generation language” to be isolated and/or eliminated.
The parsing node also performs automatic part-of-speech tagging that is influenced by the context of the particular sentence being analyzed. “Competitive intelligence” is done using noun group extraction and the identification of phrase-level concepts.
And finally, the parsing node performs singular value decomposition (SVD) of documents into multi-dimensional storage areas where any tow documents or segments within that area can be compared as to similarity (near identical topic/structure) or diversity (non-identical entities).
Post-parsing catalog function that summarizes documents and vocabularies with metrics such as frequency counts. The node automates spell checking by mapping misspelled words to the terms from which they were misspelled. Filter functions also include distinguishing unimportant terms, acronyms, and abbreviations for elimination or quarantine. It also contains a taxonomy browser to display automatically-generated topics or it allows the creation of user-defined topics.
In short, the SAS Text Miner provides a rich suite of linguistic and analytical modeling tools for discovering and extracting knowledge from across a range of text collections. The intent of their text mining tools is to enhance the power and sophistication of analytics in a way that could be made simpler, faster, and more intuitive and repeatable.