clustering Archives

OnlineHelp Settings Map Tab

June 6, 2016/0 Comments/in Cognitive Tools, KnowledgeBase /by LvT

How to fine-tune the KnowledgeMap cognitive AI clustering algorithm

You can specifiy how the cluster lables should be generated and which lables should be excluded by the algorithm. Please go to the Settings -> Map section.

noggle_settings_map

Strong Cluster Label [Enabled/Disabled]:

This attribute may be useful when certain words appear in most of the input documents (e.g. company name from header or footer) and such words dominate the cluster labels. In such case, enabled strong cluster lables may improve the clusters.

Another useful application of this attribute is when there is a need to generate only very specific clusters, i.e. clusters containing small numbers of documents. This can be achieved by enabling strong cluster lables.

Stopwords:

A stopword is a word that has little meaning by itself. For example, the, a, then, and towards are stopwords for all English documents. A stopword can never appear by itself as a cluster label, although it might be used within a label, depending on the stoplabel settings.

Stoplables:

If a KnowledgeMap label includes one of the stop labels, the label will not appear on the map of clusters produced by Noggle KnowledgeMap.

Cognitive Search Engine: How To Overcome The Knowledge Disconnect

May 30, 2016/0 Comments/in Cognitive Tools, KnowledgeBase /by LvT

How To Overcome The Big Knowledge Disconnect With Cognitive AI: Cognitive Search Engine

Our cognitive search engine with cognitive document retrieval features knocks down barriers between you and your documents. Use our natural and contextual search features that augment users’ experiences via the power of machine-based AI. Plug them in and stop searching – start knowing.

Document Recommendations

The document recommendation engine can detect all related documents for a given document. If a document is selected from the search results, the engine pulls up all related or similar documents from available libraries regardless of the filename or file type. Our recommendation intelligence is based on full-text/content-similarity deep-search algorithms. It can even pull up new versions of existing documents that have been edited by your colleagues and saved in completely different locations. You can’t locate these documents with simple search queries on your own. For example, imagine that you find an old PowerPoint document and you want to see the latest version of the document and its Excel calculation sheet. They might be anywhere on the network, but our recommendation engine detects them instantly. [read more…]

Cross-Library Search

The managed library-sharing feature enables organizations to make their documents retrievable by approved people through distributed-search functions. With this feature, users can easily and quickly retrieve useful, relevant documents stored elsewhere on the network or on local computers. The cross-library search saves time and helps avoid the high cost of reinventing the wheel when a document exists somewhere else but cannot be located locally. The embedded “request document” function makes knowledge sharing as simple and secure as sending emails. Cross-library searches speed up the retrieval process and make document retrieval a collaborative activity via our cognitive search engine.

Topic Detection and KnowledgeMap Clustering

One search aid that helps information workers to retrieve relevant content from large content libraries is clustered cross-document relationship information. This cognitive search service returns visually enriched content topics for all documents in the current search results. It helps to overcome information overload by organizing collections of documents into clearly labeled, hierarchical, thematic clusters in real time, fully automatically, and without external knowledge bases. Instead of browsing linear search results, the KnowledgeMap is a cognitive, non-supervised search-result visualization tool that presents essential information about the structure of topics within search results. The clustering algorithm scans internal relationships and linguistic patterns among all the documents found. In doing so, it unearths new groups or cross-document relationships that might guide you to new, interesting topic areas that enhance the initial search request. The amount of time users spend trying to make sense of long lists of search results is shortened dramatically. With clearly labelled folders, users can navigate straight to the documents they need and easily skip irrelevant ones.

Topic Exploration Service

With the KnowledgeMap topic clustering engine, query refinement is just a mouse click away. Topic clusters generated by our cognitive KnowledgeMap can help users refine their initial queries and drill down to a specific subject. This cognitive feature allows users to automatically rephrase search queries to pull relevant documents out of the selected topic clusters.

Intelligent Duplicate Filtering

As documents got copied, shared, and reorganized over time, more copies of the same file become available in different folders. These files generate “noise” in your search results and make the results list inconvenient to read and browse if it includes duplicate files from different locations. Our duplicate filter is intelligent enough to keep only the version to which you have direct access. For example, a file might be available three times in your libraries: in the library on your local computer, in a network library (e.g., archive), and in a shared library from your colleague. Our intelligent duplicate filter shows you the file to which you have direct access and filters out the duplicate network file and the duplicate file from your colleague. You can always be sure you’re finding the smartest way to access the document without noisy search result listings.

Recent Work Linking

This feature extends the recommendation engine to drill down into your recent work. Our recent-work-linking algorithm scans your recently used files (e.g., Word documents and presentations you have recently worked on) and scans all available libraries for similar and related documents. Noggle presents a recommendation list for relevant files in your libraries that are related to your current work. In the blink of an eye, this cognitive search engine feature presents all documents from your libraries that might help you during your work activities.

Intelligent Open Engine

Noggle is not built on absolute storage paths. The proprietary Noggle document fingerprint holds all the content/full-text-based information needed to retrieve document regardless of file-naming conventions and storage locations. You can move files during the lifecycle, and the “intelligent open” engine for the document fingerprint will always try to locate the document and open it. This cognitive feature attempts to locate files via different mechanisms. First, the absolute file path is tested. Then, a similarity search is performed to locate a duplicate or similar version in your libraries. Finally, if not found, a document request is sent to the file owner if the file cannot be located in your environment. You can always get to the document no matter whether you have it, it has been moved, or it is part of a shared library. With just one click, the cognitive intelligent open feature guides to the physical document.

Image Text Recognition

The text recognition engine reads text from image files. Optical character recognition (OCR) detects text in an image and extracts the recognized words into an indexed character stream. This feature analyzes images to detect embedded text, generate text, and enable searching. This allows you to scan or take photos of important printed documents and save them in an indexed folder (e.g., simple TIF scans from printed “paper” documents). If these scanned or photographed files are included in a special library, our text recognition makes them retrievable via simple text searches.

Encyclopedia Document Trails

This service allows users to generate topic-specific document trails just by dragging and dropping a document from a library or search result into a Nogglepedia topic. Once you drag and drop a relevant document out of your Noggle library and into a specific Nogglepedia, a proprietary document “fingerprint” is generated. This service isn’t based on moving or sharing the document itself; the document fingerprint holds all the relevant information in an enriched, compressed format. These digital document encyclopedias can be privately shared in the managed Noggle network to empower swarm intelligence, such as research groups collecting fingerprints from private or corporate documents. These fingerprints bundle the available knowledge on special subjects. From each fingerprint in a Nogglepedia, all our cognitive search engine and retrieval services, such as recommendations, can be executed with just a mouse click.

Drop-In document linking

This service allows users to drag and drop any available document into the Noggle application to retrieve related documents, such as an email attachment or a local file. This file might not be part of any indexed library, but Noggle instantly scans the document and performs a concept-based full-text search within your document libraries. Therefore, you can drop any file into the Noggle client application, and this cognitive search engine and document retrieval service will perform full-text concept matching

Additional License Information:
You need a professional license for the following services:
Shared Cross-Library Search, Intelligent Open/Document Request, Collaborative Encyclopedia Document Trails

Further reading and references on “Cognitive Search Engine”:

What is the NoggleMap or KnowledgeMap?

November 14, 2015/0 Comments/in Cognitive Tools, KnowledgeBase /by LvT

The Noggle “KnowledgeMap,” a search result visualization tool, provides users with essential information about the structure of topics that appear within the search results. The Noggle clustering algorithm scans internal relations and linguistic patterns among all the documents according to how similar they are to the initial search request. This tool can unearth new groups or cross-document relationships, which might guide users to new, interesting areas that build upon their initial search request. Clustering is one of many methods that can be used to make searching collections of documents easier.

We have often heard users demand such clustered cross-document relationship information, likely because they become frustrated with the constantly growing document volume and fragmented data storage solutions they encounter in the cloud and other big data services.

Please review the following video tutorial:

[embedyt]http://www.youtube.com/watch?v=vZdNdJZrpn4[/embedyt]

A detailed knowledge base article on our NoggleMap search feature can be found here:

Document Clustering with KnowledgeMaps

November 14, 2015/0 Comments/in Cognitive Tools, KnowledgeBase /by LvT

[layerslider id=”4″]

Cognitive-guided, non supervised document clustering

NoggleMap Search Document Clustering

One of the most common problems people used to encounter when searching for information is that they could not find documents specifically related to what they were looking for. Nowadays, this task is quite successfully handled by standard search applications.

Thanks to these sorts of search engines, pulling up results has become easy. However, when it comes to explaining the search results or displaying specific details on what sort of results have been returned, users’ options are much more limited. Usually, a search application displays a ranked list of documents and a snippet of their contents. These ranked lists are helpful for document retrieval, but far away from knowledge management. Information about the internal relationships among the documents in the search results is often not provided by standard search algorithms.

Search Document Clustering

“Search result clustering” is defined as an automatic, non-supervised grouping of similar documents in a search hits list returned from a search engine. Clustering is one of many methods that can be used to make searching collections of documents easier.

So, the Noggle “KnowledgeMap,” a search result visualization tool, provides users with essential information about the structure of topics that appear within the search results. Furthermore, the Noggle clustering algorithm scans internal relations and linguistic patterns among all the documents according to how similar they are to the initial search request. This tool can unearth new groups or cross-document relationships, which might guide users to new, interesting areas that build upon their initial search request.

We have often heard users demand such clustered cross-document relationship information, likely because they become frustrated with the constantly growing document volume and fragmented data storage solutions they encounter in the cloud and other big data services.

Problem with ranked search lists

To illustrate the problems with conventionally ranked search result lists, let’s imagine a user wants to find information about “security.” Therefore, he or she starts with the simple search term “security.”

First, the user selects peer libraries that might be relevant. In this example, the user has libraries from three different peers. In addition, the user selects six of his own libraries to perform the search request.

Figure 1: Search results for search term “security” on nine libraries from four different owners

Figure 1 shows that the search included 27,616 documents and returned 1,500 top-ranked documents related to “security.” Obviously, this is a very general query that leads to a large number of hits. Therefore the majority will be about information security, system security, or security policies based on a library for “Information Technology”.

A determined user patient enough to sort through results ranking 100 or lower should be able to find some hits on topics like “access control” or “service continuity.” However, one problem with ranked lists is that sometimes users need to wade through irrelevant documents to get to the ones they want.

Grouping results into semantic cluster via document clustering

But what about an interface that groups search results into separate semantic topics? Like network security, data security, access control, service continuity, and so on? And what if these groups were decided automatically from their own internal content—not by biased methods where someone defines what might be important?

By generating groups like this, the user will immediately get an overview of what the results contain and should be able to pick out relevant documents with much less effort.

The following figure shows how the NoggleMap feature automatically detects cross-document relations based on linguistic patterns. The left part of the screen shows the clusters and the number of documents related to that cluster. The right panel shows a visual representation of that information.

Figure 2: Clustered search results for “security” via the Noggle KnowledgeMap document clustering service

All 1,500 documents are linked to one or more of these clusters. This way, users don’t need to browse through a ranked list from the top down—they can narrow down the major cluster they are looking for and go from there.

In order to be helpful, search result clustering must organize similar results into one group. This is the primary requirement for all document clustering algorithms. But in search result clustering, the clusters labels are also extremely important. The program must accurately and concisely describe the cluster’s contents so that users can decide if the information is relevant.

Start with generic search terms first

Since users are often unaware of all their choices in a search, they do not always know the exact phrase they should search for. Thus, starting with a more generic search makes sense. Let the artificial intelligence of the Noggle search engine detect knowledge clusters based on the cross-document linguistic patterns. The visual guide then allows the user to quickly focus on the results of interest by visually selecting the relevant clusters.

This kind of interface for search results is implemented by applying a variety of document clustering techniques to the results returned. This is something that we call the Noggle “KnowledgeMap” and “ClusterSearch” technique.

The user can now select the cluster “Access Control” and browse the relevant documents from the initial request on “security”. And later focus in on the associated documents.

Figure 3: Document list in the security cluster “Access Control” from the overall search results

This makes document retrieval over different libraries and document search spaces much more efficient. By using “generic” search terms first, Noggle builds clusters for users, who can then narrow down their area of interest and check relevant documents there. Using Noggle this way is not just about searching for documents. Finally, it is a full, non-supervised knowledge management approach to retrieving knowledge that matters. Without the need to know exact phrases and exactly which documents they appear in.

Video Example

The following live presentation showcases the document clustering for included TED Talk digital library. All maps are build by the Noggle client based on the standard application (2min.):

[embedyt] http://www.youtube.com/watch?v=YMHxWGLddjE[/embedyt]

The NoggleMap feature combines latest technolgies based on Text parsing, Microsof Azure, Apache Lucene, Carrot2 Project, Noggle pre- and post-processing algorithms and the Noggle network. Patent pending.

Tag Archive for: clustering

Posts