What is a noggle library?
The Noggle library functions are based on Lucene, an open source, highly scalable text search-engine library available from the Apache Software Foundation. Web sites like Wikipedia and LinkedIn have been powered by Lucene.
Noggle brings the best availabe search and indexing technology right to your desktop, the Noggle App.
Based on Lucene in the back, Noggle is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead – the “noggle library”. This would be the equivalent of retrieving pages in a book related to a keyword by searching the index at the back of a book, as opposed to searching the words in each page of the book.
Noggle library tools focus mainly on text indexing and searching. It is the core element that is used to build different search capabilities. Based on Lucene, the noggle library core has many features. It:
- Has powerful, accurate, and efficient search algorithms.
- Calculates a score for each document that matches a given query and returns the most relevant documents ranked by the scores.
- Supports many powerful query types, such as PhraseQuery, WildcardQuery, RangeQuery, FuzzyQuery, BooleanQuery, and more.
- Supports parsing of human-entered rich query expressions.
- Allows users to extend the searching behavior using custom sorting, boosting and extending search ideas.
- Uses a file-based locking mechanism to prevent concurrent index modifications.
- Allows searching and indexing simultaneously.
The Noggle library core lets you index any data available in textual format. Therefore, Noggle uses pre-processing and parsing techniques to extract the plain text from different source formats like Word, PowerPoint, Excel, PDF files and other formats. Noggle can be used with almost any data source as long as textual information can be extracted from it. The first step of noggle before building the library by indexing the data is to make it available in simple text format. Noggle uses custom parsers and data converters; mainly based on the Microsoft IFilter technology.
Indexing is a process of converting text data into a format that facilitates rapid searching. A simple analogy is an index you would find at the end of a book: That index points you to the location of topics that appear in the book.
Noggle stores the input data in a data structure called an inverted index, which is stored on the file system or memory as a set of index files. Most Web search engines use an inverted index. It lets users perform fast keyword look-ups and finds the documents that match a given query. Before the text data is added to the index, it is processed by an custom noggle analyzer.
The analyzer is converting the text data into a fundamental unit of searching, which is called as term. During analysis, the text data goes through multiple operations: extracting the words, removing common words, ignoring punctuation, reducing words to root form, changing words to lowercase, etc. Analysis happens just before indexing and query parsing. Analysis converts text data into tokens, and these tokens are added as terms in the Noggle library index.
As a result, a high-performant library is created which can be shared with your peers to execute search request in milliseconds over the full content. The indexing and library building process is not only providing fast search results – it also provides relevant ranking scores back to the search results.
Once your decide to share a noggle library with one of your peers, the library will be encrypted and obfuscated once it leaves your client to the noggle network. Only the named peer is available to decrypt the library – so your library is always secure in the noggle network.