Tsert::Search©® is a stand-alone customizable search engine developed in C++. It indexes contents of Web sites, and local disks or file systems, and allows searching for information using keywords. The technology of Tsert::Search is used in Breeze::OS©® and Tsert::Ferret©®
Tsert::Search improves the accuracy of content search by using several components and features:
Most, if not all companies involved in the Web Search Engine
business, talk of word clustering as a viable approach to content
search.
A semantic network is a better
technology for content search.
Word clustering, relies on a stastitical analysis of words in relation to other words, in a particular cluster. Sometimes, a hit may be returned where the keyword was found in relation with an another word that does not necessarily match the query; see Google and word clustering.
With a semantic network, such hits would never be returned, because the associations are not based on statistics but solely on the semantic relationship between words. Additionally, a semantic network can re-balance itself, by increasing or decreasing the weight assigned to the links between words, depending on whether or not the user was satisfied with a particular search query.
The links or associations, in question, are not the original semantic
ones but the ones constituting the collection of links found in the
net. A search query can therefore cause a relative
term, even one with a low weight, to increase its relevance to
that of a synonym, but the weight of its original semantic link
or association, stays the same.
The Tsert NLP
Engine, also used for translation, is used to do Content Analysis.
See our preliminary Results.
The content analysis is done on unstructured text, i.e. tag information is not taken into account. Content analysis also relies on semantic networks and not clustering information, such as IBM's WebFountain.
The relevance of our search results is increased;
because the content information is taken into account, when doing
keyword searches.
There are other approaches in the
use of semantic networks for content search. The one which seems
capable to
satisfy search relevance requirements is the latent-semantic indexing approach
used by NITLE.
Our approach, we believe
surpasses theirs. It relies on natural language understanding -- the
way a person understands language -- and not mathematical constructs
with graphs, networks, or clustering. It
offers the possibilty of looking for information
according to actual content -- the terms specified by the user are
assumed
to refer to the subject of the indexed pages, i.e. what the pages seem to be
about, to a human reader.
Our system will make possible searches (for companies interested in knowing the identity of angry customers from
emails) such as: