Tsert.Com

Tsert.Com Search©®

Tsert::Search©® is a stand-alone customizable search engine developed in C++. It indexes contents of Web sites, and local disks or file systems, and allows searching for information using keywords. The technology of Tsert::Search is used in Breeze::OS©® and Tsert::Ferret©®

Tsert::Search improves the accuracy of content search by using several components and features:

Tsert::Search provides an easy method of building a semantic network Tsert::ENet ©®, with the help of our toolkit Tsert::ENetKit. It also provides, through the use of a semantic network, searching methods based on thesaurus, sound-alike, and synonim terms. It may, in the future, provide a way to do approximate searches, that is looking for terms based on only part of the specified keywords.

Most, if not all companies involved in the Web Search Engine business, talk of word clustering as a viable approach to content search.

A semantic network is a better technology for content search.

Word clustering, relies on a stastitical analysis of words in relation to other words, in a particular cluster. Sometimes, a hit may be returned where the keyword was found in relation with an another word that does not necessarily match the query; see Google and word clustering.

With a semantic network, such hits would never be returned, because the associations are not based on statistics but solely on the semantic relationship between words. Additionally, a semantic network can re-balance itself, by increasing or decreasing the weight assigned to the links between words, depending on whether or not the user was satisfied with a particular search query.

The links or associations, in question, are not the original semantic ones but the ones constituting the collection of links found in the net. A search query can therefore cause a relative term, even one with a low weight, to increase its relevance to that of a synonym, but the weight of its original semantic link or association, stays the same.

The Tsert NLP Engine, also used for translation, is used to do Content Analysis.

See our preliminary Results.

The content analysis is done on unstructured text, i.e. tag information is not taken into account. Content analysis also relies on semantic networks and not clustering information, such as IBM's WebFountain.

The relevance of our search results is increased; because the content information is taken into account, when doing keyword searches.

There are other approaches in the use of semantic networks for content search. The one which seems capable to satisfy search relevance requirements is the latent-semantic indexing approach used by NITLE.

Our approach, we believe surpasses theirs. It relies on natural language understanding -- the way a person understands language -- and not mathematical constructs with graphs, networks, or clustering. It offers the possibilty of looking for information according to actual content -- the terms specified by the user are assumed to refer to the subject of the indexed pages, i.e. what the pages seem to be about, to a human reader.

Our system will make possible searches (for companies interested in knowing the identity of angry customers from emails) such as:

"find me all emails whose subject is anger" , and
"find me all emails where anger is expressed".

Phrase-structure analysis can deliver answers to the first question, but a semantic network is required to provide answers to the second query.