estcmd gather -cl -fx.1,.2,.3,gz,gz,gz [email protected] -fz -sd -cm casket. estcmd gather -cl ,.xls,.ppt [email protected] -fz -ic UTF-8 -sd -cm casket. That would require selecting those files (“estcmd gather” would normally ignore them in reading a directory), feeding that list to the indexer, and delegating their. 年10月12日 “estdb”: estraierのdbを保持している場所; “dir”: 検索対象の文書が置かれている ディレクトリ; “gatherarg”: estcmd gather 時に使われる引数 (なけれ.
|Published (Last):||19 October 2005|
|PDF File Size:||19.43 Mb|
|ePub File Size:||8.24 Mb|
|Price:||Free* [*Free Regsitration Required]|
This guide describes detail of how to use applications of Hyper Estraier. If you have never read the introduction documentplease read it beforehand. Hyper Estraier is a full-text search system using index database. So, before search, it is needed to prepare an index into which target documents have been registered.
The former is used in order to administrate the index by command line interface. The latter is used in order to search the index for documents with a web browser. How to use it is described in this guide. Hyper Estraier supports such various methods for search as combining some search phrase and search with attributes of documents. Moreover, it is possible to customize presentation according to the configuration of estseek.
How to do it is described in this guide. Not only information of the body text but also such attributes as the title, the modification date, and so on can be added to documents handled by Hyper Estraier.
Attributes are used for such various purposes as search with attributes and determination of difference updating.
Any attribute has a name. As the name can be determined arbitrarily, some names are reserved for being used as system attributes. Names of system attributes begin with ” “. There are the following system attributes. The other attributes except for system attributes are called user-defined attributes. They can be defined by document draft said later. There are two data types for attributes; string and number. Data of the string type are arbitrary strings.
There are such operations as full matching, forward matching, backward matching, partial matching. Data of the number type are numbers or date information. A string of the number type is converted into the number and calculated according to the following formats. The data type is not determined when registration. It is determined when search. Length of the value of an attribute is not limited. Attributes and the body text of a document should be expressed in UTF-8 encoding. If another encoding is used, it should be converted into UTF By the way, estcmd detect the encoding automatically if it is not clearly specified.
However, if a document defines its own URI, it comes first. The encoding of the value of each attribute is normalized as UTF This section describes how the four are processed.
A document of plain-text is composed of strings with no structure. By default, files whose names end with “.
Document draft is a original format of Hyper Estraier. It is possible to handle various formats in the integrative way by using document draft as intermediate format. Though format of document draft is similar to RFC, detail points differ.
The delimiter for headers is not “: The following is an example data to handle a MIDI document.
estcmd command man page – hyperestraier | ManKier
The format of the keyword vector is TSV. Keywords and their scores come alternately. A hidden text is the same as normal text except not displayed in the snippet of the result.
It is useful to search with some attributes. Two kinds of search conditions are supported. One is for full-text search and the other is for attribute search.
If both are specified at the same time, documents corresponding to the both are searched for. Moreover, usual form, simplified form and so on are supported for full-text search condition. The purpose of full-text search expression is to search for documents including some specified words. For example, if you search for documents including a word ” computer “, specify ” computer ” in the search phrase as it is.
You can specify two or more words. For example, if you specify ” United Nations “, documents including ” united ” followed by ” nations ” are searched for. In case of simplified form, specify the following. Intersection operation is supported by the ” AND ” operator. For example, if you specify ” internet AND security “, documents including both of ” internet ” and ” security ” are searched for.
For example, if you specify ” hacker ANDNOT cracker “, documents including ” hacker ” but not including ” cracker ” are searched for.
Union operation is supported by the ” OR ” operator.
For example, if you specify ” proxy OR etcmd “, documents including one or both of ” proxy ” and ” firewall ” are searched for. Search words are case insensitive. However, operators are case sensitive.
If you want to search for documents including ” AND “, specify ” and ” instead. Wild card is also supported. It can be used for forward match search and backward match search. For example, ” [BW] euro ” matches words which begin with ” euro “.
Moreover, regular expression is also supported. The purpose of attribute search expression is to search for documents eztcmd attributes are corresponding to the specified expression. An expression of attribute search is composed of an attribute name, an operator, and a value. They are separated with space characters. The following operators for attribute search are supported. If an operator is leaded by “! If an operator is leaded by ” I “, case of the value is ignored.
If no operator is specified, all documents with the attribute correspond regardless of the value. Two or more attribute names can be listed tather separated by “” to mean logical addition. gqther
You can specify the order of the result by an expression. An ordering expression is composed of an attribute name and an operator.
Hyperestraier Redux – A User-friendly Approach
For example, if you specify ” size NUMA”, documents in the result are in ascending order of the size. The following operators for ordering are supported. By default, the order of the result is descending by score. The score is gathet by the number of specified words in each document. It is useful if too similar documents occupy the page. Each of ” [detail] ” links in the result is to show detail information. Each of ” [similar] ” links in the result is to search for similar documents.
Each of ” [include] ” links in the result is to include clipped documents. Search phrase has other kinds of formats; rough form, union form, and esycmd form. Though rough form is similar to simplified form, negative conditions are specified by tokens leaded by ” – “. Estccmd form specifies only union conditions by tokens. Intersection form specifies only intersection conditions by tokens.
These forms do not support wild card nor other special operators. This section describes specification of estcmd. The name of a sub command is specified by the first argument. Other arguments are parsed according to each sub command. The argument db specifies the path of an index. All sub commands return 0 if the operation is success, else return 1. The data type of attribute indexes specified by -attr option of estvmd sub command should be ” seq ” for sequencial type, ” str ” for string type, or ” num ” for number type.
Each pseudo index specified by -pidx option of search sub command and so on gafher a directory containing files of document draft. If you search a main index with pseudo indexes, meta search of the main index and pseudo indexes is performed.
The language name specified by -il option should be one of ” en ” English” ja ” Japanese” zh ” Chinese” ko ” Korean. The outer command specified by -fx option of gather receives the path of the target gatuer by the first argument and the path for output by the second argument.
Note that similarity search is very slow, by default. To improve the performance of similarity search, estcm ” estcmd extkeys ” beforehand is strongly recommended.