Search Service

Table of Contents

1. Intro
2. How to enable the Search Service?
3. Configuration
4. How to perform a search?
5. Polling for events

1. Intro

The Search Service can be used to search on properties of resources (like filename etc.) or extracted meta data.

The resource is indexed specific to its resource type. Data can be stored in meta data or in content. Meta data is stored per field. Each meta data field can have multiple values. Content is stored per key. You can search in specific meta data fields, in specific content (i.e. for a given key), in all content (i.e. over all keys), or in all meta data values and all content (use full-text for this).

2. How to enable the Search Service?

When the config option /config/server/resources/features/search has the value true in the Resource Server, the Search web service will start and it will index the Resource Stores.

When the config option /config/web-ui/resources/features/search has the value true in the Front End Server, the Search menu item will appear next to the Browse menu item.

3. Configuration

The configuration of the Cassandra cluster is very similar to that of Distributed Stores. All configuration options are under /config/server/resources/search/.

monitor: Optional option. A boolean (true or false) indicating if the store should be monitored for its health status. Default value true.
monitor-interval: Optional option. Interval in milliseconds at which the monitor will update its health. Default value 1,000 (1 second).
monitor-max-interval: Optional option. The maximum time in milliseconds between successive health checks. The check will run at least once in this interval, even if no other task reported an error. Default value 300,000 (5 minutes).
contactpoints: Optional option. A comma separated list of IP addresses of some of the nodes from the cluster for initial discovery. After this, other nodes of the cluster are found by contacting these nodes. Example 127.0.0.1,10.2.31.7,10.2.30.119,10.2.32.174. Default value localhost. In a multi-node Cassandra cluster, the contact point 127.0.0.1 must always be explicitly included. If the cluster-name option is specified, this option is ignored.
replicationfactor: Optional option. How many times the data must be replicated in the cluster. Cassandra will try to replicate on different nodes. Default value 1.
cluster-name: Optional option. The name of the Scriptura managed Cassandra cluster. This overrides the contactpoints option. Example cassandra-scriptura-production. If you do not specify this option, Scriptura will fall back to the contactpoints option to determine where it can find the Cassandra servers.
username: Optional option. The username that has to be used to connect to the cassandra cluster.
password: Optional option. The password for the configured user.
keyspace: Optional option. The keyspace to use in the Cassandra database. Only use valid characters for keyspace names. The keyspace is created if it does not exists yet. Default value search.
index-plain-text Optional option. When true, every line in a text file is added as value for meta data key content. Currently, the following Resource Types are indexed as text-files: Cascading stylesheet file (*.css), Comma-separated values (*.csv), Git ignore file (.gitignore), HTML file (*.htm, *.html), INI file (*.ini), Java source file (*.java), JavaScript source file (*.js), JSON file (*.json), Markdown file (*.md), SQL File (*.sql), Text Document (*.txt), and TypeScript file (*.ts). Default value false.
index-xml Optional option. When true, attribte values and text content is indexed for XML files. Attribute values are indexed under meta data key attr-<uri>-<local-name>. Similarly, text content is indexed under element-<uri>-<localname>. E.g. for <n1:node xmlns:n1="http://mynamespace" n1:data="foo">bar</n1:node> the value foo is indexed under meta data key attr-http://mynamespace-data and the the value bar is indexed under meta data key element-http://mynamespace-node. Currently, the following Resource Types are indexed as XML-files: Tracking Concept Definition (*.tcd), Scriptura Document Flow (*.sdf), Data Source Template (main.dst), Scriptura Event Flow (*.sef), Scalable Vector Graphics (*.svg), XML File (*.xml), and XSLT File (*.xslt). Default value false.

4. How to perform a search?

The filter can be provided in either XML or in JSON format.

Do a POST request to the end-point /search/v1/query on the Resource Server.

4.1. Samples requests

Sample XML POST request body.

<filter start-index="0" max-results="2">
	<store-name>default-store</store-name>
	<query>
		<and>
			<path-prefix>/default-workspace/</path-prefix>
			<path-filter>sample</path-filter>
			<type>text/plain</type>
			<full-text>branch</full-text>
			<meta field="filename" match-type="exact" case-type="sensitive">Sample file.txt</meta>
			<content key="content-en">Line</content>
		</and>
	</query>
	<ordering>
		<order type="standard" field="path" ascending="false"/>
		<order type="meta" field="content"/>
	</ordering>
	<fetch-meta-data-fields>
		<field>content</field>
		<field>filename</field>
	</fetch-meta-data-fields>
</filter>

Sample JSON POST request body.

{
	"storeName": "default-store",
	"query": { "and": [
			{"pathPrefix": "/default-workspace"},
			{"pathFilter": "sample"},
			{"type": "text/plain"},
			{"fullText": "branch"},
			{"meta": { "field": "filename", "matchType" :"exact", "caseType": "sensitive", "value": "Sample file.txt"}},
			{"content": { "key": "content-en", "value": "Line"}}
	]},
	"ordering": [
		{"type": "standard", "field": "path", "ascending": "false"},
		{"type": "meta", "field": "content"}
	],
	"fetchMetaDataFields": ["content", "filename"],
	"startIndex": 0,
	"maxResults": 2
}

4.2. Request description

All properties are optional. All conditions are cumulative.

Table 1. Properties
XML	JSON field	Remarks
`store-name` element	`storeName`	Limit to resources in this store.
`query` element	`query`	Specify which results should be included. See table query below.
`ordering` element	`ordering`	For XML, include a list of `order` elements. For JSON, include an array of objects. See table order below.
`fetch-meta-data-fields` element	`fetchMetaDataFields`	For XML, include a list of `field` elements with text-content. For JSON, include an array of strings.
`start-index` attribute	`startIndex`	Zero-based. Skip the first results.
`max-results` attribute	`maxResults`	Only return this amount of entries. An extra element `next` will be generated after the last resource to indicate if there are more results available.

Table 2. query
XML	JSON field	Remarks
`path-prefix` element	`pathPrefix`	Limit to resources whose path starts with the given path (case-insensitive).
`path-filter` element	`pathFilter`	Limit to resources that contain the given text in their path (case-insensitive).
`type` element	`type`	Limit to resources whose type exactly matches with the given type.
`full-text` element	`fullText`	Limit to resources which contain the given text in at least one meta data value or in content (case-insensitive).
`meta` element	`meta`	Filter on a meta data field. See table meta below.
`content` element	`content`	Filter on content. See table content below.
`and` element	`and`	Requires that all the child-elements are true. Contains `query` elements as children.
`or` element	`or`	Requires that at least one child-element is true. Contains `query` elements as children.
`not` element	`not`	Requires that its child-element is false. Contains a `query` element as child.

Table 3. meta
XML	JSON field	Remarks
`field` attribute	`field`	The name of the meta data field.
`match-type` attribute	`matchType`	One of `wilcard` or `exact`. In case of `wildcard`, include `*` as wildcard.
`case-type` attribute	`caseType`	One of `sensitive` or `insensitive`.
<text-content>	`value`	The value that should be matched. For match-type `wildcard` include stars (*) as wildcards. For XML, include the value as text-content. For JSON, include a value-field.

Table 4. content
XML	JSON field	Remarks
`key` attribute	`key`	The key of the content. Content can be categorized (e.g. per language). Optional. Omit for searching in all content.
<text-content>	`value`	The value that should be matched. For XML, include the value as text-content. For JSON, include a value-field.

Table 5. order
XML	JSON field	Remarks
`type` attribute	`type`	One of `standard` or `meta`.
`field` attribute	`field`	In case of type `standard`, one of `last-modified`, `path` or `size`. In case of type `meta`, a name of a meta data field.
`ascending` attribute	`ascending`	One of `true` or `false`.

4.3. Sample responses

Sample XML response body

<?xml version="1.0" encoding="UTF-8"?>
<resources>
    <resource store="default-store" path="/default-workspace/Sample file.txt" tag="6cb0a975-e7e5-4b05-8720-2acd9e8d33c1" last-modified="2017-08-18T19:44:49.000Z" size="206" type="text/plain">
        <metadata field="filename">
            <value>Sample file.txt</value>
        </metadata>
    </resource>
    <next/>
</resources>

When the response contains a next element, there were more results than the ones returned. Fetch the next batch of responses by changing the start-index and max-results attributes.

Sample JSON response body

{
    "resources": [
        {
            "store": "default-store",
            "path": "/default-workspace/Aanpak testen.txt",
            "tag": "6cb0a975-e7e5-4b05-8720-2acd9e8d33c1",
            "last-modified": "2017-08-18T19:44:49.000Z",
            "size": "206",
            "type": "text/plain",
            "metadata": [
                {
                    "field": "filename",
                    "values": [
                        "Sample file.txt"
                    ]
                }
            ]
        }
    ],
    "next": false
}

When the next field has value true, there were more results than the ones returned. Fetch the next batch of responses by changing the startIndex and maxResults fields.

5. Polling for events

When an Resource Store event is processed by the Search Service, it is emitted by this endpoint.

Do a GET request to the end-point /search/v1/poll on the Resource Server.

It behaves in the same way as the poll end-point of .Resource Stores.

Scriptura Engage