Search Service

Table of Contents

1. Intro

The Search Service can be used to search on properties of resources (like filename etc.) or extracted meta data.

The resource is indexed specific to its resource type. Data can be stored in meta data or in content. Meta data is stored per field. Each meta data field can have multiple values. Content is stored per key. You can search in specific meta data fields, in specific content (i.e. for a given key), in all content (i.e. over all keys), or in all meta data values and all content (use full-text for this).

2. How to enable the Search Service?

When the config option /config/server/resources/features/search has the value true in the Resource Server, the Search web service will start and it will index the Resource Stores.

When the config option /config/web-ui/resources/features/search has the value true in the Front End Server, the Search menu item will appear next to the Browse menu item.

3. Configuration

The configuration of the Cassandra cluster is very similar to that of Distributed Stores. All configuration options are under /config/server/resources/search/.

  • monitor: Optional option. A boolean (true or false) indicating if the store should be monitored for its health status. Default value true.

  • monitor-interval: Optional option. Interval in milliseconds at which the monitor will update its health. Default value 1,000 (1 second).

  • monitor-max-interval: Optional option. The maximum time in milliseconds between successive health checks. The check will run at least once in this interval, even if no other task reported an error. Default value 300,000 (5 minutes).

  • contactpoints: Optional option. A comma separated list of IP addresses of some of the nodes from the cluster for initial discovery. After this, other nodes of the cluster are found by contacting these nodes. Example 127.0.0.1,10.2.31.7,10.2.30.119,10.2.32.174. Default value localhost. In a multi-node Cassandra cluster, the contact point 127.0.0.1 must always be explicitly included. If the cluster-name option is specified, this option is ignored.

  • replicationfactor: Optional option. How many times the data must be replicated in the cluster. Cassandra will try to replicate on different nodes. Default value 1.

  • cluster-name: Optional option. The name of the Scriptura managed Cassandra cluster. This overrides the contactpoints option. Example cassandra-scriptura-production. If you do not specify this option, Scriptura will fall back to the contactpoints option to determine where it can find the Cassandra servers.

  • username: Optional option. The username that has to be used to connect to the cassandra cluster.

  • password: Optional option. The password for the configured user.

  • keyspace: Optional option. The keyspace to use in the Cassandra database. Only use valid characters for keyspace names. The keyspace is created if it does not exists yet. Default value search.

  • index-plain-text Optional option. When true, every line in a text file is added as value for meta data key content. Currently, the following Resource Types are indexed as text-files: Cascading stylesheet file (*.css), Comma-separated values (*.csv), Git ignore file (.gitignore), HTML file (*.htm, *.html), INI file (*.ini), Java source file (*.java), JavaScript source file (*.js), JSON file (*.json), Markdown file (*.md), SQL File (*.sql), Text Document (*.txt), and TypeScript file (*.ts). Default value false.

  • index-xml Optional option. When true, attribte values and text content is indexed for XML files. Attribute values are indexed under meta data key attr-<uri>-<local-name>. Similarly, text content is indexed under element-<uri>-<localname>. E.g. for <n1:node xmlns:n1="http://mynamespace" n1:data="foo">bar</n1:node> the value foo is indexed under meta data key attr-http://mynamespace-data and the the value bar is indexed under meta data key element-http://mynamespace-node. Currently, the following Resource Types are indexed as XML-files: Tracking Concept Definition (*.tcd), Scriptura Document Flow (*.sdf), Data Source Template (main.dst), Scriptura Event Flow (*.sef), Scalable Vector Graphics (*.svg), XML File (*.xml), and XSLT File (*.xslt). Default value false.

The filter can be provided in either XML or in JSON format.

Do a POST request to the end-point /search/v1/query on the Resource Server.

4.1. Samples requests

Sample XML POST request body.

<filter start-index="0" max-results="2">
	<store-name>default-store</store-name>
	<query>
		<and>
			<path-prefix>/default-workspace/</path-prefix>
			<path-filter>sample</path-filter>
			<type>text/plain</type>
			<full-text>branch</full-text>
			<meta field="filename" match-type="exact" case-type="sensitive">Sample file.txt</meta>
			<content key="content-en">Line</content>
		</and>
	</query>
	<ordering>
		<order type="standard" field="path" ascending="false"/>
		<order type="meta" field="content"/>
	</ordering>
	<fetch-meta-data-fields>
		<field>content</field>
		<field>filename</field>
	</fetch-meta-data-fields>
</filter>

Sample JSON POST request body.

{
	"storeName": "default-store",
	"query": { "and": [
			{"pathPrefix": "/default-workspace"},
			{"pathFilter": "sample"},
			{"type": "text/plain"},
			{"fullText": "branch"},
			{"meta": { "field": "filename", "matchType" :"exact", "caseType": "sensitive", "value": "Sample file.txt"}},
			{"content": { "key": "content-en", "value": "Line"}}
	]},
	"ordering": [
		{"type": "standard", "field": "path", "ascending": "false"},
		{"type": "meta", "field": "content"}
	],
	"fetchMetaDataFields": ["content", "filename"],
	"startIndex": 0,
	"maxResults": 2
}

4.2. Request description

All properties are optional. All conditions are cumulative.

Table 1. Properties
XML JSON field Remarks

store-name element

storeName

Limit to resources in this store.

query element

query

Specify which results should be included. See table query below.

ordering element

ordering

For XML, include a list of order elements. For JSON, include an array of objects. See table order below.

fetch-meta-data-fields element

fetchMetaDataFields

For XML, include a list of field elements with text-content. For JSON, include an array of strings.

start-index attribute

startIndex

Zero-based. Skip the first results.

max-results attribute

maxResults

Only return this amount of entries. An extra element next will be generated after the last resource to indicate if there are more results available.

Table 2. query
XML JSON field Remarks

path-prefix element

pathPrefix

Limit to resources whose path starts with the given path (case-insensitive).

path-filter element

pathFilter

Limit to resources that contain the given text in their path (case-insensitive).

type element

type

Limit to resources whose type exactly matches with the given type.

full-text element

fullText

Limit to resources which contain the given text in at least one meta data value or in content (case-insensitive).

meta element

meta

Filter on a meta data field. See table meta below.

content element

content

Filter on content. See table content below.

and element

and

Requires that all the child-elements are true. Contains query elements as children.

or element

or

Requires that at least one child-element is true. Contains query elements as children.

not element

not

Requires that its child-element is false. Contains a query element as child.

Table 3. meta
XML JSON field Remarks

field attribute

field

The name of the meta data field.

match-type attribute

matchType

One of wilcard or exact. In case of wildcard, include * as wildcard.

case-type attribute

caseType

One of sensitive or insensitive.

<text-content>

value

The value that should be matched. For match-type wildcard include stars (*) as wildcards. For XML, include the value as text-content. For JSON, include a value-field.

Table 4. content
XML JSON field Remarks

key attribute

key

The key of the content. Content can be categorized (e.g. per language). Optional. Omit for searching in all content.

<text-content>

value

The value that should be matched. For XML, include the value as text-content. For JSON, include a value-field.

Table 5. order
XML JSON field Remarks

type attribute

type

One of standard or meta.

field attribute

field

In case of type standard, one of last-modified, path or size. In case of type meta, a name of a meta data field.

ascending attribute

ascending

One of true or false.

4.3. Sample responses

Sample XML response body
<?xml version="1.0" encoding="UTF-8"?>
<resources>
    <resource store="default-store" path="/default-workspace/Sample file.txt" tag="6cb0a975-e7e5-4b05-8720-2acd9e8d33c1" last-modified="2017-08-18T19:44:49.000Z" size="206" type="text/plain">
        <metadata field="filename">
            <value>Sample file.txt</value>
        </metadata>
    </resource>
    <next/>
</resources>

When the response contains a next element, there were more results than the ones returned. Fetch the next batch of responses by changing the start-index and max-results attributes.

Sample JSON response body
{
    "resources": [
        {
            "store": "default-store",
            "path": "/default-workspace/Aanpak testen.txt",
            "tag": "6cb0a975-e7e5-4b05-8720-2acd9e8d33c1",
            "last-modified": "2017-08-18T19:44:49.000Z",
            "size": "206",
            "type": "text/plain",
            "metadata": [
                {
                    "field": "filename",
                    "values": [
                        "Sample file.txt"
                    ]
                }
            ]
        }
    ],
    "next": false
}

When the next field has value true, there were more results than the ones returned. Fetch the next batch of responses by changing the startIndex and maxResults fields.

5. Polling for events

When an Resource Store event is processed by the Search Service, it is emitted by this endpoint.

Do a GET request to the end-point /search/v1/poll on the Resource Server.

It behaves in the same way as the poll end-point of .Resource Stores.

Comments or suggestions?
Tell us here.

If you have any suggestions or comments about this guide, please send us an email using this form.