Search Service
1. Intro
The Search Service can be used to search on properties of resources (like filename etc.) or extracted meta data.
The resource is indexed specific to its resource type.
Data can be stored in meta data or in content.
Meta data is stored per field.
Each meta data field can have multiple values.
Content is stored per key.
You can search in specific meta data fields,
in specific content (i.e. for a given key),
in all content (i.e. over all keys),
or in all meta data values and all content (use full-text for this).
2. How to enable the Search Service?
When the config option /config/server/resources/features/search has the value true in the Resource Server, the Search web service will start and it will index the Resource Stores.
When the config option /config/web-ui/resources/features/search has the value true in the Front End Server, the Search menu item will appear next to the Browse menu item.
3. Configuration
The configuration of the Cassandra cluster is very similar to that of Distributed Stores.
All configuration options are under /config/server/resources/search/.
-
monitor: Optional option. A boolean (trueorfalse) indicating if the store should be monitored for its health status. Default valuetrue. -
monitor-interval: Optional option. Interval in milliseconds at which the monitor will update its health. Default value1,000(1 second). -
monitor-max-interval: Optional option. The maximum time in milliseconds between successive health checks. The check will run at least once in this interval, even if no other task reported an error. Default value300,000(5 minutes). -
contactpoints: Optional option. A comma separated list of IP addresses of some of the nodes from the cluster for initial discovery. After this, other nodes of the cluster are found by contacting these nodes. Example127.0.0.1,10.2.31.7,10.2.30.119,10.2.32.174. Default valuelocalhost. In a multi-node Cassandra cluster, the contact point127.0.0.1must always be explicitly included. If thecluster-nameoption is specified, this option is ignored. -
replicationfactor: Optional option. How many times the data must be replicated in the cluster. Cassandra will try to replicate on different nodes. Default value1. -
cluster-name: Optional option. The name of the Scriptura managed Cassandra cluster. This overrides thecontactpointsoption. Examplecassandra-scriptura-production. If you do not specify this option, Scriptura will fall back to thecontactpointsoption to determine where it can find the Cassandra servers. -
username: Optional option. The username that has to be used to connect to the cassandra cluster. -
password: Optional option. The password for the configured user. -
keyspace: Optional option. The keyspace to use in the Cassandra database. Only use valid characters for keyspace names. The keyspace is created if it does not exists yet. Default valuesearch. -
index-plain-textOptional option. Whentrue, every line in a text file is added as value for meta data keycontent. Currently, the following Resource Types are indexed as text-files: Cascading stylesheet file (*.css), Comma-separated values (*.csv), Git ignore file (.gitignore), HTML file (*.htm, *.html), INI file (*.ini), Java source file (*.java), JavaScript source file (*.js), JSON file (*.json), Markdown file (*.md), SQL File (*.sql), Text Document (*.txt), and TypeScript file (*.ts). Default valuefalse. -
index-xmlOptional option. Whentrue, attribte values and text content is indexed for XML files. Attribute values are indexed under meta data keyattr-<uri>-<local-name>. Similarly, text content is indexed underelement-<uri>-<localname>. E.g. for<n1:node xmlns:n1="http://mynamespace" n1:data="foo">bar</n1:node>the valuefoois indexed under meta data keyattr-http://mynamespace-dataand the the valuebaris indexed under meta data keyelement-http://mynamespace-node. Currently, the following Resource Types are indexed as XML-files: Tracking Concept Definition (*.tcd), Scriptura Document Flow (*.sdf), Data Source Template (main.dst), Scriptura Event Flow (*.sef), Scalable Vector Graphics (*.svg), XML File (*.xml), and XSLT File (*.xslt). Default valuefalse.
4. How to perform a search?
The filter can be provided in either XML or in JSON format.
Do a POST request to the end-point /search/v1/query on the Resource Server.
4.1. Samples requests
Sample XML POST request body.
<filter start-index="0" max-results="2">
<store-name>default-store</store-name>
<query>
<and>
<path-prefix>/default-workspace/</path-prefix>
<path-filter>sample</path-filter>
<type>text/plain</type>
<full-text>branch</full-text>
<meta field="filename" match-type="exact" case-type="sensitive">Sample file.txt</meta>
<content key="content-en">Line</content>
</and>
</query>
<ordering>
<order type="standard" field="path" ascending="false"/>
<order type="meta" field="content"/>
</ordering>
<fetch-meta-data-fields>
<field>content</field>
<field>filename</field>
</fetch-meta-data-fields>
</filter>
Sample JSON POST request body.
{
"storeName": "default-store",
"query": { "and": [
{"pathPrefix": "/default-workspace"},
{"pathFilter": "sample"},
{"type": "text/plain"},
{"fullText": "branch"},
{"meta": { "field": "filename", "matchType" :"exact", "caseType": "sensitive", "value": "Sample file.txt"}},
{"content": { "key": "content-en", "value": "Line"}}
]},
"ordering": [
{"type": "standard", "field": "path", "ascending": "false"},
{"type": "meta", "field": "content"}
],
"fetchMetaDataFields": ["content", "filename"],
"startIndex": 0,
"maxResults": 2
}
4.2. Request description
All properties are optional. All conditions are cumulative.
| XML | JSON field | Remarks |
|---|---|---|
|
|
Limit to resources in this store. |
|
|
Specify which results should be included. See table query below. |
|
|
For XML, include a list of |
|
|
For XML, include a list of |
|
|
Zero-based. Skip the first results. |
|
|
Only return this amount of entries.
An extra element |
| XML | JSON field | Remarks |
|---|---|---|
|
|
Limit to resources whose path starts with the given path (case-insensitive). |
|
|
Limit to resources that contain the given text in their path (case-insensitive). |
|
|
Limit to resources whose type exactly matches with the given type. |
|
|
Limit to resources which contain the given text in at least one meta data value or in content (case-insensitive). |
|
|
Filter on a meta data field. See table meta below. |
|
|
Filter on content. See table content below. |
|
|
Requires that all the child-elements are true.
Contains |
|
|
Requires that at least one child-element is true.
Contains |
|
|
Requires that its child-element is false.
Contains a |
| XML | JSON field | Remarks |
|---|---|---|
|
|
The name of the meta data field. |
|
|
One of |
|
|
One of |
<text-content> |
|
The value that should be matched.
For match-type |
| XML | JSON field | Remarks |
|---|---|---|
|
|
The key of the content. Content can be categorized (e.g. per language). Optional. Omit for searching in all content. |
<text-content> |
|
The value that should be matched. For XML, include the value as text-content. For JSON, include a value-field. |
| XML | JSON field | Remarks |
|---|---|---|
|
|
One of |
|
|
In case of type |
|
|
One of |
4.3. Sample responses
<?xml version="1.0" encoding="UTF-8"?>
<resources>
<resource store="default-store" path="/default-workspace/Sample file.txt" tag="6cb0a975-e7e5-4b05-8720-2acd9e8d33c1" last-modified="2017-08-18T19:44:49.000Z" size="206" type="text/plain">
<metadata field="filename">
<value>Sample file.txt</value>
</metadata>
</resource>
<next/>
</resources>
When the response contains a next element, there were more results than the ones returned.
Fetch the next batch of responses by changing the start-index and max-results attributes.
{
"resources": [
{
"store": "default-store",
"path": "/default-workspace/Aanpak testen.txt",
"tag": "6cb0a975-e7e5-4b05-8720-2acd9e8d33c1",
"last-modified": "2017-08-18T19:44:49.000Z",
"size": "206",
"type": "text/plain",
"metadata": [
{
"field": "filename",
"values": [
"Sample file.txt"
]
}
]
}
],
"next": false
}
When the next field has value true, there were more results than the ones returned.
Fetch the next batch of responses by changing the startIndex and maxResults fields.
5. Polling for events
When an Resource Store event is processed by the Search Service, it is emitted by this endpoint.
Do a GET request to the end-point /search/v1/poll on the Resource Server.
It behaves in the same way as the poll end-point of .Resource Stores.