Search Service
1. Intro
The Search Service can be used to search on properties of resources (like filename etc.) or extracted meta data.
The resource is indexed specific to its resource type.
Data can be stored in meta data or in content.
Meta data is stored per field.
Each meta data field can have multiple values.
Content is stored per key.
You can search in specific meta data fields,
in specific content (i.e. for a given key),
in all content (i.e. over all keys),
or in all meta data values and all content (use full-text
for this).
2. How to enable the Search Service?
When the config option /config/server/resources/features/search
has the value true
in the Resource Server, the Search web service will start and it will index the Resource Stores.
When the config option /config/web-ui/resources/features/search
has the value true
in the Front End Server, the Search menu item will appear next to the Browse menu item.
3. Configuration
The configuration of the Cassandra cluster is very similar to that of Distributed Stores.
All configuration options are under /config/server/resources/search/
.
-
monitor
: Optional option. A boolean (true
orfalse
) indicating if the store should be monitored for its health status. Default valuetrue
. -
monitor-interval
: Optional option. Interval in milliseconds at which the monitor will update its health. Default value1,000
(1 second). -
monitor-max-interval
: Optional option. The maximum time in milliseconds between successive health checks. The check will run at least once in this interval, even if no other task reported an error. Default value300,000
(5 minutes). -
contactpoints
: Optional option. A comma separated list of IP addresses of some of the nodes from the cluster for initial discovery. After this, other nodes of the cluster are found by contacting these nodes. Example127.0.0.1,10.2.31.7,10.2.30.119,10.2.32.174
. Default valuelocalhost
. In a multi-node Cassandra cluster, the contact point127.0.0.1
must always be explicitly included. If thecluster-name
option is specified, this option is ignored. -
replicationfactor
: Optional option. How many times the data must be replicated in the cluster. Cassandra will try to replicate on different nodes. Default value1
. -
cluster-name
: Optional option. The name of the Scriptura managed Cassandra cluster. This overrides thecontactpoints
option. Examplecassandra-scriptura-production
. If you do not specify this option, Scriptura will fall back to thecontactpoints
option to determine where it can find the Cassandra servers. -
username
: Optional option. The username that has to be used to connect to the cassandra cluster. -
password
: Optional option. The password for the configured user. -
keyspace
: Optional option. The keyspace to use in the Cassandra database. Only use valid characters for keyspace names. The keyspace is created if it does not exists yet. Default valuesearch
. -
index-plain-text
Optional option. Whentrue
, every line in a text file is added as value for meta data keycontent
. Currently, the following Resource Types are indexed as text-files: Cascading stylesheet file (*.css), Comma-separated values (*.csv), Git ignore file (.gitignore), HTML file (*.htm, *.html), INI file (*.ini), Java source file (*.java), JavaScript source file (*.js), JSON file (*.json), Markdown file (*.md), SQL File (*.sql), Text Document (*.txt), and TypeScript file (*.ts). Default valuefalse
. -
index-xml
Optional option. Whentrue
, attribte values and text content is indexed for XML files. Attribute values are indexed under meta data keyattr-<uri>-<local-name>
. Similarly, text content is indexed underelement-<uri>-<localname>
. E.g. for<n1:node xmlns:n1="http://mynamespace" n1:data="foo">bar</n1:node>
the valuefoo
is indexed under meta data keyattr-http://mynamespace-data
and the the valuebar
is indexed under meta data keyelement-http://mynamespace-node
. Currently, the following Resource Types are indexed as XML-files: Tracking Concept Definition (*.tcd), Scriptura Document Flow (*.sdf), Data Source Template (main.dst), Scriptura Event Flow (*.sef), Scalable Vector Graphics (*.svg), XML File (*.xml), and XSLT File (*.xslt). Default valuefalse
.
4. How to perform a search?
The filter can be provided in either XML or in JSON format.
Do a POST
request to the end-point /search/v1/query
on the Resource Server.
4.1. Samples requests
Sample XML POST request body.
<filter start-index="0" max-results="2">
<store-name>default-store</store-name>
<query>
<and>
<path-prefix>/default-workspace/</path-prefix>
<path-filter>sample</path-filter>
<type>text/plain</type>
<full-text>branch</full-text>
<meta field="filename" match-type="exact" case-type="sensitive">Sample file.txt</meta>
<content key="content-en">Line</content>
</and>
</query>
<ordering>
<order type="standard" field="path" ascending="false"/>
<order type="meta" field="content"/>
</ordering>
<fetch-meta-data-fields>
<field>content</field>
<field>filename</field>
</fetch-meta-data-fields>
</filter>
Sample JSON POST request body.
{
"storeName": "default-store",
"query": { "and": [
{"pathPrefix": "/default-workspace"},
{"pathFilter": "sample"},
{"type": "text/plain"},
{"fullText": "branch"},
{"meta": { "field": "filename", "matchType" :"exact", "caseType": "sensitive", "value": "Sample file.txt"}},
{"content": { "key": "content-en", "value": "Line"}}
]},
"ordering": [
{"type": "standard", "field": "path", "ascending": "false"},
{"type": "meta", "field": "content"}
],
"fetchMetaDataFields": ["content", "filename"],
"startIndex": 0,
"maxResults": 2
}
4.2. Request description
All properties are optional. All conditions are cumulative.
XML | JSON field | Remarks |
---|---|---|
|
|
Limit to resources in this store. |
|
|
Specify which results should be included. See table query below. |
|
|
For XML, include a list of |
|
|
For XML, include a list of |
|
|
Zero-based. Skip the first results. |
|
|
Only return this amount of entries.
An extra element |
XML | JSON field | Remarks |
---|---|---|
|
|
Limit to resources whose path starts with the given path (case-insensitive). |
|
|
Limit to resources that contain the given text in their path (case-insensitive). |
|
|
Limit to resources whose type exactly matches with the given type. |
|
|
Limit to resources which contain the given text in at least one meta data value or in content (case-insensitive). |
|
|
Filter on a meta data field. See table meta below. |
|
|
Filter on content. See table content below. |
|
|
Requires that all the child-elements are true.
Contains |
|
|
Requires that at least one child-element is true.
Contains |
|
|
Requires that its child-element is false.
Contains a |
XML | JSON field | Remarks |
---|---|---|
|
|
The name of the meta data field. |
|
|
One of |
|
|
One of |
<text-content> |
|
The value that should be matched.
For match-type |
XML | JSON field | Remarks |
---|---|---|
|
|
The key of the content. Content can be categorized (e.g. per language). Optional. Omit for searching in all content. |
<text-content> |
|
The value that should be matched. For XML, include the value as text-content. For JSON, include a value-field. |
XML | JSON field | Remarks |
---|---|---|
|
|
One of |
|
|
In case of type |
|
|
One of |
4.3. Sample responses
<?xml version="1.0" encoding="UTF-8"?>
<resources>
<resource store="default-store" path="/default-workspace/Sample file.txt" tag="6cb0a975-e7e5-4b05-8720-2acd9e8d33c1" last-modified="2017-08-18T19:44:49.000Z" size="206" type="text/plain">
<metadata field="filename">
<value>Sample file.txt</value>
</metadata>
</resource>
<next/>
</resources>
When the response contains a next
element, there were more results than the ones returned.
Fetch the next batch of responses by changing the start-index
and max-results
attributes.
{
"resources": [
{
"store": "default-store",
"path": "/default-workspace/Aanpak testen.txt",
"tag": "6cb0a975-e7e5-4b05-8720-2acd9e8d33c1",
"last-modified": "2017-08-18T19:44:49.000Z",
"size": "206",
"type": "text/plain",
"metadata": [
{
"field": "filename",
"values": [
"Sample file.txt"
]
}
]
}
],
"next": false
}
When the next
field has value true
, there were more results than the ones returned.
Fetch the next batch of responses by changing the startIndex
and maxResults
fields.
5. Polling for events
When an Resource Store event is processed by the Search Service, it is emitted by this endpoint.
Do a GET
request to the end-point /search/v1/poll
on the Resource Server.
It behaves in the same way as the poll end-point of .Resource Stores.