Output API: Document flow guide

1. Introduction

In order to support Deliver & Archive with more complex pre-processing and batch-processing, as well as in order to unburden our customers from maintaining the necessary systems to run document flows, the Runtime Output API now supports deploying and invoking document flows. This platform is colloquially named Document Flow as a Service or DFaaS.

This document assumes familiarity with Document Flow, Projects and the Xribe Output API. We will cover the definition of document flow packages, how to create an efficient development environment, and the requirements document flows need to satisfy in order to be deployable on the DFaaS platform.

2. Development Setup

In this section we’ll describe a typical setup you can use to develop, test and deploy your document flows.

2.1. Projects and Environments

The DFaaS platform uses resources stored in the cloud. All resources are tracked and promoted by resource promotion.

As we’ll see, document flows are capable of referencing templates in other projects. Document flows often have a different development cycle and different maintainers than the templates they process. While it is possible to mix template and document flow resources in a single project, for these reasons, it is probably a good idea to create a different project dedicated to document flows.

We recommend at least two environments for a document flow project. A developement environment that can be synced to your Design & Compose Self-Hosted Edition for experimenting and developing, and a production environment that will handle actual requests. A test-environment for validation by QA or business users before promoting changes to production is commonly added to the setup as well.

2.2. Configuration

At the moment, the DFaaS functionality is only meant to be used in projects managed by Unifiedpost. For this reason the DFaaS specific functionality is hidden from the Design & Compose Self-Hosted Edition designer.

In order to enable this functionality, add the following fragment to your local configuration:

<config>
    <documentflow-service>
        <show-palette type="value">true</show-palette>
    </documentflow-service>
</config>

2.3. Local Development

You will probably want to test as much as possible locally, on your own computer, to avoid build and deploy times and iterate faster. Input will be simulated by putting input data in a folder. Similarly, deliverables will be written to an output folder.

The location of these input and output folders can be defined in the local configuration file as follows:

<config>
    <documentflow-service>
        <input-folder type="value">C:\Temp\input</input-folder>
        <output-folder type="value">C:\Temp\output</output-folder>
    </documentflow-service>
</config>

It might be tempting to let this point to folders inside your workspace (to reuse the input and output folders created there by samples), but remember: your workspace is probably being synced, and you probably don’t want to upload all the input samples and resulting outputs to the cloud.

The json you need to place in the folder in order to trigger the flow has the following structure:

{
  "projectID": "dcb398ba-1b8c-4afd-85fd-a5905e5f42c5",
  "environmentID": "b4e0d9b7-16c4-4da3-b9b3-4ae3b52f5c8f",
  "id": "8eb03f8c-3c3e-4f09-ad6e-24f22a9cec66",
  "input": [
    {
      "id" : "initial-input",
      "documentflowParameters": {
        "flowID": "preview1_new.sdf"
      },
      "data": {
        "foo" : "bar"
      }
    }
  ]
}

the flowID is the name of the flow as deployed by the preview functionality. The other identifiers are used to populate data sources and to create output folders, the exact values are less important.

2.4. Deploying

The DFaaS platform will automatically build and deploy all document flow packages it finds in DFaaS enabled environments. To enable an environment, use the Xribe Deploy API to enable an environment.

The first environment of a project will not deploy automatically in response to file changes. This is because there will be many small changes during development and you don’t want to trigger a build for every update. Instead, you can trigger an explicit deploy when you have something you want to test.

The first environment will deploy automatically in response to platform updates. This means that you can’t rely on your package not deploying while you’re changing things.

2.5. Invoking

Invoking a document flow is done by POSTing a message with input type documentflow to the Xribe Output API. An example of a message is:

{
  "projectID":"dcb398ba-1b8c-4afd-85fd-a5905e5f42c5",
  "environmentID":"b4e0d9b7-16c4-4da3-b9b3-4ae3b52f5c8f",
  "input":{
    "type":"documentflow",
    "documentflowParameters":{
      "packageID":"showcase",
      "flowID":"echo"
    },
    "data":{
      "foo":"bar"
    }
  }
}

2.6. Promotion

When a document flow is ready for the next step, the resources can be deployed to the next environment, which will automatically trigger the build of a new package. Once the build has finished, existing instances will be terminated and replaced by instances containing the newly promoted version.

This means every environment will contain the exact same files. Chances are you want to contact different systems and use different credentials in each environment. You can introduce this variability by assiging properties to environments when enabling an environment and using these properties in the package definition as environment variables.

3. Package

A document flow package is an extended Design & Compose Self Hosted Edition project. This means that you start by adding flows and data sources to the project as usual. Additionally, you can add DFaaS configuration to make sure the necessary runtime resources are available.

3.1. Document Flows

The main part of the project descriptor is the list of document flows.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <document-flow>
        <display-name>ima-teststub</display-name>
        <document-flow-name>ima-teststub</document-flow-name>
        <document-flow-uri>flows/data-enrichment/activation-code/test/ima-teststub.sdf</document-flow-uri>
    </document-flow>
</project>

The document-flow-name is what will be used to route your message to the correct document flow when using the Xribe Output API.

3.2. Resources

Resources like templates and images can be referenced relatively if they are included in the project, but if you followed the advice to put templates in a different project, you will reference them using the idr:// protocol. Note that this means you will need to provide your document flow with credentials, a client-id/client-secret linked to a user authorized to access the resources. Instead of hardcoding the client-id and client-secret, add them as properties when enabling the environment and pass them as Environment Variables to the document flow. That way, you can use the same document flow in different environments with different users.

3.3. Custom Steps

If your document flow uses custom steps, either from the market place or developed yourself, you can make them available to the DFaaS platform by placing a zipped update site in your project and referencing the steps in the project descriptor like so:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <extensions>
            <extension>custom-steps/deploymentpackage.zip</extension>
        </extensions>
    </cloud-deploy-package>
</project>

3.4. Custom Fonts

If the templates used in the document flows require custom fonts, these fonts need to be installed before they can be used. Provide the necessary fonts by including them in your project and referencing them as follows in the project descriptor:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <fonts>
            <font>"ttf/Arial.ttf"</font>
            <font>"ttf/ArialB.ttf"</font>
        </fonts>
    </cloud-deploy-package>
</project>

3.5. Custom Color Profiles

If the templates used in the document flows require custom color profiles, these profiles need to be made available to the processing server before they can be used. Provide the necessary profiles by including them in your project and referencing them as follows in the project descriptor:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <icc-profiles>
            <icc-profile>icc/sRGB_v4_ICC_preference.icc</icc-profile>
        </icc-profiles>
    </cloud-deploy-package>
</project>

3.6. Log Configuration

The log configuration, influencing how much is logged and by what components can be configured as follows:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <loggers>
            <logger name="default"  level="INFO"/>
            <logger name="com.id.javapackage"  level="WARN"/>
            <logger name="com.id.mypackage.MyVeryBuggyClass" level="DEBUG"/>
            <logger name="com.id.mypackage.MyClass2" level="WARN"/>
            <logger name="com.id.mypackage.MyVeryVerboseOne" level="ERROR"/>
        </loggers>
    </cloud-deploy-package>
</project>

For access to the logs, contact support or the Xribe team.

3.7. Custom configuration

If specific configuration entries are needed, like AFP font configuration or certain performance settings, configuration entries can be passed to the server running the package as follows:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <configuration>
           <!-- single value -->
           <value key="/config/templateprocessor/afp/duplex-flip">true</value>
           <!-- value list (just like single values) -->
           <value key="/config/templateprocessor/afp/fonts/item1/name">Bookshelf Symbol,Regular</value>
           <value key="/config/templateprocessor/afp/fonts/item1/afpfontname">BD02BSSRN</value>
           <value key="/config/templateprocessor/afp/fonts/item2/name">Bookshelf Symbol,Bold</value>
           <value key="/config/templateprocessor/afp/fonts/item2/afpfontname">BD02BSSBN</value>
           <!-- fragment value -->
           <fragment key="/config/templateprocessor/afp/fonts">
                <item3 xmlns=""><name>Arial,Regular</name><afpfontname>A01ARIAL</afpfontname></item3>
           </fragment>
        </configuration>
    </cloud-deploy-package>
</project>

3.8. Instances

When a message is submitted through the Xribe Output API, an instance of the package is spun up to handle it. When additional messages are submitted and they start queueing up, additional instances of the package are started in order to process the increased load. The maximum number of instances that are started can be influenced by the package-instances max attribute. A reason for limiting the maximum number of instances might be to limit the load on external systems that the document flows are interacting with.

Starting an instance can take a few minutes. In order to avoid this cold start, the min attribute can be used to make sure there are always one or more instances available to handle requests.

Setting min to a value greater than 0 implies a significant constant cost. Normally, the goal is to reduce costs to 0 when nothing is happening.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <package-instances min="1" max="10"/>
    </cloud-deploy-package>
</project>

When started, a package instance can limit the amount of messages an instance is picking up by specifying a messages-per-instance limit.

Another factor that can limit the number of messages being processed simultaneously is the number of threads available to document flow instances. In addition, when a flow has a max-instance-limit specified, a message can be picked up by the package, but it will have to wait until a flow instance can be started before it will be processed.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <messages-per-instance limit="5"/>
    </cloud-deploy-package>
</project>

3.9. Memory, CPU, Disk Requirements

Not all document flows require the same amount of CPU, memory or disk. A document flow that generates a single pdf document for example needs little to no local disk space, whereas a document flow processing large batches likely needs local disk space to temporarily store and sort files.

cpu and memory sizes are linked and limited to a few supported combinations. A CPU value of 1024 corresponds with a single core.

CPU	Supported memory values (MB)
256	512, 1024, 2048
512	1024, 2048, 3072, 4096
1024	2048, 3072, 4096, 5120, 6144, 7168, 8192
2048	Between 4096 and 16384 in increments of 1024
4096	Between 8192 and 30720 in increments of 1024

Additionally, disk requirements can optionally be added as well. The size is expressed in GB with legal values between 20 and 200.

Be aware that an increase in CPU, memory and/or disk comes with increased costs, so be as conservative as possible.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <task-size>
            <cpu>1024</cpu>
            <memory>2048</memory>
            <disk>20</disk>
        </task-size>
    </cloud-deploy-package>
</project>

3.10. Environment Variables

In order to introduce variability between environments, environment variables (properties) set when enabling the environment can be used in the project descriptor. This makes it possible for document flow to use different users, credentials, endpoints, … in different environments. To use environment variables, use the mustache syntax like so: {{variablename}}

One way to make environment variables available in the document flow is to pass them in a document flow property data source, or to use them to select a specific file to load more configuration options from:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <document-flow>
        <display-name>run-batch</display-name>
        <document-flow-name>run-batch</document-flow-name>
        <document-flow-uri>flows/run-batch.sdf</document-flow-uri>
        <copies>1</copies>
        <document-flow-properties>
            <property key="max-instances-limit">3</property>
        </document-flow-properties>
        <document-flow-data-sources>
            <data-source>
                <data-source-name>Properties</data-source-name>
                <data-source-content>
                    <hashmap xmlns="http://uris.inventivedesigners.com/scriptura/documentflow/design">
                        <entry>
                            <key>
                                <string>clientid</string>
                            </key>
                            <value>
                                <string>{{clientid}}</string>
                            </value>
                        </entry>
                        <entry>
                            <key>
                                <string>clientsecret</string>
                            </key>
                            <value>
                                <string>{{clientsecret}}</string>
                            </value>
                        </entry>
                    </hashmap>
                </data-source-content>
            </data-source>
            <data-source>
                <data-source-name>XML</data-source-name>
                <data-source-content>
                    <uri xmlns="http://uris.inventivedesigners.com/scriptura/documentflow/design">{{environment}}.xml</uri>
                </data-source-content>
            </data-source>
        </document-flow-data-sources>
    </document-flow>
</project>

Variables can be used anywhere in the package descriptor. This means they can also be used to change the scaling per environment for example.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://uris.inventivedesigners.com/scriptura/project" version="1">
    <cloud-deploy-package>
        <package-instances min="0" max="{{maxinstances}}"/>
    </cloud-deploy-package>
</project>

4. Document Flow Design

In order to integrate with the DFaaS platform and to replace functionality that isn’t available on the DFaaS platform, DFaaS specific steps have been added.

4.1. Initial Step

The only way to trigger a document flow instance on the DFaaS platform is through the Message Monitor initial step. This step accepts the message used to call the Xribe Output API and outputs the value of the data property as a flow object of type Stream.

This step provisions a property data source with the following properties:

message-id: The message-id assigned by the Xribe Output API to the message that is responsible for triggering this document flow instance.
deliverable-id: The deliverable-id of the deliverable associated with the message that triggered this document flow instance.
project-id: The id of the project that contains the package.
environment-id: The id of the environment that contains this package.
input-id: The id of the input element of the message that triggered this flow instance. If there is no id specified on the input property, a random id is generated.
data-storage-location: A base path available to the package that can be used to store resources. This is often used to pass large files between flows; store a file under the data-storage-location and pass the path as part of the data to another flow using the Call Flow Step.

4.2. Set Status Step

The Set Status step is used to update the status of the message that triggered this document flow instance. The status can be changed either to Generated, which indicates the successful processing of the message, or Error, which signals an error during processing. Details can be provided in a free-form message.

Should multiple Set Status steps be triggered, the Error status will trump the Generated status. Put in another way: once a message is marked as being Errored, it will never transition to a Generated state.

When the flow instance ends, a callback will be triggered if the original message specified this in its webhookParameters.

This step has the following properties:

Status: Generated or Error.
Status message: A freeform error that will be logged as part of the history of the deliverable.

if you don’t use the Set Status step, the flow will work, but the status of the message will remain accepted and no callback will be triggered. For proper integration with the Xribe Output API, every message should eventually use a Set Status step to update its status and possibly trigger a callback.

4.3. Error Handling

When a flow instance aborts, the message will be retried by offering it to a new document flow instance. If the same message leads to aborts multiple times (10), it is sent to a Dead Letter Queue.

When a failure in a document flow is a legitimate possibility, error handling should be implemented in the flow; place appropriate steps in a container, set the skip on error property and add a Set Status step (with error status) to the error flow. That way the message will not be retried multiple times, the message and deliverable will transition to the error state and the callback (if present) will be notified of the error.

4.4. Create Deliverable Output Step

If a document flow wishes to communicate a result to the caller in the same way as the Xribe Output API - with pre-signed S3 links in output properties as part of the deliverable - this step must be used.

The file that should be made available as output is expected as an incoming flow object with data type Stream.

The step has the following properties:

Output Filename: The filename of the output result.
Output ID: The id of the output element of the deliverable. This might be useful to differentiate between the resulting files.
Primary: true/false. Primary output is reserved for the main output of this process. Other, intermediate, outputs can be returned as well to give an insight into the process, but aren’t essential and should not cause problems when omitted.

This step generates a property data source with the following properties:

deliverable-id: The deliverable-id of the deliverable associated with the message that triggered this document flow instance.
output-id: The output-id associated with the created output.
output-uri: The url - not pre-signed - that is assigned to the stored stream.

4.5. Call Flow Step

Invoking another document flow in the same package is possible using the Call Flow step.

The step requires the following properties:

Flow ID: The id of the flow to be called.
Input ID: The id of the input message. This will be mapped to the Input ID returned by the Initial Step of the called flow.
Properties: The properties will be offered as a serialized json object as data to the called flow.

Note that the amount of data that can be passed as properties is limited to a few hundred kB. Larger amounts of data are best passed as files in the data-storage-location with a reference passed as a property.

4.6. Temporary File Storage

Files can be stored temporarily in the data-storage-location returned by the Initial Step. Files will be automatically deleted after 3 months.

The data-storage-location is a uri that can be used like a local path with folders like the Folder Delivery, Read Folder, Read Resource, Copy Resource or Delete Resource steps from the I/O category.

4.7. Temporary Data Storage

The lack of a tracking database or relational database in the DFaaS platform is mitigated by the Store, Read and Delete Data steps that provide access to a Non-relation Database. The data structure available is a map of maps and steps exist to add, read and delete entries from it.

4.7.1. Store Data Step

This step allows you to assign a map to a variable.