The article provides a general overview of the Workflow Engine architecture and explains the common principles of how the designed Workflows are executed.
Once a Workflow is designed in the Workflow Studio, it is stored in the Workflow Repository, in the Production database, and is ready for execution. The Workflow Engine is a special module of the Workspace Management application which is responsible for managing the execution of workflows and handling such tasks as starting, resuming, terminating Workflows, monitoring and persisting the Workflow instance state.
The Workflow Engine catches the Workflow Instance events and persists them in the Monitoring database (by default, the name of the monitoring database is "M42Monitoring").
For analysis of how the Workflow instance is being executed (or has been executed), the System provides the Visual Tracking action available in the Workflow Studio. This module uses data from the Monitoring database and allows understanding and analyzing how each workflow activity has been executed, which input arguments it has received, and what the Activity result is, as a value of output arguments.
The System keeps all Workflow Instance events in the database until the workflow instance is completed, and for some time afterward.
For more details see Configure Workflow Instance Monitoring
A workflow is a long-running process which in many cases requires human interaction, which means the time between the start and finish of the Workflow Instance could be years. Obviously, in such a timeframe the Application is restarted many times. For that reason, the Workflow Engine handles the saving of the Workflow Instance state to storage any time the change occurs.
The System uses the Instance Store database for storing the serialized Workflow Instance (by default, the database name is "M42InstanceStore").
The instances are stored in the database only until the moment they are completed.
The chapter is relevant only for the Workflow Worker Engine.
Workflow Instances are long-running processes that could take years to be finished, which also means the Workflow Instance can be started on one Product Version, resumed on a second one, and finished on a third one. To mitigate risks of Workflow Instance failures in such a hostile environment the Workflow Engine has a range of machines that guarantees correct handling of the Workflow commands (start, resume, terminate, etc), or, in case of failure, provides an approach for recovery.
Any request to the Workflow Engine is handled over the Queue (table [dbo].[QueueTask]), and stay there until the first available Matrix42 Worker polls it and executes it. This approach each request, later or sooner, will be processed
There are many reasons a Workflow Instance can go to Fail. In some cases, when the Workflow Instance is business-critical, the reason for the failure can be determined and solved, afterward, the Workflow Instance can continue the execution. For more details see Reanimate Workflow Instances
- Stop service: when during the execution of the Workflow the Application Server becomes unavailable (e.g. due to iisrestart, entering Maintenance Mode, ..) or the Matrix42 Worker service has been stopped (not killed), the Workflow Engine proceeds with "graceful shutdown" procedure, which means the automatic suspending of all running Workflows, persisting them in Instance Store, and then automatically resuming as soon the overall infrastructure (AppSever and Worker) is up again.
Graceful shutdown procedure is running until all Workflows are unloaded. If the Workflow executes a long-running Activity (e.g. "Invoke Powershell" with a heavy script) the stopping of the Worker could take a while, until the Activity execution is finished.
- Kill process: If the Workflow Worker Windows Service is just killed (not stopped), the "graceful shutdown" procedure is not executed, and all Workflow Instances that were running on the Worker go, after some delay, to the Failed state. Afterward, they can be manually reanimated from the last Persistence point
The Workspace Management application supports two alternative implementations of the Workflow Engine, the basic one which is based on Microsoft AppFabric, and a new one, based on Matrix42 Workers.
The first version of the Workflow Engine is fully based on the Microsoft AppFabric module which out-of-the-box provides an implementation of all the basic tasks of the Workflow Engine, like hosting Workflows, monitoring, and persistence. The specifics of the AppFabric Engine is a way how the prepared Workflows are deployed and hosted. Using the "Publish" or "Publish Repository" action available either in Matrix42 Software Asset and Service Management or in the Workflow Studio, the prepared Workflows are deployed to the Application Server, to folder "/svc/WF/", and the System dynamically creates Web Services endpoints for each deployed Workflow version. In the end, each version of the Workflow represents the Workflow Service.
Due to a couple of downsides of the previous Workflow Engine implementation based on AppFabric, such as a problem with performance, using enormous system resources and a problem with the horizontal scaling of the Workflow execution, the new concept of Workflow execution called Matrix42 Workers has been introduced.
Matrix42 Workers engine is designed to fully replace AppFabric engine in upcoming releases. The Worker roll-out strategy includes the gradual replacement of AppFabric components from version to version, with always available option to fallback to AppFabric when something goes wrong.
For more information on Matrix42 Worker architecture, worker management, default worker, installation process and update see Martix42 Worker Engine page.
Setup System Workflow Engine
The System settings define which Workflow Engine is used for the execution of a Workflow Instance. The settings can be found in the Global System Setting dialog in the Administration area, in the Workflows tabulator:
- Use legacy Workflow Engine (AppFabric): the same as in the previous product versions, the System keeps using the AppFabric for processing all kinds of Workflow commands.
- [TECHNICAL PREVIEW] Use Matrix42 Worker together with Legacy Workflow Engine (AppFabric): the System uses Matrix42 Workers for the starting and processing of all Workflows marked as “Worker Compatible” (see Set Execution Engine for Workflow). Workflows that are either not compatible with the Worker or have already been started on legacy Workflow Engine will keep using AppFabric for execution.
See Workflow Engine Migration Guide for more details on how to adjust the system for Matrix42 Worker.
If this option is not selected the Workflows, even marked as "Execute on Matrix42 Worker", keep using AppFabric for execution.
- Use Matrix42 Worker: the option is disabled for the latest Product version and will be enabled when the Matrix42 Worker engine will be in a productive state, and most of the Workflow Activities will be compatible with the Matrix42 Worker.
Configure Workflow Instance Monitoring
The new Workflow Engine based on Matrix42 Workers uses an individual implementation of the Workflow Instance Monitoring, which not only replicates the functionality of the AppFabric Monitoring module for the Workflow Instances running on Matrix42 Worker but also adds additional features that allow tailoring the monitoring for each specific environment.
The System provides two levels of monitoring capabilities for Workflow Instances running on Matrix42 Worker Engine. All the Workflow Instances regardless of durability can be configured to utilize Workflow event collection capabilities, allowing data at varying verbosity to be collected for monitoring and troubleshooting purposes.
The engine differentiates two different levels for collecting events (see mark 4 on Workflow Engine Settings image above):
- Error Only
Workflow Engine Definition property of the Global System Settings also requires appropriate configuration of the Monitoring Level in the Workflow settings. For more detail, see Manage Workflows: General Dialog Page settings.
The Workflow Engine records to Monitoring database minimal set of Workflow Instance events, which includes events on staring, finishing workflow, suspend and resume points, and also, in case of error, all the events accumulated from the last resume point to failed Workflow activity.
The same way it is done in AppFabric, the Workflow Engine records ALL the events thrown by the Instance.
The environment in which running a massive amount of the Workflows generates huge amounts of events, can cause serious performance and lack of resource problems on processing and recording them. Usually, most of these events are recorded and then in few days automatically cleaned up and never reviewed in the Visual Tracking. To optimize the System resources usage the Error Only level is introduced, which provides enough level of information to figure out the issue for most of the cases. For special cases, when the Error Only level does not provide enough data the Troubleshooting level can be used.
In case the Error Only option is set for the overall system, it is possible to elevate the level for each particular Workflow. It can be used for troubleshooting the workflow(-s), without putting extra pressure on the overall System. For that case, the Monitoring Level can be set in Workflow dialog general page (see Manage Workflows for more details).
The Workflows running on AppFabric Workflow Engine uses AppFabric Monitoring module. See "Monitoring Applications using Windows Server AppFabric" for more details.
Workflow Worker Engine stores the persisted Workflow Instances and Workflow Monitoring events in Database. By default, the persisted instances are stored in the Production database in the table [dbo].[WFPInstancesTable]. The monitoring events of the Workflow Worker Engine are recorded to table [dbo].[WorkerEvents], which by default hosted in the AppFabric Monitoring database (e.g. M42Monitoring).
On Environments with a high volume of the executing Workflows, the size of the Instances and Monitoring databases could be essential, which significantly affects the size of the Production database. To improve the System performance and maintainability for such cases, it is recommended to relocate the Workflows tables to a dedicated database.
The Setup API provides the Powershell Cmdlets for relocating Workflows tables to a specified database.
New-WMWorkerDatabase -WorkerDBName "NewDBName"
WorkerDBName - the name of the Database on the same SQL Server where to move the Workflows data. The Cmdlet creates a database if it does not exists.
To prevent data losses on relocating the Workflows data to a new database put the System to the Maintenance mode first. Use the "Move Worker Workflow Data" as a template.
Workflow processing optimization
Cleaning obsolete Workflow Instances
The System automatically runs the background engines which periodically clean ups Workflow Instance from the Production database, and all related data from Persistence and Monitoring databases. To change the timeframe the mentioned Workflow Instance stays in the System:
- In Administration application, open Engine Activations management area;
- Find and edit Clean Up Obsolete Objects engine activation;
- Open dialog view Active Engines and edit related engine Clean Up Obsolete Objects;
- Set the number of days for:
- Completed successfully workflow instances
- Failed workflow instances
Workflow Infinite Loops Protection
If the Workflow is badly designed it can lead to infinite loops on Workflow Instance execution and overall blocking of the Workflow Engine, as some instances are always running and there is no capacity to execute new Workflow commands. To disable such negative impacts of the Infinite loops the System uses the protection mechanism which automatically terminates the Workflow Instances in case the infinite loop detected, and the amount of iterations exceeds the configured number in the Production database
By default, the System supports 10000 iterations in Workflow Instance before it will be classified as an infinite loop.
Regulate the maximum size of the Monitoring Database
Infinite loops & Database Purge
For cases when the System runs monitoring in Troubleshooting mode and some Workflow Instances enters infinite loops, it could easily lead to the very fast growth of the Monitoring Database size and missing free hard-disk space on the Database Server. To prevent this scenario the Workflow Engine supports the automatic monitoring database purge mechanism, which automatically removes the oldest Monitoring records when the database exceeds the maximum allowed size.
By default, the allowed size is 1Gb, but you can change in the Production database
MonitoringMaxTableSizeattribute, which defines the database size in Megabytes.
The System uses engine activation Workflow Monitoring Autopurge to automatically start the purge function, which is configured out-of-the-box to start once a day at night.
Workflow Activities & Variables Max Length
For cases when the System runs monitoring in Troubleshooting mode and some Workflow activities output a lot of variables, it could easily lead to the very fast growth of the Monitoring Database size and missing free hard-disk space on the Database Server. To prevent this scenario you can change in the Production database
SPSGlobalConfigurationClassWorkflowEngine table in
MonitoringMaxVariableSizeattribute, by default variables data max length is 2000 symbols.