SEBA-702 Update synchronizer implementation documentation; expunge watchers Change-Id: I91a9f636b81668c29fca0132f66d8987c7d3b9bb

commit: 9894d971325e56d179017b47b1f51403ecc235cd [log] [tgz]
author: Scott Baker <smbaker@gmail.com> Wed Jul 10 12:48:07 2019 -0700
committer: Scott Baker <smbaker@gmail.com> Wed Jul 10 16:30:49 2019 -0700
tree: a066218c6345785fd77ae5eeb149e8751702508d
parent: 5c8cb376c899bcc10685926bb904b4d059200225 [diff]
diff --git a/docs/dev/sync_impl.md b/docs/dev/sync_impl.md
index 613bb57..a405e2d 100644
--- a/docs/dev/sync_impl.md
+++ b/docs/dev/sync_impl.md

@@ -8,137 +8,9 @@
 modules. Event-based synchronizers are simpler to implement, but lack the
 aforementioned guarantees.
 
-## Differences between Work-based and Event-based Synchronizers
+The current XOS Synchronizer implementation is work-based. The Synchronizer framework determines whether models are up-to-date based on examining their content, typically using timestamps embedded in the models. _Sync Steps_, also known as _Actuators_ are required to be implemented in an idempotent manner. In particular, it should not cause an error for model synchronization to occur multiple times, even if nothing has changed in the model. In the worst case, synchronizing more often than necessary is a loss of performance, not a loss of correctness.
 
-|   Mechanism   |   Work-Based Synchronizers   |   Event-based Synchronizers   |
-|--------------------|------------------------------------------|------------------------------------------|
-| Control-logic binding | Check if models are up-to-date based on their content | React to events notifying of model updates |
-| Implementation constraints | Modules have to be idempotent | Modules are not required to be idempotent |
-| Dependencies | Modules are executed in dependency order | Modules are executed reactively in an arbitrary order |
-| Concurrency | Non-dependent modules are executed concurrently | Modules are executed sequentially |
-| Error handling | Errors are propagated to dependencies; retries on failure | No error dependency; it’s up to the Synchronizer to cope with event loss |
-| Ease of implementation | Moderate | Easy |
-
-### Implementing an Event-based Synchronizer
-
-An Event-based Synchronizer is a collection of _Watcher_ modules. Each Watcher
-module listens for (i.e., watches) events pertaining to a particular model. The
-Synchronizer developer must provide the set of these modules. The steps for
-assembling a synchronizer once these modules have been implemented, are as
-follows:
-
-1. Run the generate watcher script: `gen_watcher.py <name of your app>`
-
-2. Set your Synchronizer-specific config options in the config file, and also
-   set `observer_enable_watchers` to true.
-
-3. Install python-redis by running `pip install redis` in your Synchronizer
-   container
-
-4. Link the redis container that comes packaged with XOS with your Synchronizer
-   container as `redis`.
-
-5. Drop your watcher modules in the directory `/opt/xos/synchronizers/<your
-   synchronizer>/steps`
-
-6. Run your synchronizer by running `/opt/xos/synchronizers/<your
-   synchronizer>/run-synchronizer.sh`
-
-### Watcher Module API
-
-* `def handle_watched_object(self, o)`: A method, called every time a watched
-  object is added, deleted, or updated.
-
-* `int watch_degree`: A variable of type `int` that defines the set of watched
-  models _implicitly_. If this module synchronizes models A and B, then the
-  watched set is defined by the models that are a distance `watch_degree` from
-  A or from B in the model dependency graph.
-
-* `ModelLink watched`: A list of type `ModelLink` that defines the set of
-  watched models _explicitly_. If this is defined, then `watch_degree` is
-  ignored.
-
-* `Model synchronizes`: A list of type `Model` that identifies the model that
-  this  module synchronizes.
-
-The main body of a watcher module is the function `handle_watched_object`,
-which responds to operations on objects that the module synchronizes. If the
-module responds to multiple object types, then it must determine the type of
-object, and proceed to process it accordingly.
-
-```python
-def handle_watched_object(self, o):
-    if (type(o) is Slice):
-        self.handle_changed_slice(o)
-    elif (type(o) is Node):
-        self.handle_changed_node(o)
-```
-
-#### Linking the Watcher into the Synchronizer
-
-There are two ways of linking in a Watcher. Using them both does not hurt. The
-first method is complex but robust, and involves making the declaration in the
-data model, by ensuring that the model that your synchronizer would like to
-watch is linked to the model that it actuates. For instance, if your
-synchronizer actuates a service model called Fabric, which links the Instance
-model, then you would ensure that Instance is a dependency of Fabric by making
-the following annotation in the Fabric model:
-
-```python
-class Fabric(Service):
-    ...
-    ...
-    xos_links = [ModelLink(Instance,via='instance',into='ip')]
-```
-
-There can be several `ModelLink` specifications in a single `xos_links`
-declaration, each encapsulating the referenced model, the field in the current
-model that links to it, and the destination field in which the watcher is
-interested. If into is omitted, then the watcher is notified of all changes in
-the linked model, irrespective of the fields that change.
-
-The above change needs to be backed up with an instruction to the synchronizer
-that the watcher is interested in being notified of changes to its
-dependencies. This is done through a `watch_degree` annotation.
-
-```python
-class SyncFabricService(SyncStep):
-   watch_degree=1
-```
-
-By default, `watch_degree = 0`, means the Synchronizer watches nothing. When
-watch degree is 1, it watches one level of dependencies removed, and so on. If
-the `watch_degree` in the above code were 2, then this module would also get
-notified of changes in dependencies of the `Instance` model.
-
-The second way of linking in a watcher is to hardcode the watched model
-directly in the synchronizer:
-
-```python
-class SyncFabricService(SyncStep):
-    watched = [ModelLink(Instance,via='instance',into='ip')]
-```
-
-#### Activate the Watcher by Connecting to Redis
-
-* Set the `observer_enable_watchers` option to true in your XOS synchronizer
-  config file.
-
-* Add a link between your synchronizer container and the redis container by
-  including the following lines in the definition of your synchronizer's
-  docker-compose file. You may need to adapt these to the name of the project
-  used (e.g. cordpod)
-
-  ```yaml
-  - external_links:
-    - xos_redis:redis
-  ```
-
-* Ensure that there is a similar link between your XOS UI container and the
-  redis container.
-
-In addition to the above development tasks, you also need to make the following
-changes to your configuration to activate watchers.
+The Synchronizer framework has facilities to assist with dependency sorting, concurrency, and error handling.
 
 ### Implementing a Work-based Synchronizer
 
@@ -146,100 +18,32 @@
 module is invoked when a model is found to be outdated relative to its last
 synchronization. An actuator module can be self-contained and written entirely
 in Python, or it can be broken into a "dispatcher" and "payload", with the
-dispatcher implemented in Python and the payload implemented using Ansible. The
-Synchronizer core has built-in support for the dispatch of Ansible modules and
-helps extract parameters from the synchronized model and translate them into
-the parameters required by the corresponding Ansible script. It also tracks an
-hierarchically structured list of such ansible scripts on the filesystem, for
-operators to use to inspect and debug a system. The procedure for building a
-work-based synchronizer is as follows:
-
-1. Run the gen_workbased.py script. `gen_workbased <app name>`.
-
-2. Set your Synchronizer-specific config options in the config file, and also
-   set observer_enable_watchers to False.
-
-3. Drop your actuator modules in the directory `/opt/xos/synchronizers/<your
-   synchronizer>/steps`
-
-4. Run your synchronizer by running `/opt/xos/synchronizers/<your
-   synchronizer>/run-synchronizer.sh`
+dispatcher implemented in Python and the payload implemented externally using
+a tool such as Ansible.
 
 ### Actuator Module API
 
-* `Model synchronizes`: A list of type `Model` that records the set of models
-  that the module synchronizes.
+* `Model observes`: A list of type `Model` classes that are observed by this step.
 
 * `def sync_record(self, object)`: A method that handles outdated objects.
 
-* `def delete_record(self, object)`" A method that handles object delection.
-
-* `def get_extra_attributes(self, object)`: A method that maps an object to the
-  parameters required by its Ansible payload. Returns a `dict` with those
-  parameters and their values.
+* `def delete_record(self, object)`" A method that handles object deletion.
 
 * `def fetch_pending(self, deleted)`: A method that fetches the set of pending
-  objects from the database. The synchronizer core provides a default
+  objects from the database. The synchronizer framework provides a default
   implementation.  Override only if you have a reason to do so.
 
-* `string template_name`: The name of the Ansible script that directly
-  interacts with the underlying substrate.
+#### Sync Steps
 
-#### Implementing a Step with Ansible
-
-To implement a step using Ansible, a developer must provide two things: an
-Ansible recipe, and a `get_extra_attributes` method, which maps attributes of
-the object into a dictionary that configures that Ansible recipe. The Ansible
-recipe comes in two parts, an inner payload and a wrapper that delivers that
-payload to the VMs associated with the service. The wrapper itself comes in two
-parts. A part that sets up the preliminaries:
-
-```python
----
-- hosts: "{{ instance_name }}"
-  connection: ssh
-  user: ubuntu
-  sudo: yes
-  gather_facts: no
-  vars:
-    - package_location: "{{ package_location }}"
-    - server_port: "{{ server_port }}"
-```
-
-The template variables `package_location` and `server_port` come out of the
-Python part of the Synchronizer implementation (discussed below). The outer
-wrapper then includes a set of Ansible roles that perform the required actions:
-
-```python
-roles:
-  - download_packages
-  - configure_packages
-  - start_server
-```
-
-The "payload" of the Ansible recipe contains an implementation of the roles, in
-this case, `download_packages`, `configure_packages`, and `start_server`. The
-concrete values of parameters required by the Ansible payload are provided in
-the implementation of the `get_extra_attributes` method in the Python part of
-the Synchronizer. This method receives an object from the data model and is
-charged with the task of converting the properties of that object into the set
-of properties required by the Ansible recipe, which are returned as a Python
-dictionary.
-
-```python
-def get_extra_attributes(self, o):
-        fields = {}
-        fields['package_location'] = o.package_location
-        fields['server_port'] = o.server_port
-        return fields
-```
-
-#### Implementing a Step without Ansible
-
-To implement a step without using Ansible, a developer need only implement the
-`sync_record` and `delete_record` methods, which get called for every pending
+To implement a step, a developer need only implement the
+`sync_record` and `delete_record` methods of the step, which get called for every pending
 object. These methods interact directly with the underlying substrate.
 
+There are a variety of implementations that are possible, for example calling a REST API
+endpoint on an external service is a pattern that is used by many existing synchronizers.
+Executing an ansible playbook is another option, and something that was done in the
+past, though no current synchronizers use that pattern.
+
 #### Managing Dependencies
 
 If your data models have dependencies between them, so that for one to be
@@ -253,16 +57,20 @@
 Synchronizer tries to execute your synchronization steps concurrently to
 whatever extent this is possible while still honoring dependencies.
 
-```python
-<in the definition of your model>
-xos_links = [ModelLink(dest=MyServiceUser,via='user'),ModelLink(dest=MyServiceDevice,via='device') ]
+Dependencies are typically specified in a model-deps file that has a simple
+json-based syntax. For example,
+
+```json
+{
+    "User": [
+        ["Site", "site", "users"],
+    ]
+}
 ```
 
-In the above example, the `xos_links` field declares two dependencies. The name
-`xos_links` is key, and so the field should be named as such. The dependencies
-are contained in a list of type `ModelLink`, each of which defines a type of
-object (a model) and an "accessor" field via which a related object of that
-type can be accessed.
+In the example above, this specifies that the `User` model depends on the `Site` model, and
+that these two models are linked by the fields `site` (in the `User` model) and `users` (in
+the `Site` model).
 
 #### Handling Errors
 
@@ -296,7 +104,41 @@
 2. It disables exponential backoff (i.e., the Synchronizer tries to synchronize
    your object every single time).
 
-#### Synchronizer Configuration Options
+### Responding to external activity
+
+The original purpose of the Synchronizer framework was to implement top-down control flow, but it
+was quickly discovered a Synchronizer is a convenient place to implement bottom-up feedback flow.
+To do this, a few new classes of steps were implemented.
+
+#### Event Steps
+
+Event steps allow external events to update state in the data model. Event steps typically use
+`Kafka` as an event bus, registering on a specific `topic`. When messages on the `topic` arrive,
+a method in the event step, `process_event` is called with the contents of the event. The event step
+is then free to use the API to modify, delete, or create models as necessary.
+
+#### Pull Steps
+
+Pull steps are similar to event steps, but use a polling mechanism instead of an event mechanism.
+Pull steps must implement a method called `pull_records`. This method is called periodically and
+allows the step to conduct any polling that is necessary. The step is then free to alter the data
+model.
+
+### Implementing model-to-model policies
+
+`model_polices` are yet another type of step. Rather than performing top-down control flow
+or bottom-up feedback flow, a `model_policy` implements a sideways action, a place for
+changes in one model to cause changes in another. For example, "When object A is created,
+also create object B and link it to object A" is one common policy pattern.
+
+`model_policies` must declare a `model_name` that the policy will operate on. After that,
+the policy will declare a set of handlers,
+
+* `handle_create(obj)`. Called whenever an object is created.
+* `handle_update(obj)`. Called whenever an object is modified.
+* `handle_delete(obj)`. Called whenever an object is deleted.
+
+### Synchronizer Configuration Options
 
 The following table summarizes the available configuration options. For
 historical reasons, they are called `observer_foo` since Synchronizers were
@@ -304,16 +146,15 @@
 
 |    Option    |   Default    |     Purpose     |
 |---------|----------|-----------|
-| `observer_disabled` |  False  |  A directive to run without synchronizing. Events are not relayed to the Synchronizer and no bookkeeping is done. |
-| `observer_steps_dir`  | N/A  | The path of the directory in which the Synchronizer will look for your watcher and actuator modules. |
-| `observer_sys_dir` | N/A | The path of the directory that enlists backend objects your synchronizer creates. This is like the `/sys` directory in an operating system. Each entry is a file that contains an Ansible recipe that creates, updates or deletes your object. When you debug your synchronizer, you can run these files manually. |
-| `observer_pretend` | False | This option runs the Synchronizer in "pretend" mode, in which synchronizer modules that use Ansible run in emulated mode, and do not actually execute backend API calls. |
-| `observer_proxy_ssh` |  N/A |    |
-| `observer_name` | N/A | The name of your Synchronizer. This is a required option. |
-| `observer_applist` | core | A list consisting of the Django apps that your Synchronizer uses. |
-| `observer_dependency_graph` | `/opt/xos/model-deps` | Dependencies between various models that your Synchronizer services. These are generated automatically by the Synchronizer utility `dmdot`. |
-| `observer_backoff_disabled` | True | Models whose synchronization fails are re-executed, but with intervals that increase exponentially. This option disables the exponential growth of the intervals. |
-| `observer_logstash_hostport` | N/A | The host name and port number (e.g. `xosmonitor.org:4132`) to which the Synchronizer streams its logs, on which a logstash server is running. |
-| `observer_log_file~ | N/A | The log file into which the Synchronizer logs are published. |
-| `observer_model_policies_dir` | N/A | The directory in which model policies are stored.|
+| `name` |  N/A  |  The name of the synchronizer |
+| `accessor` |  N/A  |  A subsection of the config file that describes the `username`, `password`, and `endpoint` to contact the XOS core. |
+| `core_version` |  N/A  |  Specifies the version of the core that is required by this synchronizer. |
+| `dependency_graph` | `/opt/xos/model-deps` | Dependencies between various models that your Synchronizer services. These may be generated manually or generated automatically using `xosgenx` |
+| `models_dir` | N/A | The directory in which model xproto is stored.|
+| `steps_dir`  | N/A  | The path of the directory in which the Synchronizer will look for your actuator modules. |
+| `model_policies_dir` | N/A | The directory in which model policies are stored.|
+| `pull_steps_dir` | N/A | The directory in which pull steps are stored.|
+| `event_steps_dir` | N/A | The directory in which event steps are stored.|
+| `event_bus` |  N/A  |  A subsection that describes the kafka endpoint used by the Event steps. Has two required fields, `kind` which must be set to `kafka` and `endpoint` which is the endpoint.|
+| `logging` | N/A | A section that describe logging settings.|
commit	9894d971325e56d179017b47b1f51403ecc235cd	[log] [tgz]
author	Scott Baker <smbaker@gmail.com>	Wed Jul 10 12:48:07 2019 -0700
committer	Scott Baker <smbaker@gmail.com>	Wed Jul 10 16:30:49 2019 -0700
tree	a066218c6345785fd77ae5eeb149e8751702508d
parent	5c8cb376c899bcc10685926bb904b4d059200225 [diff]