Metadata ETL

Introduction to Metadata ETL

“ETL” refers to “extract, transform, and load” operations, usually applied to data. CumulusCI offers a suite of functionality we call Metadata ETL. Metadata ETL makes it easy to define automation that executes targeted transformations of metadata that already exists in an org.

Metadata ETL is particularly useful for building automation in projects that extend other managed packages or that perform complex setup operations during installations, such as through MetaDeploy. By using Metadata ETL tasks, projects can often avoid storing and deploying unpackaged metadata by instead extracting metadata from the target org, making changes, and then re-deploying. This mode of configuration is lower-risk and lower-maintenance than storing extensive unpackaged metadata, which may become out-of-sync, incur accidental feature dependencies, or entail more destructive deployment operations.

A primary example use case for Metadata ETL is deployment of Standard Value Sets. Standard Value Sets, which define the picklist values available on standard fields like Opportunity.StageName, are not packageable, and as such must be part of an application’s unpackaged metadata. They’re critical to many applications: a Business Process, for example, will fail to deploy if the Stage values it includes are not available. And lastly, they come with a serious danger for deployment into subscriber orgs: deploying Standard Value Sets is an overwrite operation, so all existing values in the target org that aren’t part of the deployment are deactivated. This means that it’s neither safe nor maintainable to store static Standard Value Set metadata in a project and deploy it.

These three facets - non-packageability, application requirements, and deployment safety - all support a Metadata ETL approach. Rather than attempting to deploy static metadata stored in the repository, the product’s automation should extract the Standard Value Set metadata from the org, transform it to include the desired values (as well as all existing customization), and load the transformed metadata back into the org. CumulusCI now ships with a task, add_standard_value_set_entries, that makes it easy to do just this:

add_standard_value_set_entries:
    options:
        entries:
            - fullName: "New_Value"
              label: "New Value"
              closed: False
        api_names:
            - CaseStatus

This task would retrieve the existing Case.Status picklist value set from the org, add the New_Value entry to it, and redeploy the modified metadata - ensuring that the application’s needs are met with a safe, minimal intervention in the target org.

Standard Metadata ETL Tasks

CumulusCI includes several Metadata ETL tasks in its standard library. For information about all of the available tasks, see cci org list for tasks in the group Metadata Transformations.

Most Metadata ETL tasks accept the option api_names, which specifies the developer names of the specific metadata components which should be included in the operation. In most cases, more than one entity may be transformed in a single operation. Each task performs a single Metadata API retrieve and a single atomic deployment. Please note, however, that the extract-transform-load operation as a whole is not atomic; it is not safe to run Metadata ETL tasks in parallel or to mutate metadata by other means during the run of a Metadata ETL task.

Consult the Task Reference or use the cci task info command for more information on the usage of each task.

The Metadata ETL framework makes it easy to add more tasks. For information about implementing Metadata ETL tasks, see TODO: link to section in Python customization.

Namespace Injection

All out-of-the-box Metadata ETL tasks accept a Boolean managed option. If True, CumulusCI will replace the token %%%NAMESPACE%%% in API names and in values used for transforming metadata with the project’s namespace; if False, the token will simply be removed. See Namespace Injection for more information.

Implementation of Metadata ETL Tasks

This section covers internals of the Metadata ETL framework, and is intended for users who wish to build their own Metadata ETL tasks.

The Metadata ETL framework, and out-of-the-box Metadata ETL tasks, are part of the cumulusci.tasks.metadata_etl package. The cumulusci.tasks.metadata_etl.base module contains all of the base classes inherited by Metadata ETL classes.

The easiest way to implement a Metadata ETL class that extracts, transforms, and loads a specific entity, such as CustomObject or Layout, is to subclass MetadataSingleEntityTransformTask.

This abstract base class has two override points: the class attribute entity should be defined to the Metadata API entity that this class is intended to transform, and the method _transform_entity(self, metadata: MetadataElement, api_name: str) must be overridden. This method should make any desired changes to the supplied MetadataElement, and either return a MetadataElement for deployment, or None to suppress deployment of this entity. Classes may also opt to include their own options in task_options, but generally should also incorporate the base class’s options, and override _init_options() (super’s implementation should also be called to ensure that supplied API names are processed appropriately).

The SetDuplicateRuleStatus class is a simple example of implementing a MetadataSingleEntityTransformTask subclass, presented here with additional comments:

from typing import Optional

from cumulusci.tasks.metadata_etl import MetadataSingleEntityTransformTask
from cumulusci.utils.xml.metadata_tree import MetadataElement
from cumulusci.core.utils import process_bool_arg


class SetDuplicateRuleStatus(MetadataSingleEntityTransformTask):
    # Subclasses *must* define `entity`
    entity = "DuplicateRule"

    # Most subclasses include the base class's options via
    # **MetadataSingleEntityTransformTask.task_options. Further
    # options may be added for this specific task. The base class
    # options include in particular the standard `api_names` option,
    # which base class functionality requires.
    task_options = {
        "active": {
            "description": "Boolean value, set the Duplicate Rule to either active or inactive",
            "required": True,
        },
        **MetadataSingleEntityTransformTask.task_options,
    }

    # The `_transform_entity()` method must be overriden.
    def _transform_entity(
        self, metadata: MetadataElement, api_name: str
    ) -> Optional[MetadataElement]:
        # This method modifies the supplied `MetadataElement`, using methods
        # from CumulusCI's metadata_tree module, to match the desired configuration.
        status = "true" if process_bool_arg(self.options["active"]) else "false"
        metadata.find("isActive").text = status

        # Always return the modified `MetadataElement` if deployment is desired.
        # To not deploy this element, return `None`.
        return metadata

Advanced Metadata ETL Base Classes

Most Metadata ETL tasks subclass MetadataSingleEntityTransformTask. However, the framework also includes classes that provide more flexibility for complex metadata transformation and synthesis operations.

The most general base class available is BaseMetadataETLTask. Concrete tasks should rarely subclass BaseMetadataETLTask. Doing so requires you to generate package.xml content manually by overriding _get_package_xml_content(), and requires you to override _transform(), which directly accesses retrieved metadata files on disk in self.retrieve_dir and places transformed versions into self.deploy_dir. Subclasses must also set the Boolean class attributes deploy and retrieve to define the desired mode of operation.

Tasks which wish to synthesize metadata, without doing a retrieval, should subclass BaseMetadataSynthesisTask. Subclasses must override _synthesize() to generate metadata files in self.deploy_dir. The framework will automatically create a package.xml and perform a deployment.

BaseMetadataTransformTask can be used as the base class for ETL tasks that require more flexibility than is permitted by MetadataSingleEntityTransformTask, such as tasks that must mutate multiple Metadata API entities in a single operation. Subclasses must override _get_entities() to return a dict mapping Metadata API entities to collections of API names. (The base class will generate a corresponding package.xml). Subclasses must also implement _transform(), as with BaseMetadataETLTask.

UpdateFirstAttributeTextTask is a base class and generic concrete task that makes it easy to perform a specific, common transformation: setting the value of the first instance of a specific top-level tag in a given metadata entity. Subclasses (or tasks defined in cumulusci.yml) must define the entity, targeted attribute, and desired value to set. Example:

assign_account_compact_layout:
  description: "Assigns the Fancy Compact Layout as Account's Compact Layout."
  class_path: cumulusci.tasks.metadata_etl.UpdateFirstAttributeTextTask
  options:
      managed: False
      namespace_inject: $project_config.project__package__namespace
      entity: CustomObject
      api_names: Account
      attribute: compactLayoutAssignment
      value: "%%%NAMESPACE%%%Fancy_Account_Compact_Layout"