cumulusci.tasks.bulkdata package

Submodules

class cumulusci.tasks.bulkdata.base_generate_data_task.BaseGenerateDataTask(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.core.tasks.BaseTask

Abstract base class for any class that generates data in a SQL DB.

generate_data(session, engine, base, num_records, current_batch_num)[source]

Abstract methods for base classes to really generate the data into an open session.

static init_db(db_url, mappings)[source]
task_docs = '\n Use the `num_records` option to specify how many records to generate.\n Use the `mapping` option to specify a mapping file.\n '
task_options = {'database_url': {'description': 'A path to put a copy of the sqlite database (for debugging)', 'required': False}, 'mapping': {'description': 'A mapping YAML file to use', 'required': False}, 'num_records': {'description': 'How many records to generate: total number of opportunities.', 'required': False}}
class cumulusci.tasks.bulkdata.delete.DeleteData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask, cumulusci.tasks.bulkdata.utils.BulkJobTaskMixin

compose_query(obj, where)[source]
task_options = {'hardDelete': {'description': 'If True, perform a hard delete, bypassing the recycle bin. Default: False'}, 'objects': {'description': 'A list of objects to delete records from in order of deletion. If passed via command line, use a comma separated string', 'required': True}, 'where': {'description': "A SOQL where-clause (without the keyword WHERE). Only available when 'objects' is length 1.", 'required': False}}
class cumulusci.tasks.bulkdata.extract.ExtractData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.bulkdata.utils.BulkJobTaskMixin, cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

task_options = {'database_url': {'description': 'A DATABASE_URL where the query output should be written'}, 'mapping': {'description': 'The path to a yaml file containing mappings of the database fields to Salesforce object fields', 'required': True}, 'sql_path': {'description': 'If set, an SQL script will be generated at the path provided This is useful for keeping data in the repository and allowing diffs.'}}
class cumulusci.tasks.bulkdata.factory_utils.Adder(x=0)[source]

Bases: object

A more flexible alternative to Factoryboy sequences. You can create and
destroy them wherever you want.
>>> x = Adder(10)
>>> x(1)
11
>>> x(1)
12
>>> x.reset(5)
>>> x(2)
7
>>> x(2)
9
reset(x)[source]
class cumulusci.tasks.bulkdata.factory_utils.BaseDataFactory(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.bulkdata.base_generate_data_task.BaseGenerateDataTask

Abstract base class for any FactoryBoy based generator

generate_data(session, engine, base, num_records, current_batch_num)[source]

Abstract methods for base classes to really generate the data into an open session.

make_factories(classes)[source]

Subclass to generate factory classes based on ORM classes.

make_records(num_records, factories, current_batch_num)[source]

Subclass to make db records using factories.

class cumulusci.tasks.bulkdata.factory_utils.Factories(session, orm_classes, collection)[source]

Bases: object

Thin collector for the factories and a place to experiment with techniques for better scalability than the create_batch function from FactoryBoy.

static add_session(fact, session, orm_classes)[source]

Attach the session to the factory

create_batch(classname, batchsize, **kwargs)[source]
class cumulusci.tasks.bulkdata.factory_utils.ModuleDataFactory(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.bulkdata.factory_utils.BaseDataFactory

datafactory_classes_module = None
make_factories(classes)[source]

Subclass to generate factory classes based on ORM classes.

class cumulusci.tasks.bulkdata.generate.GenerateMapping(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

core_fields = ['Id', 'Name', 'FirstName', 'LastName']
task_docs = '\n Generate a mapping file for use with the `extract_dataset` and `load_dataset` tasks.\n This task will examine the schema in the specified org and attempt to infer a\n mapping suitable for extracting data in packaged and custom objects as well as\n customized standard objects.\n\n Mappings must be serializable, and hence must resolve reference cycles - situations\n where Object A refers to B, and B also refers to A. Mapping generation will stop\n and request user input to resolve such cycles by identifying the correct load order.\n Alternately, specify the `ignore` option with the name of one of the\n lookup fields to suppress it and break the cycle. `ignore` can be specified as a list in\n `cumulusci.yml` or as a comma-separated string at the command line.\n\n In most cases, the mapping generated will need minor tweaking by the user. Note\n that the mapping omits features that are not currently well supported by the\n `extract_dataset` and `load_dataset` tasks, such as references to\n the `User` object.\n '
task_options = {'ignore': {'description': 'Object API names, or fields in Object.Field format, to ignore'}, 'namespace_prefix': {'description': 'The namespace prefix to use'}, 'path': {'description': 'Location to write the mapping file', 'required': True}}
class cumulusci.tasks.bulkdata.generate_and_load_data.GenerateAndLoadData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

Orchestrate creating tempfiles, generating data, loading data, cleaning up tempfiles and batching.

task_docs = "\n Orchestrate creating tempfiles, generating data, loading data, cleaning up tempfiles and batching.\n\n CCI has features for generating data and for loading them into orgs. This task pulls them\n together to give some useful additional features, such as storing the intermediate data in\n a tempfile (the default behavior) and generating the data in batches instead of all at\n once (controlled by the `batch_size` option).\n\n The simplest possible usage is to specify the number of records you'd like generated, a\n mapping file that defines the schema and a data generation task written in Python to actually\n generate the data.\n\n Use the `num_records` option to specify how many records to generate.\n Use the `mapping` option to specify a mapping file.\n Use 'data_generation_task' to specify what Python class to use to generate the data.'\n Use 'batch_size' to specify how many records to generate and upload in every batch.\n\n By default it creates the data in a temporary file and then cleans it up later. Specify database_url if you\n need more control than that. Existing data tables will be emptied before being refilled.\n Your database will be completely deleted!\n\n If you use database_url and batch_size together, latter batches will overwrite\n earlier batches in the database and the first batch will replace tables if they exist.\n\n A table mapping IDs to SFIds will persist across batches and will grow monotonically.\n\n If your generator class makes heavy use of Faker, you might be interested in this patch\n which frequently speeds Faker up. Adding that code to the bottom of your generator file may\n help accelerate it.\n\n https://sfdc.co/bwKxDD\n "
task_options = {'batch_size': {'description': 'How many records to create and load at a time.', 'required': False}, 'bulk_mode': {'description': 'Set to Serial to force serial mode on all jobs. Parallel is the default.'}, 'data_generation_options': {'description': 'Options to pass to the data generator.', 'required': False}, 'data_generation_task': {'description': 'Fully qualified class path of a task to generate the data. Look at cumulusci.tasks.bulkdata.tests.dummy_data_factory to learn how to write them.', 'required': True}, 'database_url': {'description': 'The database url to a database containing the test data to load'}, 'debug_dir': {'description': 'Store temporary DB files in debug_dir for easier debugging.'}, 'ignore_row_errors': {'description': 'If True, allow the load to continue even if individual rows fail to load.'}, 'mapping': {'description': 'The path to a yaml file containing mappings of the database fields to Salesforce object fields', 'required': False}, 'num_records': {'description': 'How many records to generate. Precise calcuation depends on the generator.', 'required': True}, 'num_records_tablename': {'description': 'Which table to count records in.', 'required': False}, 'replace_database': {'description': 'Confirmation that it is okay to delete the data in database_url'}, 'reset_oids': {'description': 'If True (the default), and the _sf_ids tables exist, reset them before continuing.', 'required': False}, 'sql_path': {'description': 'If specified, a database will be created from an SQL script at the provided path'}, 'start_step': {'description': 'If specified, skip steps before this one in the mapping', 'required': False}, 'vars': {'description': 'Variables that the generate or load tasks might need.'}}
class cumulusci.tasks.bulkdata.generate_mapping.GenerateMapping(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

core_fields = ['Id', 'Name', 'FirstName', 'LastName']
task_docs = '\n Generate a mapping file for use with the `extract_dataset` and `load_dataset` tasks.\n This task will examine the schema in the specified org and attempt to infer a\n mapping suitable for extracting data in packaged and custom objects as well as\n customized standard objects.\n\n Mappings must be serializable, and hence must resolve reference cycles - situations\n where Object A refers to B, and B also refers to A. Mapping generation will stop\n and request user input to resolve such cycles by identifying the correct load order.\n Alternately, specify the `ignore` option with the name of one of the\n lookup fields to suppress it and break the cycle. `ignore` can be specified as a list in\n `cumulusci.yml` or as a comma-separated string at the command line.\n\n In most cases, the mapping generated will need minor tweaking by the user. Note\n that the mapping omits features that are not currently well supported by the\n `extract_dataset` and `load_dataset` tasks, such as references to\n the `User` object.\n '
task_options = {'ignore': {'description': 'Object API names, or fields in Object.Field format, to ignore'}, 'namespace_prefix': {'description': 'The namespace prefix to use'}, 'path': {'description': 'Location to write the mapping file', 'required': True}}
class cumulusci.tasks.bulkdata.load.LoadData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.bulkdata.utils.BulkJobTaskMixin, cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

task_options = {'bulk_mode': {'description': 'Set to Serial to force serial mode on all jobs. Parallel is the default.'}, 'database_url': {'description': 'The database url to a database containing the test data to load'}, 'ignore_row_errors': {'description': 'If True, allow the load to continue even if individual rows fail to load.'}, 'mapping': {'description': 'The path to a yaml file containing mappings of the database fields to Salesforce object fields', 'required': False}, 'reset_oids': {'description': 'If True (the default), and the _sf_ids tables exist, reset them before continuing.', 'required': False}, 'sql_path': {'description': 'If specified, a database will be created from an SQL script at the provided path'}, 'start_step': {'description': 'If specified, skip steps before this one in the mapping', 'required': False}}
class cumulusci.tasks.bulkdata.utils.BulkJobTaskMixin[source]

Bases: object

class cumulusci.tasks.bulkdata.utils.EpochType(*args, **kwargs)[source]

Bases: sqlalchemy.sql.type_api.TypeDecorator

epoch = datetime.datetime(1970, 1, 1, 0, 0)
impl

alias of sqlalchemy.sql.sqltypes.Integer

process_bind_param(value, dialect)[source]

Receive a bound parameter value to be converted.

Subclasses override this method to return the value that should be passed along to the underlying TypeEngine object, and from there to the DBAPI execute() method.

The operation could be anything desired to perform custom behavior, such as transforming or serializing data. This could also be used as a hook for validating logic.

This operation should be designed with the reverse operation in mind, which would be the process_result_value method of this class.

Parameters:
  • value – Data to operate upon, of any type expected by this method in the subclass. Can be None.
  • dialect – the Dialect in use.
process_result_value(value, dialect)[source]

Receive a result-row column value to be converted.

Subclasses should implement this method to operate on data fetched from the database.

Subclasses override this method to return the value that should be passed back to the application, given a value that is already processed by the underlying TypeEngine object, originally from the DBAPI cursor method fetchone() or similar.

The operation could be anything desired to perform custom behavior, such as transforming or serializing data. This could also be used as a hook for validating logic.

Parameters:
  • value – Data to operate upon, of any type expected by this method in the subclass. Can be None.
  • dialect – the Dialect in use.

This operation should be designed to be reversible by the “process_bind_param” method of this class.

cumulusci.tasks.bulkdata.utils.create_table(mapping, metadata)[source]

Given a mapping data structure (from mapping.yml) and SQLAlchemy metadata, create a table matching the mapping.

Mapping should be a dict-like with keys “fields”, “table” and optionally “oid_as_pk” and “record_type”

cumulusci.tasks.bulkdata.utils.download_file(uri, bulk_api)[source]

Download the bulk API result file for a single batch

cumulusci.tasks.bulkdata.utils.fields_for_mapping(mapping)[source]

Summarize the list of fields in a table mapping

cumulusci.tasks.bulkdata.utils.generate_batches(num_records, batch_size)[source]

Generate batch size list for splitting a number of tasks into batch jobs.

Given a number of records to split up, and a batch size, generate a stream of batchsize, index pairs

cumulusci.tasks.bulkdata.utils.get_lookup_key_field(lookup, sf_field)[source]
cumulusci.tasks.bulkdata.utils.process_incoming_rows(f, record_type=None)[source]
cumulusci.tasks.bulkdata.utils.setup_epoch(inspector, table, column_info)[source]

Module contents

class cumulusci.tasks.bulkdata.DeleteData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask, cumulusci.tasks.bulkdata.utils.BulkJobTaskMixin

compose_query(obj, where)[source]
task_options = {'hardDelete': {'description': 'If True, perform a hard delete, bypassing the recycle bin. Default: False'}, 'objects': {'description': 'A list of objects to delete records from in order of deletion. If passed via command line, use a comma separated string', 'required': True}, 'where': {'description': "A SOQL where-clause (without the keyword WHERE). Only available when 'objects' is length 1.", 'required': False}}
class cumulusci.tasks.bulkdata.ExtractData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.bulkdata.utils.BulkJobTaskMixin, cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

task_options = {'database_url': {'description': 'A DATABASE_URL where the query output should be written'}, 'mapping': {'description': 'The path to a yaml file containing mappings of the database fields to Salesforce object fields', 'required': True}, 'sql_path': {'description': 'If set, an SQL script will be generated at the path provided This is useful for keeping data in the repository and allowing diffs.'}}
class cumulusci.tasks.bulkdata.GenerateMapping(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

core_fields = ['Id', 'Name', 'FirstName', 'LastName']
task_docs = '\n Generate a mapping file for use with the `extract_dataset` and `load_dataset` tasks.\n This task will examine the schema in the specified org and attempt to infer a\n mapping suitable for extracting data in packaged and custom objects as well as\n customized standard objects.\n\n Mappings must be serializable, and hence must resolve reference cycles - situations\n where Object A refers to B, and B also refers to A. Mapping generation will stop\n and request user input to resolve such cycles by identifying the correct load order.\n Alternately, specify the `ignore` option with the name of one of the\n lookup fields to suppress it and break the cycle. `ignore` can be specified as a list in\n `cumulusci.yml` or as a comma-separated string at the command line.\n\n In most cases, the mapping generated will need minor tweaking by the user. Note\n that the mapping omits features that are not currently well supported by the\n `extract_dataset` and `load_dataset` tasks, such as references to\n the `User` object.\n '
task_options = {'ignore': {'description': 'Object API names, or fields in Object.Field format, to ignore'}, 'namespace_prefix': {'description': 'The namespace prefix to use'}, 'path': {'description': 'Location to write the mapping file', 'required': True}}
class cumulusci.tasks.bulkdata.LoadData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.bulkdata.utils.BulkJobTaskMixin, cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

task_options = {'bulk_mode': {'description': 'Set to Serial to force serial mode on all jobs. Parallel is the default.'}, 'database_url': {'description': 'The database url to a database containing the test data to load'}, 'ignore_row_errors': {'description': 'If True, allow the load to continue even if individual rows fail to load.'}, 'mapping': {'description': 'The path to a yaml file containing mappings of the database fields to Salesforce object fields', 'required': False}, 'reset_oids': {'description': 'If True (the default), and the _sf_ids tables exist, reset them before continuing.', 'required': False}, 'sql_path': {'description': 'If specified, a database will be created from an SQL script at the provided path'}, 'start_step': {'description': 'If specified, skip steps before this one in the mapping', 'required': False}}
class cumulusci.tasks.bulkdata.GenerateAndLoadData(project_config, task_config, org_config=None, flow=None, name=None, stepnum=None, **kwargs)[source]

Bases: cumulusci.tasks.salesforce.BaseSalesforceApiTask.BaseSalesforceApiTask

Orchestrate creating tempfiles, generating data, loading data, cleaning up tempfiles and batching.

task_docs = "\n Orchestrate creating tempfiles, generating data, loading data, cleaning up tempfiles and batching.\n\n CCI has features for generating data and for loading them into orgs. This task pulls them\n together to give some useful additional features, such as storing the intermediate data in\n a tempfile (the default behavior) and generating the data in batches instead of all at\n once (controlled by the `batch_size` option).\n\n The simplest possible usage is to specify the number of records you'd like generated, a\n mapping file that defines the schema and a data generation task written in Python to actually\n generate the data.\n\n Use the `num_records` option to specify how many records to generate.\n Use the `mapping` option to specify a mapping file.\n Use 'data_generation_task' to specify what Python class to use to generate the data.'\n Use 'batch_size' to specify how many records to generate and upload in every batch.\n\n By default it creates the data in a temporary file and then cleans it up later. Specify database_url if you\n need more control than that. Existing data tables will be emptied before being refilled.\n Your database will be completely deleted!\n\n If you use database_url and batch_size together, latter batches will overwrite\n earlier batches in the database and the first batch will replace tables if they exist.\n\n A table mapping IDs to SFIds will persist across batches and will grow monotonically.\n\n If your generator class makes heavy use of Faker, you might be interested in this patch\n which frequently speeds Faker up. Adding that code to the bottom of your generator file may\n help accelerate it.\n\n https://sfdc.co/bwKxDD\n "
task_options = {'batch_size': {'description': 'How many records to create and load at a time.', 'required': False}, 'bulk_mode': {'description': 'Set to Serial to force serial mode on all jobs. Parallel is the default.'}, 'data_generation_options': {'description': 'Options to pass to the data generator.', 'required': False}, 'data_generation_task': {'description': 'Fully qualified class path of a task to generate the data. Look at cumulusci.tasks.bulkdata.tests.dummy_data_factory to learn how to write them.', 'required': True}, 'database_url': {'description': 'The database url to a database containing the test data to load'}, 'debug_dir': {'description': 'Store temporary DB files in debug_dir for easier debugging.'}, 'ignore_row_errors': {'description': 'If True, allow the load to continue even if individual rows fail to load.'}, 'mapping': {'description': 'The path to a yaml file containing mappings of the database fields to Salesforce object fields', 'required': False}, 'num_records': {'description': 'How many records to generate. Precise calcuation depends on the generator.', 'required': True}, 'num_records_tablename': {'description': 'Which table to count records in.', 'required': False}, 'replace_database': {'description': 'Confirmation that it is okay to delete the data in database_url'}, 'reset_oids': {'description': 'If True (the default), and the _sf_ids tables exist, reset them before continuing.', 'required': False}, 'sql_path': {'description': 'If specified, a database will be created from an SQL script at the provided path'}, 'start_step': {'description': 'If specified, skip steps before this one in the mapping', 'required': False}, 'vars': {'description': 'Variables that the generate or load tasks might need.'}}