python_pachyderm

Mixins

Client

class python_pachyderm.client.Client(host=None, port=None, auth_token=None, root_certs=None, transaction_id=None, tls=None)[source]

Bases: python_pachyderm.mixin.admin.AdminMixin, python_pachyderm.mixin.auth.AuthMixin, python_pachyderm.mixin.debug.DebugMixin, python_pachyderm.mixin.enterprise.EnterpriseMixin, python_pachyderm.mixin.health.HealthMixin, python_pachyderm.mixin.pfs.PFSMixin, python_pachyderm.mixin.pps.PPSMixin, python_pachyderm.mixin.transaction.TransactionMixin, python_pachyderm.mixin.version.VersionMixin, object

Attributes
auth_token
transaction_id

Methods

activate_auth(subject[, github_token, ...])

Activates auth, creating an initial set of admins.

activate_enterprise(activation_code[, expires])

Activates enterprise.

authenticate_github(github_token)

Authenticates a GitHub user to the Pachyderm cluster.

authenticate_id_token(id_token)

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.

authenticate_oidc(oidc_state)

Authenticates a user to the Pachyderm cluster via OIDC.

authenticate_one_time_password(one_time_password)

Authenticates a user to the Pachyderm cluster using a one-time password.

authorize(repo, scope)

Authorizes the user to a given repo/scope.

batch_transaction(requests)

Executes a batch transaction.

binary([filter])

Gets the pachd binary.

commit(repo_name[, branch, parent, description])

A context manager for running operations within a commit.

copy_file(source_commit, source_path, ...[, ...])

Efficiently copies files already in PFS.

create_branch(repo_name, branch_name[, ...])

Creates a new branch.

create_pipeline(pipeline_name, transform[, ...])

Creates a pipeline.

create_pipeline_from_request(req)

Creates a pipeline from a CreatePipelineRequest object.

create_repo(repo_name[, description, update])

Creates a new Repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single Repo for an entire project you might have separate ``Repo``s for logs, metrics, database dumps etc.

create_secret(secret_name, data[, labels, ...])

Creates a new secret.

create_tf_job_pipeline(pipeline_name, tf_job)

Creates a pipeline.

create_tmp_file_set()

Creates a temporary fileset (used internally).

deactivate_auth()

Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

deactivate_enterprise()

Deactivates enterprise.

delete_all()

Deletes everything in Pachyderm.

delete_all_pipelines([force])

Deletes all pipelines.

delete_all_repos([force])

Deletes all repos.

delete_all_transactions()

Deletes all transactions.

delete_branch(repo_name, branch_name[, force])

Deletes a branch, but leaves the commits themselves intact.

delete_commit(commit)

Deletes a commit.

delete_file(commit, path)

Deletes a file from a Commit.

delete_job(job_id)

Deletes a job by its ID.

delete_pipeline(pipeline_name[, force, ...])

Deletes a pipeline.

delete_repo(repo_name[, force, ...])

Deletes a repo and reclaims the storage space it was using.

delete_secret(secret_name)

Deletes a secret.

delete_transaction(transaction)

Deletes a given transaction.

diff_file(new_commit, new_path[, ...])

Diffs two files.

dump([filter, limit])

Gets a debug dump.

extend_auth_token(token, ttl)

Extends an existing auth token.

extract([url, no_objects, no_repos, ...])

Extracts cluster data for backup.

extract_auth_tokens()

This maps to an internal function that is only used for migration.

extract_pipeline(pipeline_name)

Extracts a pipeline for backup.

finish_commit(commit[, description, ...])

Ends the process of committing data to a Repo and persists the Commit.

finish_transaction(transaction)

Finishes a given transaction.

flush_commit(commits[, repos])

Blocks until all of the commits which have a set of commits as provenance have finished.

flush_job(commits[, pipeline_names])

Blocks until all of the jobs which have a set of commits as provenance have finished.

fsck([fix])

Performs a file system consistency check for PFS.

garbage_collect([memory_bytes])

Runs garbage collection.

get_acl(repo)

Gets the ACL of a repo.

get_activation_code()

Returns the enterprise code used to activate Pachdyerm Enterprise in this cluster.

get_admins()

Returns a list of strings specifying the cluster admins.

get_auth_configuration()

Gets the auth configuration.

get_auth_token(subject[, ttl])

Gets an auth token for a subject.

get_cluster_role_bindings()

Returns the current set of cluster role bindings.

get_enterprise_state()

Gets the current enterprise state of the cluster.

get_file(commit, path[, offset_bytes, ...])

Returns a PFSFile object, containing the contents of a file stored in PFS.

get_groups([username])

Gets which groups the given username belongs to.

get_job_logs(job_id[, data_filters, datum, ...])

Gets logs for a job.

get_oidc_login()

Returns the OIDC login configuration.

get_one_time_password([subject, ttl])

If this Client is authenticated as an admin, you can generate a one-time password for any given subject.

get_pipeline_logs(pipeline_name[, ...])

Gets logs for a pipeline.

get_remote_version()

Gets version of Pachyderm server.

get_scope(username, repos)

Gets the auth scope.

get_users(group)

Gets which users below to the given.

glob_file(commit, pattern)

Lists files that match a glob pattern.

health()

Returns a health check indicating if the server can handle RPCs.

inspect_branch(repo_name, branch_name)

Inspects a branch.

inspect_cluster()

Inspects a cluster.

inspect_commit(commit[, block_state])

Inspects a commit.

inspect_datum(job_id, datum_id)

Inspects a datum.

inspect_file(commit, path)

Inspects a file.

inspect_job(job_id[, block_state, ...])

Inspects a job with a given ID.

inspect_pipeline(pipeline_name[, history])

inspect_repo(repo_name)

Returns info about a specific repo.

inspect_secret(secret_name)

Inspects a secret.

inspect_transaction(transaction)

Inspects a given transaction.

list_branch(repo_name[, reverse])

Lists the active branch objects on a repo.

list_commit(repo_name[, to_commit, ...])

Lists commits.

list_datum([job_id, page_size, page, input, ...])

Lists datums.

list_file(commit, path[, history, ...])

list_job([pipeline_name, input_commit, ...])

list_pipeline([history, allow_incomplete, ...])

list_repo()

Returns info about all repos, as a list of RepoInfo objects.

list_secret()

Lists secrets.

list_transaction()

Lists transactions.

modify_admins([add, remove])

Adds and/or removes admins.

modify_cluster_role_binding(principal[, roles])

Sets the list of admin roles for a principal.

modify_members(group[, add, remove])

Adds and/or removes members of a group.

new_from_config([config_file])

Creates a Pachyderm client from a config file, which can either be passed in as a file-like object, or if unset, checks the PACH_CONFIG env var for a path.

new_from_pachd_address(pachd_address[, ...])

Creates a Pachyderm client from a given pachd address.

new_in_cluster([auth_token, transaction_id])

Creates a Pachyderm client that operates within a Pachyderm cluster.

profile_cpu(duration[, filter])

Gets a CPU profile.

put_file_bytes(commit, path, value[, ...])

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

put_file_client()

A context manager that gives a PutFileClient.

put_file_url(commit, path, url[, delimiter, ...])

Puts a file using the content found at a URL.

renew_tmp_file_set(fileset_id, ttl_seconds)

Renews a temporary fileset (used internally).

restart_datum(job_id[, data_filters])

Restarts a datum.

restore(requests)

Restores a cluster.

restore_auth_token([token])

This maps to an internal function that is only used for migration.

revoke_auth_token(token)

Revokes an auth token.

run_cron(pipeline_name)

Explicitly triggers a pipeline with one or more cron inputs to run now.

run_pipeline(pipeline_name[, provenance, job_id])

Runs a pipeline.

set_acl(repo, entries)

Sets the ACL of a repo.

set_auth_configuration(configuration)

Set the auth configuration.

set_groups_for_user(username, groups)

Sets the group membership for a user.

set_scope(username, repo, scope)

Set the auth scope.

start_commit(repo_name[, branch, parent, ...])

Begins the process of committing data to a Repo.

start_pipeline(pipeline_name)

Starts a pipeline.

start_transaction()

Starts a transaction.

stop_job(job_id)

Stops a job by its ID.

stop_pipeline(pipeline_name)

Stops a pipeline.

subscribe_commit(repo_name, branch[, ...])

Yields CommitInfo objects as commits occur.

transaction()

A context manager for running operations within a transaction.

walk_file(commit, path)

Walks over all descendant files in a directory.

who_am_i()

Returns info about the user tied to this Client.

__init__(host=None, port=None, auth_token=None, root_certs=None, transaction_id=None, tls=None)[source]

Creates a Pachyderm client.

Parameters
hoststr, optional

The pachd host. Default is ‘localhost’, which is used with pachctl port-forward.

portint, optional

The port to connect to. Default is 30650.

auth_tokenstr, optional

The authentication token. Used if authentication is enabled on the cluster.

root_certsbytes, optional

The PEM-encoded root certificates as byte string.

transaction_idstr, optional

The ID of the transaction to run operations on.

tlsbool, optional

Whether TLS should be used. If root_certs are specified, they are used. Otherwise, we use the certs provided by certifi.

property auth_token
classmethod new_from_config(config_file=None)[source]

Creates a Pachyderm client from a config file, which can either be passed in as a file-like object, or if unset, checks the PACH_CONFIG env var for a path. If that’s also unset, it defaults to loading from ‘~/.pachyderm/config.json’.

Parameters
config_fileTextIO, optional

A file-like object containing the config json file. If unspecified, we load the config from the default location (‘~/.pachyderm/config.json’).

Returns
Client

A python_pachyderm client instance.

classmethod new_from_pachd_address(pachd_address, auth_token=None, root_certs=None, transaction_id=None)[source]

Creates a Pachyderm client from a given pachd address.

Parameters
pachd_addressstr

The address of pachd server

auth_tokenstr, optional

The authentication token. Used if authentication is enabled on the cluster.

root_certsbytes, optional

The PEM-encoded root certificates as byte string. If unspecified, this will load default certs from certifi.

transaction_idstr, optional

The ID of the transaction to run operations on.

Returns
Client

A python_pachyderm client instance.

classmethod new_in_cluster(auth_token=None, transaction_id=None)[source]

Creates a Pachyderm client that operates within a Pachyderm cluster.

Parameters
auth_tokenstr, optional

The authentication token. Used if authentication is enabled on the cluster.

transaction_idstr, optional

The ID of the transaction to run operations on.

Returns
Client

A python_pachyderm client instance.

property transaction_id

Spout

class python_pachyderm.spout.SpoutCommit(pipe, marker_filename=None)[source]

Represents a commit on a spout, permitting the addition of files.

Methods

close()

Closes the commit

put_file_from_bytes(path, bytes)

Adds a file to the spout from a bytestring.

put_file_from_fileobj(path, size, fileobj)

Adds a file to the spout from a file-like object.

put_marker_from_bytes(bytes)

Adds to the marker from a bytestring.

put_marker_from_fileobj(size, fileobj)

Writes to the marker file from a file-like object.

__init__(pipe, marker_filename=None)[source]
close()[source]

Closes the commit

put_file_from_bytes(path, bytes)[source]

Adds a file to the spout from a bytestring.

Parameters
pathstr

The path to the file in the spout.

bytesbytes

The bytestring representing the file contents.

put_file_from_fileobj(path, size, fileobj)[source]

Adds a file to the spout from a file-like object.

Parameters
pathstr

The path to the file in the spout.

sizeint

The size of the file.

fileobjBinaryIO

The file-like object to add.

put_marker_from_bytes(bytes)[source]

Adds to the marker from a bytestring.

Parameters
bytesbytes

The bytestring representing the file contents.

put_marker_from_fileobj(size, fileobj)[source]

Writes to the marker file from a file-like object.

Parameters
sizeint

The size of the file.

fileobjBinaryIO

The file-like object to add.

class python_pachyderm.spout.SpoutManager(marker_filename=None, pfs_directory='/pfs')[source]

A convenience context manager for creating spouts.

Examples

>>> spout = SpoutManager()
>>> while True:
>>>     with spout.commit() as commit:
>>>         commit.put_file_from_bytes("foo", b"#")
>>>     time.sleep(1.0)

Methods

close()

Closes the SpoutManager

commit()

Opens a commit on the spout.

marker()

Gets the marker file as a context manager.

__init__(marker_filename=None, pfs_directory='/pfs')[source]

Creates a new spout manager.

Parameters
marker_filenamestr, optional

The name of the file for storing markers. If unspecified, marker-related operations will fail.

pfs_directorystr, optional

The directory for PFS content. Usually this shouldn’t be explicitly specified, unless the spout manager is being tested outside of a real Pachyderm pipeline.

close()[source]

Closes the SpoutManager

commit()[source]

Opens a commit on the spout. When the context manager exits, any added files will be committed.

marker()[source]

Gets the marker file as a context manager.

Util Helper

python_pachyderm.util.create_python_pipeline(client, path, input=None, pipeline_name=None, image_pull_secrets=None, debug=None, env=None, secrets=None, image=None, update=False, **pipeline_kwargs)[source]

Utility function for creating (or updating) a pipeline specially built for executing python code that is stored locally at path.

A normal pipeline creation process requires you to first build and push a container image with the source and dependencies baked in. As an alternative process, this function circumvents container image creation by using build step-enabled pipelines. See the pachyderm core docs for more info.

If path references a directory, it should have:

  • A main.py, as the pipeline entry-point.

  • An optional requirements.txt that specifies pip requirements.

Parameters
clientClient

The Client instance to use.

pathstr

The directory containing the python pipeline source, or an individual python file.

inputInput protobuf, optional

An Input object specifying the pipeline input.

pipeline_namestr, optional

A string specifying the pipeline name. Defaults to using the last directory name in path.

image_pull_secretsList[str], optional

A list of strings specifying the pipeline transform’s image pull secrets, which are used for pulling images from a private registry. Defaults to None, in which case the public docker registry will be used. See the pipeline spec document for more details.

debugbool, optional

Specifies whether debug logging should be enabled for the pipeline. Defaults to False.

envDict[str, str], optional

A mapping of string keys to string values specifying custom environment variables.

secretsList[Secret protobufs], optional

A list of Secret objects for secret environment variables.

imagestr, optional

A string specifying the docker image to use for the pipeline. Defaults to using pachyderm’s official python language builder.

updatebool, optional

Whether to act as an upsert.

**pipeline_kwargsdict

Keyword arguments to forward to create_pipeline.

python_pachyderm.util.parse_dict_pipeline_spec(d)[source]

Parses a dict of serialized JSON into a CreatePipelineRequest protobuf.

python_pachyderm.util.parse_json_pipeline_spec(j)[source]

Parses a string of JSON into a CreatePipelineRequest protobuf.

python_pachyderm.util.put_files(client, source_path, commit, dest_path, **kwargs)[source]

Utility function for inserting files from the local source_path to Pachyderm. Roughly equivalent to pachctl put file [-r].

Parameters
clientClient

The Client instance to use.

source_pathstr

The file/directory to recursively insert content from.

commitUnion[tuple, str, Commit protobuf]

The Commit object to use for inserting files.

dest_pathstr

The destination path in PFS.

**kwargsdict

Keyword arguments to forward. See PutFileClient.put_file_from_fileobj() for details.