Information

Exposes a mixin for each pachyderm service. These mixins should not be used directly; instead, you should use python_pachyderm.Client(). The mixins exist exclusively in order to provide better code organization (because we have several mixins, rather than one giant Client class.)

python_pachyderm.mixin.admin

class python_pachyderm.mixin.admin.AdminMixin[source]

Methods

`extract`([url, no_objects, no_repos, ...])	Extracts cluster data for backup.
`extract_pipeline`(pipeline_name)	Extracts a pipeline for backup.
`inspect_cluster`()	Inspects a cluster.
`restore`(requests)	Restores a cluster.

extract(url=None, no_objects=None, no_repos=None, no_pipelines=None, no_auth=None, no_enterprise=None)[source]

Extracts cluster data for backup. Yields Op objects.

Parameters

urlstr, optional: A string specifying an object storage URL. If set, data will be extracted to this URL rather than returned.
no_objectsbool, optional: If true, will cause extract to omit objects (and tags.)
no_reposbool, optional: If true, will cause extract to omit repos, commits and branches.
no_pipelinesbool, optional: If true, will cause extract to omit pipelines.
no_authbool, optional: If true, will cause extract to omit acls, tokens, etc.
no_enterprisebool, optional: If true, will cause extract to omit any enterprise activation key (which may break auth restore)

extract_pipeline(pipeline_name)[source]

Extracts a pipeline for backup. Returns an Op object.

Parameters

pipeline_namestr: The pipeline name to extract.

inspect_cluster()[source]: Inspects a cluster. Returns a ClusterInfo object.

restore(requests)[source]

Restores a cluster.

Parameters

requestsIterator[RestoreRequest protobufs]: A generator of RestoreRequest objects.

python_pachyderm.mixin.auth

class python_pachyderm.mixin.auth.AuthMixin[source]

Methods

`activate_auth`(subject[, github_token, ...])	Activates auth, creating an initial set of admins.
`authenticate_github`(github_token)	Authenticates a GitHub user to the Pachyderm cluster.
`authenticate_id_token`(id_token)	Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.
`authenticate_oidc`(oidc_state)	Authenticates a user to the Pachyderm cluster via OIDC.
`authenticate_one_time_password`(one_time_password)	Authenticates a user to the Pachyderm cluster using a one-time password.
`authorize`(repo, scope)	Authorizes the user to a given repo/scope.
`deactivate_auth`()	Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.
`extend_auth_token`(token, ttl)	Extends an existing auth token.
`extract_auth_tokens`()	This maps to an internal function that is only used for migration.
`get_acl`(repo)	Gets the ACL of a repo.
`get_admins`()	Returns a list of strings specifying the cluster admins.
`get_auth_configuration`()	Gets the auth configuration.
`get_auth_token`(subject[, ttl])	Gets an auth token for a subject.
`get_cluster_role_bindings`()	Returns the current set of cluster role bindings.
`get_groups`([username])	Gets which groups the given username belongs to.
`get_oidc_login`()	Returns the OIDC login configuration.
`get_one_time_password`([subject, ttl])	If this `Client` is authenticated as an admin, you can generate a one-time password for any given subject.
`get_scope`(username, repos)	Gets the auth scope.
`get_users`(group)	Gets which users below to the given.
`modify_admins`([add, remove])	Adds and/or removes admins.
`modify_cluster_role_binding`(principal[, roles])	Sets the list of admin roles for a principal.
`modify_members`(group[, add, remove])	Adds and/or removes members of a group.
`restore_auth_token`([token])	This maps to an internal function that is only used for migration.
`revoke_auth_token`(token)	Revokes an auth token.
`set_acl`(repo, entries)	Sets the ACL of a repo.
`set_auth_configuration`(configuration)	Set the auth configuration.
`set_groups_for_user`(username, groups)	Sets the group membership for a user.
`set_scope`(username, repo, scope)	Set the auth scope.
`who_am_i`()	Returns info about the user tied to this `Client`.

activate_auth(subject, github_token=None, root_token=None)[source]

Activates auth, creating an initial set of admins. Returns a string that can be used for making authenticated requests.

Parameters

subjectstr: If set to a github user (i.e. it has a ‘github:’ prefix or no prefix) then Pachyderm will confirm that it matches the user associated with github_token. If set to a robot user (i.e. it has a ‘robot:’ prefix), then Pachyderm will generate a new token for the robot user; this token will be the only way to administer this cluster until more admins are added.
github_tokenstr, optional: This is the token returned by GitHub and used to authenticate the caller. When Pachyderm is deployed locally, setting this value to a given string will automatically authenticate the caller as a GitHub user whose username is that string (unless this “looks like” a GitHub access code, in which case Pachyderm does retrieve the corresponding GitHub username)
root_tokenstr, optional: Unused

authenticate_github(github_token)[source]

Authenticates a GitHub user to the Pachyderm cluster. Returns a string that can be used for making authenticated requests.

Parameters

github_token: str: This is the token returned by GitHub and used to authenticate the caller. When Pachyderm is deployed locally, setting this value to a given string will automatically authenticate the caller as a GitHub user whose username is that string (unless this “looks like” a GitHub access code, in which case Pachyderm does retrieve the corresponding GitHub username.)

authenticate_id_token(id_token)[source]

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider. The token must include the Pachyderm client_id in the set of audiences to be valid. Returns a string that can be used for making authenticated requests.

Parameters

id_tokenstr: The ID token.

authenticate_oidc(oidc_state)[source]

Authenticates a user to the Pachyderm cluster via OIDC. Returns a string that can be used for making authenticated requests.

Parameters

oidc_statestr: The OIDC state token.

authenticate_one_time_password(one_time_password)[source]

Authenticates a user to the Pachyderm cluster using a one-time password. Returns a string that can be used for making authenticated requests.

Parameters

one_time_passwordstr: This is a short-lived, one-time-use password generated by Pachyderm, for the purpose of propagating authentication to new clients (e.g. from the dash to pachd.)

authorize(repo, scope)[source]

Authorizes the user to a given repo/scope. Return a bool specifying if the caller has at least scope-level access to repo.

Parameters

repostr: The repo name that the caller wants access to.
scopeint: The access level that the caller needs to perform an action. See the Scope enum for variants.

deactivate_auth()[source]: Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

extend_auth_token(token, ttl)[source]

Extends an existing auth token.

Parameters

tokenstr: Indicates the Pachyderm token whose TTL is being extended.
ttlint: Indicates the approximate remaining lifetime of this token, in seconds.

extract_auth_tokens()[source]: This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

get_acl(repo)[source]

Gets the ACL of a repo. Returns a GetACLResponse object.

Parameters

repostr: The repo to get an ACL for.

get_admins()[source]: Returns a list of strings specifying the cluster admins.

get_auth_configuration()[source]: Gets the auth configuration. Returns an AuthConfig object.

get_auth_token(subject, ttl=None)[source]

Gets an auth token for a subject. Returns an GetAuthTokenResponse object.

Parameters

subjectstr: The returned token will allow the caller to access resources as this subject.
ttlint, optional: Indicates the approximate remaining lifetime of this token, in seconds.

get_cluster_role_bindings()[source]: Returns the current set of cluster role bindings.

get_groups(username=None)[source]

Gets which groups the given username belongs to. Returns a list of strings.

Parameters

usernamestr, optional: The username.

get_oidc_login()[source]: Returns the OIDC login configuration.

get_one_time_password(subject=None, ttl=None)[source]

If this Client is authenticated as an admin, you can generate a one-time password for any given subject. If the caller is not an admin or the subject is not set, a one-time password will be returned for logged-in subject. Returns a string.

Parameters

subjectstr, optional: The subject.
ttlint, optional: Indicates the approximate remaining lifetime of this token, in seconds.

get_scope(username, repos)[source]

Gets the auth scope. Returns a list of Scope objects.

Parameters

usernamestr: A string specifying the principal (some of which belong to robots rather than users, but the name is preserved for now to provide compatibility with the pachyderm dash) whose access level is queried. To query the access level of a robot user, the caller must prefix username with “robot:”. If username has no prefix (i.e. no “:”), then it’s assumed to be a github user’s principal.
reposList[str]: A list of strings specifying the objects to which `username`s access level is being queried

get_users(group)[source]

Gets which users below to the given. Returns a list of strings.

Parameters

groupstr: The group to list users for.

modify_admins(add=None, remove=None)[source]

Adds and/or removes admins.

Parameters

addList[str], optional: A list of strings specifying admins to add.
removeList[str], optional: A list of strings specifying admins to remove.

modify_cluster_role_binding(principal, roles=None)[source]

Sets the list of admin roles for a principal.

Parameters

principalstr, optional: A string specifying the principal.
rolesClusterRoles protobuf: A ClusterRoles object specifying cluster-wide permissions the principal has. If unspecified, all roles are revoked for the principal.

modify_members(group, add=None, remove=None)[source]

Adds and/or removes members of a group.

Parameters

groupstr: The group to modify.
addList[str], optional: A list of strings specifying members to add.
removeList[str], optional: A list of strings specifying members to remove.

restore_auth_token(token=None)[source]: This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

revoke_auth_token(token)[source]

Revokes an auth token.

Parameters

tokenstr: Indicates the Pachyderm token that is being revoked.

set_acl(repo, entries)[source]

Sets the ACL of a repo.

Parameters

repostr: The repo to set an ACL on.
entriesList[ACLEntry protobuf]: A list of ACLEntry objects.

set_auth_configuration(configuration)[source]

Set the auth configuration.

Parameters

configAuthConfig protobuf: The auth configuration.

set_groups_for_user(username, groups)[source]

Sets the group membership for a user.

Parameters

usernamestr: The username.
groupsList[str]: The groups to add username to.

set_scope(username, repo, scope)[source]

Set the auth scope.

Parameters

usernamestr: A string specifying the principal (some of which belong to robots rather than users, but the name is preserved for now to provide compatibility with the pachyderm dash) whose access level is queried. To query the access level of a robot user, the caller must prefix username with “robot:”. If ‘username’ has no prefix (i.e. no “:”), then it’s assumed to be a github user’s principal.
repostr: A string specifying the object to which `username`s access level is being granted/revoked.
scopeint: The access level that username will now have. See the Scope enum for variants.

who_am_i()[source]: Returns info about the user tied to this Client.

python_pachyderm.mixin.debug

class python_pachyderm.mixin.debug.DebugMixin[source]

Methods

`binary`([filter])	Gets the pachd binary.
`dump`([filter, limit])	Gets a debug dump.
`profile_cpu`(duration[, filter])	Gets a CPU profile.

binary(filter=None)[source]

Gets the pachd binary. Yields byte arrays.

Parameters

filterFilter protobuf, optional: An optional Filter object.

dump(filter=None, limit=None)[source]

Gets a debug dump. Yields byte arrays.

Parameters

filterFilter protobuf, optional: An optional Filter object.
limitint, optional: Limits the number of commits/jobs returned for each repo/pipeline in the dump

profile_cpu(duration, filter=None)[source]

Gets a CPU profile. Yields byte arrays.

Parameters

durationDuration protobuf: A Duration object specifying how long to run the CPU profiler.
filterFilter protobuf, optional: An optional Filter object.

python_pachyderm.mixin.enterprise

class python_pachyderm.mixin.enterprise.EnterpriseMixin[source]

Methods

`activate_enterprise`(activation_code[, expires])	Activates enterprise.
`deactivate_enterprise`()	Deactivates enterprise.
`get_activation_code`()	Returns the enterprise code used to activate Pachdyerm Enterprise in this cluster.
`get_enterprise_state`()	Gets the current enterprise state of the cluster.

activate_enterprise(activation_code, expires=None)[source]

Activates enterprise. Returns a TokenInfo object.

Parameters

activation_codestr: Specifies a Pachyderm enterprise activation code. New users can obtain trial activation codes.
expiresTimestamp protobuf, optional: An optional Timestamp object indicating when this activation code will expire. This should not generally be set (it’s primarily used for testing), and is only applied if it’s earlier than the signed expiration time in activation_code.

deactivate_enterprise()[source]: Deactivates enterprise.

get_activation_code()[source]: Returns the enterprise code used to activate Pachdyerm Enterprise in this cluster.

get_enterprise_state()[source]: Gets the current enterprise state of the cluster. Returns a GetEnterpriseResponse object.

python_pachyderm.mixin.health

class python_pachyderm.mixin.health.HealthMixin[source]

Methods

health()

Returns a health check indicating if the server can handle RPCs.

health()[source]: Returns a health check indicating if the server can handle RPCs.

python_pachyderm.mixin.pfs

class python_pachyderm.mixin.pfs.AtomicOp(commit, path, **kwargs)[source]

Represents an operation in a PutFile call.

Methods

reqs()

Yields one or more protobuf PutFileRequests, which are then enqueued into the request's channel.

reqs()[source]: Yields one or more protobuf PutFileRequests, which are then enqueued into the request’s channel.

class python_pachyderm.mixin.pfs.AtomicPutFileobjOp(commit, path, value, **kwargs)[source]

A PutFile operation to put a file from a file-like object.

Methods

reqs()

Yields one or more protobuf PutFileRequests, which are then enqueued into the request's channel.

reqs()[source]: Yields one or more protobuf PutFileRequests, which are then enqueued into the request’s channel.

class python_pachyderm.mixin.pfs.AtomicPutFilepathOp(commit, pfs_path, local_path, **kwargs)[source]

A PutFile operation to put a file locally stored at a given path. This file is opened on-demand, which helps with minimizing the number of open files.

Methods

reqs()

Yields one or more protobuf PutFileRequests, which are then enqueued into the request's channel.

reqs()[source]: Yields one or more protobuf PutFileRequests, which are then enqueued into the request’s channel.

class python_pachyderm.mixin.pfs.PFSFile(res)[source]

The contents of a file stored in PFS.

Examples

You can treat these as either file-like objects, like so:

>>> source_file = client.get_file("montage/master", "/montage.png")
>>> with open("montage.png", "wb") as dest_file:
>>>     shutil.copyfileobj(source_file, dest_file)

Or as an iterator of bytes, like so:

>>> source_file = client.get_file("montage/master", "/montage.png")
>>> with open("montage.png", "wb") as dest_file:
>>>     for chunk in source_file:
>>>         dest_file.write(chunk)

Methods

`close`()	Closes the `PFSFile`
`read`([size])	Reads from the `PFSFile` buffer.

close()[source]: Closes the PFSFile

read(size=- 1)[source]

Reads from the PFSFile buffer.

Parameters

sizeint, optional: The number of bytes to read from the buffer.

class python_pachyderm.mixin.pfs.PFSMixin[source]

Methods

`commit`(repo_name[, branch, parent, description])	A context manager for running operations within a commit.
`copy_file`(source_commit, source_path, ...[, ...])	Efficiently copies files already in PFS.
`create_branch`(repo_name, branch_name[, ...])	Creates a new branch.
`create_repo`(repo_name[, description, update])	Creates a new `Repo` object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single `Repo` for an entire project you might have separate ``Repo``s for logs, metrics, database dumps etc.
`create_tmp_file_set`()	Creates a temporary fileset (used internally).
`delete_all_repos`([force])	Deletes all repos.
`delete_branch`(repo_name, branch_name[, force])	Deletes a branch, but leaves the commits themselves intact.
`delete_commit`(commit)	Deletes a commit.
`delete_file`(commit, path)	Deletes a file from a Commit.
`delete_repo`(repo_name[, force, ...])	Deletes a repo and reclaims the storage space it was using.
`diff_file`(new_commit, new_path[, ...])	Diffs two files.
`finish_commit`(commit[, description, ...])	Ends the process of committing data to a Repo and persists the Commit.
`flush_commit`(commits[, repos])	Blocks until all of the commits which have a set of commits as provenance have finished.
`fsck`([fix])	Performs a file system consistency check for PFS.
`get_file`(commit, path[, offset_bytes, ...])	Returns a `PFSFile` object, containing the contents of a file stored in PFS.
`glob_file`(commit, pattern)	Lists files that match a glob pattern.
`inspect_branch`(repo_name, branch_name)	Inspects a branch.
`inspect_commit`(commit[, block_state])	Inspects a commit.
`inspect_file`(commit, path)	Inspects a file.
`inspect_repo`(repo_name)	Returns info about a specific repo.
`list_branch`(repo_name[, reverse])	Lists the active branch objects on a repo.
`list_commit`(repo_name[, to_commit, ...])	Lists commits.
`list_file`(commit, path[, history, ...])
`list_repo`()	Returns info about all repos, as a list of `RepoInfo` objects.
`put_file_bytes`(commit, path, value[, ...])	Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.
`put_file_client`()	A context manager that gives a `PutFileClient`.
`put_file_url`(commit, path, url[, delimiter, ...])	Puts a file using the content found at a URL.
`renew_tmp_file_set`(fileset_id, ttl_seconds)	Renews a temporary fileset (used internally).
`start_commit`(repo_name[, branch, parent, ...])	Begins the process of committing data to a Repo.
`subscribe_commit`(repo_name, branch[, ...])	Yields `CommitInfo` objects as commits occur.
`walk_file`(commit, path)	Walks over all descendant files in a directory.

commit(repo_name, branch=None, parent=None, description=None)[source]

A context manager for running operations within a commit.

Parameters

repo_namestr: The name of the repo.
branchstr, optional: The branch name. This is a more convenient way to build linear chains of commits. When a commit is started with a non-empty branch the value of branch becomes an alias for the created Commit. This enables a more intuitive access pattern. When the commit is started on a branch the previous head of the branch is used as the parent of the commit.
parentUnion[tuple, str, Commit probotuf], optional: An optional Commit object specifying the parent commit. Upon creation the new commit will appear identical to the parent commit, data can safely be added to the new commit without affecting the contents of the parent commit.
descriptionstr, optional: Description of the commit.

copy_file(source_commit, source_path, dest_commit, dest_path, overwrite=None)[source]

Efficiently copies files already in PFS. Note that the destination repo cannot be an output repo, or the copy operation will (as of 1.9.0) silently fail.

Parameters

source_commitUnion[tuple, str, Commit protobuf]: Represents the commit with the source file.
source_pathstr: The path of the source file.
dest_commitUnion[tuple, str, Commit protobuf]: Represents the commit for the destination file.
dest_pathstr: The path of the destination file.
overwritebool, optional: Whether to overwrite the destination file if it already exists.

create_branch(repo_name, branch_name, commit=None, provenance=None, trigger=None)[source]

Creates a new branch.

Parameters

repo_namestr: The name of the repo.
branch_namestr: The new branch name.
commitUnion[tuple, str, Commit protobuf], optional: Represents the head commit of the new branch.
provenanceList[Branch protobuf], optional: An optional iterable of Branch objects representing the branch provenance.
triggerTrigger protobuf, optional: An optional Trigger object controlling when the head of branch_name is moved.

create_repo(repo_name, description=None, update=None)[source]

Creates a new Repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single Repo for an entire project you might have separate ``Repo``s for logs, metrics, database dumps etc.

Parameters

repo_namestr: Name of the repo.
descriptionstr, optional: Description of the repo.
updatebool, optional: Whether to update if the repo already exists.

create_tmp_file_set()[source]: Creates a temporary fileset (used internally). Currently, temp-fileset-related APIs are only used for Pachyderm internals (job merging), so we’re avoiding support for these functions until we find a use for them (feel free to file an issue in github.com/pachyderm/pachyderm)

delete_all_repos(force=None)[source]

Deletes all repos.

Parameters

forcebool, optional: If set to true, the repo will be removed regardless of errors. This argument should be used with care.

delete_branch(repo_name, branch_name, force=None)[source]

Deletes a branch, but leaves the commits themselves intact. In other words, those commits can still be accessed via commit IDs and other branches they happen to be on.

Parameters

repo_namestr: The repo name.
branch_namestr: The name of the branch to delete.
forcebool, optional: Whether to force the branch deletion.

delete_commit(commit)[source]

Deletes a commit.

Parameters

commitUnion[tuple, str, Commit protobuf]: The commit to delete.

delete_file(commit, path)[source]

Deletes a file from a Commit. DeleteFile leaves a tombstone in the Commit, assuming the file isn’t written to later attempting to get the file from the finished commit will result in not found error. The file will of course remain intact in the Commit’s parent.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path to the file.

delete_repo(repo_name, force=None, split_transaction=None)[source]

Deletes a repo and reclaims the storage space it was using.

Parameters

repo_namestr: The name of the repo.
forcebool, optional: If set to true, the repo will be removed regardless of errors. This argument should be used with care.
split_transactionbool, optional: Controls whether Pachyderm attempts to delete the entire repo in a single database transaction. Setting this to True can work around certain Pachyderm errors, but, if set, the delete_repo() call may need to be retried.

diff_file(new_commit, new_path, old_commit=None, old_path=None, shallow=None)[source]

Diffs two files. If old_commit or old_path are not specified, the same path in the parent of the file specified by new_commit and new_path will be used.

Parameters

new_commitUnion[tuple, str, Commit protobuf]: Represents the commit for the new file.
new_pathstr: The path of the new file.
old_commitUnion[tuple, str, Commit protobuf]: Represents the commit for the old file.
old_pathstr: The path of the old file.
shallowbool, optional: Whether to do a shallow diff.

finish_commit(commit, description=None, input_tree_object_hash=None, tree_object_hashes=None, datum_object_hash=None, size_bytes=None, empty=None)[source]

Ends the process of committing data to a Repo and persists the Commit. Once a Commit is finished the data becomes immutable and future attempts to write to it with PutFile will error.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
descriptionstr, optional: Description of this commit.
input_tree_object_hashstr, optional: Specifies an input tree object hash.
tree_object_hashesList[str], optional: A list of zero or more strings specifying object hashes for the output trees.
datum_object_hashstr, optional: Specifies an object hash.
size_bytesint, optional: An optional int.
emptybool, optional: If set, the commit will be closed (its finished field will be set to the current time) but its tree will be left None.

flush_commit(commits, repos=None)[source]

Blocks until all of the commits which have a set of commits as provenance have finished. For commits to be considered they must have all of the specified commits as provenance. This in effect waits for all of the jobs that are triggered by a set of commits to complete. It returns an error if any of the commits it’s waiting on are cancelled due to one of the jobs encountering an error during runtime. Note that it’s never necessary to call FlushCommit to run jobs, they’ll run no matter what, FlushCommit just allows you to wait for them to complete and see their output once they do. This returns an iterator of CommitInfo objects.

Yields CommitInfo objects.

Parameters

commitsList[Union[tuple, str, Commit protobuf]]: The commits to flush.
reposList[str], optional: An optional list of strings specifying repo names. If specified, only commits within these repos will be flushed.

fsck(fix=None)[source]: Performs a file system consistency check for PFS.

get_file(commit, path, offset_bytes=None, size_bytes=None)[source]

Returns a PFSFile object, containing the contents of a file stored in PFS.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path of the file.
offset_bytesint, optional: Specifies the number of bytes that should be skipped in the beginning of the file.
size_bytesint, optional: Limits the total amount of data returned, note you will get fewer bytes than size_bytes if you pass a value larger than the size of the file. If 0, then all of the data will be returned.

glob_file(commit, pattern)[source]

Lists files that match a glob pattern. Yields FileInfo objects.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
patternstr: The glob pattern.

inspect_branch(repo_name, branch_name)[source]

Inspects a branch. Returns a BranchInfo object.

Parameters

repo_namestr: The repo name.
branch_namestr: The branch name.

inspect_commit(commit, block_state=None)[source]

Inspects a commit. Returns a CommitInfo object.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
block_stateint, optional: Causes this method to block until the commit is in the desired commit state. See the CommitState enum.

inspect_file(commit, path)[source]

Inspects a file. Returns a FileInfo object.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path to the file.

inspect_repo(repo_name)[source]

Returns info about a specific repo. Returns a RepoInfo object.

Parameters

repo_namestr: Name of the repo.

list_branch(repo_name, reverse=None)[source]

Lists the active branch objects on a repo. Returns a list of BranchInfo objects.

Parameters

repo_namestr: The repo name.
reversebool, optional: If true, returns branches oldest to newest.

list_commit(repo_name, to_commit=None, from_commit=None, number=None, reverse=None)[source]

Lists commits. Yields CommitInfo objects.

Parameters

repo_namestr: If only repo_name is given, all commits in the repo are returned.
to_commitUnion[tuple, str, Commit protobuf], optional: Only the ancestors of to, including to itself, are considered.
from_commitUnion[tuple, str, Commit protobuf], optional: Only the descendants of from, including from itself, are considered.
numberint, optional: Determines how many commits are returned. If number is 0, all commits that match the aforementioned criteria are returned.
reversebool, optional: If true, returns commits oldest to newest.

list_file(commit, path, history=None, include_contents=None)[source]

Lists the files in a directory.

Parameters

commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path to the directory.

historyint, optional

Indicates how many historical versions you want returned. Semantics are:

0: Return the files as they are in commit
1: Return above and the files as they are in the last commit they were modified in.
2: etc.
-1: Return all historical versions.

include_contentsbool, optional

If True, file contents are included.

list_repo()[source]: Returns info about all repos, as a list of RepoInfo objects.

put_file_bytes(commit, path, value, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path in the repo the file(s) will be written to.
valueUnion[bytes, BinaryIO]: The file contents as bytes, represented as a file-like object, bytestring, or iterator of bytestrings.
delimiterint, optional: Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.
target_file_datumsint, optional: Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.
target_file_bytesint, optional: Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.
overwrite_indexint, optional: This is the object index where the write starts from. All existing objects starting from the index are deleted.
header_recordsint, optional: An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_client()[source]: A context manager that gives a PutFileClient. When the context manager exits, any operations enqueued from the PutFileClient are executed in a single, atomic PutFile call.

put_file_url(commit, path, url, delimiter=None, recursive=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Puts a file using the content found at a URL. The URL is sent to the server which performs the request.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path in the repo the file will be written to.
urlstr: The url of the file to put.
delimiterint, optional: Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.
recursivebool, optional: Allow for recursive scraping of some types URLs, for example on s3:// URLs.
target_file_datumsint, optional: Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.
target_file_bytesint, optional: Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.
overwrite_indexint, optional: This is the object index where the write starts from. All existing objects starting from the index are deleted.
header_recordsint, optional: An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

renew_tmp_file_set(fileset_id, ttl_seconds)[source]

Renews a temporary fileset (used internally). Currently, temp-fileset-related APIs are only used for Pachyderm internals (job merging), so we’re avoiding support for these functions until we find a use for them (feel free to file an issue in github.com/pachyderm/pachyderm)

Parameters

fileset_idstr: The fileset ID.
ttl_secondsint: The number of seconds to keep alive the temporary fileset.

start_commit(repo_name, branch=None, parent=None, description=None, provenance=None)[source]

Begins the process of committing data to a Repo. Once started you can write to the Commit with PutFile and when all the data has been written you must finish the Commit with FinishCommit. NOTE, data is not persisted until FinishCommit is called. A Commit object is returned.

Parameters

repo_namestr: The name of the repo.
branchstr, optional: The branch name. This is a more convenient way to build linear chains of commits. When a commit is started with a non-empty branch the value of branch becomes an alias for the created Commit. This enables a more intuitive access pattern. When the commit is started on a branch the previous head of the branch is used as the parent of the commit.
parentUnion[tuple, str, Commit probotuf], optional: An optional Commit object specifying the parent commit. Upon creation the new commit will appear identical to the parent commit, data can safely be added to the new commit without affecting the contents of the parent commit.
descriptionstr, optional: Description of the commit.
provenanceList[CommitProvenance protobuf], optional: An optional iterable of CommitProvenance objects specifying the commit provenance.

subscribe_commit(repo_name, branch, from_commit_id=None, state=None, prov=None)[source]

Yields CommitInfo objects as commits occur.

Parameters

repo_namestr: The name of the repo.
branchstr: The branch to subscribe to.
from_commit_idstr, optional: A commit ID. Only commits created since this commit are returned.
stateint, optional: The commit state to filter on. See the CommitState enum.
provCommitProvenance protobuf, optional: An optional CommitProvenance object.

walk_file(commit, path)[source]

Walks over all descendant files in a directory. Returns a generator of FileInfo objects.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path to the directory.

class python_pachyderm.mixin.pfs.PutFileClient[source]

PutFileClient puts or deletes PFS files atomically.

Methods

`delete_file`(commit, path)	Deletes a file.
`put_file_from_bytes`(commit, path, value[, ...])	Uploads a PFS file from a bytestring.
`put_file_from_fileobj`(commit, path, value[, ...])	Uploads a PFS file from a file-like object.
`put_file_from_filepath`(commit, pfs_path, ...)	Uploads a PFS file from a local path at a specified path.
`put_file_from_url`(commit, path, url[, ...])	Puts a file using the content found at a URL.

delete_file(commit, path)[source]

Deletes a file.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path to the file.

put_file_from_bytes(commit, path, value, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a bytestring.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path in the repo to upload the file to will be written to.
valuebytes: The file contents as a bytestring.
delimiterint, optional: Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.
target_file_datumsint, optional: Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.
target_file_bytesint, optional: Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.
overwrite_indexint, optional: This is the object index where the write starts from. All existing objects starting from the index are deleted.
header_recordsint, optional: An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_from_fileobj(commit, path, value, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a file-like object.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path in the repo to upload the file to will be written to.
valueBinaryIO: The file-like object.
delimiterint, optional: Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.
target_file_datumsint, optional: Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.
target_file_bytesint, optional: Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.
overwrite_indexint, optional: This is the object index where the write starts from. All existing objects starting from the index are deleted.
header_recordsint, optional: An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_from_filepath(commit, pfs_path, local_path, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a local path at a specified path. This will lazily open files, which will prevent too many files from being opened, or too much memory being consumed, when atomically putting many files.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pfs_pathstr: The path in the repo to upload the file to will be written to.
local_pathstr: The local file path.
delimiterint, optional: Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.
target_file_datumsint, optional: Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.
target_file_bytesint, optional: Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.
overwrite_indexint, optional: This is the object index where the write starts from. All existing objects starting from the index are deleted.
header_recordsint, optional: An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_from_url(commit, path, url, delimiter=None, recursive=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Puts a file using the content found at a URL. The URL is sent to the server which performs the request.

Parameters

commitUnion[tuple, str, Commit protobuf]: Represents the commit.
pathstr: The path in the repo the file will be written to.
urlstr: The url of the file to put.
delimiterint, optional: Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.
recursivebool, optional: Allow for recursive scraping of some types URLs, for example on s3:// URLs.
target_file_datumsint, optional: Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.
target_file_bytesint, optional: Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.
overwrite_indexint, optional: This is the object index where the write starts from. All existing objects starting from the index are deleted.
header_recordsint, optional: An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

python_pachyderm.mixin.pfs.put_file_from_fileobj_reqs(fileish, **kwargs)[source]

python_pachyderm.mixin.pfs.put_file_from_iterable_reqs(value, **kwargs)[source]

python_pachyderm.mixin.pps

class python_pachyderm.mixin.pps.PPSMixin[source]

Methods

`create_pipeline`(pipeline_name, transform[, ...])	Creates a pipeline.
`create_pipeline_from_request`(req)	Creates a pipeline from a `CreatePipelineRequest` object.
`create_secret`(secret_name, data[, labels, ...])	Creates a new secret.
`create_tf_job_pipeline`(pipeline_name, tf_job)	Creates a pipeline.
`delete_all`()	Deletes everything in Pachyderm.
`delete_all_pipelines`([force])	Deletes all pipelines.
`delete_job`(job_id)	Deletes a job by its ID.
`delete_pipeline`(pipeline_name[, force, ...])	Deletes a pipeline.
`delete_secret`(secret_name)	Deletes a secret.
`flush_job`(commits[, pipeline_names])	Blocks until all of the jobs which have a set of commits as provenance have finished.
`garbage_collect`([memory_bytes])	Runs garbage collection.
`get_job_logs`(job_id[, data_filters, datum, ...])	Gets logs for a job.
`get_pipeline_logs`(pipeline_name[, ...])	Gets logs for a pipeline.
`inspect_datum`(job_id, datum_id)	Inspects a datum.
`inspect_job`(job_id[, block_state, ...])	Inspects a job with a given ID.
`inspect_pipeline`(pipeline_name[, history])
`inspect_secret`(secret_name)	Inspects a secret.
`list_datum`([job_id, page_size, page, input, ...])	Lists datums.
`list_job`([pipeline_name, input_commit, ...])
`list_pipeline`([history, allow_incomplete, ...])
`list_secret`()	Lists secrets.
`restart_datum`(job_id[, data_filters])	Restarts a datum.
`run_cron`(pipeline_name)	Explicitly triggers a pipeline with one or more cron inputs to run now.
`run_pipeline`(pipeline_name[, provenance, job_id])	Runs a pipeline.
`start_pipeline`(pipeline_name)	Starts a pipeline.
`stop_job`(job_id)	Stops a job by its ID.
`stop_pipeline`(pipeline_name)	Stops a pipeline.

create_pipeline(pipeline_name, transform, parallelism_spec=None, hashtree_spec=None, egress=None, update=None, output_branch=None, resource_requests=None, resource_limits=None, input=None, description=None, cache_size=None, enable_stats=None, reprocess=None, max_queue_size=None, service=None, chunk_spec=None, datum_timeout=None, job_timeout=None, salt=None, standby=None, datum_tries=None, scheduling_spec=None, pod_patch=None, spout=None, spec_commit=None, metadata=None, s3_out=None, sidecar_resource_limits=None, reprocess_spec=None, autoscaling=None)[source]

Creates a pipeline. For more info, please refer to the pipeline spec document: http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html

Parameters

pipeline_namestr: The pipeline name.
transformTransform protobuf: A Transform object.
parallelism_specParallelismSpec protobuf, optional: An optional ParallelismSpec object.
hashtree_specHashtreeSpec protobuf, optional: An optional HashtreeSpec object.
egressEgress protobuf, optional: An optional Egress object.
updatebool, optional: Whether this should behave as an upsert.
output_branchstr, optional: The branch to output results on.
resource_requestsResourceSpec protobuf, optional: An optional ResourceSpec object.
resource_limitsResourceSpec protobuf, optional: An optional ResourceSpec object.
inputInput protobuf, optional: An optional Input object.
descriptionstr, optional: Description of the pipeline.
cache_sizestr, optional: An optional string.
enable_statsbool, optional: An optional bool.
reprocessbool, optional: If true, Pachyderm forces the pipeline to reprocess all datums. It only has meaning if update is True.
max_queue_sizeint, optional: An optional int.
serviceService protobuf, optional: An optional Service object.
chunk_specChunkSpec protobuf, optional: An optional ChunkSpec object.
datum_timeoutDuration protobuf, optional: An optional Duration object.
job_timeoutDuration protobuf, optional: An optional Duration object.
saltstr, optional: An optional string.
standbybool, optional: An optional bool.
datum_triesint, optional: An optional int.
scheduling_specSchedulingSpec protobuf, optional: An optional SchedulingSpec object.
pod_patchstr, optional: An optional string.
spoutSpout protobuf, optional: An optional Spout object.
spec_commitCommit protobuf, optional: An optional Commit object.
metadataMetadata protobuf, optional: An optional Metadata object.
s3_outbool, optional: Unused.
sidecar_resource_limitsResourceSpec protobuf, optional: An optional ResourceSpec setting resource limits for the pipeline sidecar.

create_pipeline_from_request(req)[source]

Creates a pipeline from a CreatePipelineRequest object. Usually this would be used in conjunction with util.parse_json_pipeline_spec() or util.parse_dict_pipeline_spec(). If you’re in pure python and not working with a pipeline spec file, the sibling method create_pipeline() is more ergonomic.

Parameters

reqCreatePipelineRequest protobuf: A CreatePipelineRequest object.

create_secret(secret_name, data, labels=None, annotations=None)[source]

Creates a new secret.

Parameters

secret_namestr: The name of the secret to create.
dataDict[str, Union[str, bytes]]: The data to store in the secret. Each key must consist of alphanumeric characters -, _ or ..
labelsDict[str, str], optional: Kubernetes labels to attach to the secret.
annotationsDict[str, str], optional: Kubernetes annotations to attach to the secret.

create_tf_job_pipeline(pipeline_name, tf_job, parallelism_spec=None, hashtree_spec=None, egress=None, update=None, output_branch=None, scale_down_threshold=None, resource_requests=None, resource_limits=None, input=None, description=None, cache_size=None, enable_stats=None, reprocess=None, max_queue_size=None, service=None, chunk_spec=None, datum_timeout=None, job_timeout=None, salt=None, standby=None, datum_tries=None, scheduling_spec=None, pod_patch=None, spout=None, spec_commit=None)[source]

Creates a pipeline. For more info, please refer to the pipeline spec document: http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html

Parameters

pipeline_namestr: The pipeline name.
tf_jobTFJob protobuf: Pachyderm uses this to create TFJobs when running in a Kubernetes cluster on which kubeflow has been installed.
parallelism_specParallelismSpec protobuf, optional: An optional ParallelismSpec object.
hashtree_specHashtreeSpec protobuf, optional: An optional HashtreeSpec object.
egressEgress protobuf, optional: An optional Egress object.
updatebool, optional: Whether this should behave as an upsert.
output_branchstr, optional: The branch to output results on.
scale_down_thresholdDuration protobuf, optional: An optional Duration object.
resource_requestsResourceSpec protobuf, optional: An optional ResourceSpec object.
resource_limitsResourceSpec protobuf, optional: An optional ResourceSpec object.
inputInput protobuf, optional: An optional Input object.
descriptionstr, optional: Description of the pipeline.
cache_sizestr, optional: An optional string.
enable_statsbool, optional: An optional bool.
reprocessbool, optional: If true, Pachyderm forces the pipeline to reprocess all datums. It only has meaning if update is True.
max_queue_sizeint, optional: An optional int.
serviceService protobuf, optional: An optional Service object.
chunk_specChunkSpec protobuf, optional: An optional ChunkSpec object.
datum_timeoutDuration protobuf, optional: An optional Duration object.
job_timeoutDuration protobuf, optional: An optional Duration object.
saltstr, optional: An optional string.
standbybool, optional: An optional bool.
datum_triesint, optional: An optional int.
scheduling_specSchedulingSpec protobuf, optional: An optional SchedulingSpec object.
pod_patchstr, optional: An optional string.
spoutSpout protobuf, optional: An optional Spout object.
spec_commitCommit protobuf, optional: An optional Commit object.

delete_all()[source]: Deletes everything in Pachyderm.

delete_all_pipelines(force=None)[source]

Deletes all pipelines.

Parameters

forcebool, optional: Whether to force delete.

delete_job(job_id)[source]

Deletes a job by its ID.

Parameters

job_idstr: The ID of the job to delete.

delete_pipeline(pipeline_name, force=None, keep_repo=None, split_transaction=None)[source]

Deletes a pipeline.

Parameters

pipeline_namestr: The pipeline name.
forcebool, optional: Whether to force delete.
keep_repobool, optional: Whether to keep the output repo.
split_transactionbool, optional: Whether Pachyderm attempts to delete the pipeline in a single database transaction. Setting this to True can work around certain Pachyderm errors, but, if set, the ``delete_repo()` call may need to be retried.

delete_secret(secret_name)[source]

Deletes a secret.

Parameters

secret_namestr: The name of the secret to delete.

flush_job(commits, pipeline_names=None)[source]

Blocks until all of the jobs which have a set of commits as provenance have finished. Yields JobInfo objects.

Parameters

commitsList[Union[tuple, str, Commit protobuf]]: A list representing the commits to flush.
pipeline_namesList[str], optional: A list of strings specifying pipeline names. If specified, only jobs within these pipelines will be flushed.

garbage_collect(memory_bytes=None)[source]

Runs garbage collection.

Parameters

memory_bytesint, optional: How much memory to use in computing which objects are alive. A larger number will result in more precise garbage collection (at the cost of more memory usage).

get_job_logs(job_id, data_filters=None, datum=None, follow=None, tail=None, use_loki_backend=None, since=None)[source]

Gets logs for a job. Yields LogMessage objects.

Parameters

job_idstr: The ID of the job.
data_filtersList[str], optional: A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).
datumDatum protobuf, optional: Filters log lines for the specified datum.
followbool, optional: If true, continue to follow new logs as they appear.
tailint, optional: If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.
use_loki_backendbool, optional: If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.
sinceDuration protobuf, optional: Specifies how far in the past to return logs from.

get_pipeline_logs(pipeline_name, data_filters=None, master=None, datum=None, follow=None, tail=None, use_loki_backend=None, since=None)[source]

Gets logs for a pipeline. Yields LogMessage objects.

Parameters

pipeline_namestr: The name of the pipeline.
data_filtersList[str], optional: A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).
masterbool, optional: If true, includes logs from the master
datumDatum protobuf, optional: Filters log lines for the specified datum.
followbool, optional: If true, continue to follow new logs as they appear.
tailint, optional: If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.
use_loki_backendbool, optional: If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.
sinceDuration protobuf, optional: Specifies how far in the past to return logs from.

inspect_datum(job_id, datum_id)[source]

Inspects a datum. Returns a DatumInfo object.

Parameters

job_idstr: The ID of the job.
datum_idstr: The ID of the datum.

inspect_job(job_id, block_state=None, output_commit=None, full=None)[source]

Inspects a job with a given ID. Returns a JobInfo.

Parameters

job_idstr: The ID of the job to inspect.
block_statebool, optional: If true, block until the job completes.
output_commitUnion[tuple, str, Commit protobuf], optional: Represents an output commit to filter on.
fullbool, optional: If true, include worker status.

inspect_pipeline(pipeline_name, history=None)[source]

Inspects a pipeline. Returns a PipelineInfo object.

Parameters

pipeline_namestr

The pipeline name.

historyint, optional

Indicates to return historical versions of pipelines. Semantics are:

0: Return current version of pipelines.
1: Return the above and pipelines from the next most recent version.
2: etc.
-1: Return pipelines from all historical versions.

inspect_secret(secret_name)[source]

Inspects a secret.

Parameters

secret_namestr: The name of the secret to inspect.

list_datum(job_id=None, page_size=None, page=None, input=None, status_only=None)[source]

Lists datums. Yields ListDatumStreamResponse objects.

Parameters

job_idstr, optional: The ID of a job. Exactly one of job_id (real) or input (hypothetical) must be set.
page_sizeint, optional: The size of the page.
pageint, optional: The page number.
inputInput protobuf, optional: If set in lieu of job_id, list_datum() returns the datums that would be given to a hypothetical job that used input as its input spec. Exactly one of job_id (real) or input (hypothetical) must be set.

list_job(pipeline_name=None, input_commit=None, output_commit=None, history=None, full=None, jqFilter=None)[source]

Lists jobs. Yields JobInfo objects.

Parameters

pipeline_namestr, optional

A pipeline name to filter on.

input_commitList[Union[tuple, str, Commit protobuf]], optional

An optional list representing input commits to filter on.

output_commitUnion[tuple, str, Commit protobuf], optional

Represents an output commit to filter on.

historyint, optional

Indicates to return jobs from historical versions of pipelines. Semantics are:

0: Return jobs from the current version of the pipeline or pipelines.
1: Return the above and jobs from the next most recent version
2: etc.
-1: Return jobs from all historical versions.

fullbool, optional

Whether the result should include all pipeline details in each JobInfo, or limited information including name and status, but excluding information in the pipeline spec. Leaving this None (or False) can make the call significantly faster in clusters with a large number of pipelines and jobs. Note that if input_commit is set, this field is coerced to True.

jqFilterstr, optional

A jq filter that can restrict the list of jobs returned.

list_pipeline(history=None, allow_incomplete=None, jqFilter=None)[source]

Lists pipelines. Returns a PipelineInfos object.

Parameters

historyint, optional

Indicates to return historical versions of pipelines. Semantics are:

0: Return current version of pipelines.
1: Return the above and pipelines from the next most recent version.
2: etc.
-1: Return pipelines from all historical versions.

allow_incompletebool, optional

If True, causes list_pipeline() to return PipelineInfos with incomplete data where the pipeline spec cannot be retrieved. Incomplete PipelineInfos will have a None Transform field, but will have the fields present in EtcdPipelineInfo.

jqFilterstr, optional

A jq filter that can restrict the list of pipelines returned.

list_secret()[source]: Lists secrets. Returns a list of SecretInfo objects.

restart_datum(job_id, data_filters=None)[source]

Restarts a datum.

Parameters

job_idstr: The ID of the job.
data_filtersList[str], optional: An optional iterable of strings.

run_cron(pipeline_name)[source]

Explicitly triggers a pipeline with one or more cron inputs to run now.

Parameters

pipeline_namestr: The pipeline name.

run_pipeline(pipeline_name, provenance=None, job_id=None)[source]

Runs a pipeline.

Parameters

pipeline_namestr: The pipeline name.
provenanceList[CommitProvenance protobuf], optional: A list representing the pipeline execution provenance.
job_idstr, optional: A specific job ID to run.

start_pipeline(pipeline_name)[source]

Starts a pipeline.

Parameters

pipeline_namestr: The pipeline name.

stop_job(job_id)[source]

Stops a job by its ID.

Parameters

job_idstr: The ID of the job to stop.

stop_pipeline(pipeline_name)[source]

Stops a pipeline.

Parameters

pipeline_namestr: The pipeline name.

python_pachyderm.mixin.pps.pipeline_inputs(root)[source]

python_pachyderm.mixin.transaction

class python_pachyderm.mixin.transaction.TransactionMixin[source]

Methods

`batch_transaction`(requests)	Executes a batch transaction.
`delete_all_transactions`()	Deletes all transactions.
`delete_transaction`(transaction)	Deletes a given transaction.
`finish_transaction`(transaction)	Finishes a given transaction.
`inspect_transaction`(transaction)	Inspects a given transaction.
`list_transaction`()	Lists transactions.
`start_transaction`()	Starts a transaction.
`transaction`()	A context manager for running operations within a transaction.

batch_transaction(requests)[source]

Executes a batch transaction.

Parameters

requestsList[TransactionRequest protobuf]: A list of TransactionRequest objects.

delete_all_transactions()[source]: Deletes all transactions.

delete_transaction(transaction)[source]

Deletes a given transaction.

Parameters

transactionUnion[str, Transaction protobuf]: Transaction ID or Transaction object.

finish_transaction(transaction)[source]

Finishes a given transaction.

Parameters

transactionUnion[str, Transaction protobuf]: Transaction ID or Transaction object.

inspect_transaction(transaction)[source]

Inspects a given transaction.

Parameters

transactionUnion[str, Transaction protobuf]: Transaction ID or Transaction object.

list_transaction()[source]: Lists transactions.

start_transaction()[source]: Starts a transaction.

transaction()[source]: A context manager for running operations within a transaction. When the context manager completes, the transaction will be deleted if an error occurred, or otherwise finished.

python_pachyderm.mixin.transaction.transaction_from(transaction)[source]

python_pachyderm.mixin.util

python_pachyderm.mixin.util.commit_from(src, allow_just_repo=False)[source]

python_pachyderm.mixin.version

class python_pachyderm.mixin.version.VersionMixin[source]

Methods

get_remote_version()

Gets version of Pachyderm server.

get_remote_version()[source]: Gets version of Pachyderm server.