Information

Exposes a mixin for each pachyderm service. These mixins should not be used directly; instead, you should use python_pachyderm.Client(). The mixins exist exclusively in order to provide better code organization (because we have several mixins, rather than one giant Client class.)

python_pachyderm.mixin.admin

class python_pachyderm.mixin.admin.AdminMixin[source]

Methods

extract([url, no_objects, no_repos, ...])

Extracts cluster data for backup.

extract_pipeline(pipeline_name)

Extracts a pipeline for backup.

inspect_cluster()

Inspects a cluster.

restore(requests)

Restores a cluster.

extract(url=None, no_objects=None, no_repos=None, no_pipelines=None, no_auth=None, no_enterprise=None)[source]

Extracts cluster data for backup. Yields Op objects.

Parameters
urlstr, optional

A string specifying an object storage URL. If set, data will be extracted to this URL rather than returned.

no_objectsbool, optional

If true, will cause extract to omit objects (and tags.)

no_reposbool, optional

If true, will cause extract to omit repos, commits and branches.

no_pipelinesbool, optional

If true, will cause extract to omit pipelines.

no_authbool, optional

If true, will cause extract to omit acls, tokens, etc.

no_enterprisebool, optional

If true, will cause extract to omit any enterprise activation key (which may break auth restore)

extract_pipeline(pipeline_name)[source]

Extracts a pipeline for backup. Returns an Op object.

Parameters
pipeline_namestr

The pipeline name to extract.

inspect_cluster()[source]

Inspects a cluster. Returns a ClusterInfo object.

restore(requests)[source]

Restores a cluster.

Parameters
requestsIterator[RestoreRequest protobufs]

A generator of RestoreRequest objects.

python_pachyderm.mixin.auth

class python_pachyderm.mixin.auth.AuthMixin[source]

Methods

activate_auth(subject[, github_token, ...])

Activates auth, creating an initial set of admins.

authenticate_github(github_token)

Authenticates a GitHub user to the Pachyderm cluster.

authenticate_id_token(id_token)

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.

authenticate_oidc(oidc_state)

Authenticates a user to the Pachyderm cluster via OIDC.

authenticate_one_time_password(one_time_password)

Authenticates a user to the Pachyderm cluster using a one-time password.

authorize(repo, scope)

Authorizes the user to a given repo/scope.

deactivate_auth()

Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

extend_auth_token(token, ttl)

Extends an existing auth token.

extract_auth_tokens()

This maps to an internal function that is only used for migration.

get_acl(repo)

Gets the ACL of a repo.

get_admins()

Returns a list of strings specifying the cluster admins.

get_auth_configuration()

Gets the auth configuration.

get_auth_token(subject[, ttl])

Gets an auth token for a subject.

get_cluster_role_bindings()

Returns the current set of cluster role bindings.

get_groups([username])

Gets which groups the given username belongs to.

get_oidc_login()

Returns the OIDC login configuration.

get_one_time_password([subject, ttl])

If this Client is authenticated as an admin, you can generate a one-time password for any given subject.

get_scope(username, repos)

Gets the auth scope.

get_users(group)

Gets which users below to the given.

modify_admins([add, remove])

Adds and/or removes admins.

modify_cluster_role_binding(principal[, roles])

Sets the list of admin roles for a principal.

modify_members(group[, add, remove])

Adds and/or removes members of a group.

restore_auth_token([token])

This maps to an internal function that is only used for migration.

revoke_auth_token(token)

Revokes an auth token.

set_acl(repo, entries)

Sets the ACL of a repo.

set_auth_configuration(configuration)

Set the auth configuration.

set_groups_for_user(username, groups)

Sets the group membership for a user.

set_scope(username, repo, scope)

Set the auth scope.

who_am_i()

Returns info about the user tied to this Client.

activate_auth(subject, github_token=None, root_token=None)[source]

Activates auth, creating an initial set of admins. Returns a string that can be used for making authenticated requests.

Parameters
subjectstr

If set to a github user (i.e. it has a ‘github:’ prefix or no prefix) then Pachyderm will confirm that it matches the user associated with github_token. If set to a robot user (i.e. it has a ‘robot:’ prefix), then Pachyderm will generate a new token for the robot user; this token will be the only way to administer this cluster until more admins are added.

github_tokenstr, optional

This is the token returned by GitHub and used to authenticate the caller. When Pachyderm is deployed locally, setting this value to a given string will automatically authenticate the caller as a GitHub user whose username is that string (unless this “looks like” a GitHub access code, in which case Pachyderm does retrieve the corresponding GitHub username)

root_tokenstr, optional

Unused

authenticate_github(github_token)[source]

Authenticates a GitHub user to the Pachyderm cluster. Returns a string that can be used for making authenticated requests.

Parameters
github_token: str

This is the token returned by GitHub and used to authenticate the caller. When Pachyderm is deployed locally, setting this value to a given string will automatically authenticate the caller as a GitHub user whose username is that string (unless this “looks like” a GitHub access code, in which case Pachyderm does retrieve the corresponding GitHub username.)

authenticate_id_token(id_token)[source]

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider. The token must include the Pachyderm client_id in the set of audiences to be valid. Returns a string that can be used for making authenticated requests.

Parameters
id_tokenstr

The ID token.

authenticate_oidc(oidc_state)[source]

Authenticates a user to the Pachyderm cluster via OIDC. Returns a string that can be used for making authenticated requests.

Parameters
oidc_statestr

The OIDC state token.

authenticate_one_time_password(one_time_password)[source]

Authenticates a user to the Pachyderm cluster using a one-time password. Returns a string that can be used for making authenticated requests.

Parameters
one_time_passwordstr

This is a short-lived, one-time-use password generated by Pachyderm, for the purpose of propagating authentication to new clients (e.g. from the dash to pachd.)

authorize(repo, scope)[source]

Authorizes the user to a given repo/scope. Return a bool specifying if the caller has at least scope-level access to repo.

Parameters
repostr

The repo name that the caller wants access to.

scopeint

The access level that the caller needs to perform an action. See the Scope enum for variants.

deactivate_auth()[source]

Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

extend_auth_token(token, ttl)[source]

Extends an existing auth token.

Parameters
tokenstr

Indicates the Pachyderm token whose TTL is being extended.

ttlint

Indicates the approximate remaining lifetime of this token, in seconds.

extract_auth_tokens()[source]

This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

get_acl(repo)[source]

Gets the ACL of a repo. Returns a GetACLResponse object.

Parameters
repostr

The repo to get an ACL for.

get_admins()[source]

Returns a list of strings specifying the cluster admins.

get_auth_configuration()[source]

Gets the auth configuration. Returns an AuthConfig object.

get_auth_token(subject, ttl=None)[source]

Gets an auth token for a subject. Returns an GetAuthTokenResponse object.

Parameters
subjectstr

The returned token will allow the caller to access resources as this subject.

ttlint, optional

Indicates the approximate remaining lifetime of this token, in seconds.

get_cluster_role_bindings()[source]

Returns the current set of cluster role bindings.

get_groups(username=None)[source]

Gets which groups the given username belongs to. Returns a list of strings.

Parameters
usernamestr, optional

The username.

get_oidc_login()[source]

Returns the OIDC login configuration.

get_one_time_password(subject=None, ttl=None)[source]

If this Client is authenticated as an admin, you can generate a one-time password for any given subject. If the caller is not an admin or the subject is not set, a one-time password will be returned for logged-in subject. Returns a string.

Parameters
subjectstr, optional

The subject.

ttlint, optional

Indicates the approximate remaining lifetime of this token, in seconds.

get_scope(username, repos)[source]

Gets the auth scope. Returns a list of Scope objects.

Parameters
usernamestr

A string specifying the principal (some of which belong to robots rather than users, but the name is preserved for now to provide compatibility with the pachyderm dash) whose access level is queried. To query the access level of a robot user, the caller must prefix username with “robot:”. If username has no prefix (i.e. no “:”), then it’s assumed to be a github user’s principal.

reposList[str]

A list of strings specifying the objects to which `username`s access level is being queried

get_users(group)[source]

Gets which users below to the given. Returns a list of strings.

Parameters
groupstr

The group to list users for.

modify_admins(add=None, remove=None)[source]

Adds and/or removes admins.

Parameters
addList[str], optional

A list of strings specifying admins to add.

removeList[str], optional

A list of strings specifying admins to remove.

modify_cluster_role_binding(principal, roles=None)[source]

Sets the list of admin roles for a principal.

Parameters
principalstr, optional

A string specifying the principal.

rolesClusterRoles protobuf

A ClusterRoles object specifying cluster-wide permissions the principal has. If unspecified, all roles are revoked for the principal.

modify_members(group, add=None, remove=None)[source]

Adds and/or removes members of a group.

Parameters
groupstr

The group to modify.

addList[str], optional

A list of strings specifying members to add.

removeList[str], optional

A list of strings specifying members to remove.

restore_auth_token(token=None)[source]

This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

revoke_auth_token(token)[source]

Revokes an auth token.

Parameters
tokenstr

Indicates the Pachyderm token that is being revoked.

set_acl(repo, entries)[source]

Sets the ACL of a repo.

Parameters
repostr

The repo to set an ACL on.

entriesList[ACLEntry protobuf]

A list of ACLEntry objects.

set_auth_configuration(configuration)[source]

Set the auth configuration.

Parameters
configAuthConfig protobuf

The auth configuration.

set_groups_for_user(username, groups)[source]

Sets the group membership for a user.

Parameters
usernamestr

The username.

groupsList[str]

The groups to add username to.

set_scope(username, repo, scope)[source]

Set the auth scope.

Parameters
usernamestr

A string specifying the principal (some of which belong to robots rather than users, but the name is preserved for now to provide compatibility with the pachyderm dash) whose access level is queried. To query the access level of a robot user, the caller must prefix username with “robot:”. If ‘username’ has no prefix (i.e. no “:”), then it’s assumed to be a github user’s principal.

repostr

A string specifying the object to which `username`s access level is being granted/revoked.

scopeint

The access level that username will now have. See the Scope enum for variants.

who_am_i()[source]

Returns info about the user tied to this Client.

python_pachyderm.mixin.debug

class python_pachyderm.mixin.debug.DebugMixin[source]

Methods

binary([filter])

Gets the pachd binary.

dump([filter, limit])

Gets a debug dump.

profile_cpu(duration[, filter])

Gets a CPU profile.

binary(filter=None)[source]

Gets the pachd binary. Yields byte arrays.

Parameters
filterFilter protobuf, optional

An optional Filter object.

dump(filter=None, limit=None)[source]

Gets a debug dump. Yields byte arrays.

Parameters
filterFilter protobuf, optional

An optional Filter object.

limitint, optional

Limits the number of commits/jobs returned for each repo/pipeline in the dump

profile_cpu(duration, filter=None)[source]

Gets a CPU profile. Yields byte arrays.

Parameters
durationDuration protobuf

A Duration object specifying how long to run the CPU profiler.

filterFilter protobuf, optional

An optional Filter object.

python_pachyderm.mixin.enterprise

class python_pachyderm.mixin.enterprise.EnterpriseMixin[source]

Methods

activate_enterprise(activation_code[, expires])

Activates enterprise.

deactivate_enterprise()

Deactivates enterprise.

get_activation_code()

Returns the enterprise code used to activate Pachdyerm Enterprise in this cluster.

get_enterprise_state()

Gets the current enterprise state of the cluster.

activate_enterprise(activation_code, expires=None)[source]

Activates enterprise. Returns a TokenInfo object.

Parameters
activation_codestr

Specifies a Pachyderm enterprise activation code. New users can obtain trial activation codes.

expiresTimestamp protobuf, optional

An optional Timestamp object indicating when this activation code will expire. This should not generally be set (it’s primarily used for testing), and is only applied if it’s earlier than the signed expiration time in activation_code.

deactivate_enterprise()[source]

Deactivates enterprise.

get_activation_code()[source]

Returns the enterprise code used to activate Pachdyerm Enterprise in this cluster.

get_enterprise_state()[source]

Gets the current enterprise state of the cluster. Returns a GetEnterpriseResponse object.

python_pachyderm.mixin.health

class python_pachyderm.mixin.health.HealthMixin[source]

Methods

health()

Returns a health check indicating if the server can handle RPCs.

health()[source]

Returns a health check indicating if the server can handle RPCs.

python_pachyderm.mixin.pfs

class python_pachyderm.mixin.pfs.AtomicOp(commit, path, **kwargs)[source]

Represents an operation in a PutFile call.

Methods

reqs()

Yields one or more protobuf PutFileRequests, which are then enqueued into the request's channel.

reqs()[source]

Yields one or more protobuf PutFileRequests, which are then enqueued into the request’s channel.

class python_pachyderm.mixin.pfs.AtomicPutFileobjOp(commit, path, value, **kwargs)[source]

A PutFile operation to put a file from a file-like object.

Methods

reqs()

Yields one or more protobuf PutFileRequests, which are then enqueued into the request's channel.

reqs()[source]

Yields one or more protobuf PutFileRequests, which are then enqueued into the request’s channel.

class python_pachyderm.mixin.pfs.AtomicPutFilepathOp(commit, pfs_path, local_path, **kwargs)[source]

A PutFile operation to put a file locally stored at a given path. This file is opened on-demand, which helps with minimizing the number of open files.

Methods

reqs()

Yields one or more protobuf PutFileRequests, which are then enqueued into the request's channel.

reqs()[source]

Yields one or more protobuf PutFileRequests, which are then enqueued into the request’s channel.

class python_pachyderm.mixin.pfs.PFSFile(res)[source]

The contents of a file stored in PFS.

Examples

You can treat these as either file-like objects, like so:

>>> source_file = client.get_file("montage/master", "/montage.png")
>>> with open("montage.png", "wb") as dest_file:
>>>     shutil.copyfileobj(source_file, dest_file)

Or as an iterator of bytes, like so:

>>> source_file = client.get_file("montage/master", "/montage.png")
>>> with open("montage.png", "wb") as dest_file:
>>>     for chunk in source_file:
>>>         dest_file.write(chunk)

Methods

close()

Closes the PFSFile

read([size])

Reads from the PFSFile buffer.

close()[source]

Closes the PFSFile

read(size=- 1)[source]

Reads from the PFSFile buffer.

Parameters
sizeint, optional

The number of bytes to read from the buffer.

class python_pachyderm.mixin.pfs.PFSMixin[source]

Methods

commit(repo_name[, branch, parent, description])

A context manager for running operations within a commit.

copy_file(source_commit, source_path, ...[, ...])

Efficiently copies files already in PFS.

create_branch(repo_name, branch_name[, ...])

Creates a new branch.

create_repo(repo_name[, description, update])

Creates a new Repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single Repo for an entire project you might have separate ``Repo``s for logs, metrics, database dumps etc.

create_tmp_file_set()

Creates a temporary fileset (used internally).

delete_all_repos([force])

Deletes all repos.

delete_branch(repo_name, branch_name[, force])

Deletes a branch, but leaves the commits themselves intact.

delete_commit(commit)

Deletes a commit.

delete_file(commit, path)

Deletes a file from a Commit.

delete_repo(repo_name[, force, ...])

Deletes a repo and reclaims the storage space it was using.

diff_file(new_commit, new_path[, ...])

Diffs two files.

finish_commit(commit[, description, ...])

Ends the process of committing data to a Repo and persists the Commit.

flush_commit(commits[, repos])

Blocks until all of the commits which have a set of commits as provenance have finished.

fsck([fix])

Performs a file system consistency check for PFS.

get_file(commit, path[, offset_bytes, ...])

Returns a PFSFile object, containing the contents of a file stored in PFS.

glob_file(commit, pattern)

Lists files that match a glob pattern.

inspect_branch(repo_name, branch_name)

Inspects a branch.

inspect_commit(commit[, block_state])

Inspects a commit.

inspect_file(commit, path)

Inspects a file.

inspect_repo(repo_name)

Returns info about a specific repo.

list_branch(repo_name[, reverse])

Lists the active branch objects on a repo.

list_commit(repo_name[, to_commit, ...])

Lists commits.

list_file(commit, path[, history, ...])

list_repo()

Returns info about all repos, as a list of RepoInfo objects.

put_file_bytes(commit, path, value[, ...])

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

put_file_client()

A context manager that gives a PutFileClient.

put_file_url(commit, path, url[, delimiter, ...])

Puts a file using the content found at a URL.

renew_tmp_file_set(fileset_id, ttl_seconds)

Renews a temporary fileset (used internally).

start_commit(repo_name[, branch, parent, ...])

Begins the process of committing data to a Repo.

subscribe_commit(repo_name, branch[, ...])

Yields CommitInfo objects as commits occur.

walk_file(commit, path)

Walks over all descendant files in a directory.

commit(repo_name, branch=None, parent=None, description=None)[source]

A context manager for running operations within a commit.

Parameters
repo_namestr

The name of the repo.

branchstr, optional

The branch name. This is a more convenient way to build linear chains of commits. When a commit is started with a non-empty branch the value of branch becomes an alias for the created Commit. This enables a more intuitive access pattern. When the commit is started on a branch the previous head of the branch is used as the parent of the commit.

parentUnion[tuple, str, Commit probotuf], optional

An optional Commit object specifying the parent commit. Upon creation the new commit will appear identical to the parent commit, data can safely be added to the new commit without affecting the contents of the parent commit.

descriptionstr, optional

Description of the commit.

copy_file(source_commit, source_path, dest_commit, dest_path, overwrite=None)[source]

Efficiently copies files already in PFS. Note that the destination repo cannot be an output repo, or the copy operation will (as of 1.9.0) silently fail.

Parameters
source_commitUnion[tuple, str, Commit protobuf]

Represents the commit with the source file.

source_pathstr

The path of the source file.

dest_commitUnion[tuple, str, Commit protobuf]

Represents the commit for the destination file.

dest_pathstr

The path of the destination file.

overwritebool, optional

Whether to overwrite the destination file if it already exists.

create_branch(repo_name, branch_name, commit=None, provenance=None, trigger=None)[source]

Creates a new branch.

Parameters
repo_namestr

The name of the repo.

branch_namestr

The new branch name.

commitUnion[tuple, str, Commit protobuf], optional

Represents the head commit of the new branch.

provenanceList[Branch protobuf], optional

An optional iterable of Branch objects representing the branch provenance.

triggerTrigger protobuf, optional

An optional Trigger object controlling when the head of branch_name is moved.

create_repo(repo_name, description=None, update=None)[source]

Creates a new Repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single Repo for an entire project you might have separate ``Repo``s for logs, metrics, database dumps etc.

Parameters
repo_namestr

Name of the repo.

descriptionstr, optional

Description of the repo.

updatebool, optional

Whether to update if the repo already exists.

create_tmp_file_set()[source]

Creates a temporary fileset (used internally). Currently, temp-fileset-related APIs are only used for Pachyderm internals (job merging), so we’re avoiding support for these functions until we find a use for them (feel free to file an issue in github.com/pachyderm/pachyderm)

delete_all_repos(force=None)[source]

Deletes all repos.

Parameters
forcebool, optional

If set to true, the repo will be removed regardless of errors. This argument should be used with care.

delete_branch(repo_name, branch_name, force=None)[source]

Deletes a branch, but leaves the commits themselves intact. In other words, those commits can still be accessed via commit IDs and other branches they happen to be on.

Parameters
repo_namestr

The repo name.

branch_namestr

The name of the branch to delete.

forcebool, optional

Whether to force the branch deletion.

delete_commit(commit)[source]

Deletes a commit.

Parameters
commitUnion[tuple, str, Commit protobuf]

The commit to delete.

delete_file(commit, path)[source]

Deletes a file from a Commit. DeleteFile leaves a tombstone in the Commit, assuming the file isn’t written to later attempting to get the file from the finished commit will result in not found error. The file will of course remain intact in the Commit’s parent.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path to the file.

delete_repo(repo_name, force=None, split_transaction=None)[source]

Deletes a repo and reclaims the storage space it was using.

Parameters
repo_namestr

The name of the repo.

forcebool, optional

If set to true, the repo will be removed regardless of errors. This argument should be used with care.

split_transactionbool, optional

Controls whether Pachyderm attempts to delete the entire repo in a single database transaction. Setting this to True can work around certain Pachyderm errors, but, if set, the delete_repo() call may need to be retried.

diff_file(new_commit, new_path, old_commit=None, old_path=None, shallow=None)[source]

Diffs two files. If old_commit or old_path are not specified, the same path in the parent of the file specified by new_commit and new_path will be used.

Parameters
new_commitUnion[tuple, str, Commit protobuf]

Represents the commit for the new file.

new_pathstr

The path of the new file.

old_commitUnion[tuple, str, Commit protobuf]

Represents the commit for the old file.

old_pathstr

The path of the old file.

shallowbool, optional

Whether to do a shallow diff.

finish_commit(commit, description=None, input_tree_object_hash=None, tree_object_hashes=None, datum_object_hash=None, size_bytes=None, empty=None)[source]

Ends the process of committing data to a Repo and persists the Commit. Once a Commit is finished the data becomes immutable and future attempts to write to it with PutFile will error.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

descriptionstr, optional

Description of this commit.

input_tree_object_hashstr, optional

Specifies an input tree object hash.

tree_object_hashesList[str], optional

A list of zero or more strings specifying object hashes for the output trees.

datum_object_hashstr, optional

Specifies an object hash.

size_bytesint, optional

An optional int.

emptybool, optional

If set, the commit will be closed (its finished field will be set to the current time) but its tree will be left None.

flush_commit(commits, repos=None)[source]

Blocks until all of the commits which have a set of commits as provenance have finished. For commits to be considered they must have all of the specified commits as provenance. This in effect waits for all of the jobs that are triggered by a set of commits to complete. It returns an error if any of the commits it’s waiting on are cancelled due to one of the jobs encountering an error during runtime. Note that it’s never necessary to call FlushCommit to run jobs, they’ll run no matter what, FlushCommit just allows you to wait for them to complete and see their output once they do. This returns an iterator of CommitInfo objects.

Yields CommitInfo objects.

Parameters
commitsList[Union[tuple, str, Commit protobuf]]

The commits to flush.

reposList[str], optional

An optional list of strings specifying repo names. If specified, only commits within these repos will be flushed.

fsck(fix=None)[source]

Performs a file system consistency check for PFS.

get_file(commit, path, offset_bytes=None, size_bytes=None)[source]

Returns a PFSFile object, containing the contents of a file stored in PFS.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path of the file.

offset_bytesint, optional

Specifies the number of bytes that should be skipped in the beginning of the file.

size_bytesint, optional

Limits the total amount of data returned, note you will get fewer bytes than size_bytes if you pass a value larger than the size of the file. If 0, then all of the data will be returned.

glob_file(commit, pattern)[source]

Lists files that match a glob pattern. Yields FileInfo objects.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

patternstr

The glob pattern.

inspect_branch(repo_name, branch_name)[source]

Inspects a branch. Returns a BranchInfo object.

Parameters
repo_namestr

The repo name.

branch_namestr

The branch name.

inspect_commit(commit, block_state=None)[source]

Inspects a commit. Returns a CommitInfo object.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

block_stateint, optional

Causes this method to block until the commit is in the desired commit state. See the CommitState enum.

inspect_file(commit, path)[source]

Inspects a file. Returns a FileInfo object.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path to the file.

inspect_repo(repo_name)[source]

Returns info about a specific repo. Returns a RepoInfo object.

Parameters
repo_namestr

Name of the repo.

list_branch(repo_name, reverse=None)[source]

Lists the active branch objects on a repo. Returns a list of BranchInfo objects.

Parameters
repo_namestr

The repo name.

reversebool, optional

If true, returns branches oldest to newest.

list_commit(repo_name, to_commit=None, from_commit=None, number=None, reverse=None)[source]

Lists commits. Yields CommitInfo objects.

Parameters
repo_namestr

If only repo_name is given, all commits in the repo are returned.

to_commitUnion[tuple, str, Commit protobuf], optional

Only the ancestors of to, including to itself, are considered.

from_commitUnion[tuple, str, Commit protobuf], optional

Only the descendants of from, including from itself, are considered.

numberint, optional

Determines how many commits are returned. If number is 0, all commits that match the aforementioned criteria are returned.

reversebool, optional

If true, returns commits oldest to newest.

list_file(commit, path, history=None, include_contents=None)[source]

Lists the files in a directory.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path to the directory.

historyint, optional

Indicates how many historical versions you want returned. Semantics are:

  • 0: Return the files as they are in commit

  • 1: Return above and the files as they are in the last commit they were modified in.

  • 2: etc.

  • -1: Return all historical versions.

include_contentsbool, optional

If True, file contents are included.

list_repo()[source]

Returns info about all repos, as a list of RepoInfo objects.

put_file_bytes(commit, path, value, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path in the repo the file(s) will be written to.

valueUnion[bytes, BinaryIO]

The file contents as bytes, represented as a file-like object, bytestring, or iterator of bytestrings.

delimiterint, optional

Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.

target_file_datumsint, optional

Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.

target_file_bytesint, optional

Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.

overwrite_indexint, optional

This is the object index where the write starts from. All existing objects starting from the index are deleted.

header_recordsint, optional

An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_client()[source]

A context manager that gives a PutFileClient. When the context manager exits, any operations enqueued from the PutFileClient are executed in a single, atomic PutFile call.

put_file_url(commit, path, url, delimiter=None, recursive=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Puts a file using the content found at a URL. The URL is sent to the server which performs the request.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path in the repo the file will be written to.

urlstr

The url of the file to put.

delimiterint, optional

Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.

recursivebool, optional

Allow for recursive scraping of some types URLs, for example on s3:// URLs.

target_file_datumsint, optional

Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.

target_file_bytesint, optional

Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.

overwrite_indexint, optional

This is the object index where the write starts from. All existing objects starting from the index are deleted.

header_recordsint, optional

An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

renew_tmp_file_set(fileset_id, ttl_seconds)[source]

Renews a temporary fileset (used internally). Currently, temp-fileset-related APIs are only used for Pachyderm internals (job merging), so we’re avoiding support for these functions until we find a use for them (feel free to file an issue in github.com/pachyderm/pachyderm)

Parameters
fileset_idstr

The fileset ID.

ttl_secondsint

The number of seconds to keep alive the temporary fileset.

start_commit(repo_name, branch=None, parent=None, description=None, provenance=None)[source]

Begins the process of committing data to a Repo. Once started you can write to the Commit with PutFile and when all the data has been written you must finish the Commit with FinishCommit. NOTE, data is not persisted until FinishCommit is called. A Commit object is returned.

Parameters
repo_namestr

The name of the repo.

branchstr, optional

The branch name. This is a more convenient way to build linear chains of commits. When a commit is started with a non-empty branch the value of branch becomes an alias for the created Commit. This enables a more intuitive access pattern. When the commit is started on a branch the previous head of the branch is used as the parent of the commit.

parentUnion[tuple, str, Commit probotuf], optional

An optional Commit object specifying the parent commit. Upon creation the new commit will appear identical to the parent commit, data can safely be added to the new commit without affecting the contents of the parent commit.

descriptionstr, optional

Description of the commit.

provenanceList[CommitProvenance protobuf], optional

An optional iterable of CommitProvenance objects specifying the commit provenance.

subscribe_commit(repo_name, branch, from_commit_id=None, state=None, prov=None)[source]

Yields CommitInfo objects as commits occur.

Parameters
repo_namestr

The name of the repo.

branchstr

The branch to subscribe to.

from_commit_idstr, optional

A commit ID. Only commits created since this commit are returned.

stateint, optional

The commit state to filter on. See the CommitState enum.

provCommitProvenance protobuf, optional

An optional CommitProvenance object.

walk_file(commit, path)[source]

Walks over all descendant files in a directory. Returns a generator of FileInfo objects.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path to the directory.

class python_pachyderm.mixin.pfs.PutFileClient[source]

PutFileClient puts or deletes PFS files atomically.

Methods

delete_file(commit, path)

Deletes a file.

put_file_from_bytes(commit, path, value[, ...])

Uploads a PFS file from a bytestring.

put_file_from_fileobj(commit, path, value[, ...])

Uploads a PFS file from a file-like object.

put_file_from_filepath(commit, pfs_path, ...)

Uploads a PFS file from a local path at a specified path.

put_file_from_url(commit, path, url[, ...])

Puts a file using the content found at a URL.

delete_file(commit, path)[source]

Deletes a file.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path to the file.

put_file_from_bytes(commit, path, value, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a bytestring.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path in the repo to upload the file to will be written to.

valuebytes

The file contents as a bytestring.

delimiterint, optional

Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.

target_file_datumsint, optional

Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.

target_file_bytesint, optional

Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.

overwrite_indexint, optional

This is the object index where the write starts from. All existing objects starting from the index are deleted.

header_recordsint, optional

An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_from_fileobj(commit, path, value, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a file-like object.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path in the repo to upload the file to will be written to.

valueBinaryIO

The file-like object.

delimiterint, optional

Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.

target_file_datumsint, optional

Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.

target_file_bytesint, optional

Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.

overwrite_indexint, optional

This is the object index where the write starts from. All existing objects starting from the index are deleted.

header_recordsint, optional

An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_from_filepath(commit, pfs_path, local_path, delimiter=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Uploads a PFS file from a local path at a specified path. This will lazily open files, which will prevent too many files from being opened, or too much memory being consumed, when atomically putting many files.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pfs_pathstr

The path in the repo to upload the file to will be written to.

local_pathstr

The local file path.

delimiterint, optional

Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.

target_file_datumsint, optional

Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.

target_file_bytesint, optional

Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.

overwrite_indexint, optional

This is the object index where the write starts from. All existing objects starting from the index are deleted.

header_recordsint, optional

An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

put_file_from_url(commit, path, url, delimiter=None, recursive=None, target_file_datums=None, target_file_bytes=None, overwrite_index=None, header_records=None)[source]

Puts a file using the content found at a URL. The URL is sent to the server which performs the request.

Parameters
commitUnion[tuple, str, Commit protobuf]

Represents the commit.

pathstr

The path in the repo the file will be written to.

urlstr

The url of the file to put.

delimiterint, optional

Causes data to be broken up into separate files by the delimiter e.g. if you used Delimiter.CSV.value, a separate PFS file will be created for each row in the input CSV file, rather than one large CSV file.

recursivebool, optional

Allow for recursive scraping of some types URLs, for example on s3:// URLs.

target_file_datumsint, optional

Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0.

target_file_bytesint, optional

Specifies the target number of bytes in each written file, file may have more or fewer bytes than the target.

overwrite_indexint, optional

This is the object index where the write starts from. All existing objects starting from the index are deleted.

header_recordsint, optional

An optional int for splitting data when delimiter is not NONE (or SQL). It specifies the number of records that are converted to a header and applied to all file shards.

python_pachyderm.mixin.pfs.put_file_from_fileobj_reqs(fileish, **kwargs)[source]
python_pachyderm.mixin.pfs.put_file_from_iterable_reqs(value, **kwargs)[source]

python_pachyderm.mixin.pps

class python_pachyderm.mixin.pps.PPSMixin[source]

Methods

create_pipeline(pipeline_name, transform[, ...])

Creates a pipeline.

create_pipeline_from_request(req)

Creates a pipeline from a CreatePipelineRequest object.

create_secret(secret_name, data[, labels, ...])

Creates a new secret.

create_tf_job_pipeline(pipeline_name, tf_job)

Creates a pipeline.

delete_all()

Deletes everything in Pachyderm.

delete_all_pipelines([force])

Deletes all pipelines.

delete_job(job_id)

Deletes a job by its ID.

delete_pipeline(pipeline_name[, force, ...])

Deletes a pipeline.

delete_secret(secret_name)

Deletes a secret.

flush_job(commits[, pipeline_names])

Blocks until all of the jobs which have a set of commits as provenance have finished.

garbage_collect([memory_bytes])

Runs garbage collection.

get_job_logs(job_id[, data_filters, datum, ...])

Gets logs for a job.

get_pipeline_logs(pipeline_name[, ...])

Gets logs for a pipeline.

inspect_datum(job_id, datum_id)

Inspects a datum.

inspect_job(job_id[, block_state, ...])

Inspects a job with a given ID.

inspect_pipeline(pipeline_name[, history])

inspect_secret(secret_name)

Inspects a secret.

list_datum([job_id, page_size, page, input, ...])

Lists datums.

list_job([pipeline_name, input_commit, ...])

list_pipeline([history, allow_incomplete, ...])

list_secret()

Lists secrets.

restart_datum(job_id[, data_filters])

Restarts a datum.

run_cron(pipeline_name)

Explicitly triggers a pipeline with one or more cron inputs to run now.

run_pipeline(pipeline_name[, provenance, job_id])

Runs a pipeline.

start_pipeline(pipeline_name)

Starts a pipeline.

stop_job(job_id)

Stops a job by its ID.

stop_pipeline(pipeline_name)

Stops a pipeline.

create_pipeline(pipeline_name, transform, parallelism_spec=None, hashtree_spec=None, egress=None, update=None, output_branch=None, resource_requests=None, resource_limits=None, input=None, description=None, cache_size=None, enable_stats=None, reprocess=None, max_queue_size=None, service=None, chunk_spec=None, datum_timeout=None, job_timeout=None, salt=None, standby=None, datum_tries=None, scheduling_spec=None, pod_patch=None, spout=None, spec_commit=None, metadata=None, s3_out=None, sidecar_resource_limits=None, reprocess_spec=None, autoscaling=None)[source]

Creates a pipeline. For more info, please refer to the pipeline spec document: http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html

Parameters
pipeline_namestr

The pipeline name.

transformTransform protobuf

A Transform object.

parallelism_specParallelismSpec protobuf, optional

An optional ParallelismSpec object.

hashtree_specHashtreeSpec protobuf, optional

An optional HashtreeSpec object.

egressEgress protobuf, optional

An optional Egress object.

updatebool, optional

Whether this should behave as an upsert.

output_branchstr, optional

The branch to output results on.

resource_requestsResourceSpec protobuf, optional

An optional ResourceSpec object.

resource_limitsResourceSpec protobuf, optional

An optional ResourceSpec object.

inputInput protobuf, optional

An optional Input object.

descriptionstr, optional

Description of the pipeline.

cache_sizestr, optional

An optional string.

enable_statsbool, optional

An optional bool.

reprocessbool, optional

If true, Pachyderm forces the pipeline to reprocess all datums. It only has meaning if update is True.

max_queue_sizeint, optional

An optional int.

serviceService protobuf, optional

An optional Service object.

chunk_specChunkSpec protobuf, optional

An optional ChunkSpec object.

datum_timeoutDuration protobuf, optional

An optional Duration object.

job_timeoutDuration protobuf, optional

An optional Duration object.

saltstr, optional

An optional string.

standbybool, optional

An optional bool.

datum_triesint, optional

An optional int.

scheduling_specSchedulingSpec protobuf, optional

An optional SchedulingSpec object.

pod_patchstr, optional

An optional string.

spoutSpout protobuf, optional

An optional Spout object.

spec_commitCommit protobuf, optional

An optional Commit object.

metadataMetadata protobuf, optional

An optional Metadata object.

s3_outbool, optional

Unused.

sidecar_resource_limitsResourceSpec protobuf, optional

An optional ResourceSpec setting resource limits for the pipeline sidecar.

create_pipeline_from_request(req)[source]

Creates a pipeline from a CreatePipelineRequest object. Usually this would be used in conjunction with util.parse_json_pipeline_spec() or util.parse_dict_pipeline_spec(). If you’re in pure python and not working with a pipeline spec file, the sibling method create_pipeline() is more ergonomic.

Parameters
reqCreatePipelineRequest protobuf

A CreatePipelineRequest object.

create_secret(secret_name, data, labels=None, annotations=None)[source]

Creates a new secret.

Parameters
secret_namestr

The name of the secret to create.

dataDict[str, Union[str, bytes]]

The data to store in the secret. Each key must consist of alphanumeric characters -, _ or ..

labelsDict[str, str], optional

Kubernetes labels to attach to the secret.

annotationsDict[str, str], optional

Kubernetes annotations to attach to the secret.

create_tf_job_pipeline(pipeline_name, tf_job, parallelism_spec=None, hashtree_spec=None, egress=None, update=None, output_branch=None, scale_down_threshold=None, resource_requests=None, resource_limits=None, input=None, description=None, cache_size=None, enable_stats=None, reprocess=None, max_queue_size=None, service=None, chunk_spec=None, datum_timeout=None, job_timeout=None, salt=None, standby=None, datum_tries=None, scheduling_spec=None, pod_patch=None, spout=None, spec_commit=None)[source]

Creates a pipeline. For more info, please refer to the pipeline spec document: http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html

Parameters
pipeline_namestr

The pipeline name.

tf_jobTFJob protobuf

Pachyderm uses this to create TFJobs when running in a Kubernetes cluster on which kubeflow has been installed.

parallelism_specParallelismSpec protobuf, optional

An optional ParallelismSpec object.

hashtree_specHashtreeSpec protobuf, optional

An optional HashtreeSpec object.

egressEgress protobuf, optional

An optional Egress object.

updatebool, optional

Whether this should behave as an upsert.

output_branchstr, optional

The branch to output results on.

scale_down_thresholdDuration protobuf, optional

An optional Duration object.

resource_requestsResourceSpec protobuf, optional

An optional ResourceSpec object.

resource_limitsResourceSpec protobuf, optional

An optional ResourceSpec object.

inputInput protobuf, optional

An optional Input object.

descriptionstr, optional

Description of the pipeline.

cache_sizestr, optional

An optional string.

enable_statsbool, optional

An optional bool.

reprocessbool, optional

If true, Pachyderm forces the pipeline to reprocess all datums. It only has meaning if update is True.

max_queue_sizeint, optional

An optional int.

serviceService protobuf, optional

An optional Service object.

chunk_specChunkSpec protobuf, optional

An optional ChunkSpec object.

datum_timeoutDuration protobuf, optional

An optional Duration object.

job_timeoutDuration protobuf, optional

An optional Duration object.

saltstr, optional

An optional string.

standbybool, optional

An optional bool.

datum_triesint, optional

An optional int.

scheduling_specSchedulingSpec protobuf, optional

An optional SchedulingSpec object.

pod_patchstr, optional

An optional string.

spoutSpout protobuf, optional

An optional Spout object.

spec_commitCommit protobuf, optional

An optional Commit object.

delete_all()[source]

Deletes everything in Pachyderm.

delete_all_pipelines(force=None)[source]

Deletes all pipelines.

Parameters
forcebool, optional

Whether to force delete.

delete_job(job_id)[source]

Deletes a job by its ID.

Parameters
job_idstr

The ID of the job to delete.

delete_pipeline(pipeline_name, force=None, keep_repo=None, split_transaction=None)[source]

Deletes a pipeline.

Parameters
pipeline_namestr

The pipeline name.

forcebool, optional

Whether to force delete.

keep_repobool, optional

Whether to keep the output repo.

split_transactionbool, optional

Whether Pachyderm attempts to delete the pipeline in a single database transaction. Setting this to True can work around certain Pachyderm errors, but, if set, the ``delete_repo()` call may need to be retried.

delete_secret(secret_name)[source]

Deletes a secret.

Parameters
secret_namestr

The name of the secret to delete.

flush_job(commits, pipeline_names=None)[source]

Blocks until all of the jobs which have a set of commits as provenance have finished. Yields JobInfo objects.

Parameters
commitsList[Union[tuple, str, Commit protobuf]]

A list representing the commits to flush.

pipeline_namesList[str], optional

A list of strings specifying pipeline names. If specified, only jobs within these pipelines will be flushed.

garbage_collect(memory_bytes=None)[source]

Runs garbage collection.

Parameters
memory_bytesint, optional

How much memory to use in computing which objects are alive. A larger number will result in more precise garbage collection (at the cost of more memory usage).

get_job_logs(job_id, data_filters=None, datum=None, follow=None, tail=None, use_loki_backend=None, since=None)[source]

Gets logs for a job. Yields LogMessage objects.

Parameters
job_idstr

The ID of the job.

data_filtersList[str], optional

A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).

datumDatum protobuf, optional

Filters log lines for the specified datum.

followbool, optional

If true, continue to follow new logs as they appear.

tailint, optional

If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.

use_loki_backendbool, optional

If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.

sinceDuration protobuf, optional

Specifies how far in the past to return logs from.

get_pipeline_logs(pipeline_name, data_filters=None, master=None, datum=None, follow=None, tail=None, use_loki_backend=None, since=None)[source]

Gets logs for a pipeline. Yields LogMessage objects.

Parameters
pipeline_namestr

The name of the pipeline.

data_filtersList[str], optional

A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).

masterbool, optional

If true, includes logs from the master

datumDatum protobuf, optional

Filters log lines for the specified datum.

followbool, optional

If true, continue to follow new logs as they appear.

tailint, optional

If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.

use_loki_backendbool, optional

If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.

sinceDuration protobuf, optional

Specifies how far in the past to return logs from.

inspect_datum(job_id, datum_id)[source]

Inspects a datum. Returns a DatumInfo object.

Parameters
job_idstr

The ID of the job.

datum_idstr

The ID of the datum.

inspect_job(job_id, block_state=None, output_commit=None, full=None)[source]

Inspects a job with a given ID. Returns a JobInfo.

Parameters
job_idstr

The ID of the job to inspect.

block_statebool, optional

If true, block until the job completes.

output_commitUnion[tuple, str, Commit protobuf], optional

Represents an output commit to filter on.

fullbool, optional

If true, include worker status.

inspect_pipeline(pipeline_name, history=None)[source]

Inspects a pipeline. Returns a PipelineInfo object.

Parameters
pipeline_namestr

The pipeline name.

historyint, optional

Indicates to return historical versions of pipelines. Semantics are:

  • 0: Return current version of pipelines.

  • 1: Return the above and pipelines from the next most recent version.

  • 2: etc.

  • -1: Return pipelines from all historical versions.

inspect_secret(secret_name)[source]

Inspects a secret.

Parameters
secret_namestr

The name of the secret to inspect.

list_datum(job_id=None, page_size=None, page=None, input=None, status_only=None)[source]

Lists datums. Yields ListDatumStreamResponse objects.

Parameters
job_idstr, optional

The ID of a job. Exactly one of job_id (real) or input (hypothetical) must be set.

page_sizeint, optional

The size of the page.

pageint, optional

The page number.

inputInput protobuf, optional

If set in lieu of job_id, list_datum() returns the datums that would be given to a hypothetical job that used input as its input spec. Exactly one of job_id (real) or input (hypothetical) must be set.

list_job(pipeline_name=None, input_commit=None, output_commit=None, history=None, full=None, jqFilter=None)[source]

Lists jobs. Yields JobInfo objects.

Parameters
pipeline_namestr, optional

A pipeline name to filter on.

input_commitList[Union[tuple, str, Commit protobuf]], optional

An optional list representing input commits to filter on.

output_commitUnion[tuple, str, Commit protobuf], optional

Represents an output commit to filter on.

historyint, optional

Indicates to return jobs from historical versions of pipelines. Semantics are:

  • 0: Return jobs from the current version of the pipeline or pipelines.

  • 1: Return the above and jobs from the next most recent version

  • 2: etc.

  • -1: Return jobs from all historical versions.

fullbool, optional

Whether the result should include all pipeline details in each JobInfo, or limited information including name and status, but excluding information in the pipeline spec. Leaving this None (or False) can make the call significantly faster in clusters with a large number of pipelines and jobs. Note that if input_commit is set, this field is coerced to True.

jqFilterstr, optional

A jq filter that can restrict the list of jobs returned.

list_pipeline(history=None, allow_incomplete=None, jqFilter=None)[source]

Lists pipelines. Returns a PipelineInfos object.

Parameters
historyint, optional

Indicates to return historical versions of pipelines. Semantics are:

  • 0: Return current version of pipelines.

  • 1: Return the above and pipelines from the next most recent version.

  • 2: etc.

  • -1: Return pipelines from all historical versions.

allow_incompletebool, optional

If True, causes list_pipeline() to return PipelineInfos with incomplete data where the pipeline spec cannot be retrieved. Incomplete PipelineInfos will have a None Transform field, but will have the fields present in EtcdPipelineInfo.

jqFilterstr, optional

A jq filter that can restrict the list of pipelines returned.

list_secret()[source]

Lists secrets. Returns a list of SecretInfo objects.

restart_datum(job_id, data_filters=None)[source]

Restarts a datum.

Parameters
job_idstr

The ID of the job.

data_filtersList[str], optional

An optional iterable of strings.

run_cron(pipeline_name)[source]

Explicitly triggers a pipeline with one or more cron inputs to run now.

Parameters
pipeline_namestr

The pipeline name.

run_pipeline(pipeline_name, provenance=None, job_id=None)[source]

Runs a pipeline.

Parameters
pipeline_namestr

The pipeline name.

provenanceList[CommitProvenance protobuf], optional

A list representing the pipeline execution provenance.

job_idstr, optional

A specific job ID to run.

start_pipeline(pipeline_name)[source]

Starts a pipeline.

Parameters
pipeline_namestr

The pipeline name.

stop_job(job_id)[source]

Stops a job by its ID.

Parameters
job_idstr

The ID of the job to stop.

stop_pipeline(pipeline_name)[source]

Stops a pipeline.

Parameters
pipeline_namestr

The pipeline name.

python_pachyderm.mixin.pps.pipeline_inputs(root)[source]

python_pachyderm.mixin.transaction

class python_pachyderm.mixin.transaction.TransactionMixin[source]

Methods

batch_transaction(requests)

Executes a batch transaction.

delete_all_transactions()

Deletes all transactions.

delete_transaction(transaction)

Deletes a given transaction.

finish_transaction(transaction)

Finishes a given transaction.

inspect_transaction(transaction)

Inspects a given transaction.

list_transaction()

Lists transactions.

start_transaction()

Starts a transaction.

transaction()

A context manager for running operations within a transaction.

batch_transaction(requests)[source]

Executes a batch transaction.

Parameters
requestsList[TransactionRequest protobuf]

A list of TransactionRequest objects.

delete_all_transactions()[source]

Deletes all transactions.

delete_transaction(transaction)[source]

Deletes a given transaction.

Parameters
transactionUnion[str, Transaction protobuf]

Transaction ID or Transaction object.

finish_transaction(transaction)[source]

Finishes a given transaction.

Parameters
transactionUnion[str, Transaction protobuf]

Transaction ID or Transaction object.

inspect_transaction(transaction)[source]

Inspects a given transaction.

Parameters
transactionUnion[str, Transaction protobuf]

Transaction ID or Transaction object.

list_transaction()[source]

Lists transactions.

start_transaction()[source]

Starts a transaction.

transaction()[source]

A context manager for running operations within a transaction. When the context manager completes, the transaction will be deleted if an error occurred, or otherwise finished.

python_pachyderm.mixin.transaction.transaction_from(transaction)[source]

python_pachyderm.mixin.util

python_pachyderm.mixin.util.commit_from(src, allow_just_repo=False)[source]

python_pachyderm.mixin.version

class python_pachyderm.mixin.version.VersionMixin[source]

Methods

get_remote_version()

Gets version of Pachyderm server.

get_remote_version()[source]

Gets version of Pachyderm server.