Mixins

Client

class Client(host=None, port=None, auth_token=None, root_certs=None, transaction_id=None, tls=None, use_default_host=True)[source]

Bases: python_pachyderm.mixin.admin.AdminMixin, python_pachyderm.mixin.auth.AuthMixin, python_pachyderm.mixin.debug.DebugMixin, python_pachyderm.mixin.enterprise.EnterpriseMixin, python_pachyderm.mixin.health.HealthMixin, python_pachyderm.mixin.identity.IdentityMixin, python_pachyderm.mixin.license.LicenseMixin, python_pachyderm.mixin.pfs.PFSMixin, python_pachyderm.mixin.pps.PPSMixin, python_pachyderm.mixin.transaction.TransactionMixin, python_pachyderm.mixin.version.VersionMixin, object

The Client class that users will primarily interact with. Initialize an instance with python_pachyderm.Client().

To see documentation on the methods Client can call, refer to the mixins module.

Attributes
auth_token
transaction_id

Methods

activate_auth([root_token])

Activates auth on the cluster.

activate_enterprise(license_server, id, secret)

Activates enterprise by registering with a license server.

activate_license(activation_code[, expires])

Activates the license service.

add_cluster(id, address[, secret, ...])

Register a cluster with the license service.

authenticate_id_token(id_token)

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.

authenticate_oidc(oidc_state)

Authenticates a user to the Pachyderm cluster via OIDC.

authorize(resource[, permissions])

Tests a list of permissions that the user might have on a resource.

batch_transaction(requests)

Executes a batch transaction.

binary([filter])

Gets the pachd binary.

commit(repo_name, branch[, parent, description])

A context manager for running operations within a commit.

copy_file(source_commit, source_path, ...[, ...])

Efficiently copies files already in PFS.

create_branch(repo_name, branch_name[, ...])

Creates a new branch.

create_idp_connector(connector)

Create an IDP connector in the identity server.

create_oidc_client(client)

Create an OIDC client in the identity server.

create_pipeline(pipeline_name, transform[, ...])

Creates a pipeline.

create_pipeline_from_request(req)

Creates a pipeline from a CreatePipelineRequest object.

create_repo(repo_name[, description, update])

Creates a new repo object in PFS with the given name.

create_secret(secret_name, data[, labels, ...])

Creates a new secret.

deactivate_auth()

Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

deactivate_enterprise()

Deactivates enterprise.

delete_all()

Delete all repos, commits, files, pipelines, and jobs.

delete_all_identity()

Delete all identity service information.

delete_all_license()

Remove all clusters and deactivate the license service.

delete_all_pipelines()

Deletes all pipelines.

delete_all_repos()

Deletes all repos.

delete_all_transactions()

Deletes all transactions.

delete_branch(repo_name, branch_name[, force])

Deletes a branch, but leaves the commits themselves intact.

delete_cluster(id)

Delete a cluster registered with the license service.

delete_file(commit, path)

Deletes a file from an open commit.

delete_idp_connector(id)

Delete an IDP connector in the identity server.

delete_job(job_id, pipeline_name)

Deletes a subjob (job at the pipeline-level).

delete_oidc_client(id)

Delete an OIDC client in the identity server.

delete_pipeline(pipeline_name[, force, ...])

Deletes a pipeline.

delete_repo(repo_name[, force])

Deletes a repo and reclaims the storage space it was using.

delete_secret(secret_name)

Deletes a secret.

delete_transaction(transaction)

Deletes a transaction.

diff_file(new_commit, new_path[, ...])

Diffs two PFS files (file = commit + path in Pachyderm) and returns files that are different.

drop_commit(commit_id)

Drops an entire commit.

dump([filter, limit])

Gets a debug dump.

extract_auth_tokens()

This maps to an internal function that is only used for migration.

finish_commit(commit[, description, error, ...])

Ends the process of committing data to a repo and persists the commit.

finish_transaction(transaction)

Finishes a transaction.

fsck([fix])

Performs a file system consistency check on PFS, ensuring the correct provenance relationships are satisfied.

get_activation_code()

Returns the enterprise code used to activate Pachyderm Enterprise in this cluster.

get_auth_configuration()

Gets the auth configuration.

get_enterprise_state()

Gets the current enterprise state of the cluster.

get_file(commit, path[, datum, URL, offset])

Gets a file from PFS.

get_file_tar(commit, path[, datum, URL, offset])

Gets a file from PFS.

get_groups()

Gets a list of groups this user belongs to.

get_identity_server_config()

Get the embedded identity server configuration.

get_idp_connector(id)

Get an IDP connector in the identity server.

get_job_logs(pipeline_name, job_id[, ...])

Gets logs for a job.

get_oidc_client(id)

Get an OIDC client in the identity server.

get_oidc_login()

Gets the OIDC login configuration.

get_pause_status()

Gets the pause status of the cluster.

get_pipeline_logs(pipeline_name[, ...])

Gets logs for a pipeline.

get_remote_version()

Gets version of Pachyderm server.

get_robot_token(robot[, ttl])

Gets a new auth token for a robot user.

get_role_binding(resource)

Returns the current set of role bindings to the resource specified.

get_roles_for_permission(permission)

Returns a list of all roles that have the specified permission.

get_users(group)

Gets users in a group.

glob_file(commit, pattern)

Lists files that match a glob pattern.

health_check()

Returns a health check indicating if the server can handle RPCs.

inspect_branch(repo_name, branch_name)

Inspects a branch.

inspect_cluster()

Inspects a cluster.

inspect_commit(commit[, commit_state])

Inspects a commit.

inspect_datum(pipeline_name, job_id, datum_id)

Inspects a datum.

inspect_file(commit, path[, datum])

Inspects a file.

inspect_job(job_id[, pipeline_name, wait, ...])

Inspects a job.

inspect_pipeline(pipeline_name[, history, ...])

inspect_repo(repo_name)

Inspects a repo.

inspect_secret(secret_name)

Inspects a secret.

inspect_transaction(transaction)

Inspects a transaction.

list_branch(repo_name[, reverse])

Lists the active branch objects in a repo.

list_clusters()

List clusters registered with the license service.

list_commit([repo_name, to_commit, ...])

Lists commits.

list_datum([pipeline_name, job_id, input])

Lists datums.

list_file(commit, path[, datum])

Lists the files in a directory.

list_idp_connectors()

List IDP connectors in the identity server.

list_job([pipeline_name, input_commit, ...])

Lists jobs.

list_oidc_clients()

List OIDC clients in the identity server.

list_pipeline([history, details, jqFilter])

list_repo([type])

Lists all repos in PFS.

list_secret()

Lists secrets.

list_transaction()

Lists unfinished transactions.

list_user_clusters()

Lists all clusters available to user.

modify_file_client(commit)

A context manager that gives a ModifyFileClient.

modify_members(group[, add, remove])

Adds and/or removes members of a group.

modify_role_binding(resource, principal[, roles])

Sets the roles for a given principal on a resource.

new_from_config(config_file)

Creates a Pachyderm client from a config file-like object.

new_from_pachd_address(pachd_address[, ...])

Creates a Pachyderm client from a given pachd address.

new_in_cluster([auth_token, transaction_id])

Creates a Pachyderm client that operates within a Pachyderm cluster.

path_exists(commit, path)

Checks whether the path exists in the specified commit, agnostic to whether path is a file or a directory.

pause_enterprise()

Pauses the cluster.

profile_cpu(duration[, filter])

Gets a CPU profile.

put_file_bytes(commit, path, value[, datum, ...])

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

put_file_url(commit, path, url[, recursive, ...])

Uploads a PFS file using the content found at a URL.

restart_datum(pipeline_name, job_id[, ...])

Restarts a datum.

restore_auth_token([token])

This maps to an internal function that is only used for migration.

revoke_auth_token(token)

Revokes an auth token.

run_cron(pipeline_name)

Triggers a cron pipeline to run now.

set_auth_configuration(configuration)

Sets the auth configuration.

set_groups_for_user(username, groups)

Sets the group membership for a user.

set_identity_server_config(config)

Configure the embedded identity server.

squash_commit(commit_id)

Squashes a commit into its parent.

start_commit(repo_name, branch[, parent, ...])

Begins the process of committing data to a repo.

start_pipeline(pipeline_name)

Starts a pipeline.

start_transaction()

Starts a transaction.

stop_job(job_id, pipeline_name[, reason])

Stops a subjob (job at the pipeline-level).

stop_pipeline(pipeline_name)

Stops a pipeline.

subscribe_commit(repo_name, branch[, ...])

Returns all commits on the branch and then listens for new commits that are created.

transaction()

A context manager for running operations within a transaction.

unpause_enterprise()

Unpauses the cluster.

update_cluster(id, address[, user_address, ...])

Update a cluster registered with the license service.

update_idp_connector(connector)

Update an IDP connector in the identity server.

update_oidc_client(client)

Update an OIDC client in the identity server.

wait_commit(commit)

Waits for the specified commit to finish.

walk_file(commit, path[, datum])

Walks over all descendant files in a directory.

who_am_i()

Returns info about the user tied to this Client.

__init__(host=None, port=None, auth_token=None, root_certs=None, transaction_id=None, tls=None, use_default_host=True)[source]

Creates a Pachyderm client. If host and port are unset, checks the PACH_CONFIG env var for a path. If that’s unset, it checks two file paths for a config file. If both files don’t exist, a client with default settings is created.

Parameters
hoststr, optional

The pachd host. Default is ‘localhost’, which is used with pachctl port-forward.

portint, optional

The port to connect to. Default is 30650.

auth_tokenstr, optional

The authentication token. Used if authentication is enabled on the cluster.

root_certsbytes, optional

The PEM-encoded root certificates as byte string.

transaction_idstr, optional

The ID of the transaction to run operations on.

tlsbool, optional

Whether TLS should be used. If root_certs are specified, they are used. Otherwise, we use the certs provided by certifi.

use_default_hostbool, optional

Whether to replicate pachctl behavior of searching for config.

Examples

>>> client = python_pachyderm.Client()
...
>>> # Manually set host and port
>>> client = python_pachyderm.Client("pachd.example.com", 12345)
delete_all()[source]

Delete all repos, commits, files, pipelines, and jobs. This resets the cluster to its initial state.

classmethod new_from_config(config_file)[source]

Creates a Pachyderm client from a config file-like object.

Parameters
config_fileTextIO

A file-like object containing the config json file.

Returns
Client

A python_pachyderm client instance.

Examples

>>> from python_pachyderm import Client
>>> config = '''{
...   "v2": {
...     "active_context": "local",
...     "contexts": {
...       "local": {
...         "pachd_address": "grpcs://172.17.0.6:30650",
...         "server_cas": "foo",
...         "session_token": "bar",
...         "active_transaction": "baz"
...       }
...     }
...   }
... }'''
>>> client = Client.new_from_config(io.StringIO(config))
classmethod new_from_pachd_address(pachd_address, auth_token=None, root_certs=None, transaction_id=None)[source]

Creates a Pachyderm client from a given pachd address.

Parameters
pachd_addressstr

The address of pachd server

auth_tokenstr, optional

The authentication token. Used if authentication is enabled on the cluster.

root_certsbytes, optional

The PEM-encoded root certificates as byte string. If unspecified, this will load default certs from certifi.

transaction_idstr, optional

The ID of the transaction to run operations on.

Returns
Client

A python_pachyderm client instance.

Examples

>>> from python_pachyderm import Client
>>> client = Client.new_from_pachd_address("grpc://pachyderm.com:80/")
...
>>> client = Client.new_from_pachd_address("https://pachyderm.com:80", root_certs=b"foo")
classmethod new_in_cluster(auth_token=None, transaction_id=None)[source]

Creates a Pachyderm client that operates within a Pachyderm cluster.

Parameters
auth_tokenstr, optional

The authentication token. Used if authentication is enabled on the cluster.

transaction_idstr, optional

The ID of the transaction to run operations on.

Returns
Client

A python_pachyderm client instance.

Examples

>>> from python_pachyderm import Client
>>> client = Client.new_in_cluster()

PFS Helper

class Commit(repo, branch=None, id=None, repo_type='user')[source]

Bases: tuple

A namedtuple subclass to specify a Commit.

Attributes
branch

Alias for field number 1

id

Alias for field number 2

repo

Alias for field number 0

repo_type

Alias for field number 3

Methods

count(value, /)

Return number of occurrences of value.

from_pb(commit)

Converts a pfs_pb2.Commit object into a Commit object.

index(value[, start, stop])

Return first index of value.

to_pb()

Converts itself into a pfs_pb2.Commit.

property branch

Alias for field number 1

static from_pb(commit)[source]

Converts a pfs_pb2.Commit object into a Commit object.

property id

Alias for field number 2

property repo

Alias for field number 0

property repo_type

Alias for field number 3

to_pb()[source]

Converts itself into a pfs_pb2.Commit.

SubcommitType

Composite type for a subcommit, a commit at the repo-level.

Examples

Tuple:

>>> sc = ("foo", "master")
>>> sc2 = ("foo", "467c580611234cdb8cc9758c7aa96087")

Dict:

>>> sc = {repo: "foo", branch: "master", repo_type: "spec"}

Commit:

>>> from python_pachyderm.pfs import Commit
>>> sc = Commit(repo="foo", branch="master")

pfs_pb2.Commit:

>>> from python_pachyderm.service import pfs_proto
>>> sc = pfs_pb2.Commit(
...     branch=pfs_pb2.Branch(
...         repo=pfs_pb2.Repo(name="foo", type="user"),
...         name="master",
...     )
... )

alias of Union[tuple, dict, python_pachyderm.pfs.Commit, python_pachyderm.proto.v2.pfs.pfs_pb2.Commit]

commit_from(commit=None)[source]

A commit can be identified by (repo, branch, commit_id, repo_type)

Helper function to convert objects that represent a Commit query into a protobuf Commit object.

Parameters
commitSubcommitType, optional

The commit representation to convert to a protobuf commit object.

Returns
pfs_pb2.Commit

A protobuf object that represents a commit.

Util Helper

parse_dict_pipeline_spec(d)[source]

Parses a dict of serialized JSON into a CreatePipelineRequest protobuf.

Parameters
ddict

Pipeline spec as a dictionary.

Returns
pps_pb2.CreatePipelineRequest

A protobuf object that contains the spec info necessary to create a pipeline.

Examples

Useful for going from Pachyderm spec to creating a pipeline. Pachyderm spec: https://docs.pachyderm.com/latest/reference/pipeline_spec/

>>> spec = '''{
...     "pipeline": {
...         "name": "foobar"
...     },
...     "description": "A pipeline that performs image edge detection by using the OpenCV library.",
...     "input": {
...         "pfs": {
...         "glob": "/*",
...         "repo": "images"
...         }
...     },
...     "transform": {
...         "cmd": [ "python3", "/edges.py" ],
...         "image": "pachyderm/opencv"
...     }
... }'''
>>> req = python_pachyderm.parse_dict_pipeline_spec(json.loads(spec))
>>> client.create_pipeline_from_request(req)
parse_json_pipeline_spec(j)[source]

Parses a string of JSON into a CreatePipelineRequest protobuf.

Parameters
jstr

Pipeline spec as a JSON-like string.

Returns
pps_pb2.CreatePipelineRequest

A protobuf object that contains the spec info necessary to create a pipeline.

Examples

Useful for going from Pachyderm spec to creating a pipeline. Pachyderm spec: https://docs.pachyderm.com/latest/reference/pipeline_spec/

>>> spec = '''{
...     "pipeline": {
...         "name": "foobar"
...     },
...     "description": "A pipeline that performs image edge detection by using the OpenCV library.",
...     "input": {
...         "pfs": {
...         "glob": "/*",
...         "repo": "images"
...         }
...     },
...     "transform": {
...         "cmd": [ "python3", "/edges.py" ],
...         "image": "pachyderm/opencv"
...     }
... }'''
>>> req = python_pachyderm.parse_json_pipeline_spec(spec)
>>> client.create_pipeline_from_request(req)
put_files(client, source_path, commit, dest_path, **kwargs)[source]

Utility function for inserting files from the local source_path into Pachyderm. Roughly equivalent to pachctl put file [-r].

Parameters
clientClient

A python_pachyderm client instance.

source_pathstr

The file/directory to recursively insert content from.

commitSubcommitType

The open commit to add files to.

dest_pathstr

The destination path in PFS.

**kwargsdict

Keyword arguments to forward. See ModifyFileClient.put_file_from_filepath() for more details.

Examples

>>> source_dir = "data/training/"
>>> with client.commit("repo_name", "master") as commit:
>>>     python_pachyderm.put_files(client, source_dir, commit, "/training_set/")
...
>>> with client.commit("repo_name", "master") as commit2:
>>>     python_pachyderm.put_files(client, "metadata/params.csv", commit2, "/hyperparams.csv")
>>>     python_pachyderm.put_files(client, "spec.json", commit2, "/spec.json")

Experimental Module