python_pachyderm
Mixins
- Information
- python_pachyderm.mixin.admin
- python_pachyderm.mixin.auth
- python_pachyderm.mixin.debug
- python_pachyderm.mixin.enterprise
- python_pachyderm.mixin.health
- python_pachyderm.mixin.pfs
- python_pachyderm.mixin.pps
- python_pachyderm.mixin.transaction
- python_pachyderm.mixin.util
- python_pachyderm.mixin.version
Client
- class python_pachyderm.client.Client(host=None, port=None, auth_token=None, root_certs=None, transaction_id=None, tls=None)[source]
Bases:
python_pachyderm.mixin.admin.AdminMixin
,python_pachyderm.mixin.auth.AuthMixin
,python_pachyderm.mixin.debug.DebugMixin
,python_pachyderm.mixin.enterprise.EnterpriseMixin
,python_pachyderm.mixin.health.HealthMixin
,python_pachyderm.mixin.pfs.PFSMixin
,python_pachyderm.mixin.pps.PPSMixin
,python_pachyderm.mixin.transaction.TransactionMixin
,python_pachyderm.mixin.version.VersionMixin
,object
- Attributes
- auth_token
- transaction_id
Methods
activate_auth
(subject[, github_token, ...])Activates auth, creating an initial set of admins.
activate_enterprise
(activation_code[, expires])Activates enterprise.
authenticate_github
(github_token)Authenticates a GitHub user to the Pachyderm cluster.
authenticate_id_token
(id_token)Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.
authenticate_oidc
(oidc_state)Authenticates a user to the Pachyderm cluster via OIDC.
authenticate_one_time_password
(one_time_password)Authenticates a user to the Pachyderm cluster using a one-time password.
authorize
(repo, scope)Authorizes the user to a given repo/scope.
batch_transaction
(requests)Executes a batch transaction.
binary
([filter])Gets the pachd binary.
commit
(repo_name[, branch, parent, description])A context manager for running operations within a commit.
copy_file
(source_commit, source_path, ...[, ...])Efficiently copies files already in PFS.
create_branch
(repo_name, branch_name[, ...])Creates a new branch.
create_pipeline
(pipeline_name, transform[, ...])Creates a pipeline.
create_pipeline_from_request
(req)Creates a pipeline from a
CreatePipelineRequest
object.create_repo
(repo_name[, description, update])Creates a new
Repo
object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a singleRepo
for an entire project you might have separate ``Repo``s for logs, metrics, database dumps etc.create_secret
(secret_name, data[, labels, ...])Creates a new secret.
create_tf_job_pipeline
(pipeline_name, tf_job)Creates a pipeline.
create_tmp_file_set
()Creates a temporary fileset (used internally).
deactivate_auth
()Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.
deactivate_enterprise
()Deactivates enterprise.
delete_all
()Deletes everything in Pachyderm.
delete_all_pipelines
([force])Deletes all pipelines.
delete_all_repos
([force])Deletes all repos.
delete_all_transactions
()Deletes all transactions.
delete_branch
(repo_name, branch_name[, force])Deletes a branch, but leaves the commits themselves intact.
delete_commit
(commit)Deletes a commit.
delete_file
(commit, path)Deletes a file from a Commit.
delete_job
(job_id)Deletes a job by its ID.
delete_pipeline
(pipeline_name[, force, ...])Deletes a pipeline.
delete_repo
(repo_name[, force, ...])Deletes a repo and reclaims the storage space it was using.
delete_secret
(secret_name)Deletes a secret.
delete_transaction
(transaction)Deletes a given transaction.
diff_file
(new_commit, new_path[, ...])Diffs two files.
dump
([filter, limit])Gets a debug dump.
extend_auth_token
(token, ttl)Extends an existing auth token.
extract
([url, no_objects, no_repos, ...])Extracts cluster data for backup.
extract_auth_tokens
()This maps to an internal function that is only used for migration.
extract_pipeline
(pipeline_name)Extracts a pipeline for backup.
finish_commit
(commit[, description, ...])Ends the process of committing data to a Repo and persists the Commit.
finish_transaction
(transaction)Finishes a given transaction.
flush_commit
(commits[, repos])Blocks until all of the commits which have a set of commits as provenance have finished.
flush_job
(commits[, pipeline_names])Blocks until all of the jobs which have a set of commits as provenance have finished.
fsck
([fix])Performs a file system consistency check for PFS.
garbage_collect
([memory_bytes])Runs garbage collection.
get_acl
(repo)Gets the ACL of a repo.
get_activation_code
()Returns the enterprise code used to activate Pachdyerm Enterprise in this cluster.
get_admins
()Returns a list of strings specifying the cluster admins.
get_auth_configuration
()Gets the auth configuration.
get_auth_token
(subject[, ttl])Gets an auth token for a subject.
get_cluster_role_bindings
()Returns the current set of cluster role bindings.
get_enterprise_state
()Gets the current enterprise state of the cluster.
get_file
(commit, path[, offset_bytes, ...])Returns a
PFSFile
object, containing the contents of a file stored in PFS.get_groups
([username])Gets which groups the given username belongs to.
get_job_logs
(job_id[, data_filters, datum, ...])Gets logs for a job.
get_oidc_login
()Returns the OIDC login configuration.
get_one_time_password
([subject, ttl])If this
Client
is authenticated as an admin, you can generate a one-time password for any given subject.get_pipeline_logs
(pipeline_name[, ...])Gets logs for a pipeline.
get_remote_version
()Gets version of Pachyderm server.
get_scope
(username, repos)Gets the auth scope.
get_users
(group)Gets which users below to the given.
glob_file
(commit, pattern)Lists files that match a glob pattern.
health
()Returns a health check indicating if the server can handle RPCs.
inspect_branch
(repo_name, branch_name)Inspects a branch.
inspect_cluster
()Inspects a cluster.
inspect_commit
(commit[, block_state])Inspects a commit.
inspect_datum
(job_id, datum_id)Inspects a datum.
inspect_file
(commit, path)Inspects a file.
inspect_job
(job_id[, block_state, ...])Inspects a job with a given ID.
inspect_pipeline
(pipeline_name[, history])inspect_repo
(repo_name)Returns info about a specific repo.
inspect_secret
(secret_name)Inspects a secret.
inspect_transaction
(transaction)Inspects a given transaction.
list_branch
(repo_name[, reverse])Lists the active branch objects on a repo.
list_commit
(repo_name[, to_commit, ...])Lists commits.
list_datum
([job_id, page_size, page, input, ...])Lists datums.
list_file
(commit, path[, history, ...])list_job
([pipeline_name, input_commit, ...])list_pipeline
([history, allow_incomplete, ...])list_repo
()Returns info about all repos, as a list of
RepoInfo
objects.list_secret
()Lists secrets.
list_transaction
()Lists transactions.
modify_admins
([add, remove])Adds and/or removes admins.
modify_cluster_role_binding
(principal[, roles])Sets the list of admin roles for a principal.
modify_members
(group[, add, remove])Adds and/or removes members of a group.
new_from_config
([config_file])Creates a Pachyderm client from a config file, which can either be passed in as a file-like object, or if unset, checks the PACH_CONFIG env var for a path.
new_from_pachd_address
(pachd_address[, ...])Creates a Pachyderm client from a given pachd address.
new_in_cluster
([auth_token, transaction_id])Creates a Pachyderm client that operates within a Pachyderm cluster.
profile_cpu
(duration[, filter])Gets a CPU profile.
put_file_bytes
(commit, path, value[, ...])Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.
put_file_client
()A context manager that gives a
PutFileClient
.put_file_url
(commit, path, url[, delimiter, ...])Puts a file using the content found at a URL.
renew_tmp_file_set
(fileset_id, ttl_seconds)Renews a temporary fileset (used internally).
restart_datum
(job_id[, data_filters])Restarts a datum.
restore
(requests)Restores a cluster.
restore_auth_token
([token])This maps to an internal function that is only used for migration.
revoke_auth_token
(token)Revokes an auth token.
run_cron
(pipeline_name)Explicitly triggers a pipeline with one or more cron inputs to run now.
run_pipeline
(pipeline_name[, provenance, job_id])Runs a pipeline.
set_acl
(repo, entries)Sets the ACL of a repo.
set_auth_configuration
(configuration)Set the auth configuration.
set_groups_for_user
(username, groups)Sets the group membership for a user.
set_scope
(username, repo, scope)Set the auth scope.
start_commit
(repo_name[, branch, parent, ...])Begins the process of committing data to a Repo.
start_pipeline
(pipeline_name)Starts a pipeline.
start_transaction
()Starts a transaction.
stop_job
(job_id)Stops a job by its ID.
stop_pipeline
(pipeline_name)Stops a pipeline.
subscribe_commit
(repo_name, branch[, ...])Yields
CommitInfo
objects as commits occur.transaction
()A context manager for running operations within a transaction.
walk_file
(commit, path)Walks over all descendant files in a directory.
who_am_i
()Returns info about the user tied to this
Client
.- __init__(host=None, port=None, auth_token=None, root_certs=None, transaction_id=None, tls=None)[source]
Creates a Pachyderm client.
- Parameters
- hoststr, optional
The pachd host. Default is ‘localhost’, which is used with
pachctl port-forward
.- portint, optional
The port to connect to. Default is 30650.
- auth_tokenstr, optional
The authentication token. Used if authentication is enabled on the cluster.
- root_certsbytes, optional
The PEM-encoded root certificates as byte string.
- transaction_idstr, optional
The ID of the transaction to run operations on.
- tlsbool, optional
Whether TLS should be used. If root_certs are specified, they are used. Otherwise, we use the certs provided by certifi.
- property auth_token
- classmethod new_from_config(config_file=None)[source]
Creates a Pachyderm client from a config file, which can either be passed in as a file-like object, or if unset, checks the PACH_CONFIG env var for a path. If that’s also unset, it defaults to loading from ‘~/.pachyderm/config.json’.
- Parameters
- config_fileTextIO, optional
A file-like object containing the config json file. If unspecified, we load the config from the default location (‘~/.pachyderm/config.json’).
- Returns
- Client
A python_pachyderm client instance.
- classmethod new_from_pachd_address(pachd_address, auth_token=None, root_certs=None, transaction_id=None)[source]
Creates a Pachyderm client from a given pachd address.
- Parameters
- pachd_addressstr
The address of pachd server
- auth_tokenstr, optional
The authentication token. Used if authentication is enabled on the cluster.
- root_certsbytes, optional
The PEM-encoded root certificates as byte string. If unspecified, this will load default certs from certifi.
- transaction_idstr, optional
The ID of the transaction to run operations on.
- Returns
- Client
A python_pachyderm client instance.
- classmethod new_in_cluster(auth_token=None, transaction_id=None)[source]
Creates a Pachyderm client that operates within a Pachyderm cluster.
- Parameters
- auth_tokenstr, optional
The authentication token. Used if authentication is enabled on the cluster.
- transaction_idstr, optional
The ID of the transaction to run operations on.
- Returns
- Client
A python_pachyderm client instance.
- property transaction_id
Spout
- class python_pachyderm.spout.SpoutCommit(pipe, marker_filename=None)[source]
Represents a commit on a spout, permitting the addition of files.
Methods
close
()Closes the commit
put_file_from_bytes
(path, bytes)Adds a file to the spout from a bytestring.
put_file_from_fileobj
(path, size, fileobj)Adds a file to the spout from a file-like object.
put_marker_from_bytes
(bytes)Adds to the marker from a bytestring.
put_marker_from_fileobj
(size, fileobj)Writes to the marker file from a file-like object.
- put_file_from_bytes(path, bytes)[source]
Adds a file to the spout from a bytestring.
- Parameters
- pathstr
The path to the file in the spout.
- bytesbytes
The bytestring representing the file contents.
- put_file_from_fileobj(path, size, fileobj)[source]
Adds a file to the spout from a file-like object.
- Parameters
- pathstr
The path to the file in the spout.
- sizeint
The size of the file.
- fileobjBinaryIO
The file-like object to add.
- class python_pachyderm.spout.SpoutManager(marker_filename=None, pfs_directory='/pfs')[source]
A convenience context manager for creating spouts.
Examples
>>> spout = SpoutManager() >>> while True: >>> with spout.commit() as commit: >>> commit.put_file_from_bytes("foo", b"#") >>> time.sleep(1.0)
Methods
close
()Closes the
SpoutManager
commit
()Opens a commit on the spout.
marker
()Gets the marker file as a context manager.
- __init__(marker_filename=None, pfs_directory='/pfs')[source]
Creates a new spout manager.
- Parameters
- marker_filenamestr, optional
The name of the file for storing markers. If unspecified, marker-related operations will fail.
- pfs_directorystr, optional
The directory for PFS content. Usually this shouldn’t be explicitly specified, unless the spout manager is being tested outside of a real Pachyderm pipeline.
- close()[source]
Closes the
SpoutManager
Util Helper
- python_pachyderm.util.create_python_pipeline(client, path, input=None, pipeline_name=None, image_pull_secrets=None, debug=None, env=None, secrets=None, image=None, update=False, **pipeline_kwargs)[source]
Utility function for creating (or updating) a pipeline specially built for executing python code that is stored locally at path.
A normal pipeline creation process requires you to first build and push a container image with the source and dependencies baked in. As an alternative process, this function circumvents container image creation by using build step-enabled pipelines. See the pachyderm core docs for more info.
If path references a directory, it should have:
A
main.py
, as the pipeline entry-point.An optional
requirements.txt
that specifies pip requirements.
- Parameters
- clientClient
The Client instance to use.
- pathstr
The directory containing the python pipeline source, or an individual python file.
- inputInput protobuf, optional
An
Input
object specifying the pipeline input.- pipeline_namestr, optional
A string specifying the pipeline name. Defaults to using the last directory name in path.
- image_pull_secretsList[str], optional
A list of strings specifying the pipeline transform’s image pull secrets, which are used for pulling images from a private registry. Defaults to None, in which case the public docker registry will be used. See the pipeline spec document for more details.
- debugbool, optional
Specifies whether debug logging should be enabled for the pipeline. Defaults to False.
- envDict[str, str], optional
A mapping of string keys to string values specifying custom environment variables.
- secretsList[Secret protobufs], optional
A list of Secret objects for secret environment variables.
- imagestr, optional
A string specifying the docker image to use for the pipeline. Defaults to using pachyderm’s official python language builder.
- updatebool, optional
Whether to act as an upsert.
- **pipeline_kwargsdict
Keyword arguments to forward to create_pipeline.
- python_pachyderm.util.parse_dict_pipeline_spec(d)[source]
Parses a dict of serialized JSON into a CreatePipelineRequest protobuf.
- python_pachyderm.util.parse_json_pipeline_spec(j)[source]
Parses a string of JSON into a CreatePipelineRequest protobuf.
- python_pachyderm.util.put_files(client, source_path, commit, dest_path, **kwargs)[source]
Utility function for inserting files from the local source_path to Pachyderm. Roughly equivalent to
pachctl put file [-r]
.- Parameters
- clientClient
The
Client
instance to use.- source_pathstr
The file/directory to recursively insert content from.
- commitUnion[tuple, str, Commit protobuf]
The
Commit
object to use for inserting files.- dest_pathstr
The destination path in PFS.
- **kwargsdict
Keyword arguments to forward. See
PutFileClient.put_file_from_fileobj()
for details.