Information
Exposes a mixin for each pachyderm service. These mixins should not be used
directly; instead, you should use python_pachyderm.Client()
. The mixins
exist exclusively in order to provide better code organization (because we have
several mixins, rather than one giant Client
class.)
python_pachyderm.mixin.admin
python_pachyderm.mixin.auth
- class AuthMixin[source]
A mixin for auth-related functionality.
Methods
activate_auth
([root_token])Activates auth on the cluster.
authenticate_id_token
(id_token)Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.
authenticate_oidc
(oidc_state)Authenticates a user to the Pachyderm cluster via OIDC.
authorize
(resource[, permissions])Tests a list of permissions that the user might have on a resource.
Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.
This maps to an internal function that is only used for migration.
Gets the auth configuration.
Gets a list of groups this user belongs to.
Gets the OIDC login configuration.
get_robot_token
(robot[, ttl])Gets a new auth token for a robot user.
get_role_binding
(resource)Returns the current set of role bindings to the resource specified.
get_roles_for_permission
(permission)Returns a list of all roles that have the specified permission.
get_users
(group)Gets users in a group.
modify_members
(group[, add, remove])Adds and/or removes members of a group.
modify_role_binding
(resource, principal[, roles])Sets the roles for a given principal on a resource.
restore_auth_token
([token])This maps to an internal function that is only used for migration.
revoke_auth_token
(token)Revokes an auth token.
set_auth_configuration
(configuration)Sets the auth configuration.
set_groups_for_user
(username, groups)Sets the group membership for a user.
who_am_i
()Returns info about the user tied to this Client.
- activate_auth(root_token=None)[source]
Activates auth on the cluster. Returns the root token, an irrevocable superuser credential that should be stored securely.
- Parameters
- root_tokenstr, optional
If set, the token used as the root user login token. In general, it is safer to not set and let Pachyderm generate one for you.
- Returns
- str
A token used as the root user login token.
- authenticate_id_token(id_token)[source]
Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider. The token must include the Pachyderm client_id in the set of audiences to be valid.
- Parameters
- id_tokenstr
The ID token.
- Returns
- str
A token that can be used for making authenticate requests.
- authenticate_oidc(oidc_state)[source]
Authenticates a user to the Pachyderm cluster via OIDC.
- Parameters
- oidc_statestr
An OIDC state token.
- Returns
- str
A token that can be used for making authenticate requests.
- authorize(resource, permissions=None)[source]
Tests a list of permissions that the user might have on a resource.
- Parameters
- resourceauth_pb2.Resource
The resource the user wants to test on.
- permissionsList[auth_pb2.Permission], optional
The list of permissions the users wants to test.
- Returns
- auth_pb2.AuthorizeResponse
A protobuf object that indicates whether the user/principal had all the inputted permissions on the resource, which permissions the user had, which permissions the user lacked, and the name of the principal.
Examples
>>> client.authorize( ... auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"), ... [auth_pb2.Permission.REPO_READ] ... ) authorized: true satisfied: REPO_READ principal: "pach:root"
- deactivate_auth()[source]
Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.
- Raises
- AuthServiceNotActivated
- extract_auth_tokens()[source]
This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).
- get_auth_configuration()[source]
Gets the auth configuration.
- Returns
- auth_pb2.OIDCConfig
A protobuf object with auth configuration information.
- get_groups()[source]
Gets a list of groups this user belongs to.
- Returns
- List[str]
List of groups the user belongs to.
- get_oidc_login()[source]
Gets the OIDC login configuration.
- Returns
- auth_pb2.GetOIDCLoginResponse
A protobuf object with the login configuration information.
- get_robot_token(robot, ttl=None)[source]
Gets a new auth token for a robot user.
- Parameters
- robotstr
The name of the robot user.
- ttlint, optional
The remaining lifetime of this token in seconds. If unset, token doesn’t expire.
- Returns
- str
The new auth token.
- get_role_binding(resource)[source]
Returns the current set of role bindings to the resource specified.
- Parameters
- resourceauth_pb2.Resource
A protobuf object representing the resource being checked.
- Returns
- Dict[str, auth_pb2.Roles]
A dictionary mapping the principals to the roles they have.
Examples
>>> client.get_role_binding(auth_pb2.Resource(type=auth_pb2.CLUSTER)) { 'robot:someuser': roles { key: "clusterAdmin" value: true }, 'pach:root': roles { key: "clusterAdmin" value: true } } ... >>> client.get_role_binding(auth_pb2.Resource(type=auth_pb2.REPO, name="foobar")) { 'user:person_a': roles { key: "repoWriter" value: true }, 'pach:root': roles { key: "repoOwner" value: true } }
- get_roles_for_permission(permission)[source]
Returns a list of all roles that have the specified permission.
- Parameters
- permissionauth_pb2.Permission
The Permission enum to check for.
- Returns
- List[auth_pb2.Role]
A list of Role protobuf objects that all have the specified permission.
Examples
All available permissions can be found in auth proto Permissions enum
>>> roles = client.get_roles_for_permission(auth_pb2.Permission.REPO_READ)
- get_users(group)[source]
Gets users in a group.
- Parameters
- groupstr
The group to list users from.
- Returns
- List[str]
All the users in the specified group.
- modify_members(group, add=None, remove=None)[source]
Adds and/or removes members of a group.
- Parameters
- groupstr
The group to modify.
- addList[str], optional
A list of strings specifying members to add.
- removeList[str], optional
A list of strings specifying members to remove.
Examples
>>> client.modify_members( ... "foogroup", ... add=["user:otheruser"], ... remove=["user:someuser"] ... )
- modify_role_binding(resource, principal, roles=None)[source]
Sets the roles for a given principal on a resource.
- Parameters
- resourceauth_pb2.Resource
A protobuf object representing the resource to grant the roles on.
- principalstr
The principal to grant the roles for.
- rolesList[str], optional
The absolute list of roles for the principtal to have. If roles is unset, the principal will have no roles.
Examples
You can find some of the built-in roles here: https://github.com/pachyderm/pachyderm/blob/master/src/auth/auth.go#L27
>>> client.modify_role_binding( ... auth_pb2.Resource(type=auth_pb2.CLUSTER), ... "user:someuser", ... roles=["clusterAdmin"] ... ) >>> client.modify_role_binding( ... auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"), ... "user:someuser", ... roles=["repoWriter"] ... )
- restore_auth_token(token=None)[source]
This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).
- revoke_auth_token(token)[source]
Revokes an auth token.
- Parameters
- tokenstr
The Pachyderm token being revoked.
- set_auth_configuration(configuration)[source]
Sets the auth configuration.
- Parameters
- configurationauth_pb2.OIDCConfig
A protobuf object with auth configuration information.
Examples
>>> client.set_auth_configuration(auth_pb2.OIDCConfig( ... issuer="http://localhost:1658", ... client_id="client", ... client_secret="secret", ... redirect_uri="http://test.example.com", ... ))
python_pachyderm.mixin.debug
- class DebugMixin[source]
A mixin for debug-related functionality.
Methods
binary
([filter])Gets the pachd binary.
dump
([system, pipelines, input_repos, timeout])Collect a standard set of debugging information using the DumpV2 API
get_dump_template
([filters])Generate a template request to be used by the DumpV2 API.
profile_cpu
(duration[, filter])Gets a CPU profile.
set_log_level
([pachyderm_level, grpc_level, ...])Sets the logging level of either pachyderm or grpc. Note: Only one level can be set at a time. If you are attempting to set multiple logging levels you must do so with multiple calls.
- binary(filter=None)[source]
Gets the pachd binary.
- Parameters
- filterdebug_pb2.Filter, optional
A protobuf object that filters what info is returned. Is one of pachd bool, pipeline protobuf, or worker protobuf.
- Yields
- bytes
The pachd binary as a sequence of bytearrays.
Examples
>>> for b in client.binary(): >>> print(b)
- dump(system=None, pipelines=None, input_repos=False, timeout=None)[source]
- Collect a standard set of debugging information using the DumpV2 API
rather than the now deprecated Dump API.
- This method is intended to be used in tandem with the
debug.get_dump_v2_template endpoint. However, if no system or pipelines are specified then this call will automatically be performed for the user.
- If no system or pipelines are specified, then debug information for all
systems and pipelines will be returned.
- Parameters
- systemdebug_pb2.System, optional
A protobuf object that filters what info is returned.
- pipelinesList[pps_pb2.Pipeline], optional
A list of pipelines from which to collect debug information.
- input_reposbool
Whether to collect debug information for input repos. Default: False
- timeoutduration_pb2.Duration, optional
Duration until timeout occurs. Default is no timeout.
- Yields
- debug_pb2.DumpChunk
Chunks of the debug dump
Examples
>>> for b in client.dump(): >>> print(b)
- get_dump_template(filters=None)[source]
Generate a template request to be used by the DumpV2 API.
- Parameters
- filtersList[str], optional
No supported filters - this argument has no effect.
- Returns
- debug_pb2.DumpV2Request
The request that can be sent to the DumpV2 API.
- profile_cpu(duration, filter=None)[source]
Gets a CPU profile.
- Parameters
- durationduration_pb2.Duration
A google protobuf duration object indicating how long the profile should run for.
- filterdebug_pb2.Filter, optional
A protobuf object that filters what info is returned. Is one of pachd bool, pipeline protobuf, or worker protobuf.
- Yields
- bytes
The cpu profile as a sequence of bytearrays.
Examples
>>> for b in client.profile_cpu(duration_pb2.Duration(seconds=1)): >>> print(b)
- set_log_level(pachyderm_level=None, grpc_level=None, duration=None, recurse=True)[source]
Sets the logging level of either pachyderm or grpc. Note: Only one level can be set at a time. If you are attempting to set
multiple logging levels you must do so with multiple calls.
- Parameters
- pachyderm_level: debug_pb2.SetLogLevelRequest.LogLevel, oneof
The desired pachyderm logging level.
- grpc_level: debug_pb2.SetLogLevelRequest.LogLevel, oneof
The desired grpc logging level.
- duration: duration_pb2.Duration, optional
How long to log at the non-default level. (default 5m0s)
- recurse: bool
Set the log level on all pachyderm pods; if false, only the pachd that handles this RPC. (default true)
python_pachyderm.mixin.enterprise
- class EnterpriseMixin[source]
A mixin for enterprise-related functionality.
Methods
activate_enterprise
(license_server, id, secret)Activates enterprise by registering with a license server.
Deactivates enterprise.
Returns the enterprise code used to activate Pachyderm Enterprise in this cluster.
Gets the current enterprise state of the cluster.
Gets the pause status of the cluster.
Pauses the cluster.
Unpauses the cluster.
- activate_enterprise(license_server, id, secret)[source]
Activates enterprise by registering with a license server.
- Parameters
- license_serverstr
The Pachyderm Enterprise Server to register with.
- idstr
The unique ID for this cluster.
- secretstr
The secret for registering this cluster.
- get_activation_code()[source]
Returns the enterprise code used to activate Pachyderm Enterprise in this cluster.
- Returns
- enterprise_pb2.GetActivationCodeResponse
A protobuf object that returns a state enum, info on the token, and the activation code.
- get_enterprise_state()[source]
Gets the current enterprise state of the cluster.
- Returns
- enterprise_pb2.GetStateResponse
A protobuf object that returns a state enum, info on the token, and an empty activation code.
python_pachyderm.mixin.health
python_pachyderm.mixin.identity
- class IdentityMixin[source]
A mixin for identity-related functionality.
Methods
create_idp_connector
(connector)Create an IDP connector in the identity server.
create_oidc_client
(client)Create an OIDC client in the identity server.
Delete all identity service information.
Delete an IDP connector in the identity server.
Delete an OIDC client in the identity server.
Get the embedded identity server configuration.
Get an IDP connector in the identity server.
get_oidc_client
(id)Get an OIDC client in the identity server.
List IDP connectors in the identity server.
List OIDC clients in the identity server.
set_identity_server_config
(config)Configure the embedded identity server.
update_idp_connector
(connector)Update an IDP connector in the identity server.
update_oidc_client
(client)Update an OIDC client in the identity server.
- create_idp_connector(connector)[source]
Create an IDP connector in the identity server.
- Parameters
- connectoridentity_pb2.IDPConnector
A protobuf object that represents a connection to an identity provider.
- create_oidc_client(client)[source]
Create an OIDC client in the identity server.
- Parameters
- clientidentity_pb2.OIDCClient
A protobuf object representing an OIDC client.
- Returns
- identity_pb2.OIDCClient
A protobuf object that returns a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.
- delete_all_identity()[source]
Delete all identity service information.
- Raises
- AuthServiceNotActivated
- delete_idp_connector(id)[source]
Delete an IDP connector in the identity server.
- Parameters
- idstr
The connector ID.
- delete_oidc_client(id)[source]
Delete an OIDC client in the identity server.
- Parameters
- idstr
The client ID.
- get_identity_server_config()[source]
Get the embedded identity server configuration.
- Returns
- identity_pb2.IdentityServerConfig
A protobuf object that holds configuration info (issuer and ID token expiry) for the identity web server.
- get_idp_connector(id)[source]
Get an IDP connector in the identity server.
- Parameters
- idstr
The connector ID.
- Returns
- identity_pb2.IDPConnector
A protobuf object that returns info on the connector ID, name, type, config version, and configuration of the upstream IDP connector.
- get_oidc_client(id)[source]
Get an OIDC client in the identity server.
- Parameters
- idstr
The client ID.
- Returns
- identity_pb2.OIDCClient
A protobuf object that returns a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.
- list_idp_connectors()[source]
List IDP connectors in the identity server.
- Returns
- List[identity_pb2.IDPConnector]
A list of probotuf objects that return info on the connector ID, name, type, config version, and configuration of the upstream IDP connector.
- list_oidc_clients()[source]
List OIDC clients in the identity server.
- Returns
- List[identity_pb2.OIDCClient]
A list of protobuf objects that return a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.
- set_identity_server_config(config)[source]
Configure the embedded identity server.
- Parameters
- configidentity_pb2.IdentityServerConfig
A protobuf object that is the configuration for the identity web server.
python_pachyderm.mixin.license
- class LicenseMixin[source]
A mixin for license-related functionality.
Methods
activate_license
(activation_code[, expires])Activates the license service.
add_cluster
(id, address[, secret, ...])Register a cluster with the license service.
Remove all clusters and deactivate the license service.
delete_cluster
(id)Delete a cluster registered with the license service.
Gets the enterprise code used to activate the server.
List clusters registered with the license service.
Lists all clusters available to user.
update_cluster
(id, address[, user_address, ...])Update a cluster registered with the license service.
- activate_license(activation_code, expires=None)[source]
Activates the license service.
- Parameters
- activation_codestr
A Pachyderm enterprise activation code. New users can obtain trial activation codes.
- expirestimestamp_pb2.Timestamp, optional
A protobuf object indicating when this activation code will expire. This should generally not be set and is only applied if it is earlier than the signed expiration time of activation_code.
- Returns
- enterprise_pb2.TokenInfo
A protobuf object that has the expiration for the current token. Field will be unset if there is no current token.
- add_cluster(id, address, secret=None, user_address=None, cluster_deployment_id=None, enterprise_server=False)[source]
Register a cluster with the license service.
- Parameters
- idstr
The unique ID to identify the cluster.
- addressstr
The public GRPC address for the license server to reach the cluster.
- secretstr, optional
A shared secret for the cluster to use to authenticate. If not specified, a random secret will be generated and returned.
- user_addressstr, optional
The pachd address for users to connect to.
- cluster_deployment_idstr, optional
The deployment ID value generated by each cluster.
- enterprise_serverbool, optional
Indicates whether the address points to an enterprise server.
- Returns
- license_pb2.AddClusterResponse
A protobuf object that returns the secret.
- delete_all_license()[source]
Remove all clusters and deactivate the license service.
- Raises
- AuthServiceNotActivated
- delete_cluster(id)[source]
Delete a cluster registered with the license service.
- Parameters
- idstr
The unique ID to identify the cluster.
- get_activation_code()[source]
Gets the enterprise code used to activate the server.
- Returns
- license_pb2.GetActivationCodeResponse
A protobuf object that returns a state enum, info on the token, and the activation code.
- list_clusters()[source]
List clusters registered with the license service.
- Returns
- List[license_pb2.ClusterStatus]
A list of protobuf objects that return info on a cluster.
- list_user_clusters()[source]
Lists all clusters available to user.
- Returns
- List[license_pb2.UserClusterInfo]
A list of protobuf objects that return info on clusters available to the users.
- update_cluster(id, address, user_address=None, cluster_deployment_id=None, secret=None)[source]
Update a cluster registered with the license service.
- Parameters
- idstr
The unique ID to identify the cluster.
- addressstr
The public GRPC address for the license server to reach the cluster.
- user_addressstr, optional
The pachd address for users to connect to.
- cluster_deployment_idstr, optional
The deployment ID value generated by each cluster.
- secretstr, optional
A shared secret for the cluster to use to authenticate. If not specified, a random secret will be generated and returned.
python_pachyderm.mixin.pfs
- class ModifyFileClient(commit)[source]
ModifyFileClient
puts or deletes PFS files atomically. ReplacesPutFileClient
from python_pachyderm 6.x.Methods
copy_file
(source_commit, source_path, dest_path)Copy a file.
delete_file
(path[, datum])Deletes a file.
put_file_from_bytes
(path, value[, datum, append])Uploads a PFS file from a bytestring.
put_file_from_fileobj
(path, value[, datum, ...])Uploads a PFS file from a file-like object.
put_file_from_filepath
(pfs_path, local_path)Uploads a PFS file from a local path at a specified path.
put_file_from_url
(path, url[, datum, ...])Uploads a PFS File from the content found at a URL.
- copy_file(source_commit, source_path, dest_path, datum=None, append=False)[source]
Copy a file.
- Parameters
- source_commitSubcommitType
The commit the source file is in.
- source_pathstr
The path to the source file.
- dest_pathstr
The path to the destination file.
- datumstr, optional
A tag for the added file.
- appendbool, optional
If true, appends the content of the source file to the destination file, if it already exists. Otherwise, overwrites the file.
- delete_file(path, datum=None)[source]
Deletes a file.
- Parameters
- pathstr
The path to the file.
- datumstr, optional
A tag that filters the files.
- put_file_from_bytes(path, value, datum=None, append=False)[source]
Uploads a PFS file from a bytestring.
- Parameters
- pathstr
The path in the repo the file will be written to.
- valueBinaryIO
The file-like object.
- datumstr, optional
A tag for the added file.
- appendbool, optional
If true, appends the content of value to the file at path, if it already exists. Otherwise, overwrites the file.
- put_file_from_fileobj(path, value, datum=None, append=False)[source]
Uploads a PFS file from a file-like object.
- Parameters
- pathstr
The path in the repo the file will be written to.
- valueBinaryIO
The file-like object.
- datumstr, optional
A tag for the added file.
- appendbool, optional
If true, appends the content of value to the file at path, if it already exists. Otherwise, overwrites the file.
- put_file_from_filepath(pfs_path, local_path, datum=None, append=False)[source]
Uploads a PFS file from a local path at a specified path. This will lazily open files, which will prevent too many files from being opened, or too much memory being consumed, when atomically putting many files.
- Parameters
- pfs_pathstr
The path in the repo the file will be written to.
- local_pathstr
The local file path.
- datumstr, optional
A tag for the added file.
- appendbool, optional
If true, appends the content of local_path to the file at pfs_path, if it already exists. Otherwise, overwrites the file.
- put_file_from_url(path, url, datum=None, append=False, recursive=False, concurrency=0)[source]
Uploads a PFS File from the content found at a URL. The URL is sent to the server which performs the request.
- Parameters
- pathstr
The path in the repo the file will be written to.
- urlstr
The URL of the file to upload.
- datumstr, optional
A tag for the added file.
- appendbool, optional
If true, appends the content to the file at path, if it already exists. Otherwise, overwrites the file.
- recursivebool, optional
If true, allows for recursive scraping on some types URLs, for example on s3:// URLs
- concurrencyint
The maximum number of threads used to complete the request. Defaults to 50.
- class PFSFile(stream)[source]
File-like objects containing content of a file stored in PFS.
Examples
>>> # client.get_file() returns a PFSFile >>> source_file = client.get_file(("montage", "master"), "/montage.png") >>> with open("montage.png", "wb") as dest_file: >>> shutil.copyfileobj(source_file, dest_file) ... >>> with client.get_file(("montage", "master"), "/montage2.png") as f: >>> content = f.read()
Methods
close
()Closes the
PFSFile
.read
([size])Reads from the
PFSFile
buffer.
- class PFSMixin[source]
A mixin with pfs-related functionality.
Methods
commit
(repo_name, branch[, parent, ...])A context manager for running operations within a commit.
copy_file
(source_commit, source_path, ...[, ...])Efficiently copies files already in PFS.
create_branch
(repo_name, branch_name[, ...])Creates a new branch.
create_project
(project_name[, description, ...])Creates a new project with the given name.
create_repo
(repo_name[, description, ...])Creates a new repo object in PFS with the given name.
Deletes all repos.
delete_branch
(repo_name, branch_name[, ...])Deletes a branch, but leaves the commits themselves intact.
delete_file
(commit, path)Deletes a file from an open commit.
delete_project
(project_name[, force])Deletes a project and reclaims the storage space it was using.
delete_repo
(repo_name[, force, project_name])Deletes a repo and reclaims the storage space it was using.
diff_file
(new_commit, new_path[, ...])Diffs two PFS files (file = commit + path in Pachyderm) and returns files that are different.
drop_commit
(commit_id)Drops an entire commit.
find_commits
(start, file_path[, limit])Searches for commits that reference the specified file being modified in a branch.
finish_commit
(commit[, description, error, ...])Ends the process of committing data to a repo and persists the commit.
fsck
([fix])Performs a file system consistency check on PFS, ensuring the correct provenance relationships are satisfied.
get_file
(commit, path[, datum, URL, offset])Gets a file from PFS.
get_file_tar
(commit, path[, datum, URL, offset])Gets a file from PFS.
glob_file
(commit, pattern[, path_range])Lists files that match a glob pattern.
inspect_branch
(repo_name, branch_name[, ...])Inspects a branch.
inspect_commit
(commit[, commit_state])Inspects a commit.
inspect_file
(commit, path[, datum])Inspects a file.
inspect_project
(project_name)Inspects a project.
inspect_repo
(repo_name[, project_name])Inspects a repo.
list_branch
(repo_name[, reverse, project_name])Lists the active branch objects in a repo.
list_commit
([repo_name, to_commit, ...])Lists commits.
list_file
(commit, path[, datum, ...])Lists the files in a directory.
Lists all projects in PFS.
list_repo
([type, projects_filter])Lists all repos in PFS.
modify_file_client
(commit)A context manager that gives a
ModifyFileClient
.path_exists
(commit, path)Checks whether the path exists in the specified commit, agnostic to whether path is a file or a directory.
put_file_bytes
(commit, path, value[, datum, ...])Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.
put_file_url
(commit, path, url[, recursive, ...])Uploads a PFS file using the content found at a URL.
squash_commit
(commit_id)Squashes a commit into its parent.
start_commit
(repo_name, branch[, parent, ...])Begins the process of committing data to a repo.
subscribe_commit
(repo_name, branch[, ...])Returns all commits on the branch and then listens for new commits that are created.
wait_commit
(commit)Waits for the specified commit to finish.
walk_file
(commit, path[, datum, ...])Walks over all descendant files in a directory.
- commit(repo_name, branch, parent=None, description=None, project_name=None)[source]
A context manager for running operations within a commit.
- Parameters
- repo_namestr
The name of the repo.
- branchstr
A string specifying the branch.
- parentUnion[str, SubcommitType], optional
A commit specifying the parent of the newly created commit. Upon creation, before data is modified, the new commit will appear identical to the parent.
- descriptionstr, optional
A description of the commit.
- project_namestr
The name of the project.
- Yields
- pfs_pb2.Commit
A protobuf object that represents a commit.
Examples
>>> with client.commit("foo", "master") as c: >>> client.delete_file(c, "/dir/delete_me.txt") >>> client.put_file_bytes(c, "/new_file.txt", b"DATA")
- copy_file(source_commit, source_path, dest_commit, dest_path, datum=None, append=False)[source]
Efficiently copies files already in PFS. Note that the destination repo cannot be an output repo, or the copy operation will silently fail.
- Parameters
- source_commitSubcommitType
The subcommit (commit at the repo-level) which holds the source file.
- source_pathstr
The path of the source file.
- dest_commitSubcommitType
The open subcommit (commit at the repo-level) to which to add the file.
- dest_pathstr
The path of the destination file.
- datumstr, optional
A tag for the added file.
- appendbool, optional
If true, appends the content of source_path to the file at dest_path, if it already exists. Otherwise, overwrites the file.
Examples
Destination commit needs to be open still, either from the result of
start_commit()
or within scope ofcommit()
>>> with client.commit("bar", "master") as c: >>> client.copy_file(("foo", "master"), "/src/file.txt", c, "/file.txt")
- create_branch(repo_name, branch_name, head_commit=None, provenance=None, trigger=None, new_commit=False, project_name=None)[source]
Creates a new branch.
- Parameters
- repo_namestr
The name of the repo.
- branch_namestr
The name of the new branch.
- head_commitSubcommitType, optional
A subcommit (commit at repo-level) indicating the head of the new branch.
- provenanceList[pfs_pb2.Branch], optional
A list of branches to establish provenance with this newly created branch.
- triggerpfs_pb2.Trigger, optional
Sets the conditions under which the head of this branch moves.
- new_commitbool, optional
If true and head_commit is specified, uses a different commit ID for head than head_commit.
- project_namestr
The name of the project.
Examples
>>> client.create_branch( ... "bar", ... "master", ... provenance=[ ... pfs_pb2.Branch( ... repo=pfs_pb2.Repo(name="foo", type="user"), name="master" ... ) ... ] ... )
- create_project(project_name, description=None, update=False)[source]
Creates a new project with the given name.
- Parameters
- project_namestr
Name of the project.
- descriptionstr, optional
Description of the project.
- updatebool, optional
Whether to update if the project already exists.
- create_repo(repo_name, description=None, update=False, project_name=None)[source]
Creates a new repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single repo for an entire project you might have separate repos for logs, metrics, database dumps etc.
- Parameters
- repo_namestr
Name of the repo.
- descriptionstr, optional
Description of the repo.
- updatebool, optional
Whether to update if the repo already exists.
- project_namestr
The name of the project.
- delete_branch(repo_name, branch_name, force=False, project_name=None)[source]
Deletes a branch, but leaves the commits themselves intact. In other words, those commits can still be accessed via commit IDs and other branches they happen to be on.
- Parameters
- repo_namestr
The name of the repo.
- branch_namestr
The name of the branch.
- forcebool, optional
If true, forces the branch deletion.
- project_namestr
The name of the project.
- delete_file(commit, path)[source]
Deletes a file from an open commit. This leaves a tombstone in the commit, assuming the file isn’t written to later while the commit is still open. Attempting to get the file from the finished commit will result in a not found error. The file will of course remain intact in the commit’s parent.
- Parameters
- commitSubcommitType
The open subcommit (commit at the repo-level) to delete a file from.
- pathstr
The path to the file.
Examples
Commit needs to be open still, either from the result of
start_commit()
or within scope ofcommit()
>>> with client.commit("bar", "master") as c: >>> client.delete_file(c, "/delete_me.txt")
- delete_project(project_name, force=False)[source]
Deletes a project and reclaims the storage space it was using.
- Parameters
- project_namestr
The name of the project.
- forcebool, optional
If set to true, the repo will be removed regardless of errors. Use with care.
- delete_repo(repo_name, force=False, project_name=None)[source]
Deletes a repo and reclaims the storage space it was using.
- Parameters
- repo_namestr
The name of the repo.
- forcebool, optional
If set to true, the repo will be removed regardless of errors. Use with care.
- project_namestr
The name of the project.
- diff_file(new_commit, new_path, old_commit=None, old_path=None, shallow=False)[source]
Diffs two PFS files (file = commit + path in Pachyderm) and returns files that are different. Similar to
git diff
.If old_commit or old_path are not specified, old_commit will be set to the parent of new_commit and old_path will be set to new_path.
- Parameters
- new_commitSubcommitType
The newer subcommit (commit at the repo-level).
- new_pathstr
The path in new_commit to compare with.
- old_commitSubcommitType, optional
The older subcommit (commit at the repo-level).
- old_pathstr, optional
The path in old_commit to compare with.
- shallowbool, optional
Unused.
- Returns
- Iterator[pfs_pb2.DiffFileResponse]
An iterator of protobuf objects that contain info on files whose content has changed between commits. If a file under one of the paths is only in one commit, than the
DiffFileResponse
for it will only have oneFileInfo
set.
Examples
>>> # Compare files >>> res = client.diff_file( ... ("foo", "master"), ... "/a/file.txt", ... ("foo", "master~2"), ... "/a/file.txt" ... ) >>> diff = next(res) ... >>> # Compare files in directories >>> res = client.diff_file( ... ("foo", "master"), ... "/a/", ... ("foo", "master~2"), ... "/a/" ... ) >>> diff = next(res)
- drop_commit(commit_id)[source]
Drops an entire commit.
- Parameters
- commit_idstr
The ID of the commit.
- find_commits(start, file_path, limit=0)[source]
Searches for commits that reference the specified file being modified in a branch.
- Parameters
- startSubcommitType
The commit where the search should begin.
- file_pathstr
The path to the file being queried.
- limit: int, optional
The number of matching commits to return. (default no limit)
- finish_commit(commit, description=None, error=None, force=False)[source]
Ends the process of committing data to a repo and persists the commit. Once a commit is finished the data becomes immutable and future attempts to write to it with ModifyFile will error.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) object to close.
- descriptionstr, optional
A description of the commit. It will overwrite the description set in
start_commit()
.- errorstr, optional
If set, a message that errors out the commit. Don’t use unless you want the finish commit request to fail.
- forcebool, optional
If true, forces commit to finish, even if it breaks provenance.
Examples
Commit needs to be open still, either from the result of
start_commit()
or within scope ofcommit()
>>> client.start_commit("foo", "master") >>> # modify open commit >>> client.finish_commit(("foo", "master")) ... >>> # same as above >>> c = client.start_commit("foo", "master") >>> # modify open commit >>> client.finish_commit(c)
- fsck(fix=False)[source]
Performs a file system consistency check on PFS, ensuring the correct provenance relationships are satisfied.
- Parameters
- fixbool, optional
If true, attempts to fix as many problems as possible.
- Returns
- Iterator[pfs_pb2.FsckResponse]
An iterator of protobuf objects that contain info on either what error was encountered (and was unable to be fixed, if fix is set to
True
) or a fix message (if fix is set toTrue
).
Examples
>>> for action in client.fsck(True): >>> print(action)
- get_file(commit, path, datum=None, URL=None, offset=0)[source]
Gets a file from PFS.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) to get the file from.
- pathstr
The path of the file.
- datumstr, optional
A tag that filters the files.
- URLstr, optional
Specifies an object storage URL that the file will be uploaded to.
- offsetint, optional
Allows file read to begin at offset number of bytes.
- Returns
- PFSFile
The contents of the file in a file-like object.
- get_file_tar(commit, path, datum=None, URL=None, offset=0)[source]
Gets a file from PFS.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) to get the file from.
- pathstr
The path of the file.
- datumstr, optional
A tag that filters the files.
- URLstr, optional
Specifies an object storage URL that the file will be uploaded to.
- offsetint, optional
Allows file read to begin at offset number of bytes.
- Returns
- PFSFile
The contents of the file in a file-like object.
- glob_file(commit, pattern, path_range=None)[source]
Lists files that match a glob pattern.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) to query against.
- patternstr
A glob pattern.
- Returns
- Iterator[pfs_pb2.FileInfo]
An iterator of protobuf objects that contain info on files.
Examples
>>> files = list(client.glob_file(("foo", "master"), "/*.txt"))
- inspect_branch(repo_name, branch_name, project_name=None)[source]
Inspects a branch.
- Parameters
- repo_namestr
The name of the repo.
- branch_namestr
The name of the branch.
- project_namestr
The name of the project.
- Returns
- pfs_pb2.BranchInfo
A protobuf object with info on a branch.
- inspect_commit(commit, commit_state=1)[source]
Inspects a commit.
- Parameters
- commitUnion[str, SubcommitType]
The commit to inspect. Can either be a commit ID or a commit object that represents a subcommit (commit at the repo-level).
- commit_state{pfs_pb2.CommitState.STARTED, pfs_pb2.CommitState.READY, pfs_pb2.CommitState.FINISHING, pfs_pb2.CommitState.FINISHED}, optional
An enum that causes the method to block until the commit is in the specified state. (Default value =
pfs_pb2.CommitState.STARTED
)
- Returns
- Iterator[pfs_pb2.CommitInfo]
An iterator of protobuf objects that contain info on a subcommit (commit at the repo-level).
Examples
>>> # commit at repo-level >>> list(client.inspect_commit(("foo", "master~2"))) ... >>> # an entire commit >>> for commit in client.inspect_commit("467c580611234cdb8cc9758c7aa96087", pfs_pb2.CommitState.FINISHED) >>> print(commit)
- inspect_file(commit, path, datum=None)[source]
Inspects a file.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) to inspect the file from.
- pathstr
The path of the file.
- datumstr, optional
A tag that filters the files.
- Returns
- pfs_pb2.FileInfo
A protobuf object that contains info on a file.
- inspect_project(project_name)[source]
Inspects a project.
- Parameters
- project_namestr
Name of the project.
- Returns
- pfs_proto.ProjectInfo
A protobuf object with info on the project.
- inspect_repo(repo_name, project_name=None)[source]
Inspects a repo.
- Parameters
- repo_namestr
Name of the repo.
- project_namestr
The name of the project.
- Returns
- pfs_pb2.RepoInfo
A protobuf object with info on the repo.
- list_branch(repo_name, reverse=False, project_name=None)[source]
Lists the active branch objects in a repo.
- Parameters
- repo_namestr
The name of the repo.
- reversebool, optional
If true, returns branches oldest to newest.
- project_namestr
The name of the project.
- Returns
- Iterator[pfs_pb2.BranchInfo]
An iterator of protobuf objects that contain info on a branch.
Examples
>>> branches = list(client.list_branch("foo"))
- list_commit(repo_name=None, to_commit=None, from_commit=None, number=None, reverse=False, all=False, origin_kind=1, started_time=None, project_name=None)[source]
Lists commits.
- Parameters
- repo_namestr, optional
The name of a repo. If set, returns subcommits (commit at repo-level) only in this repo.
- to_commitSubcommitType, optional
A subcommit (commit at repo-level) that only impacts results if repo_name is specified. If set, only the ancestors of to_commit, including to_commit, are returned.
- from_commitSubcommitType, optional
A subcommit (commit at repo-level) that only impacts results if repo_name is specified. If set, only the descendants of from_commit, including from_commit, are returned.
- numberint, optional
The number of subcommits to return. If unset, all subcommits that matched the aforementioned criteria are returned. Only impacts results if repo_name is specified.
- reversebool, optional
If true, returns the subcommits oldest to newest. Only impacts results if repo_name is specified.
- allbool, optional
If true, returns all types of subcommits. Otherwise, alias subcommits are excluded. Only impacts results if repo_name is specified.
- origin_kind{pfs_pb2.OriginKind.USER, pfs_pb2.OriginKind.AUTO, pfs_pb2.OriginKind.FSCK, pfs_pb2.OriginKind.ALIAS}, optional
An enum that specifies how a subcommit originated. Returns only subcommits of this enum type. Only impacts results if repo_name is specified.
- started_timedatetime
- project_namestr, optional
The name of the project containing the repo.
- Returns
- Union[Iterator[pfs_pb2.CommitInfo], Iterator[pfs_pb2.CommitSetInfo]]
An iterator of protobuf objects that either contain info on a subcommit (commit at the repo-level), if repo_name was specified, or a commit, if repo_name wasn’t specified.
Examples
>>> # all commits at repo-level >>> for c in client.list_commit("foo"): >>> print(c) ... >>> # all commits >>> commits = list(client.list_commit())
- list_file(commit, path, datum=None, pagination_marker=None, number=None, reverse=False)[source]
Lists the files in a directory.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) to list files from.
- pathstr
The path to the directory.
- datumstr, optional
A tag that filters the files.
- pagination_marker:
Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.
- numberint, optional
Number of files to return
- reversebool, optional
If true, return files in reverse order
- Returns
- Iterator[pfs_pb2.FileInfo]
An iterator of protobuf objects that contain info on files.
Examples
>>> files = list(client.list_file(("foo", "master"), "/dir/subdir/"))
- list_project()[source]
Lists all projects in PFS.
- Returns
- Iterator[pfs_proto.ProjectInfo]
An iterator of protobuf objects that contain info on a project.
- list_repo(type='user', projects_filter=None)[source]
Lists all repos in PFS.
- Parameters
- typestr, optional
The type of repos that should be returned (“user”, “meta”, “spec”). If unset, returns all types of repos.
- projects_filter[Project], optional
Filters out repos that do not belong in the list, while no projects means to list all repos
- Returns
- Iterator[pfs_pb2.RepoInfo]
An iterator of protobuf objects that contain info on a repo.
- modify_file_client(commit)[source]
A context manager that gives a
ModifyFileClient
. When the context manager exits, any operations enqueued from theModifyFileClient
are executed in a single, atomic ModifyFile gRPC call.- Parameters
- commitUnion[tuple, dict, Commit, pfs_pb2.Commit]
A subcommit (commit at the repo-level) to modify. If this subcommit is opened before
modify_file_client()
is called, it will remain open after. Ifmodify_file_client()
opens the subcommit, it will close when exiting thewith
scope.
- Yields
- ModifyFileClient
An object that can queue operations to modify a commit atomically.
Examples
On an open subcommit:
>>> c = client.start_commit("foo", "master") >>> with client.modify_file_client(c) as mfc: >>> mfc.delete_file("/delete_me.txt") >>> mfc.put_file_from_url( ... "/new_file.txt", ... "https://example.com/data/train/input.txt" ... ) >>> client.finish_commit(c)
Opening a subcommit:
>>> with client.modify_file_client(("foo", "master")) as mfc: >>> mfc.delete_file("/delete_me.txt") >>> mfc.put_file_from_url( ... "/new_file.txt", ... "https://example.com/data/train/input.txt" ... )
- path_exists(commit, path)[source]
Checks whether the path exists in the specified commit, agnostic to whether path is a file or a directory.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) to check in.
- pathstr
The file or directory path in commit.
- Returns
- bool
Returns
True
if path exists in commit. Otherwise, returnsFalse
.
- put_file_bytes(commit, path, value, datum=None, append=False)[source]
Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.
- Parameters
- commitSubcommitType
An open subcommit (commit at the repo-level) to modify.
- pathstr
The path in the repo the file(s) will be written to.
- valueUnion[bytes, BinaryIO]
The file contents as bytes, represented as a file-like object, bytestring, or iterator of bystrings.
- datumstr, optional
A tag for the added file(s).
- appendbool, optional
If true, appends the data to the file(s) specified at path, if they already exist. Otherwise, overwrites them.
Examples
Commit needs to be open still, either from the result of
start_commit()
or within scope ofcommit()
>>> with client.commit("foo", "master") as c: >>> client.put_file_bytes(c, "/file.txt", b"SOME BYTES")
- put_file_url(commit, path, url, recursive=False, datum=None, append=False, concurrency=0)[source]
Uploads a PFS file using the content found at a URL. The URL is sent to the server which performs the request.
- Parameters
- commitSubcommitType
An open subcommit (commit at the repo-level) to modify.
- pathstr
The path in the repo the file(s) will be written to.
- urlstr
The URL of the file to put.
- recursivebool, optional
If true, allows for recursive scraping on some types URLs, for example on s3:// URLs
- datumstr, optional
A tag for the added file(s).
- appendbool, optional
If true, appends the data to the file(s) specified at path, if they already exist. Otherwise, overwrites them.
- concurrencyint
The maximum number of threads used to complete the request. Defaults to 50.
Examples
Commit needs to be open still, either from the result of
start_commit()
or within scope ofcommit()
>>> with client.commit("foo", "master") as c: >>> client.put_file_url( ... c, ... "/file.txt", ... "https://example.com/data/train/input.txt" ... )
- squash_commit(commit_id)[source]
Squashes a commit into its parent.
- Parameters
- commit_idstr
The ID of the commit.
- start_commit(repo_name, branch, parent=None, description=None, project_name=None)[source]
Begins the process of committing data to a repo. Once started you can write to the commit with ModifyFile. When all the data has been written, you must finish the commit with FinishCommit. NOTE: data is not persisted until FinishCommit is called.
- Parameters
- repo_namestr
The name of the repo.
- branch_namestr
A string specifying the branch.
- parentUnion[str, SubcommitType], optional
A commit specifying the parent of the newly created commit. Upon creation, before data is modified, the new commit will appear identical to the parent.
- descriptionstr, optional
A description of the commit.
- project_namestr
The name of the project.
- Returns
- pfs_pb2.Commit
A protobuf object that represents an open subcommit (commit at the repo-level).
Examples
>>> c = client.start_commit("foo", "master", ("foo", "staging"))
- subscribe_commit(repo_name, branch, from_commit=None, state=1, all=False, origin_kind=1, project_name=None)[source]
Returns all commits on the branch and then listens for new commits that are created.
- Parameters
- repo_namestr
The name of the repo.
- branchstr
The name of the branch.
- from_commitUnion[str, SubcommitType], optional
Return commits only from this commit and onwards. Can either be an entire commit or a subcommit (commit at the repo-level).
- state{pfs_pb2.CommitState.STARTED, pfs_pb2.CommitState.READY, pfs_pb2.CommitState.FINISHING, pfs_pb2.CommitState.FINISHED}, optional
Return commits only when they’re at least in the specifed enum state. (Default value =
pfs_pb2.CommitState.STARTED
)- allbool, optional
If true, returns all types of commits. Otherwise, alias commits are excluded.
- origin_kind{pfs_pb2.OriginKind.USER, pfs_pb2.OriginKind.AUTO, pfs_pb2.OriginKind.FSCK, pfs_pb2.OriginKind.ALIAS}, optional
An enum that specifies how a commit originated. Returns only commits of this enum type. (Default value =
pfs_pb2.OriginKind.USER
)- project_namestr
The name of the project.
- Returns
- Iterator[pfs_pb2.CommitInfo]
An iterator of protobuf objects that contain info on subcommits (commits at the repo-level). Use
next()
to iterate through as the returned stream is potentially endless. Might block your code otherwise.
Examples
>>> commits = client.subscribe_commit("foo", "master", state=pfs_pb2.CommitState.FINISHED) >>> c = next(commits)
- wait_commit(commit)[source]
Waits for the specified commit to finish.
- Parameters
- commitUnion[str, SubcommitType]
A commit object to wait on. Can either be an entire commit or a subcommit (commit at the repo-level).
- Returns
- List[pfs_pb2.CommitInfo]
A list of protobuf objects that contain info on subcommits (commit at the repo-level). These are the individual subcommits this function waited on.
Examples
>>> # wait for an entire commit to finish >>> subcommits = client.wait_commit("467c580611234cdb8cc9758c7aa96087") ... >>> # wait for a commit to finish at a certain repo >>> client.wait_commit(("foo", "master"))
- walk_file(commit, path, datum=None, pagination_marker=None, number=None, reverse=False)[source]
Walks over all descendant files in a directory.
- Parameters
- commitSubcommitType
The subcommit (commit at the repo-level) to walk files in.
- pathstr
The path to the directory.
- datumstr, optional
A tag that filters the files.
- pagination_marker:
Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.
- numberint, optional
Number of files to return
- reversebool, optional
If true, return files in reverse order
- Returns
- ——-
- Iterator[pfs_pb2.FileInfo]
An iterator of protobuf objects that contain info on files.
Examples
>>> files = list(client.walk_file(("foo", "master"), "/dir/subdir/"))
- class PFSTarFile(name=None, mode='r', fileobj=None, format=None, tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, errors='surrogateescape', pax_headers=None, debug=None, errorlevel=None, copybufsize=None)[source]
- Attributes
- errors
Methods
add
(name[, arcname, recursive, filter])Add the file `name' to the archive. `name' may be any type of file (directory, fifo, symbolic link, etc.). If given, `arcname' specifies an alternative name for the file in the archive. Directories are added recursively by default. This can be avoided by setting `recursive' to False. `filter' is a function that expects a TarInfo object argument and returns the changed TarInfo object, if it returns None the TarInfo object will be excluded from the archive.
addfile
(tarinfo[, fileobj])Add the TarInfo object `tarinfo' to the archive. If `fileobj' is given, it should be a binary file, and tarinfo.size bytes are read from it and added to the archive. You can create TarInfo objects directly, or by using gettarinfo().
bz2open
(name[, mode, fileobj, compresslevel])Open bzip2 compressed tar archive name for reading or writing.
chmod
(tarinfo, targetpath)Set file permissions of targetpath according to tarinfo.
chown
(tarinfo, targetpath, numeric_owner)Set owner of targetpath according to tarinfo.
close
()Close the TarFile.
extract
(member[, path, set_attrs, numeric_owner])Extract a member from the archive to the current working directory, using its full name.
extractall
([path, members, numeric_owner])Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards.
extractfile
(member)Extract a member from the archive as a file object. `member' may be a filename or a TarInfo object. If `member' is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned.
fileobject
alias of
tarfile.ExFileObject
getmember
(name)Return a TarInfo object for member `name'. If `name' can not be found in the archive, KeyError is raised. If a member occurs more than once in the archive, its last occurrence is assumed to be the most up-to-date version.
getmembers
()Return the members of the archive as a list of TarInfo objects.
getnames
()Return the members of the archive as a list of their names.
gettarinfo
([name, arcname, fileobj])Create a TarInfo object from the result of os.stat or equivalent on an existing file. The file is either named by `name', or specified as a file object `fileobj' with a file descriptor. If given, `arcname' specifies an alternative name for the file in the archive, otherwise, the name is taken from the 'name' attribute of 'fileobj', or the 'name' argument. The name should be a text string.
gzopen
(name[, mode, fileobj, compresslevel])Open gzip compressed tar archive name for reading or writing.
list
([verbose, members])Print a table of contents to sys.stdout. If `verbose' is False, only the names of the members are printed. If it is True, an `ls -l'-like output is produced. `members' is optional and must be a subset of the list returned by getmembers().
makedev
(tarinfo, targetpath)Make a character or block device called targetpath.
makedir
(tarinfo, targetpath)Make a directory called targetpath.
makefifo
(tarinfo, targetpath)Make a fifo called targetpath.
makefile
(tarinfo, targetpath)Make a file called targetpath.
makelink
(tarinfo, targetpath)Make a (symbolic) link called targetpath.
makeunknown
(tarinfo, targetpath)Make a file from a TarInfo object with an unknown type at targetpath.
next
()Return the next member of the archive as a TarInfo object, when TarFile is opened for reading.
open
([name, mode, fileobj, bufsize])Open a tar archive for reading, writing or appending.
tarinfo
alias of
tarfile.TarInfo
taropen
(name[, mode, fileobj])Open uncompressed tar archive name for reading or writing.
utime
(tarinfo, targetpath)Set modification time of targetpath according to tarinfo.
xzopen
(name[, mode, fileobj, preset])Open lzma compressed tar archive name for reading or writing.
python_pachyderm.mixin.pps
- class PPSMixin[source]
A mixin for pps-related functionality.
Methods
create_pipeline
(pipeline_name, transform[, ...])Creates a pipeline.
Creates a pipeline from a
CreatePipelineRequest
object.create_secret
(secret_name, data[, labels, ...])Creates a new secret.
Deletes all pipelines.
delete_job
(job_id, pipeline_name[, project_name])Deletes a subjob (job at the pipeline-level).
delete_pipeline
(pipeline_name[, force, ...])Deletes a pipeline.
delete_secret
(secret_name)Deletes a secret.
get_job_logs
(pipeline_name, job_id[, ...])Gets logs for a job.
get_kube_events
(since)Return a stream of Kubernetes events.
get_pipeline_logs
(pipeline_name[, ...])Gets logs for a pipeline.
inspect_datum
(pipeline_name, job_id, datum_id)Inspects a datum.
inspect_job
(job_id[, pipeline_name, wait, ...])Inspects a job.
inspect_pipeline
(pipeline_name[, history, ...])inspect_secret
(secret_name)Inspects a secret.
list_datum
([pipeline_name, job_id, input, ...])Lists datums.
list_job
([pipeline_name, input_commit, ...])Lists jobs.
list_pipeline
([history, details, jqFilter, ...])Lists secrets.
query_loki
(query[, since])Returns a stream of loki log messages given a query string.
restart_datum
(pipeline_name, job_id[, ...])Restarts a datum.
run_cron
(pipeline_name[, project_name])Triggers a cron pipeline to run now.
start_pipeline
(pipeline_name[, project_name])Starts a pipeline.
stop_job
(job_id, pipeline_name[, reason, ...])Stops a subjob (job at the pipeline-level).
stop_pipeline
(pipeline_name[, project_name])Stops a pipeline.
- create_pipeline(pipeline_name, transform, project_name=None, parallelism_spec=None, egress=None, reprocess_spec=None, update=False, output_branch_name=None, s3_out=False, resource_requests=None, resource_limits=None, sidecar_resource_limits=None, input=None, description=None, reprocess=False, service=None, datum_set_spec=None, datum_timeout=None, job_timeout=None, salt=None, datum_tries=3, scheduling_spec=None, pod_patch=None, spout=None, spec_commit=None, metadata=None, autoscaling=False, tolerations=None, sidecar_resource_requests=None, dry_run=False, determined=None)[source]
Creates a pipeline.
For info on the params, please refer to the pipeline spec document: http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html
- Parameters
- pipeline_namestr
The pipeline name.
- transformpps_pb2.Transform
The image and commands run during pipeline execution.
- project_namestr
The name of the project.
- parallelism_specpps_pb2.ParallelismSpec, optional
Specifies how the pipeline is parallelized.
- egresspps_pb2.Egress, optional
An external data store to publish the results of the pipeline to.
- reprocess_specstr, optional
Specifies how to handle already-processed datums.
- updatebool, optional
If true, updates the existing pipeline with new args.
- output_branch_namestr, optional
The branch name to output results on.
- s3_outbool, optional
If true, the output repo is exposed as an S3 gateway bucket.
- resource_requestspps_pb2.ResourceSpec, optional
The amount of resources that the pipeline workers will consume.
- resource_limits: pps_pb2.ResourceSpec, optional
The upper threshold of allowed resources a given worker can consume. If a worker exceeds this value, it will be evicted.
- sidecar_resource_limitspps_pb2.ResourceSpec, optional
The upper threshold of resources allocated to the sidecar containers.
- inputpps_pb2.Input, optional
The input repos to the pipeline. Commits to these repos will automatically trigger the pipeline to create new jobs to process them.
- descriptionstr, optional
A description of the pipeline.
- reprocessbool, optional
If true, forces the pipeline to reprocess all datums. Only has meaning if update is
True
.- servicepps_pb2.Service, optional
Creates a Service pipeline instead of a normal pipeline.
- datum_set_specpps_pb2.DatumSetSpec, optional
Specifies how a pipeline should split its datums into datum sets.
- datum_timeoutduration_pb2.Duration, optional
The maximum execution time allowed for each datum.
- job_timeoutduration_pb2.Duration, optional
The maximum execution time allowed for a job.
- saltstr, optional
A tag for the pipeline.
- datum_triesint, optional
The number of times a job attempts to run on a datum when a failure occurs.
- scheduling_specpps_pb2.SchedulingSpec, optional
Specifies how the pods for a pipeline should be scheduled.
- pod_patchstr, optional
Allows one to set fields in the pod spec that haven’t been explicitly exposed in the rest of the pipeline spec.
- spoutpps_pb2.Spout, optional
Creates a Spout pipeline instead of a normal pipeline.
- spec_commitpfs_pb2.Commit, optional
A spec commit to base the pipeline spec from.
- metadatapps_pb2.Metadata, optional
Kubernetes labels and annotations to add as metadata to the pipeline pods.
- autoscalingbool, optional
If true, automatically scales the worker pool based on the datums it has to process.
- tolerations: List[pps_pb2.Toleration]
A list of Kubernetes tolerations to be applied to the worker pod.
- sidecar_resource_requestspps_pb2.ResourceSpec, optional
The amount of resources that the sidecar containers will consume.
Notes
If creating a Spout pipeline, when committing data to the repo, use commit methods (
client.commit()
,client.start_commit()
, etc.) orModifyFileClient
methods (mfc.put_file_from_bytes
,mfc.delete_file()
, etc.)For other pipelines, when committing data to the repo, write out to
/pfs/out/
.Examples
>>> client.create_pipeline( ... "foo", ... transform=pps_pb2.Transform( ... cmd=["python3", "main.py"], ... image="example/image", ... ), ... input=pps_pb2.Input(pfs=pps_pb2.PFSInput( ... repo="foo", ... branch="master", ... glob="/*" ... )) ... )
- create_pipeline_from_request(req)[source]
Creates a pipeline from a
CreatePipelineRequest
object. Usually used in conjunction withutil.parse_json_pipeline_spec()
orutil.parse_dict_pipeline_spec()
.- Parameters
- reqpps_pb2.CreatePipelineRequest
The
CreatePipelineRequest
object.
- create_secret(secret_name, data, labels=None, annotations=None)[source]
Creates a new secret.
- Parameters
- secret_namestr
The name of the secret.
- dataDict[str, Union[str, bytes]]
The data to store in the secret. Each key must consist of alphanumeric characters
-
,_
or.
.- labelsDict[str, str], optional
Kubernetes labels to attach to the secret.
- annotationsDict[str, str], optional
Kubernetes annotations to attach to the secret.
- delete_job(job_id, pipeline_name, project_name=None)[source]
Deletes a subjob (job at the pipeline-level).
- Parameters
- job_idstr
The ID of the job.
- pipeline_namestr
The name of the pipeline.
- project_namestr
The name of the project.
- delete_pipeline(pipeline_name, force=False, keep_repo=False, project_name=None)[source]
Deletes a pipeline.
- Parameters
- pipeline_namestr
The name of the pipeline.
- forcebool, optional
If true, forces the pipeline deletion.
- keep_repobool, optional
If true, keeps the output repo.
- project_namestr
The name of the project.
- delete_secret(secret_name)[source]
Deletes a secret.
- Parameters
- secret_namestr
The name of the secret.
- get_job_logs(pipeline_name, job_id, project_name=None, data_filters=None, datum=None, follow=False, tail=0, use_loki_backend=False, since=None)[source]
Gets logs for a job.
- Parameters
- pipeline_namestr
The name of the pipeline.
- job_idstr
The ID of the job.
- project_namestr
The name of the project.
- data_filtersList[str], optional
A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).
- datumpps_pb2.Datum, optional
Filters log lines for the specified datum.
- followbool, optional
If true, continue to follow new logs as they appear.
- tailint, optional
If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.
- use_loki_backendbool, optional
If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.
- sinceduration_pb2.Duration, optional
Specifies how far in the past to return logs from.
- Returns
- Iterator[pps_pb2.LogMessage]
An iterator of protobuf objects that contain info on a log from a PPS worker. If follow is set to
True
, usenext()
to iterate through as the returned stream is potentially endless. Might block your code otherwise.
- get_pipeline_logs(pipeline_name, project_name=None, data_filters=None, master=False, datum=None, follow=False, tail=0, use_loki_backend=False, since=None)[source]
Gets logs for a pipeline.
- Parameters
- pipeline_namestr
The name of the pipeline.
- project_namestr
The name of the project.
- data_filtersList[str], optional
A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).
- masterbool, optional
If true, includes logs from the master
- datumpps_pb2.Datum, optional
Filters log lines for the specified datum.
- followbool, optional
If true, continue to follow new logs as they appear.
- tailint, optional
If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.
- use_loki_backendbool, optional
If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.
- sinceduration_pb2.Duration, optional
Specifies how far in the past to return logs from.
- Returns
- Iterator[pps_pb2.LogMessage]
An iterator of protobuf objects that contain info on a log from a PPS worker. If follow is set to
True
, usenext()
to iterate through as the returned stream is potentially endless. Might block your code otherwise.
- inspect_datum(pipeline_name, job_id, datum_id, project_name=None)[source]
Inspects a datum.
- Parameters
- pipeline_namestr
The name of the pipeline.
- job_idstr
The ID of the job.
- datum_idstr
The ID of the datum.
- project_namestr
The name of the project.
- Returns
- pps_pb2.DatumInfo
A protobuf object with info on the datum.
- inspect_job(job_id, pipeline_name=None, wait=False, details=False, project_name=None)[source]
Inspects a job.
- Parameters
- job_idstr
The ID of the job.
- pipeline_namestr, optional
The name of a pipeline.
- waitbool, optional
If true, wait until the job completes.
- detailsbool, optional
If true, return worker details.
- project_namestr
The name of the project.
- Returns
- Iterator[pps_pb2.JobInfo]
An iterator of protobuf objects that contain info on a subjob (jobs at the pipeline-level).
Examples
>>> # Look at all subjobs in a job >>> subjobs = list(client.inspect_job("467c580611234cdb8cc9758c7aa96087")) ... >>> # Look at single subjob (job at the pipeline-level) >>> subjob = list(client.inspect_job("467c580611234cdb8cc9758c7aa96087", "foo"))[0]
- inspect_pipeline(pipeline_name, history=0, details=False, project_name=None)[source]
Inspects a pipeline.
- Parameters
- pipeline_namestr
The name of the pipeline.
- historyint, optional
Indicates to return historical versions of pipeline_name. Semantics are:
0: Return current version of pipeline_name
1: Return the above and pipeline_name from the next most recent version.
2: etc.
-1: Return all historical versions of pipeline_name.
- detailsbool, optional
If true, return pipeline details.
- project_namestr
The name of the project.
- Returns
- Iterator[pps_pb2.PipelineInfo]
An iterator of protobuf objects that contain info on a pipeline.
Examples
>>> pipeline = next(client.inspect_pipeline("foo")) ... >>> for p in client.inspect_pipeline("foo", 2): >>> print(p)
- inspect_secret(secret_name)[source]
Inspects a secret.
- Parameters
- secret_namestr
The name of the secret.
- Returns
- pps_pb2.SecretInfo
A protobuf object with info on the secret.
- list_datum(pipeline_name=None, job_id=None, input=None, project_name=None, datum_filter=None, pagination_marker=None, number=None, reverse=False)[source]
Lists datums. Exactly one of (pipeline_name, job_id) (real) or input (hypothetical) must be set.
- Parameters
- pipeline_namestr, optional
The name of the pipeline.
- job_idstr, optional
The ID of a job.
- inputpps_pb2.Input, optional
A protobuf object that filters the datums returned. The datums listed are ones that would be run if a pipeline was created with the provided input.
- project_namestr
The name of the project.
- datum_filter: pps_proto.ListDatumRequest.adFilter
Filter restricts returned DatumInfo messages to those which match all the filtered attributes.
- pagination_marker:
Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.
- numberint, optional
Number of files to return
- reversebool, optional
If true, return files in reverse order
- Returns
- Iterator[pps_pb2.DatumInfo]
An iterator of protobuf objects that contain info on a datum.
Examples
>>> # See hypothetical datums with specified input cross >>> datums = list(client.list_datum(input=pps_pb2.Input( ... pfs=pps_pb2.PFSInput(repo="foo", branch="master", glob="/*"), ... cross=[ ... pps_pb2.Input(pfs=pps_pb2.PFSInput(repo="bar", branch="master", glob="/")), ... pps_pb2.Input(pfs=pps_pb2.PFSInput(repo="baz", branch="master", glob="/*/*")), ... ] ... )))
- list_job(pipeline_name=None, input_commit=None, history=0, details=False, jqFilter=None, project_name=None, projects_filter=None, pagination_marker=None, number=None, reverse=False)[source]
Lists jobs.
- Parameters
- pipeline_namestr, optional
The name of a pipeline. If set, returns subjobs (job at the pipeline-level) only from this pipeline.
- input_commitSubcommitType, optional
A commit or list of commits from the input repo to filter jobs on. Only impacts returned results if pipeline_name is specified.
- historyint, optional
Indicates to return jobs from historical versions of pipeline_name. Semantics are:
0: Return jobs from the current version of pipeline_name
1: Return the above and jobs from the next most recent version
2: etc.
-1: Return jobs from all historical versions of pipeline_name
- detailsbool, optional
If true, return pipeline details for pipeline_name. Leaving this
None
(orFalse
) can make the call significantly faster in clusters with a large number of pipelines and jobs. Note that if input_commit is valid, this field is coerced to True.- jqFilterstr, optional
A
jq
filter that can filter the list of jobs returned, only if pipeline_name is provided.- project_namestr, optional
The name of the project containing the pipeline.
- projects_filter: List[str], optional
A list of projects to filter jobs on, None means don’t filter.
- pagination_marker:
Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.
- numberint, optional
Number of files to return
- reversebool, optional
If true, return files in reverse order
- Returns
- Union[Iterator[pps_pb2.JobInfo], Iterator[pps_pb2.JobSetInfo]]
An iterator of protobuf objects that either contain info on a subjob (job at the pipeline-level), if pipeline_name was specified, or a job, if pipeline_name wasn’t specified.
Examples
>>> # List all jobs >>> jobs = list(client.list_job()) ... >>> # List all jobs at a pipeline-level >>> subjobs = list(client.list_job("foo"))
- list_pipeline(history=0, details=False, jqFilter=None, commit_set=None, projects_filter=None)[source]
Lists pipelines.
- Parameters
- historyint, optional
Indicates to return historical versions of pipeline_name. Semantics are:
0: Return current version of pipeline_name
1: Return the above and pipeline_name from the next most recent version.
2: etc.
-1: Return all historical versions of pipeline_name.
- detailsbool, optional
If true, return pipeline details.
- jqFilterstr, optional
A
jq
filter that can filter the list of pipelines returned.- commit_setpfs_pb2.CommitSet, optional
If non-nil, will return all the pipeline infos at this commit set
- projects_filter: List[str], optional
A list of projects to filter jobs on, None means don’t filter.
- Returns
- Iterator[pps_pb2.PipelineInfo]
An iterator of protobuf objects that contain info on a pipeline.
Examples
>>> pipelines = list(client.list_pipeline())
- list_secret()[source]
Lists secrets.
- Returns
- List[pps_pb2.SecretInfo]
A list of protobuf objects that contain info on a secret.
- query_loki(query, since=None)[source]
Returns a stream of loki log messages given a query string.
- Parameters
- querystr
The Loki query.
- sinceduration_pb2.Duration, optional
Return log messages more recent than “since”. (default now)
- restart_datum(pipeline_name, job_id, data_filters=None, project_name=None)[source]
Restarts a datum.
- Parameters
- pipeline_namestr
The name of the pipeline.
- job_idstr
The ID of the job.
- data_filtersList[str], optional
A list of paths or hashes of datums that filter which datums are restarted.
- project_namestr
The name of the project.
- run_cron(pipeline_name, project_name=None)[source]
Triggers a cron pipeline to run now.
For more info on cron pipelines: https://docs.pachyderm.com/latest/concepts/pipeline-concepts/pipeline/cron/
- Parameters
- pipeline_namestr
The name of the pipeline.
- project_namestr
The name of the project.
- start_pipeline(pipeline_name, project_name=None)[source]
Starts a pipeline.
- Parameters
- pipeline_namestr
The name of the pipeline.
- project_namestr
The name of the project.
python_pachyderm.mixin.transaction
- class TransactionMixin[source]
A mixin for transaction-related functionality.
Methods
batch_transaction
(requests)Executes a batch transaction.
Deletes all transactions.
delete_transaction
(transaction)Deletes a transaction.
finish_transaction
(transaction)Finishes a transaction.
inspect_transaction
(transaction)Inspects a transaction.
Lists unfinished transactions.
Starts a transaction.
A context manager for running operations within a transaction.
- batch_transaction(requests)[source]
Executes a batch transaction.
- Parameters
- requestsList[transaction_pb2.TransactionRequest]
A list of
TransactionRequest
protobufs. Each protobuf must only have one field set.
- Returns
- transaction_pb2.TransactionInfo
A protobuf object with info on the transaction.
Examples
>>> # Deletes one repo and creates a branch in another repo atomically >>> client.batch_transaction([ transaction_pb2.TransactionRequest(delete_repo=pfs_pb2.DeleteRepoRequest(repo=pfs_pb2.Repo(name="foo"))), transaction_pb2.TransactionRequest(create_branch=pfs_pb2.CreateBranchRequest(branch=pfs_pb2.Branch( repo=pfs_pb2.Repo(name="bar", type="user"), name="staging" ))) ])
- delete_transaction(transaction)[source]
Deletes a transaction.
- Parameters
- transactionUnion[str, transaction_pb2.Transaction]
The ID or protobuf object representing the transaction.
Examples
>>> client.delete_transaction("6fe754facd9c41e99d04e1037e3be9ee") ... >>> transaction = client.finish_transaction("a3ak09467c580611234cdb8cc9758c7a") >>> client.delete_transaction(transaction)
- finish_transaction(transaction)[source]
Finishes a transaction.
- Parameters
- transactionUnion[str, transaction_pb2.Transaction]
The ID or protobuf object representing the transaction.
- Returns
- transaction_pb2.TransactionInfo
A protobuf object with info on the transaction.
Examples
>>> transaction = client.start_transaction() >>> # do stuff >>> client.finish_transaction(transaction)
- inspect_transaction(transaction)[source]
Inspects a transaction.
- Parameters
- transactionUnion[str, transaction_pb2.Transaction]
The ID or protobuf object representing the transaction.
- Returns
- transaction_pb2.TransactionInfo
A protobuf object with info on the transaction.
Examples
>>> transaction = client.inspect_transaction("6fe754facd9c41e99d04e1037e3be9ee") ... >>> transaction = client.inspect_transaction(transaction_protobuf)
- list_transaction()[source]
Lists unfinished transactions.
- Returns
- List[transaction_pb2.TransactionInfo]
A list of protobuf objects that contain info on a transaction.
- start_transaction()[source]
Starts a transaction.
- Returns
- transaction_pb2.Transaction
A protobuf object that represents the transaction.
Examples
>>> transaction = client.start_transaction() >>> # do stuff >>> client.finish_transaction(transaction)
- transaction()[source]
A context manager for running operations within a transaction. When the context manager completes, the transaction will be deleted if an error occurred, or otherwise finished.
- Yields
- transaction_pb2.Transaction
A protobuf object that represents a transaction.
Examples
If a pipeline has two input repos, foo and bar, a transaction is useful for adding data to both atomically before the pipeline runs even once.
>>> with client.transaction() as t: >>> c1 = client.start_commit("foo", "master") >>> c2 = client.start_commit("bar", "master") >>> >>> # File operations, such as put file, must occur outside >>> # the transaction block. >>> client.put_file_bytes(c1, "/joint_data.txt", b"DATA1") >>> client.put_file_bytes(c2, "/joint_data.txt", b"DATA2") >>> >>> client.finish_commit(c1) >>> client.finish_commit(c2)