Information

Exposes a mixin for each pachyderm service. These mixins should not be used directly; instead, you should use python_pachyderm.Client(). The mixins exist exclusively in order to provide better code organization (because we have several mixins, rather than one giant Client class.)

python_pachyderm.mixin.admin

class AdminMixin[source]

A mixin for admin-related functionality.

Methods

inspect_cluster()

Inspects a cluster.

inspect_cluster()[source]

Inspects a cluster.

Returns

admin_pb2.ClusterInfo: A protobuf object with info on the cluster.

python_pachyderm.mixin.auth

class AuthMixin[source]

A mixin for auth-related functionality.

Methods

`activate_auth`([root_token])	Activates auth on the cluster.
`authenticate_id_token`(id_token)	Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.
`authenticate_oidc`(oidc_state)	Authenticates a user to the Pachyderm cluster via OIDC.
`authorize`(resource[, permissions])	Tests a list of permissions that the user might have on a resource.
`deactivate_auth`()	Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.
`extract_auth_tokens`()	This maps to an internal function that is only used for migration.
`get_auth_configuration`()	Gets the auth configuration.
`get_groups`()	Gets a list of groups this user belongs to.
`get_oidc_login`()	Gets the OIDC login configuration.
`get_robot_token`(robot[, ttl])	Gets a new auth token for a robot user.
`get_role_binding`(resource)	Returns the current set of role bindings to the resource specified.
`get_roles_for_permission`(permission)	Returns a list of all roles that have the specified permission.
`get_users`(group)	Gets users in a group.
`modify_members`(group[, add, remove])	Adds and/or removes members of a group.
`modify_role_binding`(resource, principal[, roles])	Sets the roles for a given principal on a resource.
`restore_auth_token`([token])	This maps to an internal function that is only used for migration.
`revoke_auth_token`(token)	Revokes an auth token.
`set_auth_configuration`(configuration)	Sets the auth configuration.
`set_groups_for_user`(username, groups)	Sets the group membership for a user.
`who_am_i`()	Returns info about the user tied to this Client.

activate_auth(root_token=None)[source]

Activates auth on the cluster. Returns the root token, an irrevocable superuser credential that should be stored securely.

Parameters

root_tokenstr, optional: If set, the token used as the root user login token. In general, it is safer to not set and let Pachyderm generate one for you.

Returns

str: A token used as the root user login token.

authenticate_id_token(id_token)[source]

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider. The token must include the Pachyderm client_id in the set of audiences to be valid.

Parameters

id_tokenstr: The ID token.

Returns

str: A token that can be used for making authenticate requests.

authenticate_oidc(oidc_state)[source]

Authenticates a user to the Pachyderm cluster via OIDC.

Parameters

oidc_statestr: An OIDC state token.

Returns

str: A token that can be used for making authenticate requests.

authorize(resource, permissions=None)[source]

Tests a list of permissions that the user might have on a resource.

Parameters

resourceauth_pb2.Resource: The resource the user wants to test on.
permissionsList[auth_pb2.Permission], optional: The list of permissions the users wants to test.

Returns

auth_pb2.AuthorizeResponse: A protobuf object that indicates whether the user/principal had all the inputted permissions on the resource, which permissions the user had, which permissions the user lacked, and the name of the principal.

Examples

>>> client.authorize(
...     auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"),
...     [auth_pb2.Permission.REPO_READ]
... )
authorized: true
satisfied: REPO_READ
principal: "pach:root"

deactivate_auth()[source]

Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

Raises

AuthServiceNotActivated

extract_auth_tokens()[source]: This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

get_auth_configuration()[source]

Gets the auth configuration.

Returns

auth_pb2.OIDCConfig: A protobuf object with auth configuration information.

get_groups()[source]

Gets a list of groups this user belongs to.

Returns

List[str]: List of groups the user belongs to.

get_oidc_login()[source]

Gets the OIDC login configuration.

Returns

auth_pb2.GetOIDCLoginResponse: A protobuf object with the login configuration information.

get_robot_token(robot, ttl=None)[source]

Gets a new auth token for a robot user.

Parameters

robotstr: The name of the robot user.
ttlint, optional: The remaining lifetime of this token in seconds. If unset, token doesn’t expire.

Returns

str: The new auth token.

get_role_binding(resource)[source]

Returns the current set of role bindings to the resource specified.

Parameters

resourceauth_pb2.Resource: A protobuf object representing the resource being checked.

Returns

Dict[str, auth_pb2.Roles]: A dictionary mapping the principals to the roles they have.

Examples

>>> client.get_role_binding(auth_pb2.Resource(type=auth_pb2.CLUSTER))
{
    'robot:someuser': roles {
        key: "clusterAdmin"
        value: true
    },
    'pach:root': roles {
        key: "clusterAdmin"
        value: true
    }
}
...
>>> client.get_role_binding(auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"))
{
    'user:person_a': roles {
        key: "repoWriter"
        value: true
    },
    'pach:root': roles {
        key: "repoOwner"
        value: true
    }
}

get_roles_for_permission(permission)[source]

Returns a list of all roles that have the specified permission.

Parameters

permissionauth_pb2.Permission: The Permission enum to check for.

Returns

List[auth_pb2.Role]: A list of Role protobuf objects that all have the specified permission.

Examples

All available permissions can be found in auth proto Permissions enum

>>> roles = client.get_roles_for_permission(auth_pb2.Permission.REPO_READ)

get_users(group)[source]

Gets users in a group.

Parameters

groupstr: The group to list users from.

Returns

List[str]: All the users in the specified group.

modify_members(group, add=None, remove=None)[source]

Adds and/or removes members of a group.

Parameters

groupstr: The group to modify.
addList[str], optional: A list of strings specifying members to add.
removeList[str], optional: A list of strings specifying members to remove.

Examples

>>> client.modify_members(
...     "foogroup",
...     add=["user:otheruser"],
...     remove=["user:someuser"]
... )

modify_role_binding(resource, principal, roles=None)[source]

Sets the roles for a given principal on a resource.

Parameters

resourceauth_pb2.Resource: A protobuf object representing the resource to grant the roles on.
principalstr: The principal to grant the roles for.
rolesList[str], optional: The absolute list of roles for the principtal to have. If roles is unset, the principal will have no roles.

Examples

You can find some of the built-in roles here: https://github.com/pachyderm/pachyderm/blob/master/src/auth/auth.go#L27

>>> client.modify_role_binding(
...     auth_pb2.Resource(type=auth_pb2.CLUSTER),
...     "user:someuser",
...     roles=["clusterAdmin"]
... )
>>> client.modify_role_binding(
...     auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"),
...     "user:someuser",
...     roles=["repoWriter"]
... )

restore_auth_token(token=None)[source]: This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

revoke_auth_token(token)[source]

Revokes an auth token.

Parameters

tokenstr: The Pachyderm token being revoked.

set_auth_configuration(configuration)[source]

Sets the auth configuration.

Parameters

configurationauth_pb2.OIDCConfig: A protobuf object with auth configuration information.

Examples

>>> client.set_auth_configuration(auth_pb2.OIDCConfig(
...     issuer="http://localhost:1658",
...     client_id="client",
...     client_secret="secret",
...     redirect_uri="http://test.example.com",
... ))

set_groups_for_user(username, groups)[source]

Sets the group membership for a user.

Parameters

usernamestr: The username to be added.
groupsList[str]: The groups to add the username to.

Examples

>>> client.set_groups_for_user("user:someuser", ["foogroup", "bargroup"])

who_am_i()[source]

Returns info about the user tied to this Client.

Returns

auth_pb2.WhoAmIResponse: A protobuf object that returns the username and expiration for the token used.

python_pachyderm.mixin.debug

class DebugMixin[source]

A mixin for debug-related functionality.

Methods

`binary`([filter])	Gets the pachd binary.
`dump`([system, pipelines, input_repos, timeout])	Collect a standard set of debugging information using the DumpV2 API
`get_dump_template`([filters])	Generate a template request to be used by the DumpV2 API.
`profile_cpu`(duration[, filter])	Gets a CPU profile.
`set_log_level`([pachyderm_level, grpc_level, ...])	Sets the logging level of either pachyderm or grpc. Note: Only one level can be set at a time. If you are attempting to set multiple logging levels you must do so with multiple calls.

binary(filter=None)[source]

Gets the pachd binary.

Parameters

filterdebug_pb2.Filter, optional: A protobuf object that filters what info is returned. Is one of pachd bool, pipeline protobuf, or worker protobuf.

Yields

bytes: The pachd binary as a sequence of bytearrays.

Examples

>>> for b in client.binary():
>>>     print(b)

dump(system=None, pipelines=None, input_repos=False, timeout=None)[source]

Collect a standard set of debugging information using the DumpV2 API: rather than the now deprecated Dump API.
This method is intended to be used in tandem with the: debug.get_dump_v2_template endpoint. However, if no system or pipelines are specified then this call will automatically be performed for the user.
If no system or pipelines are specified, then debug information for all: systems and pipelines will be returned.

Parameters

systemdebug_pb2.System, optional: A protobuf object that filters what info is returned.
pipelinesList[pps_pb2.Pipeline], optional: A list of pipelines from which to collect debug information.
input_reposbool: Whether to collect debug information for input repos. Default: False
timeoutduration_pb2.Duration, optional: Duration until timeout occurs. Default is no timeout.

Yields

debug_pb2.DumpChunk: Chunks of the debug dump

Examples

>>> for b in client.dump():
>>>     print(b)

get_dump_template(filters=None)[source]

Generate a template request to be used by the DumpV2 API.

Parameters

filtersList[str], optional: No supported filters - this argument has no effect.

Returns

debug_pb2.DumpV2Request: The request that can be sent to the DumpV2 API.

profile_cpu(duration, filter=None)[source]

Gets a CPU profile.

Parameters

durationduration_pb2.Duration: A google protobuf duration object indicating how long the profile should run for.
filterdebug_pb2.Filter, optional: A protobuf object that filters what info is returned. Is one of pachd bool, pipeline protobuf, or worker protobuf.

Yields

bytes: The cpu profile as a sequence of bytearrays.

Examples

>>> for b in client.profile_cpu(duration_pb2.Duration(seconds=1)):
>>>     print(b)

set_log_level(pachyderm_level=None, grpc_level=None, duration=None, recurse=True)[source]

Sets the logging level of either pachyderm or grpc. Note: Only one level can be set at a time. If you are attempting to set

multiple logging levels you must do so with multiple calls.

Parameters

pachyderm_level: debug_pb2.SetLogLevelRequest.LogLevel, oneof: The desired pachyderm logging level.
grpc_level: debug_pb2.SetLogLevelRequest.LogLevel, oneof: The desired grpc logging level.
duration: duration_pb2.Duration, optional: How long to log at the non-default level. (default 5m0s)
recurse: bool: Set the log level on all pachyderm pods; if false, only the pachd that handles this RPC. (default true)

python_pachyderm.mixin.enterprise

class EnterpriseMixin[source]

A mixin for enterprise-related functionality.

Methods

`activate_enterprise`(license_server, id, secret)	Activates enterprise by registering with a license server.
`deactivate_enterprise`()	Deactivates enterprise.
`get_activation_code`()	Returns the enterprise code used to activate Pachyderm Enterprise in this cluster.
`get_enterprise_state`()	Gets the current enterprise state of the cluster.
`get_pause_status`()	Gets the pause status of the cluster.
`pause_enterprise`()	Pauses the cluster.
`unpause_enterprise`()	Unpauses the cluster.

activate_enterprise(license_server, id, secret)[source]

Activates enterprise by registering with a license server.

Parameters

license_serverstr: The Pachyderm Enterprise Server to register with.
idstr: The unique ID for this cluster.
secretstr: The secret for registering this cluster.

deactivate_enterprise()[source]: Deactivates enterprise.

get_activation_code()[source]

Returns the enterprise code used to activate Pachyderm Enterprise in this cluster.

Returns

enterprise_pb2.GetActivationCodeResponse: A protobuf object that returns a state enum, info on the token, and the activation code.

get_enterprise_state()[source]

Gets the current enterprise state of the cluster.

Returns

enterprise_pb2.GetStateResponse: A protobuf object that returns a state enum, info on the token, and an empty activation code.

get_pause_status()[source]

Gets the pause status of the cluster.

Returns

enterprise_pb2.PauseStatusResponse: A protobuf object that returns a status enum.

pause_enterprise()[source]: Pauses the cluster.

unpause_enterprise()[source]: Unpauses the cluster.

python_pachyderm.mixin.health

class HealthMixin[source]

A mixin for health-related functionality.

Methods

health_check()

Returns a health check indicating if the server can handle RPCs.

health_check()[source]

Returns a health check indicating if the server can handle RPCs.

Returns

health_pb2.HealthCheckResponse: A protobuf object with a status enum indicating server health.

python_pachyderm.mixin.identity

class IdentityMixin[source]

A mixin for identity-related functionality.

Methods

`create_idp_connector`(connector)	Create an IDP connector in the identity server.
`create_oidc_client`(client)	Create an OIDC client in the identity server.
`delete_all_identity`()	Delete all identity service information.
`delete_idp_connector`(id)	Delete an IDP connector in the identity server.
`delete_oidc_client`(id)	Delete an OIDC client in the identity server.
`get_identity_server_config`()	Get the embedded identity server configuration.
`get_idp_connector`(id)	Get an IDP connector in the identity server.
`get_oidc_client`(id)	Get an OIDC client in the identity server.
`list_idp_connectors`()	List IDP connectors in the identity server.
`list_oidc_clients`()	List OIDC clients in the identity server.
`set_identity_server_config`(config)	Configure the embedded identity server.
`update_idp_connector`(connector)	Update an IDP connector in the identity server.
`update_oidc_client`(client)	Update an OIDC client in the identity server.

create_idp_connector(connector)[source]

Create an IDP connector in the identity server.

Parameters

connectoridentity_pb2.IDPConnector: A protobuf object that represents a connection to an identity provider.

create_oidc_client(client)[source]

Create an OIDC client in the identity server.

Parameters

clientidentity_pb2.OIDCClient: A protobuf object representing an OIDC client.

Returns

identity_pb2.OIDCClient: A protobuf object that returns a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.

delete_all_identity()[source]

Delete all identity service information.

Raises

AuthServiceNotActivated

delete_idp_connector(id)[source]

Delete an IDP connector in the identity server.

Parameters

idstr: The connector ID.

delete_oidc_client(id)[source]

Delete an OIDC client in the identity server.

Parameters

idstr: The client ID.

get_identity_server_config()[source]

Get the embedded identity server configuration.

Returns

identity_pb2.IdentityServerConfig: A protobuf object that holds configuration info (issuer and ID token expiry) for the identity web server.

get_idp_connector(id)[source]

Get an IDP connector in the identity server.

Parameters

idstr: The connector ID.

Returns

identity_pb2.IDPConnector: A protobuf object that returns info on the connector ID, name, type, config version, and configuration of the upstream IDP connector.

get_oidc_client(id)[source]

Get an OIDC client in the identity server.

Parameters

idstr: The client ID.

Returns

identity_pb2.OIDCClient: A protobuf object that returns a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.

list_idp_connectors()[source]

List IDP connectors in the identity server.

Returns

List[identity_pb2.IDPConnector]: A list of probotuf objects that return info on the connector ID, name, type, config version, and configuration of the upstream IDP connector.

list_oidc_clients()[source]

List OIDC clients in the identity server.

Returns

List[identity_pb2.OIDCClient]: A list of protobuf objects that return a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.

set_identity_server_config(config)[source]

Configure the embedded identity server.

Parameters

configidentity_pb2.IdentityServerConfig: A protobuf object that is the configuration for the identity web server.

update_idp_connector(connector)[source]

Update an IDP connector in the identity server.

Parameters

connectoridentity_pb2.IDPConnector: A protobuf object that represents a connection to an identity provider.

update_oidc_client(client)[source]

Update an OIDC client in the identity server.

Parameters

clientidentity_pb2.OIDCClient: A protobuf object representing an OIDC client.

python_pachyderm.mixin.license

class LicenseMixin[source]

A mixin for license-related functionality.

Methods

`activate_license`(activation_code[, expires])	Activates the license service.
`add_cluster`(id, address[, secret, ...])	Register a cluster with the license service.
`delete_all_license`()	Remove all clusters and deactivate the license service.
`delete_cluster`(id)	Delete a cluster registered with the license service.
`get_activation_code`()	Gets the enterprise code used to activate the server.
`list_clusters`()	List clusters registered with the license service.
`list_user_clusters`()	Lists all clusters available to user.
`update_cluster`(id, address[, user_address, ...])	Update a cluster registered with the license service.

activate_license(activation_code, expires=None)[source]

Activates the license service.

Parameters

activation_codestr: A Pachyderm enterprise activation code. New users can obtain trial activation codes.
expirestimestamp_pb2.Timestamp, optional: A protobuf object indicating when this activation code will expire. This should generally not be set and is only applied if it is earlier than the signed expiration time of activation_code.

Returns

enterprise_pb2.TokenInfo: A protobuf object that has the expiration for the current token. Field will be unset if there is no current token.

add_cluster(id, address, secret=None, user_address=None, cluster_deployment_id=None, enterprise_server=False)[source]

Register a cluster with the license service.

Parameters

idstr: The unique ID to identify the cluster.
addressstr: The public GRPC address for the license server to reach the cluster.
secretstr, optional: A shared secret for the cluster to use to authenticate. If not specified, a random secret will be generated and returned.
user_addressstr, optional: The pachd address for users to connect to.
cluster_deployment_idstr, optional: The deployment ID value generated by each cluster.
enterprise_serverbool, optional: Indicates whether the address points to an enterprise server.

Returns

license_pb2.AddClusterResponse: A protobuf object that returns the secret.

delete_all_license()[source]

Remove all clusters and deactivate the license service.

Raises

AuthServiceNotActivated

delete_cluster(id)[source]

Delete a cluster registered with the license service.

Parameters

idstr: The unique ID to identify the cluster.

get_activation_code()[source]

Gets the enterprise code used to activate the server.

Returns

license_pb2.GetActivationCodeResponse: A protobuf object that returns a state enum, info on the token, and the activation code.

list_clusters()[source]

List clusters registered with the license service.

Returns

List[license_pb2.ClusterStatus]: A list of protobuf objects that return info on a cluster.

list_user_clusters()[source]

Lists all clusters available to user.

Returns

List[license_pb2.UserClusterInfo]: A list of protobuf objects that return info on clusters available to the users.

update_cluster(id, address, user_address=None, cluster_deployment_id=None, secret=None)[source]

Update a cluster registered with the license service.

Parameters

idstr: The unique ID to identify the cluster.
addressstr: The public GRPC address for the license server to reach the cluster.
user_addressstr, optional: The pachd address for users to connect to.
cluster_deployment_idstr, optional: The deployment ID value generated by each cluster.
secretstr, optional: A shared secret for the cluster to use to authenticate. If not specified, a random secret will be generated and returned.

python_pachyderm.mixin.pfs

class ModifyFileClient(commit)[source]

ModifyFileClient puts or deletes PFS files atomically. Replaces PutFileClient from python_pachyderm 6.x.

Methods

`copy_file`(source_commit, source_path, dest_path)	Copy a file.
`delete_file`(path[, datum])	Deletes a file.
`put_file_from_bytes`(path, value[, datum, append])	Uploads a PFS file from a bytestring.
`put_file_from_fileobj`(path, value[, datum, ...])	Uploads a PFS file from a file-like object.
`put_file_from_filepath`(pfs_path, local_path)	Uploads a PFS file from a local path at a specified path.
`put_file_from_url`(path, url[, datum, ...])	Uploads a PFS File from the content found at a URL.

copy_file(source_commit, source_path, dest_path, datum=None, append=False)[source]

Copy a file.

Parameters

source_commitSubcommitType: The commit the source file is in.
source_pathstr: The path to the source file.
dest_pathstr: The path to the destination file.
datumstr, optional: A tag for the added file.
appendbool, optional: If true, appends the content of the source file to the destination file, if it already exists. Otherwise, overwrites the file.

delete_file(path, datum=None)[source]

Deletes a file.

Parameters

pathstr: The path to the file.
datumstr, optional: A tag that filters the files.

put_file_from_bytes(path, value, datum=None, append=False)[source]

Uploads a PFS file from a bytestring.

Parameters

pathstr: The path in the repo the file will be written to.
valueBinaryIO: The file-like object.
datumstr, optional: A tag for the added file.
appendbool, optional: If true, appends the content of value to the file at path, if it already exists. Otherwise, overwrites the file.

put_file_from_fileobj(path, value, datum=None, append=False)[source]

Uploads a PFS file from a file-like object.

Parameters

pathstr: The path in the repo the file will be written to.
valueBinaryIO: The file-like object.
datumstr, optional: A tag for the added file.
appendbool, optional: If true, appends the content of value to the file at path, if it already exists. Otherwise, overwrites the file.

put_file_from_filepath(pfs_path, local_path, datum=None, append=False)[source]

Uploads a PFS file from a local path at a specified path. This will lazily open files, which will prevent too many files from being opened, or too much memory being consumed, when atomically putting many files.

Parameters

pfs_pathstr: The path in the repo the file will be written to.
local_pathstr: The local file path.
datumstr, optional: A tag for the added file.
appendbool, optional: If true, appends the content of local_path to the file at pfs_path, if it already exists. Otherwise, overwrites the file.

put_file_from_url(path, url, datum=None, append=False, recursive=False, concurrency=0)[source]

Uploads a PFS File from the content found at a URL. The URL is sent to the server which performs the request.

Parameters

pathstr: The path in the repo the file will be written to.
urlstr: The URL of the file to upload.
datumstr, optional: A tag for the added file.
appendbool, optional: If true, appends the content to the file at path, if it already exists. Otherwise, overwrites the file.
recursivebool, optional: If true, allows for recursive scraping on some types URLs, for example on s3:// URLs
concurrencyint: The maximum number of threads used to complete the request. Defaults to 50.

class PFSFile(stream)[source]

File-like objects containing content of a file stored in PFS.

Examples

>>> # client.get_file() returns a PFSFile
>>> source_file = client.get_file(("montage", "master"), "/montage.png")
>>> with open("montage.png", "wb") as dest_file:
>>>     shutil.copyfileobj(source_file, dest_file)
...
>>> with client.get_file(("montage", "master"), "/montage2.png") as f:
>>>     content = f.read()

Methods

`close`()	Closes the `PFSFile`.
`read`([size])	Reads from the `PFSFile` buffer.

close()[source]: Closes the PFSFile.

read(size=- 1)[source]

Reads from the PFSFile buffer.

Parameters

sizeint, optional: If set, the number of bytes to read from the buffer.

Returns

bytes: Content from the stream.

class PFSMixin[source]

A mixin with pfs-related functionality.

Methods

`commit`(repo_name, branch[, parent, ...])	A context manager for running operations within a commit.
`copy_file`(source_commit, source_path, ...[, ...])	Efficiently copies files already in PFS.
`create_branch`(repo_name, branch_name[, ...])	Creates a new branch.
`create_project`(project_name[, description, ...])	Creates a new project with the given name.
`create_repo`(repo_name[, description, ...])	Creates a new repo object in PFS with the given name.
`delete_all_repos`()	Deletes all repos.
`delete_branch`(repo_name, branch_name[, ...])	Deletes a branch, but leaves the commits themselves intact.
`delete_file`(commit, path)	Deletes a file from an open commit.
`delete_project`(project_name[, force])	Deletes a project and reclaims the storage space it was using.
`delete_repo`(repo_name[, force, project_name])	Deletes a repo and reclaims the storage space it was using.
`diff_file`(new_commit, new_path[, ...])	Diffs two PFS files (file = commit + path in Pachyderm) and returns files that are different.
`drop_commit`(commit_id)	Drops an entire commit.
`find_commits`(start, file_path[, limit])	Searches for commits that reference the specified file being modified in a branch.
`finish_commit`(commit[, description, error, ...])	Ends the process of committing data to a repo and persists the commit.
`fsck`([fix])	Performs a file system consistency check on PFS, ensuring the correct provenance relationships are satisfied.
`get_file`(commit, path[, datum, URL, offset])	Gets a file from PFS.
`get_file_tar`(commit, path[, datum, URL, offset])	Gets a file from PFS.
`glob_file`(commit, pattern[, path_range])	Lists files that match a glob pattern.
`inspect_branch`(repo_name, branch_name[, ...])	Inspects a branch.
`inspect_commit`(commit[, commit_state])	Inspects a commit.
`inspect_file`(commit, path[, datum])	Inspects a file.
`inspect_project`(project_name)	Inspects a project.
`inspect_repo`(repo_name[, project_name])	Inspects a repo.
`list_branch`(repo_name[, reverse, project_name])	Lists the active branch objects in a repo.
`list_commit`([repo_name, to_commit, ...])	Lists commits.
`list_file`(commit, path[, datum, ...])	Lists the files in a directory.
`list_project`()	Lists all projects in PFS.
`list_repo`([type, projects_filter])	Lists all repos in PFS.
`modify_file_client`(commit)	A context manager that gives a `ModifyFileClient`.
`path_exists`(commit, path)	Checks whether the path exists in the specified commit, agnostic to whether path is a file or a directory.
`put_file_bytes`(commit, path, value[, datum, ...])	Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.
`put_file_url`(commit, path, url[, recursive, ...])	Uploads a PFS file using the content found at a URL.
`squash_commit`(commit_id)	Squashes a commit into its parent.
`start_commit`(repo_name, branch[, parent, ...])	Begins the process of committing data to a repo.
`subscribe_commit`(repo_name, branch[, ...])	Returns all commits on the branch and then listens for new commits that are created.
`wait_commit`(commit)	Waits for the specified commit to finish.
`walk_file`(commit, path[, datum, ...])	Walks over all descendant files in a directory.

commit(repo_name, branch, parent=None, description=None, project_name=None)[source]

A context manager for running operations within a commit.

Parameters

repo_namestr: The name of the repo.
branchstr: A string specifying the branch.
parentUnion[str, SubcommitType], optional: A commit specifying the parent of the newly created commit. Upon creation, before data is modified, the new commit will appear identical to the parent.
descriptionstr, optional: A description of the commit.
project_namestr: The name of the project.

Yields

pfs_pb2.Commit: A protobuf object that represents a commit.

Examples

>>> with client.commit("foo", "master") as c:
>>>     client.delete_file(c, "/dir/delete_me.txt")
>>>     client.put_file_bytes(c, "/new_file.txt", b"DATA")

copy_file(source_commit, source_path, dest_commit, dest_path, datum=None, append=False)[source]

Efficiently copies files already in PFS. Note that the destination repo cannot be an output repo, or the copy operation will silently fail.

Parameters

source_commitSubcommitType: The subcommit (commit at the repo-level) which holds the source file.
source_pathstr: The path of the source file.
dest_commitSubcommitType: The open subcommit (commit at the repo-level) to which to add the file.
dest_pathstr: The path of the destination file.
datumstr, optional: A tag for the added file.
appendbool, optional: If true, appends the content of source_path to the file at dest_path, if it already exists. Otherwise, overwrites the file.

Examples

Destination commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("bar", "master") as c:
>>>     client.copy_file(("foo", "master"), "/src/file.txt", c, "/file.txt")

create_branch(repo_name, branch_name, head_commit=None, provenance=None, trigger=None, new_commit=False, project_name=None)[source]

Creates a new branch.

Parameters

repo_namestr: The name of the repo.
branch_namestr: The name of the new branch.
head_commitSubcommitType, optional: A subcommit (commit at repo-level) indicating the head of the new branch.
provenanceList[pfs_pb2.Branch], optional: A list of branches to establish provenance with this newly created branch.
triggerpfs_pb2.Trigger, optional: Sets the conditions under which the head of this branch moves.
new_commitbool, optional: If true and head_commit is specified, uses a different commit ID for head than head_commit.
project_namestr: The name of the project.

Examples

>>> client.create_branch(
...     "bar",
...     "master",
...     provenance=[
...         pfs_pb2.Branch(
...             repo=pfs_pb2.Repo(name="foo", type="user"), name="master"
...         )
...     ]
... )

create_project(project_name, description=None, update=False)[source]

Creates a new project with the given name.

Parameters

project_namestr: Name of the project.
descriptionstr, optional: Description of the project.
updatebool, optional: Whether to update if the project already exists.

create_repo(repo_name, description=None, update=False, project_name=None)[source]

Creates a new repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single repo for an entire project you might have separate repos for logs, metrics, database dumps etc.

Parameters

repo_namestr: Name of the repo.
descriptionstr, optional: Description of the repo.
updatebool, optional: Whether to update if the repo already exists.
project_namestr: The name of the project.

delete_all_repos()[source]: Deletes all repos.

delete_branch(repo_name, branch_name, force=False, project_name=None)[source]

Deletes a branch, but leaves the commits themselves intact. In other words, those commits can still be accessed via commit IDs and other branches they happen to be on.

Parameters

repo_namestr: The name of the repo.
branch_namestr: The name of the branch.
forcebool, optional: If true, forces the branch deletion.
project_namestr: The name of the project.

delete_file(commit, path)[source]

Deletes a file from an open commit. This leaves a tombstone in the commit, assuming the file isn’t written to later while the commit is still open. Attempting to get the file from the finished commit will result in a not found error. The file will of course remain intact in the commit’s parent.

Parameters

commitSubcommitType: The open subcommit (commit at the repo-level) to delete a file from.
pathstr: The path to the file.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("bar", "master") as c:
>>>     client.delete_file(c, "/delete_me.txt")

delete_project(project_name, force=False)[source]

Deletes a project and reclaims the storage space it was using.

Parameters

project_namestr: The name of the project.
forcebool, optional: If set to true, the repo will be removed regardless of errors. Use with care.

delete_repo(repo_name, force=False, project_name=None)[source]

Deletes a repo and reclaims the storage space it was using.

Parameters

repo_namestr: The name of the repo.
forcebool, optional: If set to true, the repo will be removed regardless of errors. Use with care.
project_namestr: The name of the project.

diff_file(new_commit, new_path, old_commit=None, old_path=None, shallow=False)[source]

Diffs two PFS files (file = commit + path in Pachyderm) and returns files that are different. Similar to git diff.

If old_commit or old_path are not specified, old_commit will be set to the parent of new_commit and old_path will be set to new_path.

Parameters

new_commitSubcommitType: The newer subcommit (commit at the repo-level).
new_pathstr: The path in new_commit to compare with.
old_commitSubcommitType, optional: The older subcommit (commit at the repo-level).
old_pathstr, optional: The path in old_commit to compare with.
shallowbool, optional: Unused.

Returns

Iterator[pfs_pb2.DiffFileResponse]: An iterator of protobuf objects that contain info on files whose content has changed between commits. If a file under one of the paths is only in one commit, than the DiffFileResponse for it will only have one FileInfo set.

Examples

>>> # Compare files
>>> res = client.diff_file(
...     ("foo", "master"),
...     "/a/file.txt",
...     ("foo", "master~2"),
...     "/a/file.txt"
... )
>>> diff = next(res)
...
>>> # Compare files in directories
>>> res = client.diff_file(
...     ("foo", "master"),
...     "/a/",
...     ("foo", "master~2"),
...     "/a/"
... )
>>> diff = next(res)

drop_commit(commit_id)[source]

Drops an entire commit.

Parameters

commit_idstr: The ID of the commit.

find_commits(start, file_path, limit=0)[source]

Searches for commits that reference the specified file being modified in a branch.

Parameters

startSubcommitType: The commit where the search should begin.
file_pathstr: The path to the file being queried.
limit: int, optional: The number of matching commits to return. (default no limit)

finish_commit(commit, description=None, error=None, force=False)[source]

Ends the process of committing data to a repo and persists the commit. Once a commit is finished the data becomes immutable and future attempts to write to it with ModifyFile will error.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) object to close.
descriptionstr, optional: A description of the commit. It will overwrite the description set in start_commit().
errorstr, optional: If set, a message that errors out the commit. Don’t use unless you want the finish commit request to fail.
forcebool, optional: If true, forces commit to finish, even if it breaks provenance.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> client.start_commit("foo", "master")
>>> # modify open commit
>>> client.finish_commit(("foo", "master"))
...
>>> # same as above
>>> c = client.start_commit("foo", "master")
>>> # modify open commit
>>> client.finish_commit(c)

fsck(fix=False)[source]

Performs a file system consistency check on PFS, ensuring the correct provenance relationships are satisfied.

Parameters

fixbool, optional: If true, attempts to fix as many problems as possible.

Returns

Iterator[pfs_pb2.FsckResponse]: An iterator of protobuf objects that contain info on either what error was encountered (and was unable to be fixed, if fix is set to True) or a fix message (if fix is set to True).

Examples

>>> for action in client.fsck(True):
>>>     print(action)

get_file(commit, path, datum=None, URL=None, offset=0)[source]

Gets a file from PFS.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) to get the file from.
pathstr: The path of the file.
datumstr, optional: A tag that filters the files.
URLstr, optional: Specifies an object storage URL that the file will be uploaded to.
offsetint, optional: Allows file read to begin at offset number of bytes.

Returns

PFSFile: The contents of the file in a file-like object.

get_file_tar(commit, path, datum=None, URL=None, offset=0)[source]

Gets a file from PFS.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) to get the file from.
pathstr: The path of the file.
datumstr, optional: A tag that filters the files.
URLstr, optional: Specifies an object storage URL that the file will be uploaded to.
offsetint, optional: Allows file read to begin at offset number of bytes.

Returns

PFSFile: The contents of the file in a file-like object.

glob_file(commit, pattern, path_range=None)[source]

Lists files that match a glob pattern.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) to query against.
patternstr: A glob pattern.

Returns

Iterator[pfs_pb2.FileInfo]: An iterator of protobuf objects that contain info on files.

Examples

>>> files = list(client.glob_file(("foo", "master"), "/*.txt"))

inspect_branch(repo_name, branch_name, project_name=None)[source]

Inspects a branch.

Parameters

repo_namestr: The name of the repo.
branch_namestr: The name of the branch.
project_namestr: The name of the project.

Returns

pfs_pb2.BranchInfo: A protobuf object with info on a branch.

inspect_commit(commit, commit_state=1)[source]

Inspects a commit.

Parameters

commitUnion[str, SubcommitType]: The commit to inspect. Can either be a commit ID or a commit object that represents a subcommit (commit at the repo-level).
commit_state{pfs_pb2.CommitState.STARTED, pfs_pb2.CommitState.READY, pfs_pb2.CommitState.FINISHING, pfs_pb2.CommitState.FINISHED}, optional: An enum that causes the method to block until the commit is in the specified state. (Default value = pfs_pb2.CommitState.STARTED)

Returns

Iterator[pfs_pb2.CommitInfo]: An iterator of protobuf objects that contain info on a subcommit (commit at the repo-level).

Examples

>>> # commit at repo-level
>>> list(client.inspect_commit(("foo", "master~2")))
...
>>> # an entire commit
>>> for commit in client.inspect_commit("467c580611234cdb8cc9758c7aa96087", pfs_pb2.CommitState.FINISHED)
>>>     print(commit)

inspect_file(commit, path, datum=None)[source]

Inspects a file.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) to inspect the file from.
pathstr: The path of the file.
datumstr, optional: A tag that filters the files.

Returns

pfs_pb2.FileInfo: A protobuf object that contains info on a file.

inspect_project(project_name)[source]

Inspects a project.

Parameters

project_namestr: Name of the project.

Returns

pfs_proto.ProjectInfo: A protobuf object with info on the project.

inspect_repo(repo_name, project_name=None)[source]

Inspects a repo.

Parameters

repo_namestr: Name of the repo.
project_namestr: The name of the project.

Returns

pfs_pb2.RepoInfo: A protobuf object with info on the repo.

list_branch(repo_name, reverse=False, project_name=None)[source]

Lists the active branch objects in a repo.

Parameters

repo_namestr: The name of the repo.
reversebool, optional: If true, returns branches oldest to newest.
project_namestr: The name of the project.

Returns

Iterator[pfs_pb2.BranchInfo]: An iterator of protobuf objects that contain info on a branch.

Examples

>>> branches = list(client.list_branch("foo"))

list_commit(repo_name=None, to_commit=None, from_commit=None, number=None, reverse=False, all=False, origin_kind=1, started_time=None, project_name=None)[source]

Lists commits.

Parameters

repo_namestr, optional: The name of a repo. If set, returns subcommits (commit at repo-level) only in this repo.
to_commitSubcommitType, optional: A subcommit (commit at repo-level) that only impacts results if repo_name is specified. If set, only the ancestors of to_commit, including to_commit, are returned.
from_commitSubcommitType, optional: A subcommit (commit at repo-level) that only impacts results if repo_name is specified. If set, only the descendants of from_commit, including from_commit, are returned.
numberint, optional: The number of subcommits to return. If unset, all subcommits that matched the aforementioned criteria are returned. Only impacts results if repo_name is specified.
reversebool, optional: If true, returns the subcommits oldest to newest. Only impacts results if repo_name is specified.
allbool, optional: If true, returns all types of subcommits. Otherwise, alias subcommits are excluded. Only impacts results if repo_name is specified.
origin_kind{pfs_pb2.OriginKind.USER, pfs_pb2.OriginKind.AUTO, pfs_pb2.OriginKind.FSCK, pfs_pb2.OriginKind.ALIAS}, optional: An enum that specifies how a subcommit originated. Returns only subcommits of this enum type. Only impacts results if repo_name is specified.
started_timedatetime
project_namestr, optional: The name of the project containing the repo.

Returns

Union[Iterator[pfs_pb2.CommitInfo], Iterator[pfs_pb2.CommitSetInfo]]: An iterator of protobuf objects that either contain info on a subcommit (commit at the repo-level), if repo_name was specified, or a commit, if repo_name wasn’t specified.

Examples

>>> # all commits at repo-level
>>> for c in client.list_commit("foo"):
>>>     print(c)
...
>>> # all commits
>>> commits = list(client.list_commit())

list_file(commit, path, datum=None, pagination_marker=None, number=None, reverse=False)[source]

Lists the files in a directory.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) to list files from.
pathstr: The path to the directory.
datumstr, optional: A tag that filters the files.
pagination_marker:: Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.
numberint, optional: Number of files to return
reversebool, optional: If true, return files in reverse order

Returns

Iterator[pfs_pb2.FileInfo]: An iterator of protobuf objects that contain info on files.

Examples

>>> files = list(client.list_file(("foo", "master"), "/dir/subdir/"))

list_project()[source]

Lists all projects in PFS.

Returns

Iterator[pfs_proto.ProjectInfo]: An iterator of protobuf objects that contain info on a project.

list_repo(type='user', projects_filter=None)[source]

Lists all repos in PFS.

Parameters

typestr, optional: The type of repos that should be returned (“user”, “meta”, “spec”). If unset, returns all types of repos.
projects_filter[Project], optional: Filters out repos that do not belong in the list, while no projects means to list all repos

Returns

Iterator[pfs_pb2.RepoInfo]: An iterator of protobuf objects that contain info on a repo.

modify_file_client(commit)[source]

A context manager that gives a ModifyFileClient. When the context manager exits, any operations enqueued from the ModifyFileClient are executed in a single, atomic ModifyFile gRPC call.

Parameters

commitUnion[tuple, dict, Commit, pfs_pb2.Commit]: A subcommit (commit at the repo-level) to modify. If this subcommit is opened before modify_file_client() is called, it will remain open after. If modify_file_client() opens the subcommit, it will close when exiting the with scope.

Yields

ModifyFileClient: An object that can queue operations to modify a commit atomically.

Examples

On an open subcommit:

>>> c = client.start_commit("foo", "master")
>>> with client.modify_file_client(c) as mfc:
>>>     mfc.delete_file("/delete_me.txt")
>>>     mfc.put_file_from_url(
...         "/new_file.txt",
...         "https://example.com/data/train/input.txt"
...     )
>>> client.finish_commit(c)

Opening a subcommit:

>>> with client.modify_file_client(("foo", "master")) as mfc:
>>>     mfc.delete_file("/delete_me.txt")
>>>     mfc.put_file_from_url(
...         "/new_file.txt",
...         "https://example.com/data/train/input.txt"
...     )

path_exists(commit, path)[source]

Checks whether the path exists in the specified commit, agnostic to whether path is a file or a directory.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) to check in.
pathstr: The file or directory path in commit.

Returns

bool: Returns True if path exists in commit. Otherwise, returns False.

put_file_bytes(commit, path, value, datum=None, append=False)[source]

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

Parameters

commitSubcommitType: An open subcommit (commit at the repo-level) to modify.
pathstr: The path in the repo the file(s) will be written to.
valueUnion[bytes, BinaryIO]: The file contents as bytes, represented as a file-like object, bytestring, or iterator of bystrings.
datumstr, optional: A tag for the added file(s).
appendbool, optional: If true, appends the data to the file(s) specified at path, if they already exist. Otherwise, overwrites them.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("foo", "master") as c:
>>>     client.put_file_bytes(c, "/file.txt", b"SOME BYTES")

put_file_url(commit, path, url, recursive=False, datum=None, append=False, concurrency=0)[source]

Uploads a PFS file using the content found at a URL. The URL is sent to the server which performs the request.

Parameters

commitSubcommitType: An open subcommit (commit at the repo-level) to modify.
pathstr: The path in the repo the file(s) will be written to.
urlstr: The URL of the file to put.
recursivebool, optional: If true, allows for recursive scraping on some types URLs, for example on s3:// URLs
datumstr, optional: A tag for the added file(s).
appendbool, optional: If true, appends the data to the file(s) specified at path, if they already exist. Otherwise, overwrites them.
concurrencyint: The maximum number of threads used to complete the request. Defaults to 50.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("foo", "master") as c:
>>>     client.put_file_url(
...         c,
...         "/file.txt",
...         "https://example.com/data/train/input.txt"
...     )

squash_commit(commit_id)[source]

Squashes a commit into its parent.

Parameters

commit_idstr: The ID of the commit.

start_commit(repo_name, branch, parent=None, description=None, project_name=None)[source]

Begins the process of committing data to a repo. Once started you can write to the commit with ModifyFile. When all the data has been written, you must finish the commit with FinishCommit. NOTE: data is not persisted until FinishCommit is called.

Parameters

repo_namestr: The name of the repo.
branch_namestr: A string specifying the branch.
parentUnion[str, SubcommitType], optional: A commit specifying the parent of the newly created commit. Upon creation, before data is modified, the new commit will appear identical to the parent.
descriptionstr, optional: A description of the commit.
project_namestr: The name of the project.

Returns

pfs_pb2.Commit: A protobuf object that represents an open subcommit (commit at the repo-level).

Examples

>>> c = client.start_commit("foo", "master", ("foo", "staging"))

subscribe_commit(repo_name, branch, from_commit=None, state=1, all=False, origin_kind=1, project_name=None)[source]

Returns all commits on the branch and then listens for new commits that are created.

Parameters

repo_namestr: The name of the repo.
branchstr: The name of the branch.
from_commitUnion[str, SubcommitType], optional: Return commits only from this commit and onwards. Can either be an entire commit or a subcommit (commit at the repo-level).
state{pfs_pb2.CommitState.STARTED, pfs_pb2.CommitState.READY, pfs_pb2.CommitState.FINISHING, pfs_pb2.CommitState.FINISHED}, optional: Return commits only when they’re at least in the specifed enum state. (Default value = pfs_pb2.CommitState.STARTED)
allbool, optional: If true, returns all types of commits. Otherwise, alias commits are excluded.
origin_kind{pfs_pb2.OriginKind.USER, pfs_pb2.OriginKind.AUTO, pfs_pb2.OriginKind.FSCK, pfs_pb2.OriginKind.ALIAS}, optional: An enum that specifies how a commit originated. Returns only commits of this enum type. (Default value = pfs_pb2.OriginKind.USER)
project_namestr: The name of the project.

Returns

Iterator[pfs_pb2.CommitInfo]: An iterator of protobuf objects that contain info on subcommits (commits at the repo-level). Use next() to iterate through as the returned stream is potentially endless. Might block your code otherwise.

Examples

>>> commits = client.subscribe_commit("foo", "master", state=pfs_pb2.CommitState.FINISHED)
>>> c = next(commits)

wait_commit(commit)[source]

Waits for the specified commit to finish.

Parameters

commitUnion[str, SubcommitType]: A commit object to wait on. Can either be an entire commit or a subcommit (commit at the repo-level).

Returns

List[pfs_pb2.CommitInfo]: A list of protobuf objects that contain info on subcommits (commit at the repo-level). These are the individual subcommits this function waited on.

Examples

>>> # wait for an entire commit to finish
>>> subcommits = client.wait_commit("467c580611234cdb8cc9758c7aa96087")
...
>>> # wait for a commit to finish at a certain repo
>>> client.wait_commit(("foo", "master"))

walk_file(commit, path, datum=None, pagination_marker=None, number=None, reverse=False)[source]

Walks over all descendant files in a directory.

Parameters

commitSubcommitType: The subcommit (commit at the repo-level) to walk files in.
pathstr: The path to the directory.
datumstr, optional: A tag that filters the files.
pagination_marker:: Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.
numberint, optional: Number of files to return
reversebool, optional: If true, return files in reverse order
Returns
——-
Iterator[pfs_pb2.FileInfo]: An iterator of protobuf objects that contain info on files.

Examples

>>> files = list(client.walk_file(("foo", "master"), "/dir/subdir/"))

class PFSTarFile(name=None, mode='r', fileobj=None, format=None, tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, errors='surrogateescape', pax_headers=None, debug=None, errorlevel=None, copybufsize=None)[source]

Attributes

errors

Methods

`add`(name[, arcname, recursive, filter])	Add the file `name' to the archive. `name' may be any type of file (directory, fifo, symbolic link, etc.). If given, `arcname' specifies an alternative name for the file in the archive. Directories are added recursively by default. This can be avoided by setting `recursive' to False. `filter' is a function that expects a TarInfo object argument and returns the changed TarInfo object, if it returns None the TarInfo object will be excluded from the archive.
`addfile`(tarinfo[, fileobj])	Add the TarInfo object `tarinfo' to the archive. If `fileobj' is given, it should be a binary file, and tarinfo.size bytes are read from it and added to the archive. You can create TarInfo objects directly, or by using gettarinfo().
`bz2open`(name[, mode, fileobj, compresslevel])	Open bzip2 compressed tar archive name for reading or writing.
`chmod`(tarinfo, targetpath)	Set file permissions of targetpath according to tarinfo.
`chown`(tarinfo, targetpath, numeric_owner)	Set owner of targetpath according to tarinfo.
`close`()	Close the TarFile.
`extract`(member[, path, set_attrs, numeric_owner])	Extract a member from the archive to the current working directory, using its full name.
`extractall`([path, members, numeric_owner])	Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards.
`extractfile`(member)	Extract a member from the archive as a file object. `member' may be a filename or a TarInfo object. If `member' is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned.
`fileobject`	alias of `tarfile.ExFileObject`
`getmember`(name)	Return a TarInfo object for member `name'. If `name' can not be found in the archive, KeyError is raised. If a member occurs more than once in the archive, its last occurrence is assumed to be the most up-to-date version.
`getmembers`()	Return the members of the archive as a list of TarInfo objects.
`getnames`()	Return the members of the archive as a list of their names.
`gettarinfo`([name, arcname, fileobj])	Create a TarInfo object from the result of os.stat or equivalent on an existing file. The file is either named by `name', or specified as a file object `fileobj' with a file descriptor. If given, `arcname' specifies an alternative name for the file in the archive, otherwise, the name is taken from the 'name' attribute of 'fileobj', or the 'name' argument. The name should be a text string.
`gzopen`(name[, mode, fileobj, compresslevel])	Open gzip compressed tar archive name for reading or writing.
`list`([verbose, members])	Print a table of contents to sys.stdout. If `verbose' is False, only the names of the members are printed. If it is True, an `ls -l'-like output is produced. `members' is optional and must be a subset of the list returned by getmembers().
`makedev`(tarinfo, targetpath)	Make a character or block device called targetpath.
`makedir`(tarinfo, targetpath)	Make a directory called targetpath.
`makefifo`(tarinfo, targetpath)	Make a fifo called targetpath.
`makefile`(tarinfo, targetpath)	Make a file called targetpath.
`makelink`(tarinfo, targetpath)	Make a (symbolic) link called targetpath.
`makeunknown`(tarinfo, targetpath)	Make a file from a TarInfo object with an unknown type at targetpath.
`next`()	Return the next member of the archive as a TarInfo object, when TarFile is opened for reading.
`open`([name, mode, fileobj, bufsize])	Open a tar archive for reading, writing or appending.
`tarinfo`	alias of `tarfile.TarInfo`
`taropen`(name[, mode, fileobj])	Open uncompressed tar archive name for reading or writing.
`utime`(tarinfo, targetpath)	Set modification time of targetpath according to tarinfo.
`xzopen`(name[, mode, fileobj, preset])	Open lzma compressed tar archive name for reading or writing.

transaction_incompatible(pfs_method)[source]: Decorator for marking methods of the PFS API which are not allowed to occur during a transaction.

python_pachyderm.mixin.pps

class PPSMixin[source]

A mixin for pps-related functionality.

Methods

`create_pipeline`(pipeline_name, transform[, ...])	Creates a pipeline.
`create_pipeline_from_request`(req)	Creates a pipeline from a `CreatePipelineRequest` object.
`create_secret`(secret_name, data[, labels, ...])	Creates a new secret.
`delete_all_pipelines`()	Deletes all pipelines.
`delete_job`(job_id, pipeline_name[, project_name])	Deletes a subjob (job at the pipeline-level).
`delete_pipeline`(pipeline_name[, force, ...])	Deletes a pipeline.
`delete_secret`(secret_name)	Deletes a secret.
`get_job_logs`(pipeline_name, job_id[, ...])	Gets logs for a job.
`get_kube_events`(since)	Return a stream of Kubernetes events.
`get_pipeline_logs`(pipeline_name[, ...])	Gets logs for a pipeline.
`inspect_datum`(pipeline_name, job_id, datum_id)	Inspects a datum.
`inspect_job`(job_id[, pipeline_name, wait, ...])	Inspects a job.
`inspect_pipeline`(pipeline_name[, history, ...])
`inspect_secret`(secret_name)	Inspects a secret.
`list_datum`([pipeline_name, job_id, input, ...])	Lists datums.
`list_job`([pipeline_name, input_commit, ...])	Lists jobs.
`list_pipeline`([history, details, jqFilter, ...])
`list_secret`()	Lists secrets.
`query_loki`(query[, since])	Returns a stream of loki log messages given a query string.
`restart_datum`(pipeline_name, job_id[, ...])	Restarts a datum.
`run_cron`(pipeline_name[, project_name])	Triggers a cron pipeline to run now.
`start_pipeline`(pipeline_name[, project_name])	Starts a pipeline.
`stop_job`(job_id, pipeline_name[, reason, ...])	Stops a subjob (job at the pipeline-level).
`stop_pipeline`(pipeline_name[, project_name])	Stops a pipeline.

create_pipeline(pipeline_name, transform, project_name=None, parallelism_spec=None, egress=None, reprocess_spec=None, update=False, output_branch_name=None, s3_out=False, resource_requests=None, resource_limits=None, sidecar_resource_limits=None, input=None, description=None, reprocess=False, service=None, datum_set_spec=None, datum_timeout=None, job_timeout=None, salt=None, datum_tries=3, scheduling_spec=None, pod_patch=None, spout=None, spec_commit=None, metadata=None, autoscaling=False, tolerations=None, sidecar_resource_requests=None, dry_run=False, determined=None)[source]

Creates a pipeline.

For info on the params, please refer to the pipeline spec document: http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html

Parameters

pipeline_namestr: The pipeline name.
transformpps_pb2.Transform: The image and commands run during pipeline execution.
project_namestr: The name of the project.
parallelism_specpps_pb2.ParallelismSpec, optional: Specifies how the pipeline is parallelized.
egresspps_pb2.Egress, optional: An external data store to publish the results of the pipeline to.
reprocess_specstr, optional: Specifies how to handle already-processed datums.
updatebool, optional: If true, updates the existing pipeline with new args.
output_branch_namestr, optional: The branch name to output results on.
s3_outbool, optional: If true, the output repo is exposed as an S3 gateway bucket.
resource_requestspps_pb2.ResourceSpec, optional: The amount of resources that the pipeline workers will consume.
resource_limits: pps_pb2.ResourceSpec, optional: The upper threshold of allowed resources a given worker can consume. If a worker exceeds this value, it will be evicted.
sidecar_resource_limitspps_pb2.ResourceSpec, optional: The upper threshold of resources allocated to the sidecar containers.
inputpps_pb2.Input, optional: The input repos to the pipeline. Commits to these repos will automatically trigger the pipeline to create new jobs to process them.
descriptionstr, optional: A description of the pipeline.
reprocessbool, optional: If true, forces the pipeline to reprocess all datums. Only has meaning if update is True.
servicepps_pb2.Service, optional: Creates a Service pipeline instead of a normal pipeline.
datum_set_specpps_pb2.DatumSetSpec, optional: Specifies how a pipeline should split its datums into datum sets.
datum_timeoutduration_pb2.Duration, optional: The maximum execution time allowed for each datum.
job_timeoutduration_pb2.Duration, optional: The maximum execution time allowed for a job.
saltstr, optional: A tag for the pipeline.
datum_triesint, optional: The number of times a job attempts to run on a datum when a failure occurs.
scheduling_specpps_pb2.SchedulingSpec, optional: Specifies how the pods for a pipeline should be scheduled.
pod_patchstr, optional: Allows one to set fields in the pod spec that haven’t been explicitly exposed in the rest of the pipeline spec.
spoutpps_pb2.Spout, optional: Creates a Spout pipeline instead of a normal pipeline.
spec_commitpfs_pb2.Commit, optional: A spec commit to base the pipeline spec from.
metadatapps_pb2.Metadata, optional: Kubernetes labels and annotations to add as metadata to the pipeline pods.
autoscalingbool, optional: If true, automatically scales the worker pool based on the datums it has to process.
tolerations: List[pps_pb2.Toleration]: A list of Kubernetes tolerations to be applied to the worker pod.
sidecar_resource_requestspps_pb2.ResourceSpec, optional: The amount of resources that the sidecar containers will consume.

Notes

If creating a Spout pipeline, when committing data to the repo, use commit methods (client.commit(), client.start_commit(), etc.) or ModifyFileClient methods (mfc.put_file_from_bytes, mfc.delete_file(), etc.)

For other pipelines, when committing data to the repo, write out to /pfs/out/.

Examples

>>> client.create_pipeline(
...     "foo",
...     transform=pps_pb2.Transform(
...         cmd=["python3", "main.py"],
...         image="example/image",
...     ),
...     input=pps_pb2.Input(pfs=pps_pb2.PFSInput(
...         repo="foo",
...         branch="master",
...         glob="/*"
...     ))
... )

create_pipeline_from_request(req)[source]

Creates a pipeline from a CreatePipelineRequest object. Usually used in conjunction with util.parse_json_pipeline_spec() or util.parse_dict_pipeline_spec().

Parameters

reqpps_pb2.CreatePipelineRequest: The CreatePipelineRequest object.

create_secret(secret_name, data, labels=None, annotations=None)[source]

Creates a new secret.

Parameters

secret_namestr: The name of the secret.
dataDict[str, Union[str, bytes]]: The data to store in the secret. Each key must consist of alphanumeric characters -, _ or ..
labelsDict[str, str], optional: Kubernetes labels to attach to the secret.
annotationsDict[str, str], optional: Kubernetes annotations to attach to the secret.

delete_all_pipelines()[source]: Deletes all pipelines.

delete_job(job_id, pipeline_name, project_name=None)[source]

Deletes a subjob (job at the pipeline-level).

Parameters

job_idstr: The ID of the job.
pipeline_namestr: The name of the pipeline.
project_namestr: The name of the project.

delete_pipeline(pipeline_name, force=False, keep_repo=False, project_name=None)[source]

Deletes a pipeline.

Parameters

pipeline_namestr: The name of the pipeline.
forcebool, optional: If true, forces the pipeline deletion.
keep_repobool, optional: If true, keeps the output repo.
project_namestr: The name of the project.

delete_secret(secret_name)[source]

Deletes a secret.

Parameters

secret_namestr: The name of the secret.

get_job_logs(pipeline_name, job_id, project_name=None, data_filters=None, datum=None, follow=False, tail=0, use_loki_backend=False, since=None)[source]

Gets logs for a job.

Parameters

pipeline_namestr: The name of the pipeline.
job_idstr: The ID of the job.
project_namestr: The name of the project.
data_filtersList[str], optional: A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).
datumpps_pb2.Datum, optional: Filters log lines for the specified datum.
followbool, optional: If true, continue to follow new logs as they appear.
tailint, optional: If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.
use_loki_backendbool, optional: If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.
sinceduration_pb2.Duration, optional: Specifies how far in the past to return logs from.

Returns

Iterator[pps_pb2.LogMessage]: An iterator of protobuf objects that contain info on a log from a PPS worker. If follow is set to True, use next() to iterate through as the returned stream is potentially endless. Might block your code otherwise.

get_kube_events(since)[source]: Return a stream of Kubernetes events.

get_pipeline_logs(pipeline_name, project_name=None, data_filters=None, master=False, datum=None, follow=False, tail=0, use_loki_backend=False, since=None)[source]

Gets logs for a pipeline.

Parameters

pipeline_namestr: The name of the pipeline.
project_namestr: The name of the project.
data_filtersList[str], optional: A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).
masterbool, optional: If true, includes logs from the master
datumpps_pb2.Datum, optional: Filters log lines for the specified datum.
followbool, optional: If true, continue to follow new logs as they appear.
tailint, optional: If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.
use_loki_backendbool, optional: If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.
sinceduration_pb2.Duration, optional: Specifies how far in the past to return logs from.

Returns

Iterator[pps_pb2.LogMessage]: An iterator of protobuf objects that contain info on a log from a PPS worker. If follow is set to True, use next() to iterate through as the returned stream is potentially endless. Might block your code otherwise.

inspect_datum(pipeline_name, job_id, datum_id, project_name=None)[source]

Inspects a datum.

Parameters

pipeline_namestr: The name of the pipeline.
job_idstr: The ID of the job.
datum_idstr: The ID of the datum.
project_namestr: The name of the project.

Returns

pps_pb2.DatumInfo: A protobuf object with info on the datum.

inspect_job(job_id, pipeline_name=None, wait=False, details=False, project_name=None)[source]

Inspects a job.

Parameters

job_idstr: The ID of the job.
pipeline_namestr, optional: The name of a pipeline.
waitbool, optional: If true, wait until the job completes.
detailsbool, optional: If true, return worker details.
project_namestr: The name of the project.

Returns

Iterator[pps_pb2.JobInfo]: An iterator of protobuf objects that contain info on a subjob (jobs at the pipeline-level).

Examples

>>> # Look at all subjobs in a job
>>> subjobs = list(client.inspect_job("467c580611234cdb8cc9758c7aa96087"))
...
>>> # Look at single subjob (job at the pipeline-level)
>>> subjob = list(client.inspect_job("467c580611234cdb8cc9758c7aa96087", "foo"))[0]

inspect_pipeline(pipeline_name, history=0, details=False, project_name=None)[source]

Inspects a pipeline.

Parameters

pipeline_namestr

The name of the pipeline.

historyint, optional

Indicates to return historical versions of pipeline_name. Semantics are:

0: Return current version of pipeline_name
1: Return the above and pipeline_name from the next most recent version.
2: etc.
-1: Return all historical versions of pipeline_name.

detailsbool, optional

If true, return pipeline details.

project_namestr

The name of the project.

Returns

Iterator[pps_pb2.PipelineInfo]: An iterator of protobuf objects that contain info on a pipeline.

Examples

>>> pipeline = next(client.inspect_pipeline("foo"))
...
>>> for p in client.inspect_pipeline("foo", 2):
>>>     print(p)

inspect_secret(secret_name)[source]

Inspects a secret.

Parameters

secret_namestr: The name of the secret.

Returns

pps_pb2.SecretInfo: A protobuf object with info on the secret.

list_datum(pipeline_name=None, job_id=None, input=None, project_name=None, datum_filter=None, pagination_marker=None, number=None, reverse=False)[source]

Lists datums. Exactly one of (pipeline_name, job_id) (real) or input (hypothetical) must be set.

Parameters

pipeline_namestr, optional: The name of the pipeline.
job_idstr, optional: The ID of a job.
inputpps_pb2.Input, optional: A protobuf object that filters the datums returned. The datums listed are ones that would be run if a pipeline was created with the provided input.
project_namestr: The name of the project.
datum_filter: pps_proto.ListDatumRequest.adFilter: Filter restricts returned DatumInfo messages to those which match all the filtered attributes.
pagination_marker:: Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.
numberint, optional: Number of files to return
reversebool, optional: If true, return files in reverse order

Returns

Iterator[pps_pb2.DatumInfo]: An iterator of protobuf objects that contain info on a datum.

Examples

>>> # See hypothetical datums with specified input cross
>>> datums = list(client.list_datum(input=pps_pb2.Input(
...     pfs=pps_pb2.PFSInput(repo="foo", branch="master", glob="/*"),
...     cross=[
...         pps_pb2.Input(pfs=pps_pb2.PFSInput(repo="bar", branch="master", glob="/")),
...         pps_pb2.Input(pfs=pps_pb2.PFSInput(repo="baz", branch="master", glob="/*/*")),
...     ]
... )))

list_job(pipeline_name=None, input_commit=None, history=0, details=False, jqFilter=None, project_name=None, projects_filter=None, pagination_marker=None, number=None, reverse=False)[source]

Lists jobs.

Parameters

pipeline_namestr, optional

The name of a pipeline. If set, returns subjobs (job at the pipeline-level) only from this pipeline.

input_commitSubcommitType, optional

A commit or list of commits from the input repo to filter jobs on. Only impacts returned results if pipeline_name is specified.

historyint, optional

Indicates to return jobs from historical versions of pipeline_name. Semantics are:

0: Return jobs from the current version of pipeline_name
1: Return the above and jobs from the next most recent version
2: etc.
-1: Return jobs from all historical versions of pipeline_name

detailsbool, optional

If true, return pipeline details for pipeline_name. Leaving this None (or False) can make the call significantly faster in clusters with a large number of pipelines and jobs. Note that if input_commit is valid, this field is coerced to True.

jqFilterstr, optional

A jq filter that can filter the list of jobs returned, only if pipeline_name is provided.

project_namestr, optional

The name of the project containing the pipeline.

projects_filter: List[str], optional

A list of projects to filter jobs on, None means don’t filter.

pagination_marker:

Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.

numberint, optional

Number of files to return

reversebool, optional

If true, return files in reverse order

Returns

Union[Iterator[pps_pb2.JobInfo], Iterator[pps_pb2.JobSetInfo]]: An iterator of protobuf objects that either contain info on a subjob (job at the pipeline-level), if pipeline_name was specified, or a job, if pipeline_name wasn’t specified.

Examples

>>> # List all jobs
>>> jobs = list(client.list_job())
...
>>> # List all jobs at a pipeline-level
>>> subjobs = list(client.list_job("foo"))

list_pipeline(history=0, details=False, jqFilter=None, commit_set=None, projects_filter=None)[source]

Lists pipelines.

Parameters

historyint, optional

Indicates to return historical versions of pipeline_name. Semantics are:

0: Return current version of pipeline_name
1: Return the above and pipeline_name from the next most recent version.
2: etc.
-1: Return all historical versions of pipeline_name.

detailsbool, optional

If true, return pipeline details.

jqFilterstr, optional

A jq filter that can filter the list of pipelines returned.

commit_setpfs_pb2.CommitSet, optional

If non-nil, will return all the pipeline infos at this commit set

projects_filter: List[str], optional

A list of projects to filter jobs on, None means don’t filter.

Returns

Iterator[pps_pb2.PipelineInfo]: An iterator of protobuf objects that contain info on a pipeline.

Examples

>>> pipelines = list(client.list_pipeline())

list_secret()[source]

Lists secrets.

Returns

List[pps_pb2.SecretInfo]: A list of protobuf objects that contain info on a secret.

query_loki(query, since=None)[source]

Returns a stream of loki log messages given a query string.

Parameters

querystr: The Loki query.
sinceduration_pb2.Duration, optional: Return log messages more recent than “since”. (default now)

restart_datum(pipeline_name, job_id, data_filters=None, project_name=None)[source]

Restarts a datum.

Parameters

pipeline_namestr: The name of the pipeline.
job_idstr: The ID of the job.
data_filtersList[str], optional: A list of paths or hashes of datums that filter which datums are restarted.
project_namestr: The name of the project.

run_cron(pipeline_name, project_name=None)[source]

Triggers a cron pipeline to run now.

For more info on cron pipelines: https://docs.pachyderm.com/latest/concepts/pipeline-concepts/pipeline/cron/

Parameters

pipeline_namestr: The name of the pipeline.
project_namestr: The name of the project.

start_pipeline(pipeline_name, project_name=None)[source]

Starts a pipeline.

Parameters

pipeline_namestr: The name of the pipeline.
project_namestr: The name of the project.

stop_job(job_id, pipeline_name, reason=None, project_name=None)[source]

Stops a subjob (job at the pipeline-level).

Parameters

job_idstr: The ID of the job.
pipeline_namestr: The name of the pipeline.
reasonstr, optional: A reason for stopping the job.
project_namestr: The name of the project.

stop_pipeline(pipeline_name, project_name=None)[source]

Stops a pipeline.

Parameters

pipeline_namestr: The name of the pipeline.
project_namestr: The name of the project.

python_pachyderm.mixin.transaction

class TransactionMixin[source]

A mixin for transaction-related functionality.

Methods

`batch_transaction`(requests)	Executes a batch transaction.
`delete_all_transactions`()	Deletes all transactions.
`delete_transaction`(transaction)	Deletes a transaction.
`finish_transaction`(transaction)	Finishes a transaction.
`inspect_transaction`(transaction)	Inspects a transaction.
`list_transaction`()	Lists unfinished transactions.
`start_transaction`()	Starts a transaction.
`transaction`()	A context manager for running operations within a transaction.

batch_transaction(requests)[source]

Executes a batch transaction.

Parameters

requestsList[transaction_pb2.TransactionRequest]: A list of TransactionRequest protobufs. Each protobuf must only have one field set.

Returns

transaction_pb2.TransactionInfo: A protobuf object with info on the transaction.

Examples

>>> # Deletes one repo and creates a branch in another repo atomically
>>> client.batch_transaction([
    transaction_pb2.TransactionRequest(delete_repo=pfs_pb2.DeleteRepoRequest(repo=pfs_pb2.Repo(name="foo"))),
    transaction_pb2.TransactionRequest(create_branch=pfs_pb2.CreateBranchRequest(branch=pfs_pb2.Branch(
        repo=pfs_pb2.Repo(name="bar", type="user"), name="staging"
    )))
])

delete_all_transactions()[source]: Deletes all transactions.

delete_transaction(transaction)[source]

Deletes a transaction.

Parameters

transactionUnion[str, transaction_pb2.Transaction]: The ID or protobuf object representing the transaction.

Examples

>>> client.delete_transaction("6fe754facd9c41e99d04e1037e3be9ee")
...
>>> transaction = client.finish_transaction("a3ak09467c580611234cdb8cc9758c7a")
>>> client.delete_transaction(transaction)

finish_transaction(transaction)[source]

Finishes a transaction.

Parameters

transactionUnion[str, transaction_pb2.Transaction]: The ID or protobuf object representing the transaction.

Returns

transaction_pb2.TransactionInfo: A protobuf object with info on the transaction.

Examples

>>> transaction = client.start_transaction()
>>> # do stuff
>>> client.finish_transaction(transaction)

inspect_transaction(transaction)[source]

Inspects a transaction.

Parameters

transactionUnion[str, transaction_pb2.Transaction]: The ID or protobuf object representing the transaction.

Returns

transaction_pb2.TransactionInfo: A protobuf object with info on the transaction.

Examples

>>> transaction = client.inspect_transaction("6fe754facd9c41e99d04e1037e3be9ee")
...
>>> transaction = client.inspect_transaction(transaction_protobuf)

list_transaction()[source]

Lists unfinished transactions.

Returns

List[transaction_pb2.TransactionInfo]: A list of protobuf objects that contain info on a transaction.

start_transaction()[source]

Starts a transaction.

Returns

transaction_pb2.Transaction: A protobuf object that represents the transaction.

Examples

>>> transaction = client.start_transaction()
>>> # do stuff
>>> client.finish_transaction(transaction)

transaction()[source]

A context manager for running operations within a transaction. When the context manager completes, the transaction will be deleted if an error occurred, or otherwise finished.

Yields

transaction_pb2.Transaction: A protobuf object that represents a transaction.

Examples

If a pipeline has two input repos, foo and bar, a transaction is useful for adding data to both atomically before the pipeline runs even once.

>>> with client.transaction() as t:
>>>     c1 = client.start_commit("foo", "master")
>>>     c2 = client.start_commit("bar", "master")
>>>
>>> # File operations, such as put file, must occur outside
>>> #  the transaction block.
>>> client.put_file_bytes(c1, "/joint_data.txt", b"DATA1")
>>> client.put_file_bytes(c2, "/joint_data.txt", b"DATA2")
>>>
>>> client.finish_commit(c1)
>>> client.finish_commit(c2)

python_pachyderm.mixin.version

class VersionMixin[source]

A mixin for version-related functionality.

Methods

get_remote_version()

Gets version of Pachyderm server.

get_remote_version()[source]

Gets version of Pachyderm server.

Returns

version_pb2.Version: A protobuf object with info on the pachd version.