Information

Exposes a mixin for each pachyderm service. These mixins should not be used directly; instead, you should use python_pachyderm.Client(). The mixins exist exclusively in order to provide better code organization (because we have several mixins, rather than one giant Client class.)

python_pachyderm.mixin.admin

class AdminMixin[source]

A mixin for admin-related functionality.

Methods

inspect_cluster()

Inspects a cluster.

inspect_cluster()[source]

Inspects a cluster.

Returns
admin_pb2.ClusterInfo

A protobuf object with info on the cluster.

python_pachyderm.mixin.auth

class AuthMixin[source]

A mixin for auth-related functionality.

Methods

activate_auth([root_token])

Activates auth on the cluster.

authenticate_id_token(id_token)

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider.

authenticate_oidc(oidc_state)

Authenticates a user to the Pachyderm cluster via OIDC.

authorize(resource[, permissions])

Tests a list of permissions that the user might have on a resource.

deactivate_auth()

Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

extract_auth_tokens()

This maps to an internal function that is only used for migration.

get_auth_configuration()

Gets the auth configuration.

get_groups()

Gets a list of groups this user belongs to.

get_oidc_login()

Gets the OIDC login configuration.

get_robot_token(robot[, ttl])

Gets a new auth token for a robot user.

get_role_binding(resource)

Returns the current set of role bindings to the resource specified.

get_roles_for_permission(permission)

Returns a list of all roles that have the specified permission.

get_users(group)

Gets users in a group.

modify_members(group[, add, remove])

Adds and/or removes members of a group.

modify_role_binding(resource, principal[, roles])

Sets the roles for a given principal on a resource.

restore_auth_token([token])

This maps to an internal function that is only used for migration.

revoke_auth_token(token)

Revokes an auth token.

set_auth_configuration(configuration)

Sets the auth configuration.

set_groups_for_user(username, groups)

Sets the group membership for a user.

who_am_i()

Returns info about the user tied to this Client.

activate_auth(root_token=None)[source]

Activates auth on the cluster. Returns the root token, an irrevocable superuser credential that should be stored securely.

Parameters
root_tokenstr, optional

If set, the token used as the root user login token. In general, it is safer to not set and let Pachyderm generate one for you.

Returns
str

A token used as the root user login token.

authenticate_id_token(id_token)[source]

Authenticates a user to the Pachyderm cluster using an ID token issued by the OIDC provider. The token must include the Pachyderm client_id in the set of audiences to be valid.

Parameters
id_tokenstr

The ID token.

Returns
str

A token that can be used for making authenticate requests.

authenticate_oidc(oidc_state)[source]

Authenticates a user to the Pachyderm cluster via OIDC.

Parameters
oidc_statestr

An OIDC state token.

Returns
str

A token that can be used for making authenticate requests.

authorize(resource, permissions=None)[source]

Tests a list of permissions that the user might have on a resource.

Parameters
resourceauth_pb2.Resource

The resource the user wants to test on.

permissionsList[auth_pb2.Permission], optional

The list of permissions the users wants to test.

Returns
auth_pb2.AuthorizeResponse

A protobuf object that indicates whether the user/principal had all the inputted permissions on the resource, which permissions the user had, which permissions the user lacked, and the name of the principal.

Examples

>>> client.authorize(
...     auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"),
...     [auth_pb2.Permission.REPO_READ]
... )
authorized: true
satisfied: REPO_READ
principal: "pach:root"
deactivate_auth()[source]

Deactivates auth, removing all ACLs, tokens, and admins from the Pachyderm cluster and making all data publicly accessible.

Raises
AuthServiceNotActivated
extract_auth_tokens()[source]

This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

get_auth_configuration()[source]

Gets the auth configuration.

Returns
auth_pb2.OIDCConfig

A protobuf object with auth configuration information.

get_groups()[source]

Gets a list of groups this user belongs to.

Returns
List[str]

List of groups the user belongs to.

get_oidc_login()[source]

Gets the OIDC login configuration.

Returns
auth_pb2.GetOIDCLoginResponse

A protobuf object with the login configuration information.

get_robot_token(robot, ttl=None)[source]

Gets a new auth token for a robot user.

Parameters
robotstr

The name of the robot user.

ttlint, optional

The remaining lifetime of this token in seconds. If unset, token doesn’t expire.

Returns
str

The new auth token.

get_role_binding(resource)[source]

Returns the current set of role bindings to the resource specified.

Parameters
resourceauth_pb2.Resource

A protobuf object representing the resource being checked.

Returns
Dict[str, auth_pb2.Roles]

A dictionary mapping the principals to the roles they have.

Examples

>>> client.get_role_binding(auth_pb2.Resource(type=auth_pb2.CLUSTER))
{
    'robot:someuser': roles {
        key: "clusterAdmin"
        value: true
    },
    'pach:root': roles {
        key: "clusterAdmin"
        value: true
    }
}
...
>>> client.get_role_binding(auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"))
{
    'user:person_a': roles {
        key: "repoWriter"
        value: true
    },
    'pach:root': roles {
        key: "repoOwner"
        value: true
    }
}
get_roles_for_permission(permission)[source]

Returns a list of all roles that have the specified permission.

Parameters
permissionauth_pb2.Permission

The Permission enum to check for.

Returns
List[auth_pb2.Role]

A list of Role protobuf objects that all have the specified permission.

Examples

All available permissions can be found in auth proto Permissions enum

>>> roles = client.get_roles_for_permission(auth_pb2.Permission.REPO_READ)
get_users(group)[source]

Gets users in a group.

Parameters
groupstr

The group to list users from.

Returns
List[str]

All the users in the specified group.

modify_members(group, add=None, remove=None)[source]

Adds and/or removes members of a group.

Parameters
groupstr

The group to modify.

addList[str], optional

A list of strings specifying members to add.

removeList[str], optional

A list of strings specifying members to remove.

Examples

>>> client.modify_members(
...     "foogroup",
...     add=["user:otheruser"],
...     remove=["user:someuser"]
... )
modify_role_binding(resource, principal, roles=None)[source]

Sets the roles for a given principal on a resource.

Parameters
resourceauth_pb2.Resource

A protobuf object representing the resource to grant the roles on.

principalstr

The principal to grant the roles for.

rolesList[str], optional

The absolute list of roles for the principtal to have. If roles is unset, the principal will have no roles.

Examples

You can find some of the built-in roles here: https://github.com/pachyderm/pachyderm/blob/master/src/auth/auth.go#L27

>>> client.modify_role_binding(
...     auth_pb2.Resource(type=auth_pb2.CLUSTER),
...     "user:someuser",
...     roles=["clusterAdmin"]
... )
>>> client.modify_role_binding(
...     auth_pb2.Resource(type=auth_pb2.REPO, name="foobar"),
...     "user:someuser",
...     roles=["repoWriter"]
... )
restore_auth_token(token=None)[source]

This maps to an internal function that is only used for migration. Pachyderm’s extract and restore functionality calls extract_auth_tokens and restore_auth_tokens to move Pachyderm tokens between clusters during migration. Currently this function is only used for Pachyderm internals, so we’re avoiding support for this function in python-pachyderm client until we find a use for it (feel free to file an issue in github.com/pachyderm/pachyderm).

revoke_auth_token(token)[source]

Revokes an auth token.

Parameters
tokenstr

The Pachyderm token being revoked.

set_auth_configuration(configuration)[source]

Sets the auth configuration.

Parameters
configurationauth_pb2.OIDCConfig

A protobuf object with auth configuration information.

Examples

>>> client.set_auth_configuration(auth_pb2.OIDCConfig(
...     issuer="http://localhost:1658",
...     client_id="client",
...     client_secret="secret",
...     redirect_uri="http://test.example.com",
... ))
set_groups_for_user(username, groups)[source]

Sets the group membership for a user.

Parameters
usernamestr

The username to be added.

groupsList[str]

The groups to add the username to.

Examples

>>> client.set_groups_for_user("user:someuser", ["foogroup", "bargroup"])
who_am_i()[source]

Returns info about the user tied to this Client.

Returns
auth_pb2.WhoAmIResponse

A protobuf object that returns the username and expiration for the token used.

python_pachyderm.mixin.debug

class DebugMixin[source]

A mixin for debug-related functionality.

Methods

binary([filter])

Gets the pachd binary.

dump([system, pipelines, input_repos, timeout])

Collect a standard set of debugging information using the DumpV2 API

get_dump_template([filters])

Generate a template request to be used by the DumpV2 API.

profile_cpu(duration[, filter])

Gets a CPU profile.

set_log_level([pachyderm_level, grpc_level, ...])

Sets the logging level of either pachyderm or grpc. Note: Only one level can be set at a time. If you are attempting to set multiple logging levels you must do so with multiple calls.

binary(filter=None)[source]

Gets the pachd binary.

Parameters
filterdebug_pb2.Filter, optional

A protobuf object that filters what info is returned. Is one of pachd bool, pipeline protobuf, or worker protobuf.

Yields
bytes

The pachd binary as a sequence of bytearrays.

Examples

>>> for b in client.binary():
>>>     print(b)
dump(system=None, pipelines=None, input_repos=False, timeout=None)[source]
Collect a standard set of debugging information using the DumpV2 API

rather than the now deprecated Dump API.

This method is intended to be used in tandem with the

debug.get_dump_v2_template endpoint. However, if no system or pipelines are specified then this call will automatically be performed for the user.

If no system or pipelines are specified, then debug information for all

systems and pipelines will be returned.

Parameters
systemdebug_pb2.System, optional

A protobuf object that filters what info is returned.

pipelinesList[pps_pb2.Pipeline], optional

A list of pipelines from which to collect debug information.

input_reposbool

Whether to collect debug information for input repos. Default: False

timeoutduration_pb2.Duration, optional

Duration until timeout occurs. Default is no timeout.

Yields
debug_pb2.DumpChunk

Chunks of the debug dump

Examples

>>> for b in client.dump():
>>>     print(b)
get_dump_template(filters=None)[source]

Generate a template request to be used by the DumpV2 API.

Parameters
filtersList[str], optional

No supported filters - this argument has no effect.

Returns
debug_pb2.DumpV2Request

The request that can be sent to the DumpV2 API.

profile_cpu(duration, filter=None)[source]

Gets a CPU profile.

Parameters
durationduration_pb2.Duration

A google protobuf duration object indicating how long the profile should run for.

filterdebug_pb2.Filter, optional

A protobuf object that filters what info is returned. Is one of pachd bool, pipeline protobuf, or worker protobuf.

Yields
bytes

The cpu profile as a sequence of bytearrays.

Examples

>>> for b in client.profile_cpu(duration_pb2.Duration(seconds=1)):
>>>     print(b)
set_log_level(pachyderm_level=None, grpc_level=None, duration=None, recurse=True)[source]

Sets the logging level of either pachyderm or grpc. Note: Only one level can be set at a time. If you are attempting to set

multiple logging levels you must do so with multiple calls.

Parameters
pachyderm_level: debug_pb2.SetLogLevelRequest.LogLevel, oneof

The desired pachyderm logging level.

grpc_level: debug_pb2.SetLogLevelRequest.LogLevel, oneof

The desired grpc logging level.

duration: duration_pb2.Duration, optional

How long to log at the non-default level. (default 5m0s)

recurse: bool

Set the log level on all pachyderm pods; if false, only the pachd that handles this RPC. (default true)

python_pachyderm.mixin.enterprise

class EnterpriseMixin[source]

A mixin for enterprise-related functionality.

Methods

activate_enterprise(license_server, id, secret)

Activates enterprise by registering with a license server.

deactivate_enterprise()

Deactivates enterprise.

get_activation_code()

Returns the enterprise code used to activate Pachyderm Enterprise in this cluster.

get_enterprise_state()

Gets the current enterprise state of the cluster.

get_pause_status()

Gets the pause status of the cluster.

pause_enterprise()

Pauses the cluster.

unpause_enterprise()

Unpauses the cluster.

activate_enterprise(license_server, id, secret)[source]

Activates enterprise by registering with a license server.

Parameters
license_serverstr

The Pachyderm Enterprise Server to register with.

idstr

The unique ID for this cluster.

secretstr

The secret for registering this cluster.

deactivate_enterprise()[source]

Deactivates enterprise.

get_activation_code()[source]

Returns the enterprise code used to activate Pachyderm Enterprise in this cluster.

Returns
enterprise_pb2.GetActivationCodeResponse

A protobuf object that returns a state enum, info on the token, and the activation code.

get_enterprise_state()[source]

Gets the current enterprise state of the cluster.

Returns
enterprise_pb2.GetStateResponse

A protobuf object that returns a state enum, info on the token, and an empty activation code.

get_pause_status()[source]

Gets the pause status of the cluster.

Returns
enterprise_pb2.PauseStatusResponse

A protobuf object that returns a status enum.

pause_enterprise()[source]

Pauses the cluster.

unpause_enterprise()[source]

Unpauses the cluster.

python_pachyderm.mixin.health

class HealthMixin[source]

A mixin for health-related functionality.

Methods

health_check()

Returns a health check indicating if the server can handle RPCs.

health_check()[source]

Returns a health check indicating if the server can handle RPCs.

Returns
health_pb2.HealthCheckResponse

A protobuf object with a status enum indicating server health.

python_pachyderm.mixin.identity

class IdentityMixin[source]

A mixin for identity-related functionality.

Methods

create_idp_connector(connector)

Create an IDP connector in the identity server.

create_oidc_client(client)

Create an OIDC client in the identity server.

delete_all_identity()

Delete all identity service information.

delete_idp_connector(id)

Delete an IDP connector in the identity server.

delete_oidc_client(id)

Delete an OIDC client in the identity server.

get_identity_server_config()

Get the embedded identity server configuration.

get_idp_connector(id)

Get an IDP connector in the identity server.

get_oidc_client(id)

Get an OIDC client in the identity server.

list_idp_connectors()

List IDP connectors in the identity server.

list_oidc_clients()

List OIDC clients in the identity server.

set_identity_server_config(config)

Configure the embedded identity server.

update_idp_connector(connector)

Update an IDP connector in the identity server.

update_oidc_client(client)

Update an OIDC client in the identity server.

create_idp_connector(connector)[source]

Create an IDP connector in the identity server.

Parameters
connectoridentity_pb2.IDPConnector

A protobuf object that represents a connection to an identity provider.

create_oidc_client(client)[source]

Create an OIDC client in the identity server.

Parameters
clientidentity_pb2.OIDCClient

A protobuf object representing an OIDC client.

Returns
identity_pb2.OIDCClient

A protobuf object that returns a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.

delete_all_identity()[source]

Delete all identity service information.

Raises
AuthServiceNotActivated
delete_idp_connector(id)[source]

Delete an IDP connector in the identity server.

Parameters
idstr

The connector ID.

delete_oidc_client(id)[source]

Delete an OIDC client in the identity server.

Parameters
idstr

The client ID.

get_identity_server_config()[source]

Get the embedded identity server configuration.

Returns
identity_pb2.IdentityServerConfig

A protobuf object that holds configuration info (issuer and ID token expiry) for the identity web server.

get_idp_connector(id)[source]

Get an IDP connector in the identity server.

Parameters
idstr

The connector ID.

Returns
identity_pb2.IDPConnector

A protobuf object that returns info on the connector ID, name, type, config version, and configuration of the upstream IDP connector.

get_oidc_client(id)[source]

Get an OIDC client in the identity server.

Parameters
idstr

The client ID.

Returns
identity_pb2.OIDCClient

A protobuf object that returns a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.

list_idp_connectors()[source]

List IDP connectors in the identity server.

Returns
List[identity_pb2.IDPConnector]

A list of probotuf objects that return info on the connector ID, name, type, config version, and configuration of the upstream IDP connector.

list_oidc_clients()[source]

List OIDC clients in the identity server.

Returns
List[identity_pb2.OIDCClient]

A list of protobuf objects that return a client with info on the client ID, name, secret, and lists of redirect URIs and trusted peers.

set_identity_server_config(config)[source]

Configure the embedded identity server.

Parameters
configidentity_pb2.IdentityServerConfig

A protobuf object that is the configuration for the identity web server.

update_idp_connector(connector)[source]

Update an IDP connector in the identity server.

Parameters
connectoridentity_pb2.IDPConnector

A protobuf object that represents a connection to an identity provider.

update_oidc_client(client)[source]

Update an OIDC client in the identity server.

Parameters
clientidentity_pb2.OIDCClient

A protobuf object representing an OIDC client.

python_pachyderm.mixin.license

class LicenseMixin[source]

A mixin for license-related functionality.

Methods

activate_license(activation_code[, expires])

Activates the license service.

add_cluster(id, address[, secret, ...])

Register a cluster with the license service.

delete_all_license()

Remove all clusters and deactivate the license service.

delete_cluster(id)

Delete a cluster registered with the license service.

get_activation_code()

Gets the enterprise code used to activate the server.

list_clusters()

List clusters registered with the license service.

list_user_clusters()

Lists all clusters available to user.

update_cluster(id, address[, user_address, ...])

Update a cluster registered with the license service.

activate_license(activation_code, expires=None)[source]

Activates the license service.

Parameters
activation_codestr

A Pachyderm enterprise activation code. New users can obtain trial activation codes.

expirestimestamp_pb2.Timestamp, optional

A protobuf object indicating when this activation code will expire. This should generally not be set and is only applied if it is earlier than the signed expiration time of activation_code.

Returns
enterprise_pb2.TokenInfo

A protobuf object that has the expiration for the current token. Field will be unset if there is no current token.

add_cluster(id, address, secret=None, user_address=None, cluster_deployment_id=None, enterprise_server=False)[source]

Register a cluster with the license service.

Parameters
idstr

The unique ID to identify the cluster.

addressstr

The public GRPC address for the license server to reach the cluster.

secretstr, optional

A shared secret for the cluster to use to authenticate. If not specified, a random secret will be generated and returned.

user_addressstr, optional

The pachd address for users to connect to.

cluster_deployment_idstr, optional

The deployment ID value generated by each cluster.

enterprise_serverbool, optional

Indicates whether the address points to an enterprise server.

Returns
license_pb2.AddClusterResponse

A protobuf object that returns the secret.

delete_all_license()[source]

Remove all clusters and deactivate the license service.

Raises
AuthServiceNotActivated
delete_cluster(id)[source]

Delete a cluster registered with the license service.

Parameters
idstr

The unique ID to identify the cluster.

get_activation_code()[source]

Gets the enterprise code used to activate the server.

Returns
license_pb2.GetActivationCodeResponse

A protobuf object that returns a state enum, info on the token, and the activation code.

list_clusters()[source]

List clusters registered with the license service.

Returns
List[license_pb2.ClusterStatus]

A list of protobuf objects that return info on a cluster.

list_user_clusters()[source]

Lists all clusters available to user.

Returns
List[license_pb2.UserClusterInfo]

A list of protobuf objects that return info on clusters available to the users.

update_cluster(id, address, user_address=None, cluster_deployment_id=None, secret=None)[source]

Update a cluster registered with the license service.

Parameters
idstr

The unique ID to identify the cluster.

addressstr

The public GRPC address for the license server to reach the cluster.

user_addressstr, optional

The pachd address for users to connect to.

cluster_deployment_idstr, optional

The deployment ID value generated by each cluster.

secretstr, optional

A shared secret for the cluster to use to authenticate. If not specified, a random secret will be generated and returned.

python_pachyderm.mixin.pfs

class ModifyFileClient(commit)[source]

ModifyFileClient puts or deletes PFS files atomically. Replaces PutFileClient from python_pachyderm 6.x.

Methods

copy_file(source_commit, source_path, dest_path)

Copy a file.

delete_file(path[, datum])

Deletes a file.

put_file_from_bytes(path, value[, datum, append])

Uploads a PFS file from a bytestring.

put_file_from_fileobj(path, value[, datum, ...])

Uploads a PFS file from a file-like object.

put_file_from_filepath(pfs_path, local_path)

Uploads a PFS file from a local path at a specified path.

put_file_from_url(path, url[, datum, ...])

Uploads a PFS File from the content found at a URL.

copy_file(source_commit, source_path, dest_path, datum=None, append=False)[source]

Copy a file.

Parameters
source_commitSubcommitType

The commit the source file is in.

source_pathstr

The path to the source file.

dest_pathstr

The path to the destination file.

datumstr, optional

A tag for the added file.

appendbool, optional

If true, appends the content of the source file to the destination file, if it already exists. Otherwise, overwrites the file.

delete_file(path, datum=None)[source]

Deletes a file.

Parameters
pathstr

The path to the file.

datumstr, optional

A tag that filters the files.

put_file_from_bytes(path, value, datum=None, append=False)[source]

Uploads a PFS file from a bytestring.

Parameters
pathstr

The path in the repo the file will be written to.

valueBinaryIO

The file-like object.

datumstr, optional

A tag for the added file.

appendbool, optional

If true, appends the content of value to the file at path, if it already exists. Otherwise, overwrites the file.

put_file_from_fileobj(path, value, datum=None, append=False)[source]

Uploads a PFS file from a file-like object.

Parameters
pathstr

The path in the repo the file will be written to.

valueBinaryIO

The file-like object.

datumstr, optional

A tag for the added file.

appendbool, optional

If true, appends the content of value to the file at path, if it already exists. Otherwise, overwrites the file.

put_file_from_filepath(pfs_path, local_path, datum=None, append=False)[source]

Uploads a PFS file from a local path at a specified path. This will lazily open files, which will prevent too many files from being opened, or too much memory being consumed, when atomically putting many files.

Parameters
pfs_pathstr

The path in the repo the file will be written to.

local_pathstr

The local file path.

datumstr, optional

A tag for the added file.

appendbool, optional

If true, appends the content of local_path to the file at pfs_path, if it already exists. Otherwise, overwrites the file.

put_file_from_url(path, url, datum=None, append=False, recursive=False, concurrency=0)[source]

Uploads a PFS File from the content found at a URL. The URL is sent to the server which performs the request.

Parameters
pathstr

The path in the repo the file will be written to.

urlstr

The URL of the file to upload.

datumstr, optional

A tag for the added file.

appendbool, optional

If true, appends the content to the file at path, if it already exists. Otherwise, overwrites the file.

recursivebool, optional

If true, allows for recursive scraping on some types URLs, for example on s3:// URLs

concurrencyint

The maximum number of threads used to complete the request. Defaults to 50.

class PFSFile(stream)[source]

File-like objects containing content of a file stored in PFS.

Examples

>>> # client.get_file() returns a PFSFile
>>> source_file = client.get_file(("montage", "master"), "/montage.png")
>>> with open("montage.png", "wb") as dest_file:
>>>     shutil.copyfileobj(source_file, dest_file)
...
>>> with client.get_file(("montage", "master"), "/montage2.png") as f:
>>>     content = f.read()

Methods

close()

Closes the PFSFile.

read([size])

Reads from the PFSFile buffer.

close()[source]

Closes the PFSFile.

read(size=- 1)[source]

Reads from the PFSFile buffer.

Parameters
sizeint, optional

If set, the number of bytes to read from the buffer.

Returns
bytes

Content from the stream.

class PFSMixin[source]

A mixin with pfs-related functionality.

Methods

commit(repo_name, branch[, parent, ...])

A context manager for running operations within a commit.

copy_file(source_commit, source_path, ...[, ...])

Efficiently copies files already in PFS.

create_branch(repo_name, branch_name[, ...])

Creates a new branch.

create_project(project_name[, description, ...])

Creates a new project with the given name.

create_repo(repo_name[, description, ...])

Creates a new repo object in PFS with the given name.

delete_all_repos()

Deletes all repos.

delete_branch(repo_name, branch_name[, ...])

Deletes a branch, but leaves the commits themselves intact.

delete_file(commit, path)

Deletes a file from an open commit.

delete_project(project_name[, force])

Deletes a project and reclaims the storage space it was using.

delete_repo(repo_name[, force, project_name])

Deletes a repo and reclaims the storage space it was using.

diff_file(new_commit, new_path[, ...])

Diffs two PFS files (file = commit + path in Pachyderm) and returns files that are different.

drop_commit(commit_id)

Drops an entire commit.

find_commits(start, file_path[, limit])

Searches for commits that reference the specified file being modified in a branch.

finish_commit(commit[, description, error, ...])

Ends the process of committing data to a repo and persists the commit.

fsck([fix])

Performs a file system consistency check on PFS, ensuring the correct provenance relationships are satisfied.

get_file(commit, path[, datum, URL, offset])

Gets a file from PFS.

get_file_tar(commit, path[, datum, URL, offset])

Gets a file from PFS.

glob_file(commit, pattern[, path_range])

Lists files that match a glob pattern.

inspect_branch(repo_name, branch_name[, ...])

Inspects a branch.

inspect_commit(commit[, commit_state])

Inspects a commit.

inspect_file(commit, path[, datum])

Inspects a file.

inspect_project(project_name)

Inspects a project.

inspect_repo(repo_name[, project_name])

Inspects a repo.

list_branch(repo_name[, reverse, project_name])

Lists the active branch objects in a repo.

list_commit([repo_name, to_commit, ...])

Lists commits.

list_file(commit, path[, datum, ...])

Lists the files in a directory.

list_project()

Lists all projects in PFS.

list_repo([type, projects_filter])

Lists all repos in PFS.

modify_file_client(commit)

A context manager that gives a ModifyFileClient.

path_exists(commit, path)

Checks whether the path exists in the specified commit, agnostic to whether path is a file or a directory.

put_file_bytes(commit, path, value[, datum, ...])

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

put_file_url(commit, path, url[, recursive, ...])

Uploads a PFS file using the content found at a URL.

squash_commit(commit_id)

Squashes a commit into its parent.

start_commit(repo_name, branch[, parent, ...])

Begins the process of committing data to a repo.

subscribe_commit(repo_name, branch[, ...])

Returns all commits on the branch and then listens for new commits that are created.

wait_commit(commit)

Waits for the specified commit to finish.

walk_file(commit, path[, datum, ...])

Walks over all descendant files in a directory.

commit(repo_name, branch, parent=None, description=None, project_name=None)[source]

A context manager for running operations within a commit.

Parameters
repo_namestr

The name of the repo.

branchstr

A string specifying the branch.

parentUnion[str, SubcommitType], optional

A commit specifying the parent of the newly created commit. Upon creation, before data is modified, the new commit will appear identical to the parent.

descriptionstr, optional

A description of the commit.

project_namestr

The name of the project.

Yields
pfs_pb2.Commit

A protobuf object that represents a commit.

Examples

>>> with client.commit("foo", "master") as c:
>>>     client.delete_file(c, "/dir/delete_me.txt")
>>>     client.put_file_bytes(c, "/new_file.txt", b"DATA")
copy_file(source_commit, source_path, dest_commit, dest_path, datum=None, append=False)[source]

Efficiently copies files already in PFS. Note that the destination repo cannot be an output repo, or the copy operation will silently fail.

Parameters
source_commitSubcommitType

The subcommit (commit at the repo-level) which holds the source file.

source_pathstr

The path of the source file.

dest_commitSubcommitType

The open subcommit (commit at the repo-level) to which to add the file.

dest_pathstr

The path of the destination file.

datumstr, optional

A tag for the added file.

appendbool, optional

If true, appends the content of source_path to the file at dest_path, if it already exists. Otherwise, overwrites the file.

Examples

Destination commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("bar", "master") as c:
>>>     client.copy_file(("foo", "master"), "/src/file.txt", c, "/file.txt")
create_branch(repo_name, branch_name, head_commit=None, provenance=None, trigger=None, new_commit=False, project_name=None)[source]

Creates a new branch.

Parameters
repo_namestr

The name of the repo.

branch_namestr

The name of the new branch.

head_commitSubcommitType, optional

A subcommit (commit at repo-level) indicating the head of the new branch.

provenanceList[pfs_pb2.Branch], optional

A list of branches to establish provenance with this newly created branch.

triggerpfs_pb2.Trigger, optional

Sets the conditions under which the head of this branch moves.

new_commitbool, optional

If true and head_commit is specified, uses a different commit ID for head than head_commit.

project_namestr

The name of the project.

Examples

>>> client.create_branch(
...     "bar",
...     "master",
...     provenance=[
...         pfs_pb2.Branch(
...             repo=pfs_pb2.Repo(name="foo", type="user"), name="master"
...         )
...     ]
... )
create_project(project_name, description=None, update=False)[source]

Creates a new project with the given name.

Parameters
project_namestr

Name of the project.

descriptionstr, optional

Description of the project.

updatebool, optional

Whether to update if the project already exists.

create_repo(repo_name, description=None, update=False, project_name=None)[source]

Creates a new repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single repo for an entire project you might have separate repos for logs, metrics, database dumps etc.

Parameters
repo_namestr

Name of the repo.

descriptionstr, optional

Description of the repo.

updatebool, optional

Whether to update if the repo already exists.

project_namestr

The name of the project.

delete_all_repos()[source]

Deletes all repos.

delete_branch(repo_name, branch_name, force=False, project_name=None)[source]

Deletes a branch, but leaves the commits themselves intact. In other words, those commits can still be accessed via commit IDs and other branches they happen to be on.

Parameters
repo_namestr

The name of the repo.

branch_namestr

The name of the branch.

forcebool, optional

If true, forces the branch deletion.

project_namestr

The name of the project.

delete_file(commit, path)[source]

Deletes a file from an open commit. This leaves a tombstone in the commit, assuming the file isn’t written to later while the commit is still open. Attempting to get the file from the finished commit will result in a not found error. The file will of course remain intact in the commit’s parent.

Parameters
commitSubcommitType

The open subcommit (commit at the repo-level) to delete a file from.

pathstr

The path to the file.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("bar", "master") as c:
>>>     client.delete_file(c, "/delete_me.txt")
delete_project(project_name, force=False)[source]

Deletes a project and reclaims the storage space it was using.

Parameters
project_namestr

The name of the project.

forcebool, optional

If set to true, the repo will be removed regardless of errors. Use with care.

delete_repo(repo_name, force=False, project_name=None)[source]

Deletes a repo and reclaims the storage space it was using.

Parameters
repo_namestr

The name of the repo.

forcebool, optional

If set to true, the repo will be removed regardless of errors. Use with care.

project_namestr

The name of the project.

diff_file(new_commit, new_path, old_commit=None, old_path=None, shallow=False)[source]

Diffs two PFS files (file = commit + path in Pachyderm) and returns files that are different. Similar to git diff.

If old_commit or old_path are not specified, old_commit will be set to the parent of new_commit and old_path will be set to new_path.

Parameters
new_commitSubcommitType

The newer subcommit (commit at the repo-level).

new_pathstr

The path in new_commit to compare with.

old_commitSubcommitType, optional

The older subcommit (commit at the repo-level).

old_pathstr, optional

The path in old_commit to compare with.

shallowbool, optional

Unused.

Returns
Iterator[pfs_pb2.DiffFileResponse]

An iterator of protobuf objects that contain info on files whose content has changed between commits. If a file under one of the paths is only in one commit, than the DiffFileResponse for it will only have one FileInfo set.

Examples

>>> # Compare files
>>> res = client.diff_file(
...     ("foo", "master"),
...     "/a/file.txt",
...     ("foo", "master~2"),
...     "/a/file.txt"
... )
>>> diff = next(res)
...
>>> # Compare files in directories
>>> res = client.diff_file(
...     ("foo", "master"),
...     "/a/",
...     ("foo", "master~2"),
...     "/a/"
... )
>>> diff = next(res)
drop_commit(commit_id)[source]

Drops an entire commit.

Parameters
commit_idstr

The ID of the commit.

find_commits(start, file_path, limit=0)[source]

Searches for commits that reference the specified file being modified in a branch.

Parameters
startSubcommitType

The commit where the search should begin.

file_pathstr

The path to the file being queried.

limit: int, optional

The number of matching commits to return. (default no limit)

finish_commit(commit, description=None, error=None, force=False)[source]

Ends the process of committing data to a repo and persists the commit. Once a commit is finished the data becomes immutable and future attempts to write to it with ModifyFile will error.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) object to close.

descriptionstr, optional

A description of the commit. It will overwrite the description set in start_commit().

errorstr, optional

If set, a message that errors out the commit. Don’t use unless you want the finish commit request to fail.

forcebool, optional

If true, forces commit to finish, even if it breaks provenance.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> client.start_commit("foo", "master")
>>> # modify open commit
>>> client.finish_commit(("foo", "master"))
...
>>> # same as above
>>> c = client.start_commit("foo", "master")
>>> # modify open commit
>>> client.finish_commit(c)
fsck(fix=False)[source]

Performs a file system consistency check on PFS, ensuring the correct provenance relationships are satisfied.

Parameters
fixbool, optional

If true, attempts to fix as many problems as possible.

Returns
Iterator[pfs_pb2.FsckResponse]

An iterator of protobuf objects that contain info on either what error was encountered (and was unable to be fixed, if fix is set to True) or a fix message (if fix is set to True).

Examples

>>> for action in client.fsck(True):
>>>     print(action)
get_file(commit, path, datum=None, URL=None, offset=0)[source]

Gets a file from PFS.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) to get the file from.

pathstr

The path of the file.

datumstr, optional

A tag that filters the files.

URLstr, optional

Specifies an object storage URL that the file will be uploaded to.

offsetint, optional

Allows file read to begin at offset number of bytes.

Returns
PFSFile

The contents of the file in a file-like object.

get_file_tar(commit, path, datum=None, URL=None, offset=0)[source]

Gets a file from PFS.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) to get the file from.

pathstr

The path of the file.

datumstr, optional

A tag that filters the files.

URLstr, optional

Specifies an object storage URL that the file will be uploaded to.

offsetint, optional

Allows file read to begin at offset number of bytes.

Returns
PFSFile

The contents of the file in a file-like object.

glob_file(commit, pattern, path_range=None)[source]

Lists files that match a glob pattern.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) to query against.

patternstr

A glob pattern.

Returns
Iterator[pfs_pb2.FileInfo]

An iterator of protobuf objects that contain info on files.

Examples

>>> files = list(client.glob_file(("foo", "master"), "/*.txt"))
inspect_branch(repo_name, branch_name, project_name=None)[source]

Inspects a branch.

Parameters
repo_namestr

The name of the repo.

branch_namestr

The name of the branch.

project_namestr

The name of the project.

Returns
pfs_pb2.BranchInfo

A protobuf object with info on a branch.

inspect_commit(commit, commit_state=1)[source]

Inspects a commit.

Parameters
commitUnion[str, SubcommitType]

The commit to inspect. Can either be a commit ID or a commit object that represents a subcommit (commit at the repo-level).

commit_state{pfs_pb2.CommitState.STARTED, pfs_pb2.CommitState.READY, pfs_pb2.CommitState.FINISHING, pfs_pb2.CommitState.FINISHED}, optional

An enum that causes the method to block until the commit is in the specified state. (Default value = pfs_pb2.CommitState.STARTED)

Returns
Iterator[pfs_pb2.CommitInfo]

An iterator of protobuf objects that contain info on a subcommit (commit at the repo-level).

Examples

>>> # commit at repo-level
>>> list(client.inspect_commit(("foo", "master~2")))
...
>>> # an entire commit
>>> for commit in client.inspect_commit("467c580611234cdb8cc9758c7aa96087", pfs_pb2.CommitState.FINISHED)
>>>     print(commit)
inspect_file(commit, path, datum=None)[source]

Inspects a file.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) to inspect the file from.

pathstr

The path of the file.

datumstr, optional

A tag that filters the files.

Returns
pfs_pb2.FileInfo

A protobuf object that contains info on a file.

inspect_project(project_name)[source]

Inspects a project.

Parameters
project_namestr

Name of the project.

Returns
pfs_proto.ProjectInfo

A protobuf object with info on the project.

inspect_repo(repo_name, project_name=None)[source]

Inspects a repo.

Parameters
repo_namestr

Name of the repo.

project_namestr

The name of the project.

Returns
pfs_pb2.RepoInfo

A protobuf object with info on the repo.

list_branch(repo_name, reverse=False, project_name=None)[source]

Lists the active branch objects in a repo.

Parameters
repo_namestr

The name of the repo.

reversebool, optional

If true, returns branches oldest to newest.

project_namestr

The name of the project.

Returns
Iterator[pfs_pb2.BranchInfo]

An iterator of protobuf objects that contain info on a branch.

Examples

>>> branches = list(client.list_branch("foo"))
list_commit(repo_name=None, to_commit=None, from_commit=None, number=None, reverse=False, all=False, origin_kind=1, started_time=None, project_name=None)[source]

Lists commits.

Parameters
repo_namestr, optional

The name of a repo. If set, returns subcommits (commit at repo-level) only in this repo.

to_commitSubcommitType, optional

A subcommit (commit at repo-level) that only impacts results if repo_name is specified. If set, only the ancestors of to_commit, including to_commit, are returned.

from_commitSubcommitType, optional

A subcommit (commit at repo-level) that only impacts results if repo_name is specified. If set, only the descendants of from_commit, including from_commit, are returned.

numberint, optional

The number of subcommits to return. If unset, all subcommits that matched the aforementioned criteria are returned. Only impacts results if repo_name is specified.

reversebool, optional

If true, returns the subcommits oldest to newest. Only impacts results if repo_name is specified.

allbool, optional

If true, returns all types of subcommits. Otherwise, alias subcommits are excluded. Only impacts results if repo_name is specified.

origin_kind{pfs_pb2.OriginKind.USER, pfs_pb2.OriginKind.AUTO, pfs_pb2.OriginKind.FSCK, pfs_pb2.OriginKind.ALIAS}, optional

An enum that specifies how a subcommit originated. Returns only subcommits of this enum type. Only impacts results if repo_name is specified.

started_timedatetime
project_namestr, optional

The name of the project containing the repo.

Returns
Union[Iterator[pfs_pb2.CommitInfo], Iterator[pfs_pb2.CommitSetInfo]]

An iterator of protobuf objects that either contain info on a subcommit (commit at the repo-level), if repo_name was specified, or a commit, if repo_name wasn’t specified.

Examples

>>> # all commits at repo-level
>>> for c in client.list_commit("foo"):
>>>     print(c)
...
>>> # all commits
>>> commits = list(client.list_commit())
list_file(commit, path, datum=None, pagination_marker=None, number=None, reverse=False)[source]

Lists the files in a directory.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) to list files from.

pathstr

The path to the directory.

datumstr, optional

A tag that filters the files.

pagination_marker:

Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.

numberint, optional

Number of files to return

reversebool, optional

If true, return files in reverse order

Returns
Iterator[pfs_pb2.FileInfo]

An iterator of protobuf objects that contain info on files.

Examples

>>> files = list(client.list_file(("foo", "master"), "/dir/subdir/"))
list_project()[source]

Lists all projects in PFS.

Returns
Iterator[pfs_proto.ProjectInfo]

An iterator of protobuf objects that contain info on a project.

list_repo(type='user', projects_filter=None)[source]

Lists all repos in PFS.

Parameters
typestr, optional

The type of repos that should be returned (“user”, “meta”, “spec”). If unset, returns all types of repos.

projects_filter[Project], optional

Filters out repos that do not belong in the list, while no projects means to list all repos

Returns
Iterator[pfs_pb2.RepoInfo]

An iterator of protobuf objects that contain info on a repo.

modify_file_client(commit)[source]

A context manager that gives a ModifyFileClient. When the context manager exits, any operations enqueued from the ModifyFileClient are executed in a single, atomic ModifyFile gRPC call.

Parameters
commitUnion[tuple, dict, Commit, pfs_pb2.Commit]

A subcommit (commit at the repo-level) to modify. If this subcommit is opened before modify_file_client() is called, it will remain open after. If modify_file_client() opens the subcommit, it will close when exiting the with scope.

Yields
ModifyFileClient

An object that can queue operations to modify a commit atomically.

Examples

On an open subcommit:

>>> c = client.start_commit("foo", "master")
>>> with client.modify_file_client(c) as mfc:
>>>     mfc.delete_file("/delete_me.txt")
>>>     mfc.put_file_from_url(
...         "/new_file.txt",
...         "https://example.com/data/train/input.txt"
...     )
>>> client.finish_commit(c)

Opening a subcommit:

>>> with client.modify_file_client(("foo", "master")) as mfc:
>>>     mfc.delete_file("/delete_me.txt")
>>>     mfc.put_file_from_url(
...         "/new_file.txt",
...         "https://example.com/data/train/input.txt"
...     )
path_exists(commit, path)[source]

Checks whether the path exists in the specified commit, agnostic to whether path is a file or a directory.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) to check in.

pathstr

The file or directory path in commit.

Returns
bool

Returns True if path exists in commit. Otherwise, returns False.

put_file_bytes(commit, path, value, datum=None, append=False)[source]

Uploads a PFS file from a file-like object, bytestring, or iterator of bytestrings.

Parameters
commitSubcommitType

An open subcommit (commit at the repo-level) to modify.

pathstr

The path in the repo the file(s) will be written to.

valueUnion[bytes, BinaryIO]

The file contents as bytes, represented as a file-like object, bytestring, or iterator of bystrings.

datumstr, optional

A tag for the added file(s).

appendbool, optional

If true, appends the data to the file(s) specified at path, if they already exist. Otherwise, overwrites them.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("foo", "master") as c:
>>>     client.put_file_bytes(c, "/file.txt", b"SOME BYTES")
put_file_url(commit, path, url, recursive=False, datum=None, append=False, concurrency=0)[source]

Uploads a PFS file using the content found at a URL. The URL is sent to the server which performs the request.

Parameters
commitSubcommitType

An open subcommit (commit at the repo-level) to modify.

pathstr

The path in the repo the file(s) will be written to.

urlstr

The URL of the file to put.

recursivebool, optional

If true, allows for recursive scraping on some types URLs, for example on s3:// URLs

datumstr, optional

A tag for the added file(s).

appendbool, optional

If true, appends the data to the file(s) specified at path, if they already exist. Otherwise, overwrites them.

concurrencyint

The maximum number of threads used to complete the request. Defaults to 50.

Examples

Commit needs to be open still, either from the result of start_commit() or within scope of commit()

>>> with client.commit("foo", "master") as c:
>>>     client.put_file_url(
...         c,
...         "/file.txt",
...         "https://example.com/data/train/input.txt"
...     )
squash_commit(commit_id)[source]

Squashes a commit into its parent.

Parameters
commit_idstr

The ID of the commit.

start_commit(repo_name, branch, parent=None, description=None, project_name=None)[source]

Begins the process of committing data to a repo. Once started you can write to the commit with ModifyFile. When all the data has been written, you must finish the commit with FinishCommit. NOTE: data is not persisted until FinishCommit is called.

Parameters
repo_namestr

The name of the repo.

branch_namestr

A string specifying the branch.

parentUnion[str, SubcommitType], optional

A commit specifying the parent of the newly created commit. Upon creation, before data is modified, the new commit will appear identical to the parent.

descriptionstr, optional

A description of the commit.

project_namestr

The name of the project.

Returns
pfs_pb2.Commit

A protobuf object that represents an open subcommit (commit at the repo-level).

Examples

>>> c = client.start_commit("foo", "master", ("foo", "staging"))
subscribe_commit(repo_name, branch, from_commit=None, state=1, all=False, origin_kind=1, project_name=None)[source]

Returns all commits on the branch and then listens for new commits that are created.

Parameters
repo_namestr

The name of the repo.

branchstr

The name of the branch.

from_commitUnion[str, SubcommitType], optional

Return commits only from this commit and onwards. Can either be an entire commit or a subcommit (commit at the repo-level).

state{pfs_pb2.CommitState.STARTED, pfs_pb2.CommitState.READY, pfs_pb2.CommitState.FINISHING, pfs_pb2.CommitState.FINISHED}, optional

Return commits only when they’re at least in the specifed enum state. (Default value = pfs_pb2.CommitState.STARTED)

allbool, optional

If true, returns all types of commits. Otherwise, alias commits are excluded.

origin_kind{pfs_pb2.OriginKind.USER, pfs_pb2.OriginKind.AUTO, pfs_pb2.OriginKind.FSCK, pfs_pb2.OriginKind.ALIAS}, optional

An enum that specifies how a commit originated. Returns only commits of this enum type. (Default value = pfs_pb2.OriginKind.USER)

project_namestr

The name of the project.

Returns
Iterator[pfs_pb2.CommitInfo]

An iterator of protobuf objects that contain info on subcommits (commits at the repo-level). Use next() to iterate through as the returned stream is potentially endless. Might block your code otherwise.

Examples

>>> commits = client.subscribe_commit("foo", "master", state=pfs_pb2.CommitState.FINISHED)
>>> c = next(commits)
wait_commit(commit)[source]

Waits for the specified commit to finish.

Parameters
commitUnion[str, SubcommitType]

A commit object to wait on. Can either be an entire commit or a subcommit (commit at the repo-level).

Returns
List[pfs_pb2.CommitInfo]

A list of protobuf objects that contain info on subcommits (commit at the repo-level). These are the individual subcommits this function waited on.

Examples

>>> # wait for an entire commit to finish
>>> subcommits = client.wait_commit("467c580611234cdb8cc9758c7aa96087")
...
>>> # wait for a commit to finish at a certain repo
>>> client.wait_commit(("foo", "master"))
walk_file(commit, path, datum=None, pagination_marker=None, number=None, reverse=False)[source]

Walks over all descendant files in a directory.

Parameters
commitSubcommitType

The subcommit (commit at the repo-level) to walk files in.

pathstr

The path to the directory.

datumstr, optional

A tag that filters the files.

pagination_marker:

Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.

numberint, optional

Number of files to return

reversebool, optional

If true, return files in reverse order

Returns
——-
Iterator[pfs_pb2.FileInfo]

An iterator of protobuf objects that contain info on files.

Examples

>>> files = list(client.walk_file(("foo", "master"), "/dir/subdir/"))
class PFSTarFile(name=None, mode='r', fileobj=None, format=None, tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, errors='surrogateescape', pax_headers=None, debug=None, errorlevel=None, copybufsize=None)[source]
Attributes
errors

Methods

add(name[, arcname, recursive, filter])

Add the file `name' to the archive. `name' may be any type of file (directory, fifo, symbolic link, etc.). If given, `arcname' specifies an alternative name for the file in the archive. Directories are added recursively by default. This can be avoided by setting `recursive' to False. `filter' is a function that expects a TarInfo object argument and returns the changed TarInfo object, if it returns None the TarInfo object will be excluded from the archive.

addfile(tarinfo[, fileobj])

Add the TarInfo object `tarinfo' to the archive. If `fileobj' is given, it should be a binary file, and tarinfo.size bytes are read from it and added to the archive. You can create TarInfo objects directly, or by using gettarinfo().

bz2open(name[, mode, fileobj, compresslevel])

Open bzip2 compressed tar archive name for reading or writing.

chmod(tarinfo, targetpath)

Set file permissions of targetpath according to tarinfo.

chown(tarinfo, targetpath, numeric_owner)

Set owner of targetpath according to tarinfo.

close()

Close the TarFile.

extract(member[, path, set_attrs, numeric_owner])

Extract a member from the archive to the current working directory, using its full name.

extractall([path, members, numeric_owner])

Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards.

extractfile(member)

Extract a member from the archive as a file object. `member' may be a filename or a TarInfo object. If `member' is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned.

fileobject

alias of tarfile.ExFileObject

getmember(name)

Return a TarInfo object for member `name'. If `name' can not be found in the archive, KeyError is raised. If a member occurs more than once in the archive, its last occurrence is assumed to be the most up-to-date version.

getmembers()

Return the members of the archive as a list of TarInfo objects.

getnames()

Return the members of the archive as a list of their names.

gettarinfo([name, arcname, fileobj])

Create a TarInfo object from the result of os.stat or equivalent on an existing file. The file is either named by `name', or specified as a file object `fileobj' with a file descriptor. If given, `arcname' specifies an alternative name for the file in the archive, otherwise, the name is taken from the 'name' attribute of 'fileobj', or the 'name' argument. The name should be a text string.

gzopen(name[, mode, fileobj, compresslevel])

Open gzip compressed tar archive name for reading or writing.

list([verbose, members])

Print a table of contents to sys.stdout. If `verbose' is False, only the names of the members are printed. If it is True, an `ls -l'-like output is produced. `members' is optional and must be a subset of the list returned by getmembers().

makedev(tarinfo, targetpath)

Make a character or block device called targetpath.

makedir(tarinfo, targetpath)

Make a directory called targetpath.

makefifo(tarinfo, targetpath)

Make a fifo called targetpath.

makefile(tarinfo, targetpath)

Make a file called targetpath.

makelink(tarinfo, targetpath)

Make a (symbolic) link called targetpath.

makeunknown(tarinfo, targetpath)

Make a file from a TarInfo object with an unknown type at targetpath.

next()

Return the next member of the archive as a TarInfo object, when TarFile is opened for reading.

open([name, mode, fileobj, bufsize])

Open a tar archive for reading, writing or appending.

tarinfo

alias of tarfile.TarInfo

taropen(name[, mode, fileobj])

Open uncompressed tar archive name for reading or writing.

utime(tarinfo, targetpath)

Set modification time of targetpath according to tarinfo.

xzopen(name[, mode, fileobj, preset])

Open lzma compressed tar archive name for reading or writing.

transaction_incompatible(pfs_method)[source]

Decorator for marking methods of the PFS API which are not allowed to occur during a transaction.

python_pachyderm.mixin.pps

class PPSMixin[source]

A mixin for pps-related functionality.

Methods

create_pipeline(pipeline_name, transform[, ...])

Creates a pipeline.

create_pipeline_from_request(req)

Creates a pipeline from a CreatePipelineRequest object.

create_secret(secret_name, data[, labels, ...])

Creates a new secret.

delete_all_pipelines()

Deletes all pipelines.

delete_job(job_id, pipeline_name[, project_name])

Deletes a subjob (job at the pipeline-level).

delete_pipeline(pipeline_name[, force, ...])

Deletes a pipeline.

delete_secret(secret_name)

Deletes a secret.

get_job_logs(pipeline_name, job_id[, ...])

Gets logs for a job.

get_kube_events(since)

Return a stream of Kubernetes events.

get_pipeline_logs(pipeline_name[, ...])

Gets logs for a pipeline.

inspect_datum(pipeline_name, job_id, datum_id)

Inspects a datum.

inspect_job(job_id[, pipeline_name, wait, ...])

Inspects a job.

inspect_pipeline(pipeline_name[, history, ...])

inspect_secret(secret_name)

Inspects a secret.

list_datum([pipeline_name, job_id, input, ...])

Lists datums.

list_job([pipeline_name, input_commit, ...])

Lists jobs.

list_pipeline([history, details, jqFilter, ...])

list_secret()

Lists secrets.

query_loki(query[, since])

Returns a stream of loki log messages given a query string.

restart_datum(pipeline_name, job_id[, ...])

Restarts a datum.

run_cron(pipeline_name[, project_name])

Triggers a cron pipeline to run now.

start_pipeline(pipeline_name[, project_name])

Starts a pipeline.

stop_job(job_id, pipeline_name[, reason, ...])

Stops a subjob (job at the pipeline-level).

stop_pipeline(pipeline_name[, project_name])

Stops a pipeline.

create_pipeline(pipeline_name, transform, project_name=None, parallelism_spec=None, egress=None, reprocess_spec=None, update=False, output_branch_name=None, s3_out=False, resource_requests=None, resource_limits=None, sidecar_resource_limits=None, input=None, description=None, reprocess=False, service=None, datum_set_spec=None, datum_timeout=None, job_timeout=None, salt=None, datum_tries=3, scheduling_spec=None, pod_patch=None, spout=None, spec_commit=None, metadata=None, autoscaling=False, tolerations=None, sidecar_resource_requests=None, dry_run=False, determined=None)[source]

Creates a pipeline.

For info on the params, please refer to the pipeline spec document: http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html

Parameters
pipeline_namestr

The pipeline name.

transformpps_pb2.Transform

The image and commands run during pipeline execution.

project_namestr

The name of the project.

parallelism_specpps_pb2.ParallelismSpec, optional

Specifies how the pipeline is parallelized.

egresspps_pb2.Egress, optional

An external data store to publish the results of the pipeline to.

reprocess_specstr, optional

Specifies how to handle already-processed datums.

updatebool, optional

If true, updates the existing pipeline with new args.

output_branch_namestr, optional

The branch name to output results on.

s3_outbool, optional

If true, the output repo is exposed as an S3 gateway bucket.

resource_requestspps_pb2.ResourceSpec, optional

The amount of resources that the pipeline workers will consume.

resource_limits: pps_pb2.ResourceSpec, optional

The upper threshold of allowed resources a given worker can consume. If a worker exceeds this value, it will be evicted.

sidecar_resource_limitspps_pb2.ResourceSpec, optional

The upper threshold of resources allocated to the sidecar containers.

inputpps_pb2.Input, optional

The input repos to the pipeline. Commits to these repos will automatically trigger the pipeline to create new jobs to process them.

descriptionstr, optional

A description of the pipeline.

reprocessbool, optional

If true, forces the pipeline to reprocess all datums. Only has meaning if update is True.

servicepps_pb2.Service, optional

Creates a Service pipeline instead of a normal pipeline.

datum_set_specpps_pb2.DatumSetSpec, optional

Specifies how a pipeline should split its datums into datum sets.

datum_timeoutduration_pb2.Duration, optional

The maximum execution time allowed for each datum.

job_timeoutduration_pb2.Duration, optional

The maximum execution time allowed for a job.

saltstr, optional

A tag for the pipeline.

datum_triesint, optional

The number of times a job attempts to run on a datum when a failure occurs.

scheduling_specpps_pb2.SchedulingSpec, optional

Specifies how the pods for a pipeline should be scheduled.

pod_patchstr, optional

Allows one to set fields in the pod spec that haven’t been explicitly exposed in the rest of the pipeline spec.

spoutpps_pb2.Spout, optional

Creates a Spout pipeline instead of a normal pipeline.

spec_commitpfs_pb2.Commit, optional

A spec commit to base the pipeline spec from.

metadatapps_pb2.Metadata, optional

Kubernetes labels and annotations to add as metadata to the pipeline pods.

autoscalingbool, optional

If true, automatically scales the worker pool based on the datums it has to process.

tolerations: List[pps_pb2.Toleration]

A list of Kubernetes tolerations to be applied to the worker pod.

sidecar_resource_requestspps_pb2.ResourceSpec, optional

The amount of resources that the sidecar containers will consume.

Notes

If creating a Spout pipeline, when committing data to the repo, use commit methods (client.commit(), client.start_commit(), etc.) or ModifyFileClient methods (mfc.put_file_from_bytes, mfc.delete_file(), etc.)

For other pipelines, when committing data to the repo, write out to /pfs/out/.

Examples

>>> client.create_pipeline(
...     "foo",
...     transform=pps_pb2.Transform(
...         cmd=["python3", "main.py"],
...         image="example/image",
...     ),
...     input=pps_pb2.Input(pfs=pps_pb2.PFSInput(
...         repo="foo",
...         branch="master",
...         glob="/*"
...     ))
... )
create_pipeline_from_request(req)[source]

Creates a pipeline from a CreatePipelineRequest object. Usually used in conjunction with util.parse_json_pipeline_spec() or util.parse_dict_pipeline_spec().

Parameters
reqpps_pb2.CreatePipelineRequest

The CreatePipelineRequest object.

create_secret(secret_name, data, labels=None, annotations=None)[source]

Creates a new secret.

Parameters
secret_namestr

The name of the secret.

dataDict[str, Union[str, bytes]]

The data to store in the secret. Each key must consist of alphanumeric characters -, _ or ..

labelsDict[str, str], optional

Kubernetes labels to attach to the secret.

annotationsDict[str, str], optional

Kubernetes annotations to attach to the secret.

delete_all_pipelines()[source]

Deletes all pipelines.

delete_job(job_id, pipeline_name, project_name=None)[source]

Deletes a subjob (job at the pipeline-level).

Parameters
job_idstr

The ID of the job.

pipeline_namestr

The name of the pipeline.

project_namestr

The name of the project.

delete_pipeline(pipeline_name, force=False, keep_repo=False, project_name=None)[source]

Deletes a pipeline.

Parameters
pipeline_namestr

The name of the pipeline.

forcebool, optional

If true, forces the pipeline deletion.

keep_repobool, optional

If true, keeps the output repo.

project_namestr

The name of the project.

delete_secret(secret_name)[source]

Deletes a secret.

Parameters
secret_namestr

The name of the secret.

get_job_logs(pipeline_name, job_id, project_name=None, data_filters=None, datum=None, follow=False, tail=0, use_loki_backend=False, since=None)[source]

Gets logs for a job.

Parameters
pipeline_namestr

The name of the pipeline.

job_idstr

The ID of the job.

project_namestr

The name of the project.

data_filtersList[str], optional

A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).

datumpps_pb2.Datum, optional

Filters log lines for the specified datum.

followbool, optional

If true, continue to follow new logs as they appear.

tailint, optional

If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.

use_loki_backendbool, optional

If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.

sinceduration_pb2.Duration, optional

Specifies how far in the past to return logs from.

Returns
Iterator[pps_pb2.LogMessage]

An iterator of protobuf objects that contain info on a log from a PPS worker. If follow is set to True, use next() to iterate through as the returned stream is potentially endless. Might block your code otherwise.

get_kube_events(since)[source]

Return a stream of Kubernetes events.

get_pipeline_logs(pipeline_name, project_name=None, data_filters=None, master=False, datum=None, follow=False, tail=0, use_loki_backend=False, since=None)[source]

Gets logs for a pipeline.

Parameters
pipeline_namestr

The name of the pipeline.

project_namestr

The name of the project.

data_filtersList[str], optional

A list of the names of input files from which we want processing logs. This may contain multiple files, in case pipeline_name contains multiple inputs. Each filter may be an absolute path of a file within a repo, or it may be a hash for that file (to search for files at specific versions).

masterbool, optional

If true, includes logs from the master

datumpps_pb2.Datum, optional

Filters log lines for the specified datum.

followbool, optional

If true, continue to follow new logs as they appear.

tailint, optional

If nonzero, the number of lines from the end of the logs to return. Note: tail applies per container, so you will get tail * <number of pods> total lines back.

use_loki_backendbool, optional

If true, use loki as a backend, rather than Kubernetes, for fetching logs. Requires a loki-enabled cluster.

sinceduration_pb2.Duration, optional

Specifies how far in the past to return logs from.

Returns
Iterator[pps_pb2.LogMessage]

An iterator of protobuf objects that contain info on a log from a PPS worker. If follow is set to True, use next() to iterate through as the returned stream is potentially endless. Might block your code otherwise.

inspect_datum(pipeline_name, job_id, datum_id, project_name=None)[source]

Inspects a datum.

Parameters
pipeline_namestr

The name of the pipeline.

job_idstr

The ID of the job.

datum_idstr

The ID of the datum.

project_namestr

The name of the project.

Returns
pps_pb2.DatumInfo

A protobuf object with info on the datum.

inspect_job(job_id, pipeline_name=None, wait=False, details=False, project_name=None)[source]

Inspects a job.

Parameters
job_idstr

The ID of the job.

pipeline_namestr, optional

The name of a pipeline.

waitbool, optional

If true, wait until the job completes.

detailsbool, optional

If true, return worker details.

project_namestr

The name of the project.

Returns
Iterator[pps_pb2.JobInfo]

An iterator of protobuf objects that contain info on a subjob (jobs at the pipeline-level).

Examples

>>> # Look at all subjobs in a job
>>> subjobs = list(client.inspect_job("467c580611234cdb8cc9758c7aa96087"))
...
>>> # Look at single subjob (job at the pipeline-level)
>>> subjob = list(client.inspect_job("467c580611234cdb8cc9758c7aa96087", "foo"))[0]
inspect_pipeline(pipeline_name, history=0, details=False, project_name=None)[source]

Inspects a pipeline.

Parameters
pipeline_namestr

The name of the pipeline.

historyint, optional

Indicates to return historical versions of pipeline_name. Semantics are:

  • 0: Return current version of pipeline_name

  • 1: Return the above and pipeline_name from the next most recent version.

  • 2: etc.

  • -1: Return all historical versions of pipeline_name.

detailsbool, optional

If true, return pipeline details.

project_namestr

The name of the project.

Returns
Iterator[pps_pb2.PipelineInfo]

An iterator of protobuf objects that contain info on a pipeline.

Examples

>>> pipeline = next(client.inspect_pipeline("foo"))
...
>>> for p in client.inspect_pipeline("foo", 2):
>>>     print(p)
inspect_secret(secret_name)[source]

Inspects a secret.

Parameters
secret_namestr

The name of the secret.

Returns
pps_pb2.SecretInfo

A protobuf object with info on the secret.

list_datum(pipeline_name=None, job_id=None, input=None, project_name=None, datum_filter=None, pagination_marker=None, number=None, reverse=False)[source]

Lists datums. Exactly one of (pipeline_name, job_id) (real) or input (hypothetical) must be set.

Parameters
pipeline_namestr, optional

The name of the pipeline.

job_idstr, optional

The ID of a job.

inputpps_pb2.Input, optional

A protobuf object that filters the datums returned. The datums listed are ones that would be run if a pipeline was created with the provided input.

project_namestr

The name of the project.

datum_filter: pps_proto.ListDatumRequest.adFilter

Filter restricts returned DatumInfo messages to those which match all the filtered attributes.

pagination_marker:

Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.

numberint, optional

Number of files to return

reversebool, optional

If true, return files in reverse order

Returns
Iterator[pps_pb2.DatumInfo]

An iterator of protobuf objects that contain info on a datum.

Examples

>>> # See hypothetical datums with specified input cross
>>> datums = list(client.list_datum(input=pps_pb2.Input(
...     pfs=pps_pb2.PFSInput(repo="foo", branch="master", glob="/*"),
...     cross=[
...         pps_pb2.Input(pfs=pps_pb2.PFSInput(repo="bar", branch="master", glob="/")),
...         pps_pb2.Input(pfs=pps_pb2.PFSInput(repo="baz", branch="master", glob="/*/*")),
...     ]
... )))
list_job(pipeline_name=None, input_commit=None, history=0, details=False, jqFilter=None, project_name=None, projects_filter=None, pagination_marker=None, number=None, reverse=False)[source]

Lists jobs.

Parameters
pipeline_namestr, optional

The name of a pipeline. If set, returns subjobs (job at the pipeline-level) only from this pipeline.

input_commitSubcommitType, optional

A commit or list of commits from the input repo to filter jobs on. Only impacts returned results if pipeline_name is specified.

historyint, optional

Indicates to return jobs from historical versions of pipeline_name. Semantics are:

  • 0: Return jobs from the current version of pipeline_name

  • 1: Return the above and jobs from the next most recent version

  • 2: etc.

  • -1: Return jobs from all historical versions of pipeline_name

detailsbool, optional

If true, return pipeline details for pipeline_name. Leaving this None (or False) can make the call significantly faster in clusters with a large number of pipelines and jobs. Note that if input_commit is valid, this field is coerced to True.

jqFilterstr, optional

A jq filter that can filter the list of jobs returned, only if pipeline_name is provided.

project_namestr, optional

The name of the project containing the pipeline.

projects_filter: List[str], optional

A list of projects to filter jobs on, None means don’t filter.

pagination_marker:

Marker for pagination. If set, the files that come after the marker in lexicographical order will be returned. If reverse is also set, the files that come before the marker in lexicographical order will be returned.

numberint, optional

Number of files to return

reversebool, optional

If true, return files in reverse order

Returns
Union[Iterator[pps_pb2.JobInfo], Iterator[pps_pb2.JobSetInfo]]

An iterator of protobuf objects that either contain info on a subjob (job at the pipeline-level), if pipeline_name was specified, or a job, if pipeline_name wasn’t specified.

Examples

>>> # List all jobs
>>> jobs = list(client.list_job())
...
>>> # List all jobs at a pipeline-level
>>> subjobs = list(client.list_job("foo"))
list_pipeline(history=0, details=False, jqFilter=None, commit_set=None, projects_filter=None)[source]

Lists pipelines.

Parameters
historyint, optional

Indicates to return historical versions of pipeline_name. Semantics are:

  • 0: Return current version of pipeline_name

  • 1: Return the above and pipeline_name from the next most recent version.

  • 2: etc.

  • -1: Return all historical versions of pipeline_name.

detailsbool, optional

If true, return pipeline details.

jqFilterstr, optional

A jq filter that can filter the list of pipelines returned.

commit_setpfs_pb2.CommitSet, optional

If non-nil, will return all the pipeline infos at this commit set

projects_filter: List[str], optional

A list of projects to filter jobs on, None means don’t filter.

Returns
Iterator[pps_pb2.PipelineInfo]

An iterator of protobuf objects that contain info on a pipeline.

Examples

>>> pipelines = list(client.list_pipeline())
list_secret()[source]

Lists secrets.

Returns
List[pps_pb2.SecretInfo]

A list of protobuf objects that contain info on a secret.

query_loki(query, since=None)[source]

Returns a stream of loki log messages given a query string.

Parameters
querystr

The Loki query.

sinceduration_pb2.Duration, optional

Return log messages more recent than “since”. (default now)

restart_datum(pipeline_name, job_id, data_filters=None, project_name=None)[source]

Restarts a datum.

Parameters
pipeline_namestr

The name of the pipeline.

job_idstr

The ID of the job.

data_filtersList[str], optional

A list of paths or hashes of datums that filter which datums are restarted.

project_namestr

The name of the project.

run_cron(pipeline_name, project_name=None)[source]

Triggers a cron pipeline to run now.

For more info on cron pipelines: https://docs.pachyderm.com/latest/concepts/pipeline-concepts/pipeline/cron/

Parameters
pipeline_namestr

The name of the pipeline.

project_namestr

The name of the project.

start_pipeline(pipeline_name, project_name=None)[source]

Starts a pipeline.

Parameters
pipeline_namestr

The name of the pipeline.

project_namestr

The name of the project.

stop_job(job_id, pipeline_name, reason=None, project_name=None)[source]

Stops a subjob (job at the pipeline-level).

Parameters
job_idstr

The ID of the job.

pipeline_namestr

The name of the pipeline.

reasonstr, optional

A reason for stopping the job.

project_namestr

The name of the project.

stop_pipeline(pipeline_name, project_name=None)[source]

Stops a pipeline.

Parameters
pipeline_namestr

The name of the pipeline.

project_namestr

The name of the project.

python_pachyderm.mixin.transaction

class TransactionMixin[source]

A mixin for transaction-related functionality.

Methods

batch_transaction(requests)

Executes a batch transaction.

delete_all_transactions()

Deletes all transactions.

delete_transaction(transaction)

Deletes a transaction.

finish_transaction(transaction)

Finishes a transaction.

inspect_transaction(transaction)

Inspects a transaction.

list_transaction()

Lists unfinished transactions.

start_transaction()

Starts a transaction.

transaction()

A context manager for running operations within a transaction.

batch_transaction(requests)[source]

Executes a batch transaction.

Parameters
requestsList[transaction_pb2.TransactionRequest]

A list of TransactionRequest protobufs. Each protobuf must only have one field set.

Returns
transaction_pb2.TransactionInfo

A protobuf object with info on the transaction.

Examples

>>> # Deletes one repo and creates a branch in another repo atomically
>>> client.batch_transaction([
    transaction_pb2.TransactionRequest(delete_repo=pfs_pb2.DeleteRepoRequest(repo=pfs_pb2.Repo(name="foo"))),
    transaction_pb2.TransactionRequest(create_branch=pfs_pb2.CreateBranchRequest(branch=pfs_pb2.Branch(
        repo=pfs_pb2.Repo(name="bar", type="user"), name="staging"
    )))
])
delete_all_transactions()[source]

Deletes all transactions.

delete_transaction(transaction)[source]

Deletes a transaction.

Parameters
transactionUnion[str, transaction_pb2.Transaction]

The ID or protobuf object representing the transaction.

Examples

>>> client.delete_transaction("6fe754facd9c41e99d04e1037e3be9ee")
...
>>> transaction = client.finish_transaction("a3ak09467c580611234cdb8cc9758c7a")
>>> client.delete_transaction(transaction)
finish_transaction(transaction)[source]

Finishes a transaction.

Parameters
transactionUnion[str, transaction_pb2.Transaction]

The ID or protobuf object representing the transaction.

Returns
transaction_pb2.TransactionInfo

A protobuf object with info on the transaction.

Examples

>>> transaction = client.start_transaction()
>>> # do stuff
>>> client.finish_transaction(transaction)
inspect_transaction(transaction)[source]

Inspects a transaction.

Parameters
transactionUnion[str, transaction_pb2.Transaction]

The ID or protobuf object representing the transaction.

Returns
transaction_pb2.TransactionInfo

A protobuf object with info on the transaction.

Examples

>>> transaction = client.inspect_transaction("6fe754facd9c41e99d04e1037e3be9ee")
...
>>> transaction = client.inspect_transaction(transaction_protobuf)
list_transaction()[source]

Lists unfinished transactions.

Returns
List[transaction_pb2.TransactionInfo]

A list of protobuf objects that contain info on a transaction.

start_transaction()[source]

Starts a transaction.

Returns
transaction_pb2.Transaction

A protobuf object that represents the transaction.

Examples

>>> transaction = client.start_transaction()
>>> # do stuff
>>> client.finish_transaction(transaction)
transaction()[source]

A context manager for running operations within a transaction. When the context manager completes, the transaction will be deleted if an error occurred, or otherwise finished.

Yields
transaction_pb2.Transaction

A protobuf object that represents a transaction.

Examples

If a pipeline has two input repos, foo and bar, a transaction is useful for adding data to both atomically before the pipeline runs even once.

>>> with client.transaction() as t:
>>>     c1 = client.start_commit("foo", "master")
>>>     c2 = client.start_commit("bar", "master")
>>>
>>> # File operations, such as put file, must occur outside
>>> #  the transaction block.
>>> client.put_file_bytes(c1, "/joint_data.txt", b"DATA1")
>>> client.put_file_bytes(c2, "/joint_data.txt", b"DATA2")
>>>
>>> client.finish_commit(c1)
>>> client.finish_commit(c2)

python_pachyderm.mixin.version

class VersionMixin[source]

A mixin for version-related functionality.

Methods

get_remote_version()

Gets version of Pachyderm server.

get_remote_version()[source]

Gets version of Pachyderm server.

Returns
version_pb2.Version

A protobuf object with info on the pachd version.