Skip to main content
This page shows you how to create a W&B artifact, add content to it, and save it so that you can version and share datasets, models, and other files across your machine learning workflows. Use the W&B Python SDK to construct artifacts from W&B Runs. You can add files, directories, URIs, and files from parallel runs to artifacts. After you add a file to an artifact, save the artifact to the W&B Server or your own private server. Each artifact is associated with a run. For information on how to track external files, such as files stored in Amazon S3, see the Track external files page.

Construct an artifact

Construct a W&B artifact in three steps:
  1. Create an artifact Python object with wandb.Artifact()
  2. Add one or more files to the artifact
  3. Save your artifact to the W&B server

1. Create an artifact Python object with wandb.Artifact()

Initialize the wandb.Artifact() class to create an artifact object. Specify the following parameters:
  • Name: The name of your artifact. The name should be unique, descriptive, and memorable.
  • Type: The type of artifact. The type should be short, descriptive, and correspond to a single step of your machine learning pipeline. Common artifact types include 'dataset' or 'model'.
W&B uses the name and type you provide to create a directed acyclic graph in the W&B App. See the Explore and traverse artifact graphs for more information.
Artifacts can’t have the same name, regardless of type. In other words, you can’t create an artifact named cats of type dataset and another artifact with the same name of type model.
You can optionally provide a description and metadata when you initialize an artifact object. For more information about available attributes and parameters, see the wandb.Artifact class definition in the Python SDK Reference Guide. Copy and paste the following code snippet to create an artifact object. Replace the [NAME] and [TYPE] placeholders with your own values:
import wandb

# Create an artifact object
artifact = wandb.Artifact(name="[NAME]", type="[TYPE]")

2. Add one or more files to the artifact

Add files, directories, external URI references (such as Amazon S3) and more to your artifact object. To add a single file, use the artifact object’s Artifact.add_file() method:
artifact.add_file(local_path="path/to/file.txt", name="[NAME]")
To add a directory, use the Artifact.add_dir() method:
artifact.add_dir(local_path="path/to/directory", name="[NAME]")
See the next section, Add files to an artifact, for more information about how to add different file types to an artifact.

3. Save your artifact to the W&B server

When you save the artifact, W&B uploads its contents and registers it with the run, which makes the artifact available for downstream use and versioning. Use the run object’s wandb.Run.log_artifact() method to save the artifact.
with wandb.init(project="[PROJECT]", job_type="[JOB-TYPE]") as run:
    run.log_artifact(artifact)
When to use wandb.Run.log_artifact() or Artifact.save()
  • Use wandb.Run.log_artifact() to create a new artifact and associate it with a specific run.
  • Use Artifact.save() to update an existing artifact without creating a new run.
Putting this all together, the following code snippet shows how to create a dataset artifact, add a file to the artifact, and save the artifact to W&B:
import wandb

artifact = wandb.Artifact(name="[NAME]", type="[TYPE]")
artifact.add_file(local_path="path/to/file.txt", name="[NAME]")
artifact.add_dir(local_path="path/to/directory", name="[NAME]")

with wandb.init(project="[PROJECT]", job_type="[JOB-TYPE]") as run:
    run.log_artifact(artifact)
Each time you log an artifact with the same name and type, W&B creates a new version of that artifact. For more information, see Create a new artifact version.
W&B performs wandb.Run.log_artifact() calls asynchronously for faster uploads. This can cause surprising behavior when you log artifacts in a loop. For example:
with wandb.init() as run:
    for i in range(10):
        a = wandb.Artifact(name = "race",
            type="dataset",
            metadata={
                "index": i,
            },
        )
        # ... add files to artifact a ...
        run.log_artifact(a)
The artifact version v0 is not guaranteed to have an index of 0 in its metadata because W&B might log artifacts in an arbitrary order.

Add files to an artifact

After you create an artifact object, populate it with the content you want to track. The following sections show how to add different types of objects to an artifact. Assume you have a directory with the following structure as you read through the examples:
root-directory
| - hello.txt
| - images/
| -- | cat.png
| -- | dog.png
| - checkpoints/
| -- | model.h5
| - models/
| -- | model.h5

Add a single file

Use wandb.Artifact.add_file() to add a single local file to an artifact. Provide the local path to the file as the local_path parameter:
import wandb

# Initialize an artifact object
artifact = wandb.Artifact(name="[NAME]", type="[TYPE]")

# Add a single file
artifact.add_file(local_path="path/file.format")
For example, suppose you had a file called 'hello.txt' in your working local directory.
artifact.add_file("hello.txt")
The artifact now has the following content:
hello.txt
Optionally, pass a different name to the name parameter to rename the file within the artifact object itself. Continuing the previous example:
artifact.add_file(
    local_path="hello.txt", 
    name="new/path/hello_world.txt"
    )
The artifact is stored as:
new/path/hello_world.txt
The following table shows how different API calls produce different artifact contents:
API callResulting artifact
artifact.new_file('hello.txt')hello.txt
artifact.add_file('model.h5')model.h5
artifact.add_file('checkpoints/model.h5')model.h5
artifact.add_file('model.h5', name='models/mymodel.h5')models/mymodel.h5

Add multiple files

Use the wandb.Artifact.add_dir() method to add multiple files from a local directory to an artifact. Provide the local path to the directory as the local_path parameter.
import wandb

# Initialize an artifact object
artifact = wandb.Artifact(name="[NAME]", type="[TYPE]")

# Add a local directory to the artifact
artifact.add_dir(local_path="path/to/directory", name="optional-prefix")
The following table shows how different API calls produce different artifact contents:
API callResulting artifact
artifact.add_dir('images')

cat.png

dog.png

artifact.add_dir('images', name='images')

images/cat.png

images/dog.png

Add a URI reference

Use a URI reference when you want an artifact to point to content stored outside of W&B, such as in an object store, without copying the underlying bytes. Artifacts track checksums and other information for reproducibility if the URI has a scheme that the W&B library supports. Add an external URI reference to an artifact with the wandb.Artifact.add_reference() method. Replace the 'uri' string with your own URI. Optionally pass the desired path within the artifact for the name parameter.
# Add a URI reference
artifact.add_reference(uri="uri", name="optional-name")
Artifacts support the following URI schemes:
  • http(s)://: A path to a file accessible over HTTP. The artifact tracks checksums in the form of etags and size metadata if the HTTP server supports the ETag and Content-Length response headers.
  • s3://: A path to an object or object prefix in S3. The artifact tracks checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. W&B expands object prefixes to include the objects under the prefix, up to a maximum of 10,000 objects.
  • gs://: A path to an object or object prefix in GCS. The artifact tracks checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. W&B expands object prefixes to include the objects under the prefix, up to a maximum of 10,000 objects.
The following table shows how different API calls produce different artifact contents:
API callResulting artifact contents
artifact.add_reference('s3://my-bucket/model.h5')model.h5
artifact.add_reference('s3://my-bucket/checkpoints/model.h5')model.h5
artifact.add_reference('s3://my-bucket/model.h5', name='models/mymodel.h5')models/mymodel.h5
artifact.add_reference('s3://my-bucket/images')

cat.png

dog.png

artifact.add_reference('s3://my-bucket/images', name='images')

images/cat.png

images/dog.png

Add files to artifacts from parallel runs

For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact.
import wandb
import time

# This example uses Ray to run in parallel
# for demonstration purposes.
import ray

ray.init()

artifact_type = "dataset"
artifact_name = "parallel-artifact"
table_name = "distributed_table"
parts_path = "parts"
num_parallel = 5

# Each batch of parallel writers should have its own
# unique group name.
group_name = "writer-group-{}".format(round(time.time()))


@ray.remote
def train(i):
    """
    The writer job. Each writer adds one image to the artifact.
    """
    with wandb.init(group=group_name) as run:
        artifact = wandb.Artifact(name=artifact_name, type=artifact_type)

        # Add data to a wandb table.
        table = wandb.Table(columns=["a", "b", "c"], data=[[i, i * 2, 2**i]])

        # Add the table to folder in the artifact
        artifact.add(table, "{}/table_{}".format(parts_path, i))

        # Upsert the artifact to create or append data to the artifact
        run.upsert_artifact(artifact)


# Launch your runs in parallel
result_ids = [train.remote(i) for i in range(num_parallel)]

# Join on all the writers to make sure their files have
# been added before finishing the artifact.
ray.get(result_ids)

# After all the writers finish, finish the artifact
# to mark it ready.
with wandb.init(group=group_name) as run:
    artifact = wandb.Artifact(artifact_name, type=artifact_type)

    # Create a "PartitionTable" pointing to the folder of tables
    # and add it to the artifact.
    artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name)

    # Finish the artifact to finalize it, disallowing future "upserts"
    # to this version.
    run.finish_artifact(artifact)

Find path for logged artifacts and other metadata

After you log an artifact, you might want to inspect the files associated with the run that produced it. The following code snippet shows how to use the W&B Public API to list the files in a run, including their names and URLs. Replace the [ENTITY/PROJECT/RUN-ID] placeholder with your own values:
from wandb.apis.public.files import Files
from wandb.apis.public.api import Api

# Example run object
run = Api().run("[ENTITY/PROJECT/RUN-ID]")

# Create a Files object to iterate over files in the run
files = Files(api.client, run)

# Iterate over files
for file in files:
    print(f"File Name: {file.name}")
    print(f"File URL: {file.url}")
    print(f"Path to file in the bucket: {file.direct_url}")
See the File class for more information about available attributes and methods.