Create an artifact version - Weights & Biases Documentation

This page shows you how to create a new artifact version so you can track, share, and reuse datasets, models, or other files across experiments. Create a new artifact version with a single run or collaboratively with distributed runs. You can optionally create a new artifact version from a previous version known as an incremental artifact, which avoids re-uploading files that didn’t change.

Create an incremental artifact when you need to apply changes to a subset of files in an artifact, where the size of the original artifact is much larger.

Create new artifact versions from scratch

You can create a new artifact version in two ways: from a single run and from distributed runs. The following list defines each method:

Single run: A single run provides all the data for a new version. This is the most common case and is best suited when the run fully recreates the needed data. For example, to output saved models or model predictions in a table for analysis.
Distributed runs: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs that generate data, often in parallel. For example, to evaluate a model in a distributed manner and output the predictions.

W&B creates a new artifact and assigns it a v0 alias if you pass a name to the wandb.Artifact API that doesn’t exist in your project. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1. W&B retrieves an existing artifact if you pass a name and artifact type to the wandb.Artifact API that matches an existing artifact in your project. The retrieved artifact already has at least one version (v0 or later).

Single run

Log a new version of an artifact with a single run that produces all the files in the artifact. This case occurs when a single run produces all the files in the artifact. You can create a new artifact version either as part of an active W&B run (so the artifact is tracked as that run’s output) or outside of a run (when you want to log artifacts independently of experiment tracking). Based on your use case, select one of the following tabs to create a new artifact version inside or outside of a run:

Inside a run
Outside of a run

Create an artifact version within a W&B run:

Create a run with wandb.init().
Create a new artifact or retrieve an existing one with wandb.Artifact.
Add files to the artifact with .add_file.
Log the artifact to the run with .log_artifact.

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")

    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.log_artifact(artifact)

Create an artifact version outside of a W&B run:

Create a new artifact or retrieve an existing one with wandb.Artifact.
Add files to the artifact with .add_file.
Save the artifact with .save.

artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
artifact.save()

Distributed runs

Allow a collection of runs to collaborate on a version before they commit it. This is in contrast to single-run mode described previously, where one run provides all the data for a new version. Use distributed runs when no single run has access to all the files that belong in the artifact (for example, when several parallel jobs each produce a portion of the output).

Each run in the collection needs the same unique ID (called distributed_id) to collaborate on the same version. By default, if present, W&B uses the run’s group as set by wandb.init(group=GROUP) as the distributed_id.
A final run must “commit” the version, permanently locking its state.
Use upsert_artifact to add to the collaborative artifact and finish_artifact to finalize the commit.

Consider the following example, which demonstrates how multiple runs share a distributed_id to contribute to a single artifact version and how a final run commits it. Different runs (labeled as Run 1, Run 2, and Run 3 in the following examples) add a different image file to the same artifact with upsert_artifact. Run 1:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 2:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image2.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 3: Run 3 commits the artifact version and permanently locks its state, so no further runs can add files under the same distributed_id. Run 3 must run after Run 1 and Run 2 complete. The run that calls wandb.Run.finish_artifact() can include files in the artifact, but doesn’t need to.

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image3.png")
    run.finish_artifact(artifact, distributed_id="my_dist_artifact")

Create a new artifact version from an existing version

Add, modify, or remove a subset of files from a previous artifact version without the need to re-index the files that didn’t change. Adding, modifying, or removing a subset of files from a previous artifact version creates a new artifact version known as an incremental artifact.

The following list describes scenarios for each type of incremental change you might encounter:

Add: You periodically add a new subset of files to a dataset after you collect a new batch.
Remove: You discovered several duplicate files and want to remove them from your artifact.
Update: You corrected annotations for a subset of files and want to replace the old files with the correct ones.

You could create an artifact from scratch to perform the same function as an incremental artifact. However, when you create an artifact from scratch, you need to have all the contents of your artifact on your local disk. When you make an incremental change, you can add, remove, or modify a single file without changing the files from a previous artifact version.

You can create an incremental artifact within a single run or with a set of runs (distributed mode).

To incrementally change an artifact, follow this procedure:

Obtain the artifact version you want to incrementally change:

Inside a run
Outside of a run

saved_artifact = run.use_artifact("my_artifact:latest")

client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")

Create a draft with:

draft_artifact = saved_artifact.new_draft()

Perform any incremental changes you want to see in the next version. You can add, remove, or modify an existing entry.

Select one of the tabs for an example of how to perform each of these changes:

Add
Remove
Modify

Add a file to an existing artifact version with the add_file method:

draft_artifact.add_file("file_to_add.txt")

You can also add multiple files by adding a directory with the add_dir method.

Remove a file from an existing artifact version with the remove method:

draft_artifact.remove("file_to_remove.txt")

You can also remove multiple files with the remove method by passing in a directory path.

Modify or replace contents by removing the old contents from the draft and adding the new contents back in:

draft_artifact.remove("modified_file.txt")
draft_artifact.add_file("modified_file.txt")

Log or save your changes to commit the draft as a new artifact version. The following tabs show you how to save your changes inside and outside of a W&B run. Select the tab that is appropriate for your use case:

Inside a run
Outside of a run

run.log_artifact(draft_artifact)

draft_artifact.save()

After you log or save the draft, W&B creates a new artifact version that records only the incremental changes while reusing the unchanged files from the previous version. The preceding code examples combined look like the following:

Inside a run
Outside of a run

with wandb.init(job_type="modify dataset") as run:
    saved_artifact = run.use_artifact(
        "my_artifact:latest"
    )  # fetch artifact and input it into your run
    draft_artifact = saved_artifact.new_draft()  # create a draft version

    # modify a subset of files in the draft version
    draft_artifact.add_file("file_to_add.txt")
    draft_artifact.remove("dir_to_remove/")
    run.log_artifact(
        draft_artifact
    )  # log your changes to create a new version and mark it as output to your run

client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")  # load your artifact
draft_artifact = saved_artifact.new_draft()  # create a draft version

# modify a subset of files in the draft version
draft_artifact.remove("deleted_file.txt")
draft_artifact.add_file("modified_file.txt")
draft_artifact.save()  # commit changes to the draft

​Create new artifact versions from scratch

​Single run

​Distributed runs

​Create a new artifact version from an existing version

Create new artifact versions from scratch

Single run

Distributed runs

Create a new artifact version from an existing version