Tutorial: Create, track, and use a dataset artifact

This walkthrough shows you how to create, track, and use a dataset artifact with W&B. You log a dataset as a versioned artifact to W&B, then download it in a later run. This workflow helps you reproducibly share datasets across experiments and track them as inputs and outputs of your runs.

Log in to W&B

Import the W&B library and log in to W&B. If you haven’t done so already, sign up for a free W&B account.

import wandb

wandb.login()

Initialize a run

Use wandb.init() to initialize a run. This generates a background process to sync and log data. Provide a project name and a job type:

# Create a W&B Run. Here you specify 'dataset' as the job type since this example
# shows how to create a dataset artifact.
with wandb.init(project="artifacts-example", job_type="upload-dataset") as run:
    # Your code here

Create an artifact object

Create an artifact object with wandb.Artifact(). Provide a name for the artifact and a description of the file type for the name and type parameters, respectively. For example, the following code snippet demonstrates how to create an artifact called bicycle-dataset with a dataset label:

artifact = wandb.Artifact(name="bicycle-dataset", type="dataset")

For more information about how to construct an artifact, see Construct artifacts.

Add the dataset to the artifact

Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named dataset.h5 that is saved locally on your machine to the artifact:

# Add a file to the artifact's contents
artifact.add_file(local_path="dataset.h5")

Replace the filename dataset.h5 in the previous code snippet with the path to the file you want to add to the artifact.

Log the dataset

Use the run object’s wandb.Run.log_artifact() method to both save your artifact version and declare the artifact as an output of the run.

# Save the artifact version to W&B and mark it
# as the output of this run
run.log_artifact(artifact)

When you log an artifact, W&B creates a latest alias by default. For more information about artifact aliases and versions, see Create a custom alias and Create new artifact versions, respectively. Putting this together, your script so far should look like this:

import wandb

wandb.login()

with wandb.init(project="artifacts-example", job_type="upload-dataset") as run:
    artifact = wandb.Artifact(name="bicycle-dataset", type="dataset")
    artifact.add_file(local_path="dataset.h5")
    run.log_artifact(artifact)

Download and use the artifact

The following code example demonstrates the steps you can take to use an artifact you’ve logged and saved to the W&B servers:

Initialize a new run object with wandb.init().
Use the run object’s wandb.Run.use_artifact() method to specify which artifact to use. This returns an artifact object.
Use the artifact’s wandb.Artifact.download() method to download the contents of the artifact.

# Create a W&B Run. Here you specify 'training' for 'type'
# because you use this run to track training.
with wandb.init(project="artifacts-example", job_type="training") as run:

  # Query W&B for an artifact and mark it as input to this run
  artifact = run.use_artifact("bicycle-dataset:latest")

  # Download the artifact's contents
  artifact_dir = artifact.download()

Alternatively, you can use the Public API (wandb.Api) to export or update data already saved in W&B outside of a run. For more information, see Track external files. You now have a versioned dataset artifact logged to W&B and consumed by a downstream run.

Guides

Integrations

Reference

Tutorial: Create, track, and use a dataset artifact

Log in to W&B

Initialize a run

Create an artifact object

Add the dataset to the artifact

Log the dataset

Download and use the artifact

​Log in to W&B

​Initialize a run

​Create an artifact object

​Add the dataset to the artifact

​Log the dataset

​Download and use the artifact

Log in to W&B

Initialize a run

Create an artifact object

Add the dataset to the artifact

Log the dataset

Download and use the artifact