Inputs, sources, images, formats: making sense of it all

Throughout the documentation, there are many references to sources, images, and inputs. The various I/O options for Buf can seem daunting and overly complex, so we'll break down how this all fits together.

The default built and lint operations of Buf use your current directory. If all you are doing is linting Protobuf files, the below can be largely ignored, as Buf does what you would expect by default.

Terminology

First, some basic terminology to help our discussion:

  • A Source is a set of .proto files that can be compiled.

  • An Image is a compiled set of .proto files. This is itself a Protobuf message. The exact mechanics of Images are described in the Image documentation, which we encourage you to read.

    Images are created from Sources using buf build or protoc.

  • An Input is either a Source or an Image.

  • All Inputs have a Format, which describes the type of the Input. This Format is usually automatically derived, however it can be explicitly set.

Why?

At first glance, this may seem extremely complex for a Protobuf tool. For most current use cases, that is accurate. Generally, your only goal is to work with .proto files on disk. This is how Buf works by default. However, there are cases where one wants to work with more than just local files, some of which apply to Buf's current feature set, and some of which are for the future.

Breaking change detection

The biggest current use case is for breaking change detection. When you are comparing your current Protobuf schema to an old version of your schema, you have to decide - where is your old version stored? Buf provides multiple options for this, including the ability to directly compile and compare against a git branch or git tag, however it is generally preferable to store a representation of your old version in a file. Buf does this via Images, allowing you to store your golden state, and then compare your current Protobuf schema against this golden state. This includes support for partial comparisons, as well as storing this golden state in a remote location.

Using protoc instead of the internal compiler

Existing lint and breaking change detection tools produce an internal representation of your Protobuf schema in one of two ways:

  • By using a third-party Protobuf parser, which is usually error-prone and almost never covers every edge case of the Protobuf grammar.
  • By shelling out to protoc itself and parsing the result, which not only requires specific management of protoc in relation to the lint/breaking change detection tool, but can be cumbersome and error-prone itself, especially if the tool parses error output from protoc.

Buf tackles this issue by using FileDescriptorSets (which we extend into Images.md) internally for all operations, and allowing these FileDescriptorSets to be produced in one of two ways:

  • By using a newly-developed Golang Protobuf compiler that is continuously tested against thousands of known Protobuf definitions, including all known edge cases of the Protobuf grammar.
  • By allowing users to provide protoc output as buf input, thereby bypassing any compiling or parsing on the part of buf entirely, and instead using protoc, the gold standard of Protobuf compilation.

See the Image and compiler documentation for more details.

In short, we don't expect you to natively trust the internal compiler is actually equivalent to protoc - we would want to verify this claim ourselves. There are also cases (such as Bazel setups) where you may already have infrastructure around calling protoc, and may want to just use artifacts from protoc as input to buf.

Buf Schema Registry

Buf's primary functionality right now is linting and breaking change detection.

These features, and the corresponding CLI tool buf, will always remain free and open source.

However, our goal is to develop this into a new way of working with Protocol Buffers. One of the first products we'll be releasing is the Buf Schema Registry, which will handle stub generation and consumption. See the future for more details.

The core primitive for Buf is the Image, which will be used for the Buf Schema Registry.

Specifying an Input

Inputs are specified as the first argument on the command line, and with the --against flag for the compare against Input on buf check breaking.

  • For buf build, the Input is specified as the first argument. The location to write the output Image is specified with --output, which defaults to the OS-equivalent of /dev/null.
  • For buf check lint, the Input to lint is specified as the first argument.
  • For buf check breaking, the Input is specified as the first argument, and the Input to compare against is specified with --against.
  • For buf ls-files, the Input to list is specified as the first argument.
  • For buf generate, the Input to list is specified as the first argument.

Inputs are specified as a string, which has the following structure:

path#option_key1=option_value1,option_key2=option_value2

The path specifies the path to the Input. The options specify options to interpret the Input at the path.

The option format can be used on any Input string to override the derived Format.

Examples (the mechanics of which are described below):

  • path/to/file.data#format=bin explicitly sets the Format to bin, as by default this path would be interpreted as Format dir.
  • https://github.com/googleapis/googleapis#format=git explicitly sets the Format to git. In this case however, note that https://github.com/googleapis/googleapis.git has the same effect as the latter is also a valid path (see below for derived Formats).
  • -#format=json explicitly sets the Format to json, i.e. read from stdin as JSON, or in the case of buf build --output, write to stdout as JSON.

As of now, there are six other options, all of which are Format-specific:

  • The branch option specifies the branch to clone for git Inputs.
  • The tag option specifies the tag to clone for git Inputs.
  • The ref option specifies an explicit git reference for git Inputs. Any ref that is a valid input to git checkout is accepted.
  • The depth option optionally specifies how deep of a clone to perform. This defaults to 50 if ref is set, and 1 otherwise.
  • The recurse_submodules option says to clone submodules recursively for git Inputs.
  • The strip_components option specifies the number of directories to strip for tar or zip Inputs.
  • The subdir option specifies a subdirectory to use within a git, tar, or zip Input.

If ref is specified, branch can be further specified to clone a specific branch before checking out the ref.

Source Formats

All Sources contain a set of .proto files that can be compiled.

dir

A local directory. The path can be either relative or absolute.

This is the default Format. By default, Buf uses the current directory as the Input for all commands.

Examples:

  • path/to/dir says to compile the files in this relative directory path.
  • /absolute/path/to/dir says to compile the files in this absolute directory path.

tar

A tarball. The path to this tarball can be either a local file, a remote http/https location, or - for stdin.

Use compression=gzip to specify that the tarball is is compressed with Gzip. This is automatically detected if the file extension is .tgz or .tar.gz.

Use compression=zstd to specify that the tarball is is compressed with Zstandard. This is automatically detected if the file extension is .tar.zst.

The strip_components and subdir options are optional. Note that strip_components is applied before subdir.

Examples:

  • foo.tar says to read the tarball at this relative path.
  • foo.tar.gz says to read the gzipped tarball at this relative path.
  • foo.tgz says to read the gzipped tarball at this relative path.
  • foo.tar.zst says to read the zstandard tarball at this relative path.
  • foo.tar#strip_components=2 says to read the tarball at this relative path and strip the first two directories.
  • foo.tgz#subdir=proto says to read the gzipped tarball at this relative path, and use the subdirectory proto within the archive as the base directory.
  • https://github.com/googleapis/googleapis/archive/master.tar.gz#strip_components=1 says to read the gzipped tarball at this http location, and strip one directory.
  • -#format=tar says to read a tarball from stdin.
  • -#format=tar,compression=gzip says to read a gzipped tarball from stdin.
  • -#format=tar,compression=zstd says to read a zstandard tarball from stdin.

zip

A zip archive. The path to this archive can be either a local file, a remote http/https location, or - for stdin.

The strip_components and subdir options are optional. Note that strip_components is applied before subdir.

Examples:

  • foo.zip says to read the zip archive at this relative path.
  • foo.zip#strip_components=2 says to read the zip archive at this relative path and strip the first two directories.
  • foo.zip#subdir=proto says to read the zip archive at this relative path, and use the subdirectory proto within the archive as the base directory.
  • https://github.com/googleapis/googleapis/archive/master.zip#strip_components=1 says to read the zip archive at this http location, and strip one directory.
  • -#format=zip says to read a zip archive from stdin.

git

A git repository. The path to the git repository can be either a local .git directory, or a remote git http://, https://, ssh://, or git:// location.

  • The branch option specifies the branch to clone.
  • The tag option specifies the tag to clone.
  • The ref option specifies an explicit git reference. Any ref that is a valid input to git checkout is accepted.
  • The depth option specifies how deep of a clone to perform. It defaults to 50 if ref is used and 1 otherwise.
  • The recurse_submodules option says to clone submodules recursively.
  • The subdir option says to use this subdirectory as the base directory.

Note that http://, https://, ssh://, and git:// locations must be prefixed with their scheme:

  • http locations must start with http://.
  • https locations must start with https://.
  • ssh locations must start with ssh://.
  • git locations must start with git://.

Examples:

  • .git#branch=master says to clone the master branch of the git repository at the relative path .git. This is particularly useful for local breaking change detection.
  • .git#tag=v1.0.0 says to clone the v1.0.0 tag of the git repository at the relative path .git.
  • .git#branch=master,subdir=proto say to clone the master branch and use the proto directory as the base directory.
  • .git#branch=master,recurse_submodules=true says to clone the master branch along with all recursive submodules.
  • .git#ref=7c0dc2fee4d20dcee8a982268ce35e66fc19cac8 says to clone the repo and checkout the specific ref. Any ref that is a valid input to git checkout can be used.
  • .git#ref=refs/remotes/pull/3,branch=my_feature,depth=100 says to clone the specified branch to a depth of 100 and checkout refs/remotes/pull/3.
  • https://github.com/googleapis/googleapis.git says to clone the default branch of the git repository at the remote location.
  • https://github.com/googleapis/googleapis.git#branch=master says to clone the master branch of the git repository at the remote location.
  • https://github.com/googleapis/googleapis.git#tag=v1.0.0 says to clone the v1.0.0 tag of the git repository at the remote location.
  • git://github.com/googleapis/googleapis.git#branch=master is also valid.
  • ssh://git@github.com/org/private-repo.git#branch=master is also valid.
  • https://github.com/googleapis/googleapis#format=git,branch=master is also valid.

Symlinks

Note that symlinks are supported for dir Inputs only, while git, tar, and zip Inputs will ignore all symlinks.

Image Formats

All Images are files. Files can be read from a local path, a remote http/https location, or - for stdin.

Images are created using buf build. Examples:

  • buf build -o image.bin
  • buf build -o image.bin.gz
  • buf build -o image.bin.zst
  • buf build -o image.json
  • buf build -o image.json.gz
  • buf build -o image.json.zst
  • buf build -o -
  • buf build -o -#format=json
  • buf build -o -#format=json,compression=gzip
  • buf build -o -#format=json,compression=zstd

Note that -o is an alias for --output.

Images can also be created in the bin Format using protoc. See the compiler documentation for more details.

For example, the following is a valid way to compile all Protobuf files in your current directory, produce a FileDescriptorSet (which is also an Image, as described in the Image documentation) to stdout, and read this Image as binary from stdin:

protoc -I . $(find. -name '*.proto') -o /dev/stdout | buf check lint -

bin

A binary Image.

Use compression=gzip to specify the Image is compressed with Gzip. This is automatically detected if the file extension is .bin.gz

Use compression=zstd to specify the Image is compressed with Zstandard. This is automatically detected if the file extension is .bin.zst

Examples:

  • image.bin says to read the file at this relative path.
  • image.bin.gz says to read the gzipped file at this relative path.
  • image.bin.zst says to read the zstandard file at this relative path.
  • - says to read a binary Image from stdin.
  • -#compression=gzip says to read a gzipped binary Image from stdin.
  • -#compression=zstd says to read a zstandard binary Image from stdin.

json

A JSON Image. This creates Images that take much more space, and are slower to parse, but will result in diffs that show the actual differences between two Images in a readable format.

Use compression=gzip to specify the Image is compressed with Gzip. This is automatically detected if the file extension is .json.gz

Use compression=zstd to specify the Image is compressed with Zstandard. This is automatically detected if the file extension is .json.zst

Examples:

  • image.json says to read the file at this relative path.
  • image.json.gz says to read the gzipped file at this relative path.
  • image.json.zst says to read the zstandard file at this relative path.
  • -#format=json says to read a JSON Image from stdin.
  • -#format=json,compression=gzip says to read a gzipped JSON Image from stdin.
  • -#format=json,compression=zstd says to read a zstandard JSON Image from stdin.

When combined with jq, this also allows for introspection. For example, to see a list of all packages:

$ buf build -o -#format=json | jq '.file[] | .package' | sort | uniq | head
"google.actions.type"
"google.ads.admob.v1"
"google.ads.googleads.v1.common"
"google.ads.googleads.v1.enums"
"google.ads.googleads.v1.errors"
"google.ads.googleads.v1.resources"
"google.ads.googleads.v1.services"
"google.ads.googleads.v2.common"
"google.ads.googleads.v2.enums"
"google.ads.googleads.v2.errors"

Automatically derived Formats

By default, Buf will derive the Format and compression of an Input from the path via the file extension.

ExtensionDerived FormatDerived Compression
.binbinnone
.bin.gzbingzip
.bin.zstbinzstd
.jsonjsonnone
.json.gzjsongzip
.json.zstjsonzstd
.tartarnone
.tar.gztargzip
.tgztargzip
.tar.zsttarzstd
.zipzipn/a
.gitgitnone

There are also two special cases:

  • If the path is -, this is interpreted to mean stdin. By default, this is interpreted as the bin Format.

    Of note, the special value - can also be used as a value to the --output flag of buf image build, which is interpreted to mean stdout, and also interpreted by default as the bin Format.

  • If the path is /dev/null on Linux or Mac, or nul in the future with Windows, this is interpreted as the bin format.

If no format can be automatically derived, the dir format is assumed, ie Buf assumes the path is a path to a local directory.

The format of an Input can be explicitly set as described above.

Deprecated Formats

The following formats are deprecated. They will continue to work forever, but we recommend updating if you are explictly specifying any of these.

FormatReplacement
bingzUse the bin format with the compression=gzip option.
jsongzUse the json format with the compression=gzip option.
targzUse the tar format with the compression=gzip option.

Authentication

Archives, git repositories, and image files can be read from remote locations. For those remote locations that need authentication, a couple mechanisms exist.

HTTPS

Remote archives and image files use netrc files for authentication. Buf will look for a netrc file at $NETRC first, defaulting to ~/.netrc.

Git repositories are cloned using the git command, so any credential helpers you have configured will be automatically used.

Basic authentication can be also specified for remote archives, git repositories, and image files over https with the following environment variables:

  • BUF_INPUT_HTTPS_USERNAME - The username. For GitHub, this is your GitHub user.
  • BUF_INPUT_HTTPS_PASSWORD - The password. For GitHub, this is a personal access token for your GitHub User.

Assuming one of these mechanisms is present, you can call Buf as you normally would:

$ buf check lint https://github.com/org/private-repo.git#branch=master
$ buf check lint https://github.com/org/private-repo.git#tag=v1.0.0
$ buf check lint https://github.com/org/private-repo/archive/master.tar.gz#strip_components=1
$ buf check lint https://github.com/org/private-repo/archive/master.zip#strip_components=1
$ buf check breaking --against https://github.com/org/private-repo.git#branch=master
$ buf check breaking --against https://github.com/org/private-repo.git#tag=v1.0.0

SSH

Public key authentication can be used for remote git repositories over ssh.

Git repositories are cloned via the git command, so by default, Buf will use your existing Git SSH configuration, including any identities added to ssh-agent.

The following environment variables can also be used:

  • BUF_INPUT_SSH_KEY_FILE - The path to the private key file.
  • BUF_INPUT_SSH_KNOWN_HOSTS_FILES - A colon-separated list of known hosts file paths.

Assuming one of these mechanisms is present, you can call Buf as you normally would:

$ buf check lint ssh://git@github.com/org/private-repo.git#branch=master
$ buf check lint ssh://git@github.com/org/private-repo.git#tag=v1.0.0
$ buf check breaking --against ssh://git@github.com/org/private-repo.git#branch=master
$ buf check breaking --against ssh://git@github.com/org/private-repo.git#tag=v1.0.0

Note that CI services such as CircleCI have a private key and known hosts file pre-installed, so this should work out of the box.

Input configuration

By default, buf will look for a configuration file for an Input in the following manner:

  • For dir, bin, json Inputs, Buf will look at your current directory for a buf.yaml file.
  • For tar and zip Inputs, Buf will look at the root of the archive for a buf.yaml file after strip_components is applied.
  • For git Inputs, Buf will look at the root of the cloned repository at the head of the cloned branch.

The configuration can be overridden by command line flags. See the configuration documentation for more details.