Images
Throughout the documentation, you may occasionally see references to Buf images. We'll go over what images are, how they are used, and the various options associated with them here.
#
How Protobuf plugins workFirst we need to provide a short overview of how Protobuf plugins work.
When you invoke this command...
$ protoc -I . --go_out=gen/go foo.proto
...here's (roughly) what happens:
protoc
compiles the filefoo.proto
(and any imports) and internally produces aFileDescriptorSet
, which is a list ofFileDescriptorProto
messages. These messages contain all information about your.proto
files, including optional source code information such as the start/end line/column of each element of your.proto
file, as well as associated comments.- The
FileDescriptorSet
is turned into aCodeGeneratorRequest
, which contains theFileDescriptorProto
s thatprotoc
produced forfoo.proto
and any imports, a list of the files specified (justfoo.proto
in this example), as well as any options provided after the=
sign of--go_out
or with--go_opt
. protoc
then looks for a binary namedprotoc-gen-go
, and invokes it, giving the serialized CodeGeneratorRequest as stdin.protoc-gen-go
runs, and either errors or produces aCodeGeneratorResponse
, which specifies what files are to be generated and their content. The serialized CodeGeneratorResponse is written to stdout ofprotoc-gen-go
.- On success of
protoc-gen-go
,protoc
reads stdout and then writes these generated files.
The built-in generators to protoc
, such as --java_out
, --cpp_out
, etc.,
work in roughly the same manner, although instead of executing an external
binary, this is done internally to protoc
.
FileDescriptorSet
s are the core primitive used throughout the Protobuf
ecosystem to represent a compiled Protobuf schema. They are also the primary
artifact that protoc produces.
That is to say that everything you do with protoc
, and any plugins you use,
talk in terms of FileDescriptorSet
s.
gRPC Reflection
uses them under the hood as well.
FileDescriptorSet
s with protoc#
Creating protoc
provides the --descriptor_set_out
flag, aliased as -o
, to allow
writing serialized FileDescriptorSet
s. For example, given a single file
foo.proto
, you can write a FileDescriptorSet
to stdout like this:
$ protoc -I . -o /dev/stdout foo.proto
The resulting FileDescriptorSet
contains a single FileDescriptorProto
with
name foo.proto
.
By default, FileDescriptorSet
s don't include any imports not specified on the
command line, and don't include source code information. Source code information
is useful for generating documentation inside your generated stubs, and for
things like linters and breaking change detectors. As an example, assume
foo.proto
imports bar.proto
. To produce a FileDescriptorSet
that includes
both foo.proto
and bar.proto
, as well as source code information:
$ protoc -I . --include_imports --include_source_info -o /dev/stdout foo.proto
#
What are Buf images?An image is Buf's custom extension to FileDescriptorSet
s. The actual
definition is currently stored in the
bufbuild/buf
repo as of this writing.
Buf images are FileDescriptorSet
s, and FileDescriptorSet
s are images.
Due to the forwards- and backwards-compatible nature of Protobuf, we add an
additional field to FileDescriptorSet
while maintaining compatibility in both
directions - existing Protobuf plugins drop this field, and buf
does not
require this field to be set to work with images.
Modules are the primitive of Buf, and Buf
images represent the compiled artifact of a module. In fact, images contain
information about the module used to create it, which powers a variety of
BSR features. For clarity, the Image
Protobuf definition is shown below (notice the ModuleName
in the
ImageFileExtension
):
// A Buf image is an extended FileDescriptorSet.//// See https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.protomessage Image { repeated ImageFile file = 1;}
// ImageFile is an extended FileDescriptorProto.//// Since FileDescriptorProto does not have extensions, we copy the fields from// FileDescriptorProto, and then add our own extensions via the// buf_image_file_extension field. This is compatible with a// FileDescriptorProto.//// See https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.protomessage ImageFile { optional string name = 1; optional string package = 2; repeated string dependency = 3; repeated int32 public_dependency = 10; repeated int32 weak_dependency = 11; repeated google.protobuf.DescriptorProto message_type = 4; repeated google.protobuf.EnumDescriptorProto enum_type = 5; repeated google.protobuf.ServiceDescriptorProto service = 6; repeated google.protobuf.FieldDescriptorProto extension = 7; optional google.protobuf.FileOptions options = 8; optional google.protobuf.SourceCodeInfo source_code_info = 9; optional string syntax = 12;
// buf_extension contains buf-specific extensions to FileDescriptorProtos. // // The prefixed name and high tag value is used to all but guarantee there // is never a conflict with Google's FileDescriptorProto definition. // The definition of a FileDescriptorProto has not changed in years, so // we're not too worried about a conflict here. optional ImageFileExtension buf_extension = 8042;}
// ImageFileExtension contains extensions to ImageFiles.//// The fields are not included directly on the ImageFile so that we can both// detect if extensions exist, which signifies this was created by buf and not// by protoc, and so that we can add fields in a freeform manner without// worrying about conflicts with FileDescriptorProto.message ImageFileExtension { // is_import denotes whether this file is considered an "import". // // An import is a file which was not derived from the local source files. // There are two cases where this could be true: // // 1. A Well-Known Type included from the compiler. // 2. A file that was included from a Buf module dependency. // // We use "import" as this matches with the protoc concept of // --include_imports, however import is a bit of an overloaded term. // // This is always set. optional bool is_import = 1; // ModuleInfo contains information about the Buf module this file belongs to. // // This field is optional and is not set if the module is not known. optional ModuleInfo module_info = 2; // is_syntax_unspecified denotes whether the file did not have a syntax // explicitly specified. // // Per the FileDescriptorProto spec, it would be fine in this case to just // leave the syntax field unset to denote this and to set the syntax field // to "proto2" if it is specified. But protoc does not set the syntax // field if it was "proto2", and plugins may (incorrectly) depend on this. // We also want to maintain consistency with protoc as much as possible. // So instead, we have this field which denotes whether syntax was not // specified. // // This is always be set. optional bool is_syntax_unspecified = 3; // unused_dependency are the indexes within the dependency field on // FileDescriptorProto for those dependencies that are not used. // // This matches the shape of the public_dependency and weak_dependency // fields. repeated int32 unused_dependency = 4;}
// ModuleInfo contains information about a Buf module that an ImageFile// belongs to.message ModuleInfo { // name is the name of the Buf module. // // This is always set. optional ModuleName name = 1; // commit is the repository commit. // // This field is optional and not set if the commit is not known. optional string commit = 2;}
// ModuleName is a module name.//// All fields are always set.message ModuleName { optional string remote = 1; optional string owner = 2; optional string repository = 3;}
#
Linting and breaking change detectionLinting and breaking change detection internally operate on Buf images that the
buf
CLI either produces on the fly or reads from an external location. They
represent a stable, widely used method to represent a compiled Protobuf schema.
For the breaking change detector, images are the storage format used if you want
to manually store the state of your Protobuf schema. See the
input documentation for more details.
#
Creating imagesYou can create Buf images using buf build
. If the current directory contains a
valid buf.yaml
, you can building an image
with this command:
$ buf build -o image.bin
The resulting Buf image is written to the image.bin
file. Of note, the
ordering of the FileDescriptorProto
s is carefully written to mimic the
ordering that protoc
would produce, for both the cases where imports are and
are not written.
By default, buf
produces a Buf image with both
imports and source code info. You can strip each of these:
$ buf build --exclude-imports --exclude-source-info -o image.bin
In general, we do not recommend stripping these, as this information can be useful for various operations. Source code info, however, takes up a lot of additional space (generally ~5x more space), so if you know you do not need this data, it can be useful to strip source code info.
Images can be outputted in one of two formats:
- Binary
- JSON
Either format can be compressed using Gzip or Zstandard.
Per the Buf input documentation, buf build
can deduce the format
from the file extension:
$ buf build -o image.bin$ buf build -o image.bin.gz$ buf build -o image.bin.zst$ buf build -o image.json$ buf build -o image.json.gz$ buf build -o image.json.zst
The special value -
is used to denote stdout. You can manually set the format.
For example:
$ buf build -o -#format=json
You can combine this with jq to introspect the built image. To see a list of all packages:
$ buf build -o -#format=json | jq '.file[] | .package' | sort | uniq | head
Output"google.actions.type""google.ads.admob.v1""google.ads.googleads.v1.common""google.ads.googleads.v1.enums""google.ads.googleads.v1.errors""google.ads.googleads.v1.resources""google.ads.googleads.v1.services""google.ads.googleads.v2.common""google.ads.googleads.v2.enums""google.ads.googleads.v2.errors"
Images always include the ImageFileExtension
field. If you want a pure
FileDescriptorSet
without this field set, to mimic protoc
entirely:
$ buf build -o image.bin --as-file-descriptor-set
The ImageFileExtension
field doesn't affect Protobuf plugins or any other
operations; they merely see this as an unknown field. But we provide this option
in case you need it.
buf
input#
Using protoc output as Since buf
speaks in terms of Buf images and
[FileDescriptorSet
][filescriptorset]s are images, we can useprotoc
output as
buf
input. Here's an example for buf lint
:
$ protoc -I . --include_source_info -o /dev/stdout foo.proto | buf lint -
#
Protoc lint and breaking change detection pluginsSince buf
"understands" FileDescriptorSet
s, we can
provide plugins protoc-gen-buf-lint
and
protoc-gen-buf-breaking
as standard
Protobuf plugins as well.