What are Images?
Throughout the documentation, you will see many references to Images. We'll go over what Images are, how they are used, and the various options associated with them here.
Protobuf plugins: how they work
First we need to provide a short overview of how plugins work.
When you invoke the following command:
The following is (roughly) what happens:
protoc
compiles the filefoo.proto
(and any imports) and internally produces a FileDescriptorSet, which is just a list of FileDescriptorProto messages. These messages contain all information about your.proto
files, including optionally source code information such as the start/end line/column of each element of your.proto
file, as well as associated comments.- The FileDescriptorSet is turned into a CodeGeneratorRequest,
which contains the FileDescriptorProtos that
protoc
produced forfoo.proto
and any imports, a list of the files specified (justfoo.proto
in this example), as well as any options provided after the=
sign of--go_out
or with--go_opt
. protoc
then looks for a binary namedprotoc-gen-go
, and invokes it, giving the serialized CodeGeneratorRequest as stdin.protoc-gen-go
runs, and either errors or produces a CodeGeneratorResponse, which specifies what files are to be generated and their content. The serialized CodeGeneratorResponse is written to stdout ofprotoc-gen-go
.- On success of
protoc-gen-go
,protoc
reads stdout and then writes these generated files.
The builtin generators to protoc
, i.e. --java_out
, --cpp_out
, etc, work in roughly
the same manner, although instead of executing an external binary, this is done internally
to protoc
.
FileDescriptorSets are the primitive used throughout the Protobuf ecosystem to represent a compiled Protobuf schema. They are also the primary artifact that protoc produces.
That is to say that everything you do with protoc
, and any plugins you use, talk in terms of FileDescriptorSets. Of note, they are how gRPC
Reflection works under the hood
as well.
How do I create FileDescriptorSets with protoc?
protoc
provides the --descriptor_set_out
flag, aliased as -o
, to allow writing serialized
FileDescriptorSets. For example, given a single file foo.proto
, you can write a FileDescriptorSet to
stdout as follows:
The resulting FileDescriptorSet will contain a single FileDescriptorProto with name foo.proto
.
By default, FileDescriptorSets will not include any imports not specified on the command line,
and will not include source code information. Source code information is useful for generating
documentation inside your generated stubs, and for things like linters and breaking change
detectors. As an example, assume foo.proto
imports bar.proto
. To produce a FileDescriptorSet
that includes both foo.proto
and bar.proto
, as well as source code information:
What are Images then?
An Image is Buf's custom extension to FileDescriptorSets. The actual definition is currently stored in bufbuild/buf as of this writing.
Images are FileDescriptorSets, and FileDescriptorSets are Images. Due to the forwards and backwards compatible nature of Protobuf, we're able to add an additional field to FileDescriptorSet while maintaining compatibility in both directions - existing Protobuf plugins will just drop this field, and Buf does not require this field to be set to work with Images.
Images are the primitive of Buf. As a result, FileDescriptorSets are also the primitive of Buf.
Linting and breaking change detection internally operate on Images that Buf either produces on the fly, or reads from an external location. They represent a stable, widely-used method to represent a compiled Protobuf schema. For the breaking change detector, Images are the storage format used if you want to manually store the state of your Protobuf schema. See the breaking change documentation for more details.
We use the ImageExtension of an Image to store additional information that is useful to Buf to perform it's operations. Currently, the only additional information stored is the indexes within the file array of the FileDescriptorProtos that are imports.
Right now, the only possible imports are the Well-Known Types. All other files are
specified through your build configuration, but it is always possible to include
the Well-Known Types in your .proto
files with Buf, and is usually possible to
include the Well-Known Types with protoc
in a standard installation. It's widely
accepted that a Protobuf compiler should always provide these.
Currently, we use this information in the linter and breaking change detector. For the linter, we do not want to lint imports - they are not part of your Protobuf schema that you care about for linting. The linter filters any imports before running the lint rules. If the ImageExtension field is not present, Buf cannot deduce what FileDescriptorProtos are imports, and lints everything.
For the breaking change detector, we check imports by default, however you can
exclude imports with the --exclude-imports
flag. As with the linter, if the
ImageExtension field is not present, Buf does not know what an import is, so
--exclude-imports
is a no-op.
Creating images
Images are created using buf build
. Given that you are in the root
of your repository, and you have a proper configuration:
The resulting image is written to the file image.bin
. Of note, the ordering of
the FileDescriptorProtos is carefully written to mimic the ordering that protoc
would produce, for both the cases where imports are and are not written.
By default, Buf produces an Image with both imports and source code info. You can strip each of these:
In general, we do not recommend stripping these, as this information can be useful for various operations. However, source code info specifically takes a lot of additional space, generally in the region of 5x as much space, so if you know you do not need this data, it can be useful to strip source code info.
Images can be outputted in one of two formats:
- Binary
- JSON
Either format can be compressed using Gzip or Zstandard.
Per the Inputs documentation, buf build
can deduce the format
by the file extension:
The special value -
is used to denote stdout. You can manually set the format. For example:
When combined with jq, this also allows for introspection. For example, to see a list of all packages:
Images always include the ImageExtension field. However, if you want a pure FileDescriptorSet
without this field set, to mimic protoc
entirely:
The ImageExtension field will not affect Protobuf plugins or any other operations, they will merely see this as an unknown field. However, we provide the option in case you want it.
Using protoc output as Buf input
Since Buf's primitive is the Image, and FileDescriptorSets are Images, we're able to easily
allow protoc
output to be buf
input. As an example for lint:
We discuss this further in the relevant sections of our documentation.
Protoc lint and breaking change detection plugins
Since Buf talks in terms of FileDescriptorSets, it's trivial for us to provide the Protobuf plugins protoc-gen-buf-lint and protoc-gen-buf-breaking as well.