Protocol Buffers 为结构化数据的序列化向前兼容,向后兼容,提供了语言中立、平台无关、可扩展机制的途径。类似JSON,但比JSON更小、更快。
通过.proto
文件来定义,生成接口代码、特定语言的运行库,以及数据的序列化格式。
网络包的序列化格式 ,高达几兆大小的结构化数据,适用于网络传输和长期的数据存储。面对变更,不用修改代码。
程序员只需编写.proto
文件
message Person {
optional string name = 1;
optional int32 id = 2;
optional string email = 3;
}
通过.proto
文件,可生成各种语言的代码,还包含字段的访问、序列化和反序列化的方法。
Person john = Person.newBuilder()
.setId(1234)
.setName("John Doe")
.setEmail("jdoe@example.com")
.build();
output = new FileOutputStream(args[0]);
john.writeTo(output);
由于可用于持久化,那么向后兼容就是至关重要的了。Protocol buffers 允许修改、新增、删除字段的同时,不影响现有服务,后面细说。
Protocol buffers可实现以下功能:
一般用于定义通信协议(同grpc一起使用)和数据存储。
优点:
可使用不同语言序列化和反序列化
定义一份.proto
文件,多个项目都能使用。可用于跨项目之间的接口定义。
由于支持跨项目,就要考虑向前兼容和向后兼容。
optional int32 result_per_page = 3 [default = 10];
.proto Type | Notes | C++ Type | Java Type | Python Type[2] | Go Type |
---|---|---|---|---|---|
double | double | double | float | *float64 | |
float | float | float | float | *float32 | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | *int32 |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long[3] | *int64 |
uint32 | Uses variable-length encoding. | uint32 | int[1] | int/long[3] | *uint32 |
uint64 | Uses variable-length encoding. | uint64 | long[1] | int/long[3] | *uint64 |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | *int32 |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long[3] | *int64 |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 228. | uint32 | int[1] | int/long[3] | *uint32 |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 256. | uint64 | long[1] | int/long[3] | *uint64 |
sfixed32 | Always four bytes. | int32 | int | int | *int32 |
sfixed64 | Always eight bytes. | int64 | long | int/long[3] | *int64 |
bool | bool | boolean | bool | *bool | |
string | A string must always contain UTF-8 encoded text. | string | String | unicode (Python 2) or str (Python 3) | *string |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | bytes | []byte |
syntax = "proto3";
message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
}
文件第一行非空、非注释的代码,指定了proto的版本,否则默认按照proto2来解析
非配字段序号,二进制文件中字段的唯一标识,不应该改变和复用,会影响兼容性
为避免上诉问题,如果是多系统交互,删除字段后,应该通过reserved来标识该字段序号或者字段名被预留了
message Foo {
reserved 2, 15, 9 to 11;
reserved "foo", "bar";
}
1-15的字段序号(包含字段类型)需要一个字节存储,16-2047的字段序号需要两个字节存储,频繁使用的字段应放到1-15范围内
多个相关的message可以放到一个proto文件
/* SearchRequest represents a search query, with pagination options to
* indicate which results to include in the response. */
message SearchRequest {
string query = 1;
int32 page_number = 2; // Which page number do we want?
int32 result_per_page = 3; // Number of results to return per page.
}
.pb.go
文件
枚举的默认值是第一个定义的枚举值,并且必须值为0
repeated字段的默认值为空的list
实际使用时需注意区分默认值和主动设置的值,例如一个布尔值为false,有可能是主动设置的false,也有可能是没有提供该参数而产生的默认值。这种情况可使用包装类
import "google/protobuf/wrappers.proto";
google.protobuf.Int32Value status = 2;
message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
enum Corpus {
UNIVERSAL = 0;
WEB = 1;
IMAGES = 2;
LOCAL = 3;
NEWS = 4;
PRODUCTS = 5;
VIDEO = 6;
}
Corpus corpus = 4;
}
第一个枚举值必须为0,可用于默认值
重复值需注明,否则编译错误
message MyMessage1 {
enum EnumAllowingAlias {
option allow_alias = true;
UNKNOWN = 0;
STARTED = 1;
RUNNING = 1;
}
}
无法识别的枚举值也会被序列化到文件,还会反序列化到message
删除枚举值也会产生兼容性问题,和字段类似,可以通过预留的方式,防止被重新使用
enum Foo {
reserved 2, 15, 9 to 11, 40 to max;
reserved "FOO", "BAR";
}
import "myproject/other_protos.proto";
message SearchResponse {
message Result {
string url = 1;
string title = 2;
repeated string snippets = 3;
}
repeated Result results = 1;
}
外部使用
message SomeOtherMessage {
SearchResponse.Result result = 1;
}
Protocol Buffers在处理数据时,会自动进行类型转换,所以有的情况下可以达到兼容的效果。例如string的code读取bytes时,只要bytes是utf8编码的,就可以读取为string。int32读取int64的数据,会自动截取32位。
这里主要是体现兼容性,但不建议故意为之。
old code parse new binary,new fields become unknown fields
import "google/protobuf/any.proto";
message ErrorStatus {
string message = 1;
repeated google.protobuf.Any details = 2;
}
message SampleMessage {
oneof test_oneof {
string name = 4;
SubMessage sub_message = 9;
}
}
None/NOT_SET
,无法区分是没有设置值,还是因为兼容性问题导致的map<string, Project> projects = 3;
package foo.bar;
message Open { ... }
message Foo {
...
foo.bar.Open open = 1;
...
}
service SearchService {
rpc Search(SearchRequest) returns (SearchResponse);
}
json里的空字段转Protocol buffers时,会转成默认值。Protocol buffers里的默认字段转json时会被忽略,但可配置。
proto3 | JSON | JSON example | Notes |
---|---|---|---|
message | object | {"fooBar": v, "g": null, …} | Generates JSON objects. Message field names are mapped to lowerCamelCase and become JSON object keys. If the json_name field option is specified, the specified value will be used as the key instead. Parsers accept both the lowerCamelCase name (or the one specified by the json_name option) and the original proto field name. null is an accepted value for all field types and treated as the default value of the corresponding field type. |
enum | string | "FOO_BAR" | The name of the enum value as specified in proto is used. Parsers accept both enum names and integer values. |
map<K,V> | object | {"k": v, …} | All keys are converted to strings. |
repeated V | array | [v, …] | null is accepted as the empty list []. |
bool | true, false | true, false | |
string | string | "Hello World!" | |
bytes | base64 string | "YWJjMTIzIT8kKiYoKSctPUB+" | JSON value will be the data encoded as a string using standard base64 encoding with paddings. Either standard or URL-safe base64 encoding with/without paddings are accepted. |
int32, fixed32, uint32 | number | 1, -10, 0 | JSON value will be a decimal number. Either numbers or strings are accepted. |
int64, fixed64, uint64 | string | "1", "-10" | JSON value will be a decimal string. Either numbers or strings are accepted. |
float, double | number | 1.1, -10.0, 0, "NaN", "Infinity" | JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted. -0 is considered equivalent to 0. |
Any | object | {"@type": "url", "f": v, … } | If the Any contains a value that has a special JSON mapping, it will be converted as follows: {"@type": xxx, "value": yyy}. Otherwise, the value will be converted into a JSON object, and the "@type" field will be inserted to indicate the actual data type. |
Timestamp | string | "1972-01-01T10:00:20.021Z" | Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. |
Duration | string | "1.000340012s", "1s" | Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision and the suffix "s" is required. |
Struct | object | { … } | Any JSON object. See struct.proto. |
Wrapper types | various types | 2, "2", "foo", true, "true", null, 0, … | Wrappers use the same representation in JSON as the wrapped primitive type, except that null is allowed and preserved during data conversion and transfer. |
FieldMask | string | "f.fooBar,h" | See field_mask.proto. |
ListValue | array | [foo, bar, …] | |
Value | value | Any JSON value. Check google.protobuf.Value for details. | |
NullValue | null | JSON null | |
Empty | object | {} | An empty JSON object |
不同级别的选项:file-level、message-level、field-level、enum types, enum values, oneof fields, service types, and service methods...
级别对应编写的位置
可以自定义option
go_packages
:指定生成的文件的引用路径,最后一个词作为包名
option go_package = "github.com/protocolbuffers/protobuf/examples/go/tutorialpb";