📋Key Takeaways
- 1JSON: human-readable, universal, but verbose and slow to parse — use for APIs and config
- 2Protocol Buffers (protobuf): binary, fast, strongly typed, compact — use for gRPC and internal communication
- 3Avro: binary with schema evolution, used in Kafka and data pipelines — schema stored separately
- 4Parquet/ORC: columnar formats for analytics — excellent compression and query performance for OLAP
How Data Crosses Service Boundaries
Every piece of data sent between services, stored in queues, or written to disk must be serialized into bytes. The format you choose affects performance, storage cost, schema evolution capability, and developer experience.
Serialization Formats Compared
| Format | Type | Human-Readable | Speed | Schema Evolution | Best For |
|---|---|---|---|---|---|
| JSON | Text | Yes | Slow | Weak | REST APIs, config files |
| Protobuf | Binary | No | Very fast | Good (field numbering) | gRPC, internal services |
| Avro | Binary | No | Fast | Excellent (full+backward) | Kafka, data pipelines |
| MessagePack | Binary | No | Fast | None | Light binary JSON replacement |
| Parquet | Columnar | No | Fast for analytics | Good | Data lakes, Spark, analytics |
Advantages
- •JSON is universal and debuggable
- •Protobuf is 3-10x faster than JSON
- •Parquet dramatically reduces analytics storage costs
Disadvantages
- •Binary formats are not human-debuggable
- •Schema evolution requires careful planning
- •Multiple formats in one system add complexity
🧪 Test Your Understanding
Knowledge Check1/1
Which format is best for Kafka event streaming with evolving schemas?