Data Storage Formats
This page collects resources that deal with storing complex, hierarchical/binary data to file.
Alternatively, taking an excerpt from bsdf.io:
[...] data specification for serializing (scientific) data, for the purpose of storage and (inter process) communication.
Issue
- Many file formats are designed to store complex hierarchical data in a "human-readable" format.
- The misnomer is that "human-readable" really means these files can be read/edited with simple text editors.
- In reality, these formats are error-prone/difficult for humans to edit, and even understand.
- As dataset complexity increases, simple text editors unavoidably assure the eventuality of errors.
- What is really needed to ensure "human-readability" is a simple, flexible application to view/edit files.
Goal
Thus, these data storage formats might as well provide the following components:
- A solid, flexible, hierarchical binary format.
- A simple-to-use editor/reader.
Basic info
- ✨: Recommended by MA
- 🤔: Looks promising to MA
- Comparison of data-serialization formats (Wikipedia) ↪
Text-based formats
- CSV: (Not typically hierarchical)
- JSON
- YAML
- XML
Binary formats
- HDF5 ↪: ✨ Hierarchical Data Format.
- Source: The HDF Group; Originally developed @ National Center for Supercomputing Applications.
- HDF View ↪: Editor.
- NetCDF ↪: ✨ Network Common Data Form.
- Source: University Corporation for Atmospheric Research.
- (MA) HDF5 appears to be a better alternative.
- ASDF ↪: (Advanced Scientific Data Format), pronounced AZ-diff.
- Source: Space Telescope Science Institute.
- BSON ↪: [bee · sahn], short for Binary JSON.
- BSDF ↪: Strives to be simple, compact and fast.
- Comparison with other solutions.
- Source: Almar Klein.