Delta Lake(.Parquet) vs JSON Formats for storage

Delta Lake(.Parquet) vs JSON Formats for storage

A short insight on my observations regarding the parquet and json format of files for storage in any platform.

Introduction

Fast storage and retrieval of data are vital for maintaining a competitive edge, enhancing user experience, and facilitating efficient decision-making, especially in a fast-paced digital environment where responsiveness and scalability are paramount.

These are some of my observations while working with these file format's.

Why Json files?

JavaScript Object Notation (JSON) is a file format that uses human-readable text to store and transmit data objects

  • Easy to use: Json objects can be easily created and used.

  • Widely accepted: Json objects are used in almost all the services and supported by most of the programming languages.

  • Lesser complexity: They are less complex than parquet files.

Why delta parquet files?

Parquet format is made by Apache for fast data processing of complex data.

  • Columner format: It stores data in a columner format, which makes it easier to read.

  • Easy rollback: It allows for easy rollback in case of wrong data input.

  • Custom data partitioning: Allows to partition data based on unique column entries makes it easy to know about the data.

  • Compression : It compresses the data due to columner strorage ability

  • ACID properties: It follows and implements all the ACID properties.

  • Scalability: It can handle large amount of data efficently due to its scalable metadata handling and data versioning capabilities.

Below diagram shows how delta lake files and folders are structured.

Json files have data about when new data was added and new parquet file and json files are created on each insertion and the whole folder will be read(my_table).

Conclusion

When it comes to handling large scale data it's always best to go with delta lake table or the parquet format. On the other hand for small data, json is the better option.

Note: for any corrections do reach out to me using my socials present in the navbar.