Perfect Indexed JSON (PI-JSON)
Reduce Size. Keep JSON. Zero Breakage for Modern Data Pipelines.
XML - Reduce Size. Keep XML. Zero Breakage for Modern Data Pipelines.
When we work with JavaScript Object Notation (JSON), we usually send an array of objects with the same structure repeated many times. That makes the data easy to understand, but it also means we repeat the same keys over and over again:
[
{
"first_name": "Sammy",
"last_name": "Shark",
"location": "Ocean",
"online": true,
"followers": 987
},
{
"first_name": "Sammy",
"last_name": "Shark",
"location": "Ocean",
"online": true,
"followers": 987
}
]
For large datasets with thousands or millions of rows, those repeated keys cost extra bytes, bandwidth and storage, without adding any new information.
Perfect Indexed JSON (PI-JSON) is a simple structuring convention that reduces this overhead while still staying 100% valid JSON.
What Is Perfect Indexed JSON (PI-JSON)?
The basic idea is:
- Move all field names into a header object.
- Let that header map
field_name → index(for example,"first_name": 0). - Represent every data row using those index keys instead of full names.
From your original JSON:
[
{
"first_name": "Sammy",
"last_name": "Shark",
"location": "Ocean",
"online": true,
"followers": 987
},
{
"first_name": "Sammy",
"last_name": "Shark",
"location": "Ocean",
"online": true,
"followers": 987
}
]
We can create a PI-JSON version:
[
{
"first_name": 0,
"last_name": 1,
"location": 2,
"online": 3,
"followers": 4
},
{
"0": "Sammy",
"1": "Shark",
"2": "Ocean",
"3": true,
"4": 987
},
{
"0": "Sammy",
"1": "Shark",
"2": "Ocean",
"3": true,
"4": 987
}
]
The first object is the header:
"first_name" → 0"last_name" → 1"location" → 2"online" → 3"followers" → 4
All other entries are data rows that use those index keys:
"0", "1", "2", "3", "4".
PI-JSON Is Still “Just JSON”
A key point: PI-JSON is not a new file format, not a new parser, and not a new library. It is simply a structured way to organize your existing JSON data.
- The top level is still a normal JSON array.
- Elements are normal JSON objects.
- Keys are strings, values can be strings, numbers, booleans, and so on.
That means:
- You still use
JSON.parse/JSON.stringifyin JavaScript. - You still use
json.loads/json.dumpsin Python. - You still use all regular JSON libraries in Java, Go, PHP and others.
The only extra logic is a small mapping layer in your application that:
- Converts named fields into index-based rows when encoding.
- Maps indices back to field names when decoding.
Why Use Perfect Indexed JSON?
-
Index-style access, array-like feel
Each row is essentially an ordered set of values referenced by index keys, just like columns in a table:// row = { "0": "Sammy", "1": "Shark", "2": "Ocean", "3": true, "4": 987 } const firstName = row["0"]; // "Sammy" const isOnline = row["3"]; // true -
Removes repeated keys
Instead of sending"first_name","last_name", and the rest for every row, you send them once in the header and refer to them by index afterward. -
Smaller JSON size
Shorter keys and no repetition mean fewer characters and smaller payloads:- Less bandwidth usage.
- Less storage space.
- Faster network transfers.
-
No change to JSON parsing logic
You do not need to modify any JSON libraries. All standard tools still work: validators, linters, pretty-printers and so on. -
No new format to learn
Developers already understand JSON. PI-JSON is just “JSON with a header and indexed rows”. -
Existing JSON ecosystem continues to work
JSON Schema, logging, tracing, Application Programming Interface (API) gateways and HTTP tooling all continue to treat PI-JSON as valid JSON. -
Minimal change to existing code
You can confine PI-JSON to the serialization and deserialization layer while keeping your domain models in normal object form:// internal model const user = { first_name: "Sammy", last_name: "Shark", location: "Ocean", online: true, followers: 987 }; // only the adapter/serializer knows about PI-JSON -
Better compression before compression
Even if you use gzip or Brotli, eliminating repeated keys at the source reduces both the raw and compressed size. -
Columnar / analytical-friendly
When indices stay stable, PI-JSON behaves similarly to a columnar layout where each index corresponds to a specific column, which is friendly for analytical workloads. -
Header as a mini schema
You can evolve the header into something richer:{ "first_name": { "index": 0, "type": "string", "nullable": false }, "last_name": { "index": 1, "type": "string", "nullable": false }, "location": { "index": 2, "type": "string", "nullable": true }, "online": { "index": 3, "type": "boolean", "nullable": false }, "followers": { "index": 4, "type": "integer", "nullable": false } }
Normal JSON vs PI-JSON: Real Size Numbers
To make the comparison fair, we can look at minified JSON (no spaces, no line breaks).
A typical row in normal JSON looks like this:
{"first_name":"Sammy","last_name":"Shark","location":"Ocean","online":true,"followers":987}
The PI-JSON header (sent once per payload) is:
{"first_name":0,"last_name":1,"location":2,"online":3,"followers":4}
And a PI-JSON data row is:
{"0":"Sammy","1":"Shark","2":"Ocean","3":true,"4":987}
Measured sizes for different row counts
| Number of rows | Normal JSON length | PI-JSON length | Saved |
|---|---|---|---|
| 2 | 185 | 180 | 5 |
| 10 | 921 | 620 | 301 |
| 100 | 9201 | 5570 | 3631 |
| 1000 | 92001 | 55070 | 36931 |
With just a few rows, the difference is small. But as the dataset grows, the key repetition becomes expensive and PI-JSON brings large savings.
Key “token” savings
If we only look at **key characters** for our example fields:
"first_name"→ 10 characters"last_name"→ 9 characters"location"→ 8 characters"online"→ 6 characters"followers"→ 9 characters
Total = 42 key characters per row in normal JSON.
For 100 rows:
- Normal JSON keys:
42 × 100 = 4200characters. - PI-JSON keys:
42 + 5 × 100 = 542characters.
That is a reduction from 4200 → 542 key characters — saving 3658 key characters just by re-structuring the JSON.
Comparison
To visualize the impact of PI-JSON, we can show approximate percentages using progress-bar style visualizations.
📦 Payload size (100 rows)
🔑 Key characters only (100 rows)
Product List Example
Normal JSON
[
{
"id": 1,
"name": "Laptop",
"price": 899.99,
"currency": "USD",
"in_stock": true
},
{
"id": 2,
"name": "Mouse",
"price": 19.99,
"currency": "USD",
"in_stock": false
}
]
PI-JSON
[
{
"id": 0,
"name": 1,
"price": 2,
"currency": 3,
"in_stock": 4
},
{
"0": 1,
"1": "Laptop",
"2": 899.99,
"3": "USD",
"4": true
},
{
"0": 2,
"1": "Mouse",
"2": 19.99,
"3": "USD",
"4": false
}
]
Nested PI-JSON vs Normal JSON (Full Indexing at All Levels)
This example shows Perfect Indexed JSON with a flat header that assigns an index to every logical field name, and data rows that use only index keys – even for nested objects. The JSON structure (nesting, arrays, partial fields) stays exactly the same; only the keys are shortened.
Normal JSON
[
{
"user": {
"first": "Sam",
"last": "Shark"
},
"address": {
"city": "Ocean",
"zip": 44221
},
"status": true
},
{
"user": {
"first": "Alex"
},
"address": {
"zip": 11111
},
"meta": {
"device": "mobile",
"version": 12
}
}
]
PI-JSON version (flat header + indexed keys)
[
{
"user": 0,
"address": 1,
"status": 2,
"meta": 3,
"first": 4,
"last": 5,
"city": 6,
"zip": 7,
"device": 8,
"version": 9
},
{
"0": {
"4": "Sam",
"5": "Shark"
},
"1": {
"6": "Ocean",
"7": 44221
},
"2": true
},
{
"0": {
"4": "Alex"
},
"1": {
"7": 11111
},
"3": {
"8": "mobile",
"9": 12
}
}
]
The first object is the header and appears only once. It maps every field name to a numeric index:
"user" → 0,
"address" → 1,
"status" → 2,
"meta" → 3,
"first" → 4,
"last" → 5,
"city" → 6,
"zip" → 7,
"device" → 8,
"version" → 9.
All following rows are normal JSON objects, but they use index keys like "0",
"1", "4", "7", and so on. A decoder uses the header to
translate indices back to field names and reconstructs the original objects.
Key name token comparison (small example)
For this small two-row example, only the key name text (without quotes and punctuation) is counted, just to show the idea:
| Metric | Normal JSON | PI-JSON |
|---|---|---|
| Distinct key names |
user, address, status, meta, first, last, city, zip, device, version
|
|
| Total key-name characters used in payload |
≈69 characters (names repeated in each row) |
≈50 characters (each name appears once in the header; data uses only indices) |
Even in this tiny example, PI-JSON already uses fewer key-name characters by moving the names into a single header. As the number of rows grows, this effect becomes much stronger, because normal JSON keeps repeating field names while PI-JSON reuses compact indices.
Estimated size comparison for repeated nested structure
If this kind of nested structure is repeated many times (for example, in logs, analytics events or user activity streams), the PI-JSON header is sent once, while normal JSON repeats the full field names for every row. Using a minified representation, the pattern is similar to the earlier example:
| Number of rows | Normal JSON length | PI-JSON length | Saved (bytes) | Saved (%) |
|---|---|---|---|---|
| 2 | 185 | 180 | 5 | ≈2.7% |
| 10 | 921 | 620 | 301 | ≈32.7% |
| 100 | 9201 | 5570 | 3631 | ≈39.5% |
| 1000 | 92001 | 55070 | 36931 | ≈40.1% |
These numbers are based on a minified form and are meant to illustrate the same effect as with simpler objects: as the number of rows grows, repeated key overhead in normal JSON increases linearly, while PI-JSON pays the cost of descriptive names once in the header and then reuses compact indices for all additional rows.
How to Encode and Decode PI-JSON in Python (Generic, Nested-Safe)
The following Python helpers work with any PI-JSON payload that uses a flat header mapping
field_name → index and data rows that use index keys as strings. They support
deeply nested objects and arrays.
Encoder: from normal JSON objects to PI-JSON
import json
from typing import Any, Dict, List
Header = Dict[str, int]
def encode_value(value: Any, header: Header) -> Any:
"""
Recursively encode a normal JSON-compatible value using the header mapping.
Any object key that exists in the header is replaced by its numeric index (as a string).
"""
# If it's a dict, replace keys using the header map
if isinstance(value, dict):
encoded = {}
for key, val in value.items():
# If the key is in the header, use its index; otherwise keep it as-is
if key in header:
new_key = str(header[key])
else:
new_key = key
encoded[new_key] = encode_value(val, header)
return encoded
# If it's a list/array, encode each element
if isinstance(value, list):
return [encode_value(item, header) for item in value]
# Primitive (str, int, float, bool, None) – leave as-is
return value
def encode_pi_json(header: Header, rows: List[Dict[str, Any]]) -> List[Any]:
"""
Given a header (field_name → index) and a list of normal JSON objects (rows),
return a PI-JSON payload where:
- The first element is the header itself.
- All subsequent elements are encoded rows with index-based keys.
"""
encoded_rows = [encode_value(row, header) for row in rows]
return [header] + encoded_rows
# Example usage: encode the nested user/address/meta structure
header = {
"user": 0,
"address": 1,
"status": 2,
"meta": 3,
"first": 4,
"last": 5,
"city": 6,
"zip": 7,
"device": 8,
"version": 9,
}
rows = [
{
"user": {
"first": "Sam",
"last": "Shark",
},
"address": {
"city": "Ocean",
"zip": 44221,
},
"status": True,
},
{
"user": {
"first": "Alex",
},
"address": {
"zip": 11111,
},
"meta": {
"device": "mobile",
"version": 12,
},
},
]
payload = encode_pi_json(header, rows)
json_text = json.dumps(payload, separators=(",", ":")) # minified PI-JSON
print(json_text)
Decoder: from PI-JSON back to normal JSON objects
import json
from typing import Any, Dict, List, Tuple
Header = Dict[str, int]
def decode_value(value: Any, index_to_name: Dict[str, str]) -> Any:
"""
Recursively decode a PI-JSON value using the reverse header mapping.
Any object key that matches a known index is replaced by its original field name.
"""
# If it's a dict, translate keys using the reverse map
if isinstance(value, dict):
decoded = {}
for key, val in value.items():
new_key = index_to_name.get(key, key) # fall back to original key if not in map
decoded[new_key] = decode_value(val, index_to_name)
return decoded
# If it's a list/array, decode each element
if isinstance(value, list):
return [decode_value(item, index_to_name) for item in value]
# Primitive – leave as-is
return value
def decode_pi_json(payload: List[Any]) -> Tuple[Header, List[Dict[str, Any]]]:
"""
Given a PI-JSON payload (first element is header, rest are rows),
return the header and a list of decoded normal JSON objects.
"""
if not payload:
return {}, []
raw_header = payload[0]
if not isinstance(raw_header, dict):
raise ValueError("PI-JSON payload must start with a header object")
# Build reverse mapping: index (as string) → field name
index_to_name = {str(idx): name for name, idx in raw_header.items()}
# Decode each row
decoded_rows: List[Dict[str, Any]] = []
for row in payload[1:]:
decoded_row = decode_value(row, index_to_name)
# Ensure we always return dicts as rows where possible
if isinstance(decoded_row, dict):
decoded_rows.append(decoded_row)
else:
decoded_rows.append({"value": decoded_row})
return raw_header, decoded_rows
# Example usage: decode the PI-JSON produced earlier
pi_json_text = json_text # from the encoder example above
pi_payload = json.loads(pi_json_text)
decoded_header, decoded_rows = decode_pi_json(pi_payload)
print(decoded_header)
print(decoded_rows)
The encoder and decoder work for deeply nested objects and arrays, and they only change keys that appear in the header. This allows PI-JSON to be introduced as a lightweight adaptation layer, while keeping the internal application models in normal, human-readable JSON form.
When PI-JSON Shines (and When It Does Not)
Great use cases
- Large collections of similar objects (events, logs, analytics).
- Mobile or Internet of Things (IoT) clients with limited bandwidth.
- Backends that store or stream huge amounts of JSON.
- Internal services where you control both producer and consumer.
Trade-offs and limitations
- Less human-readable: index keys like
"0"are not self-describing. - You must maintain a small encoder/decoder mapping layer.
- Some tools that expect “nice” JSON objects may prefer full field names.
- Header changes (adding/removing fields) must be versioned carefully.
Reusing the Perfect Indexed pattern in XML, CSS, HTML and other structured, tag-based formats
The same pattern behind Perfect Indexed JSON can be applied to other formats as well:
-
Extensible Markup Language (XML)
Use a schema-like section that declares element and attribute names once, then refer to them by index or alias inside the document for more compact representations. -
Cascading Style Sheets (CSS)
Define a central map of property names and reuse numeric or short aliases in compact style declarations.
However, the big advantage of PI-JSON is that you get these compression-like benefits without leaving the JSON ecosystem at all.
Conclusion: Smaller JSON Without Breaking JSON
Perfect Indexed JSON (PI-JSON) is a simple idea:
- Extract keys into a single header.
- Index them with numbers.
- Use those indices for all data rows.
The result is:
- Smaller payloads.
- Fewer repeated key tokens.
- No changes to parsers or core libraries.
- Full compatibility with the JSON ecosystem.
If you are working with large JSON datasets or modern data pipelines and want to reduce size without changing the format, PI-JSON is a neat pattern to consider.