Travis Martin

2021 Progress Update

We originally built all the SoC-side code using C#, but for the switch kernel this was a mistake due to the soft-realtime requirements of a switch. So we ported the switch kernel to Zig, significantly improving the performance of the switch.

We have made a lot of progress on understanding the ethical considerations of deploying the IsoGrid and we now have a plan to keep the specification and source-code implementation private until ethical norms can be established.

There isn’t likely to be any detailed updates to this site until a minimum viable product is ready.

Scalability of Isochronous Mesh Networking to 2^40 Switches

When discussing mesh networking, the common refrain is “mesh networking is not scalable”. Here is data and code that indicates it can scale ‘enough’ to support a full-scale global isochronous mesh network of 2^40 nodes with mostly stable links. These data were generated by running a mesh routing algorithm on each node in simulated networks sized from 2^7 to 2^23. Note that this algorithm isn’t the full spec’d IsoGrid algorithm, just a framework to test scalability.

Assumptions

This simulation and analysis makes the following assumptions:

The link layer is isochronous
The network layer is source-routed
Latency per hop is bounded by a constant time that is comparable to the time it takes for light to propagate across a hop
The links (on average) stay up for longer than a day (no mobile nodes)
Nodes join the network in a bootstrapping process inspired by S/Kademlia
The simulation of networks sized from 2^7 to 2^23 is enough to extrapolate likely scalability to 2^40
The global network topology is approximated by placing nodes in a 2D grid pattern and linking each node to the 4 nearest nodes (except at the edges and corners).
Each node has 100Gib/s (or more) total switching capacity
Each bootstrapping message is around 1kB
The cost of network transit is directly proportional to physical distance

It follows from #1 & #2 that latency per hop can be bounded by a constant. Specifically, #3 is reasonable at least until 2^47 nodes (~1 node per square meter of the land surface of the earth). Light takes ~5ns to travel 1m in optical fiber. With 128 bit words switched isochronously at an achievable 100Gib/s, each hop in this full-scale scenario has on the order of 1.2ns extra latency.

#4 reveals an acceptance that mesh networks with highly mobile nodes (actually nodes with highly transient links) aren’t likely to be scalable. But the switches that form the network don’t have to be mobile, just like the core routers of the Internet aren’t mobile.

The current TCP/IP Internet routing table is NOT scalable: Each node must know about all other nodes. #6 means that the following is only likely true but isn’t proven: Isochronous&source-routed mesh networks can scale to 2^40 nodes with conservative assumptions about topology and current technology. If, instead, one assumes a more 3D topology would develop naturally in the full-scale scenario (urban environments, satellites, and rooftop wireless laser hops), then it scales beyond 2^50 nodes.

#7 means that taking a simple bisection of the square grid produces the nodes that have the heaviest bootstrapping data load to switch. In order to be a viable network, this load must be a small portion of the available bandwidth of those nodes. Higher physical bandwidth links and more stable links network wide make this better.

Simulation

The simulation code is shared here for verification and reproducibility purposes. This code is NOT production quality. It’s never been code reviewed. It’s more important to move to the next phase of development rather than spend time cleaning up the simulation code.

The simulation was built to have the following scalability properties, and the data confirms it was achieved:

The “Search Rounds” taken by each node to find its best matched nodes (XOR metric) grows as LOG2(N)
The “Highest Traffic on a Link” grows near N^(1/2)

Data

Raw data was collected by simulating the bootstrapping of various networks from 2^7 nodes to 2^23 nodes. A seed is used to drive a pseudo-random number generator to produce the node IDs. With the same seed, the first three columns are produced deterministically, but with a different seed the numbers move around a bit. In order to determine the growth rates with more accuracy, the simulation was repeated with different seeds (up to 2^21 nodes), and then averaged:

Nodes	Hops Traversed Per node	Centerline crossing messages Per node	Search Rounds Per node	Avg. Mem. Per node (B)	Avg. CPU Per node (ms)	Highest Traffic on a Link (kB)
2^7	354.63	28.70	3.647	4916	1.00	324.65
2^9	797.70	29.89	5.607	7471	1.22	676.36
2^11	1718.72	31.80	7.597	11362	1.75	1439.18
2^13	3673.90	34.64	9.596	13538	2.44	3134.81
2^15	7819.45	37.56	11.596	16310	3.17	6799.43
2^17	16684.50	40.66	13.596	19211	3.99	14720.58
2^19	35614.29	43.87	15.596	21949	5.06	31768.34
2^21*	76090.86	47.35	17.60	#	#	68570.64
2^23*	162771.73	50.98	19.60	#	#	147642.81

*The 2^21 sim required a large system and many hours of compute time, so it was only run 7 times. The 2^23 sim required a very large system and two days of compute time, so it was only run once.
#Memory and CPU time aren’t meaningful when tested near the memory limits of the machine, (due to additional Garbage Collection) which occurred on the 2^21 and 2^23 node sims

Analyzing the effect of doubling the network size on each column, from left to right:

If one normalizes the per-node cost (the number of hops traversed per node) by distance, then the per-node cost growth rate upon doubling the network size is low (<6%). This can be done by assuming the growing number of nodes are spread evenly over a non-growing space: Since the network is a square, this normalized growth rate was computed by multiplying by the square root of the network size.
The number of messages per node that have to cross to the other side of the network is growing at only 3.9%
The search rounds per node are tracking LOG2(N) perfectly
The memory is growing at a low rate of 9%
The CPU time is growing at a low rate of 13%
The last column on the right (in bold) is the scalability bottleneck, growing at a rate of 47%

Download the data

Conclusion

As expected, the main bottleneck for mesh networks is the load on the network attributed to link and node updates. The worst case load would be a full re-bootstrap of the entire network, so this analysis focuses on that scenario. The bootstrap data load of the links that cross the centerline of the network goes up by 47% each time the network size doubles. Extrapolating the above simulations to bootstrapping a (truly epic) 2^40 node network:

Each node would take ~140KB of memory
Each node would consume much less than 1 second of Core i5 CPU time
Each centerline node would only have to switch 50-100GB

The last point seems like a lot of data for a simple bootstrap, but this would be 5-10 years in the future where 10Gb/s links would be the bare minimum, and this data can be transmitted in about a minute. Even if these extrapolations are low by a factor of 100, with the assumptions made above, the network would still perform well.

A full-scale global isochronous mesh network can scale to at least 2^40 switches

Discussion on Hacker News!

IsoGrid Design Overview

This post outlines the design for a new network protocol with a mesh topology called the IsoGrid.

Layer 0: Physical Layer

Options include, but are not limited to:
- Ethernet, ATM, USB, etc.
- Tunnels through other networks (like the TCP/IP Internet, etc.)

Layer 1: Link Layer

Defines how nodes directly communicate with each other across links
Defines how a µPkt can be sent between two nodes
Provides for mesh-wide frequency synchronization
The IsoGrid does NOT mandate a globally-required protocol at this layer
The IsoGrid does impose generic requirements at this layer

Layer 2: Network (µPkt) Layer

Defines the extensibility model for µPkt types and µRoute types
Defines how the network routes µPkts across the IsoGrid
Defines how nodes send credits in exchange for forwarding µPkts

Layer 3: Transport (IsoStream) Layer

Defines how the network uses source routing for Isochronous Streams (IsoStream)
An IsoStream is switched at the word level, one word at a time
Defines how nodes send credits in exchange for switching IsoStreams

Layer 4: Session (EccFlow) Layer

Defines how remote nodes use IsoStreams to safely and reliably communicate with each other over an arbitrarily long time period
Forward Error Correction coding, safety, and multi-path segmentation
Defines how nodes use the network transport to distribute routing information

Layer 5: Application Layer

Globally-Scalable mesh mapping and routing using HashMatchLogMap (HMLM)
Distributed content addressable storage (CAS)
Distributed self-certified naming using GetNodeInfoFromLocatorHash
Higher Layers: All the normal protocols you would expect to run on networks

IsoGrid vs. TCP/IP Comparison

Layer	TCP/IP	IsoGrid
Application	SSH, FTP, HTTP, etc. DNS BGP	CAS GetNodeInfoFromLocatorHash HMLM
Session/Transport	TCP, UDP, etc.	EccFlow IsoStream
Network	IP	µPkt
Link	Point-To-Point, Ethernet, subnet Broadcast, token ring, ATM, etc.	Point-To-Point only, Isochronous USB, ATM, Point-to-point Ethernet.
Physical	Any	Point-To-Point only: Communication mediums that have collisions aren’t well suited to IsoGrid

How it works

The IsoGrid is a mesh network that supports routing of one-way isochronous streams (IsoStreams) and small packets (µPkts). In order to provide isochronous streams, the IsoGrid runs with a synchronized frequency, very similar to the way an electrical grid runs on Utility Frequency (except much higher frequency). The source provides a series of route instructions to be used at each hop (Source Routing). Each switch along the way uses its route instruction to establish the IsoStream connection. The µPkt that starts a connection defines the length of the IsoStream, and how many credits to send per word. The IsoGrid network and transport layers are optimized to support the IsoGrid session layer (EccFlow) which fragments the data, applies a forward error correction code, and sends the data over many paths across the network.

Credits and Micro-Payments

In order to avoid a tragedy of the commons, the IsoGrid allows for exchanging credits between neighbor nodes so as to pay for the data sent or processed. This allows payments to be made among neighbors instead of having to have a centralized payment processor.

In the IsoGrid model, each node owns its outbound links (but not really its inbound links), its CPU hardware, its storage, etc. Each node SHOULD charge credits for use of those resources. Each pair of neighbor nodes SHOULD agree on the settlement instrument that will represent the value of a credit between themselves, and payment SHOULD be by simple agreement (there is no required third party).

Any settlement instrument is possible, but here are some examples:

electricity
Etherium
cash
check
ACH transfer
propane
gasoline
gold
water

Nodes MAY have different costs for different outbound links.

IsoGrid Secondary Limitations

No network is without limits. The design of a protocol standard necessitates making tradeoffs in order to meet the requirements. Here are some of the known limitations due to the design of the IsoGrid Protocol:

Links that use a shared physical medium (ex: those with collisions) aren’t well suited to be used by the network (too much latency).
- The IsoGrid is best suited for running on top of physical layers that have exclusive access to the underlying communication medium. Like fiber, copper, and point-to-point wireless such as 60 GHz
Highly mobile nodes that want to offer isochronous data transit services might have a harder time competing against stationary nodes (because route tracking across transient links might not be scalable)
A single IsoStream can use no more than the fastest available slot along a route
- However, an endpoint can create multiple connections such that nearly 100% utilization with EccFlow is possible
Low-rate connections have higher latency
Half-Duplex links not supported (except with unacceptably high latency)
Streams through nodes that move will have non-trivial buffering/timing requirements
- The faster it moves, the more demanding the requirements
New Links (for example, link tunnels) could take a (longer latency) three-way handshake (and a credit cost) for remote nodes to be able to effectively use the new links

Discussion is at Hacker News!