I have a use case where I have a Bi-Partite Graph. Call one type of nodes “Type A” and the other “Type B.” Now when a Type A node gets added, it forms some edges based on a criterion with Type B nodes (usually how many existing edges does the Type B node have?). Type A nodes are being added consistently (say with every HTTP request). The criteria used is very simplistic since the latency of the HTTP request cannot be large.
After some time interval X, a process runs in the background that takes a snapshot of the Graph and “reshuffles” the edges based on a more optimal criteria, which takes much longer to run. Now after it is done reshuffling it needs to update the processes that are serving HTTP requests to the new state of the graph.
Now this happens in a distributed environment. Several boxes are serving HTTP requests.
- How can I store the distributed graph?
One option is to store the edges and nodes in Redis. Then when a request comes in with a Type A node you first fetch all the Type B nodes (won’t be too many), form all the edges and save the edges in the Redis again. Problem is that the constraint is the number of edges Type B nodes can have. So if a Type B node had 4 edges and the limit is 5 and two concurrent requests try to add an edge one has to fail. The only way I can see doing this is either
(1) Grabbing locks in Redis or
(2) Sharding the graph so that a subset of Type B nodes are always fetched by one box (which is suboptimal).
- How do I update the distributed graph after the optimal cycle?
The optimal cycle takes a snapshot of the Graph and optimizes the edges (reshuffles them). Now this make take some time. So during that time the HTTP requests are also adding nodes to their snapshot in Redis. What’s the best way to “merge” these two graphs.
One option is after the optimal cycle runs, it
(1) sets a flag to say temporarily stop adding new nodes in HTTP requests (the requests will most likely block or send Async Message to the caller later via Kafka/RabbitMQ)
(2) adds any new nodes that were added while it was running (if it took 30 seconds to run and there was a new node added every second, then it needs to account for 30 more nodes that weren’t in its snapshot).
(3) Replaces the Graph edges in Redis.
(4) Unsets the flag to start processing HTTP requests again.
However, this process causes the latency of HTTP requests to become too high when the are blocked from processing, which is not good.