I’m interested in modelling pool behavior to understand whether the actual found-blocks-per-day is statistically accurate compared to an expected baseline. For instance, this might give an indication that a pool is “hiding” blocks found if the expected number of found blocks diverges significantly from the actual blocks found daily by the pool.
I’ve created a simplistic test-bed that allows me to simulate running a pool, with randomly generated blocks found by the pool. The test-bed code below is written in T-SQL; you can get a free copy of Microsoft SQL Server Developer Edition from Microsoft that will allow you to easily run the code yourself.
Is this methodology sound? Am I missing something?
Here’s the code to create the required table and 10,000 “days” of sample data, with one block-per-minute:
IF OBJECT_ID(N'dbo.Runs', N'U') IS NOT NULL DROP TABLE dbo.Runs; GO CREATE TABLE dbo.Runs ( RunNum int NOT NULL --this is unique for each day (1 to 10,000) , RunSeq int NOT NULL --this is each block in the day (i.e. 1 to 1,440) , Win tinyint NOT NULL --This will be 1 if our pool found the block DEFAULT (0) --otherwise it will be 0 ); DECLARE @NetworkHash bigint; DECLARE @PoolHash bigint; DECLARE @Prob DECIMAL(38, 20); SET @NetworkHash = 301.2 * POWER(10, 6); --plug in the actual network hash rate in Million hashes per second SET @PoolHash = 1.5 * POWER(10,6); --plug in the actual pool hash rate in Million hashes per second SET @Prob = CONVERT(decimal(38, 9), @PoolHash) / CONVERT(decimal(38, 9), @NetworkHash); /* display the various parameters */ SELECT NetworkHash = @NetworkHash , PoolHash = @PoolHash , PoolProbability = @Prob , AvgBlocksPerDayFoundByPool = 1440 * @Prob; /* Create random data for each block found across 10,000 days, with 1,440 blocks per day */ ;WITH nums AS ( SELECT v.Num FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))v(Num) ) , Runs AS ( SELECT TOP(10000) n = n1.Num + (n2.Num * 10) + (n3.Num * 100) + (n4.Num * 1000) + (n5.Num * 10000) FROM nums n1 CROSS JOIN nums n2 CROSS JOIN nums n3 CROSS JOIN nums n4 CROSS JOIN nums n5 ) , Seq AS ( SELECT TOP(1440) n = n1.Num + (n2.Num * 10) + (n3.Num * 100) + (n4.Num * 1000) + (n5.Num * 10000) FROM nums n1 CROSS JOIN nums n2 CROSS JOIN nums n3 CROSS JOIN nums n4 CROSS JOIN nums n5 ) INSERT INTO dbo.Runs (RunNum, RunSeq, Win) SELECT Runs.n , Seq.n , CASE WHEN (CONVERT(decimal(38, 20), CONVERT(bigint, CRYPT_GEN_RANDOM(6), 0)) / POWER(CONVERT(bigint, 2), 48)) <= @Prob THEN 1 ELSE 0 END FROM Runs CROSS APPLY Seq;
The data above is setup to replicate a blockchain where blocks are found once-per-minute, 1440-per-day.
Here are several queries I ran to analyze the data:
/* Sanity check */ ;WITH t AS ( SELECT c = COUNT(1) , r = SUM(Runs.Win) , x = SUM(Runs.Win) / CONVERT(decimal(38, 20), COUNT(1)) FROM dbo.Runs GROUP BY Runs.RunNum ) SELECT TotalRuns = COUNT(1) , BlockFoundRate = AVG(x) FROM t /* Daily Break-down of Wins */ SELECT Runs.RunNum , TotalNetworkBlocks = COUNT(1) , TotalPoolBlocks = SUM(Runs.Win) , WinRatio = SUM(Runs.Win) / CONVERT(decimal(38, 20), COUNT(1)) FROM dbo.Runs GROUP BY Runs.RunNum ORDER BY Runs.RunNum; /* Zero-block days */ SELECT r.RunNum , SUM(r.Win) FROM dbo.Runs r GROUP BY r.RunNum HAVING SUM(R.Win) = 0 ORDER BY r.RunNum /* Bell-curve distribution. */ ;WITH s AS ( SELECT TotalWins = SUM(Runs.Win) FROM dbo.Runs GROUP BY Runs.RunNum ) SELECT s.TotalWins , COUNT(1) FROM s GROUP BY s.TotalWins