Conversation
|
|
||
| # tmp folder | ||
|
|
||
| /tmp No newline at end of file |
There was a problem hiding this comment.
What's the use of adding /tmp?
There was a problem hiding this comment.
the logs are written to /tmp, and i would like to save the logs for debugging but not commit them to github
| TimeUpdateFrequency: 100 * time.Millisecond, | ||
| TimeUpdateAmount: 500 * time.Millisecond, | ||
| CrashInterval: 800 * time.Millisecond, | ||
| LogDirectory: "tmp", |
There was a problem hiding this comment.
I think we need to create a temporary folder and if the test succeeds, delete the logs.
There was a problem hiding this comment.
This way you don't need to delete the directory at the beginning of the test
There was a problem hiding this comment.
added deleting the tmp directory if the test fails, but I still think we should check if the directory exists at the beginning of the test. This way if the test fails, and we want to re-run we don't append to an already existing log file
| } | ||
|
|
||
| func (n *Network) StartInstances() { | ||
| panic("Call Run() Instead") |
There was a problem hiding this comment.
This method isn't used. Why is it needed?
There was a problem hiding this comment.
because BasicNetwork is embedded in Network i don't want the caller accidentally calling StartInstances thinking its a valid method
testutil/random_network/mempool.go
Outdated
| } | ||
|
|
||
| func (m *Mempool) isTxInChain(txID txID, parentDigest simplex.Digest) bool { | ||
| block, exists := m.verifiedButNotAcceptedTXs[parentDigest] |
There was a problem hiding this comment.
I'm not sure this works for a case where we have a leader failover that creates a sibling block.
Suppose block a is notarized and then a block b is built on top of it and broadcast, and verified by a node but then the majority notarize the empty block.
In the next round, block b' is proposed on top of block a.
We need a way to rollback the verified but not accepted map for the parent a because both blocks b and b' are built on top of it.
One way of doing it, is move this book-keeping to the blocks themselves.
Specifically, have the instances of the blocks maintain pointers to blocks they are built upon, and then just have the transaction list inside a block a map, where when we encode the bytes of the block, we do it across some order that is set at the time we build the block.
There was a problem hiding this comment.
We need a way to rollback the verified but not accepted map for the parent a because both blocks b and b' are built on top of it.
why do we need this? we aren't checking if the block was verified before accepting it anyways. We can do so if we want.
secondly, this should still succeed verification in your example. If we want to make the mempool more rigorous we could check if any two verified blocks have the same parent and delete the earlier block before verified the later. In your example when block b initially gets verified we do nothing. Then if block b' gets verified, we delete block b from the verification map before marking b' as verified. Then we can adjust AcceptBlock to ensure the block has been verified before hand.
There was a problem hiding this comment.
hmm, actually I think you are right. been trying to solve a bug relating to this. I dont think we can have the blocks point to other blocks however since we need to be able to serialize/deserialize blocks and this would cause some recursive headaches. Maybe its best to have the mempool keep a lookup table of built blocks and just fetch from this table using the parent digest when needed
There was a problem hiding this comment.
why do we need this? we aren't checking if the block was verified before accepting it anyways. We can do so if we want.
If you have verified a block and you haven't accepted it, but accepted a sibling, you need to put back all transactions you have removed from the mempool, back into the mempool, of the block that you ended up accepting its sibling (unless that transaction ended up in the sibling block)
|
Can we rebase on top of main? |
76e550a to
1e1fdc1
Compare
|
This is still somewhat non-deterministic to some degree: When I create a digest of the crashed and recovered node IDs in the schedule I get two possible outcomes: Patch is below: When we have When we have Can we make this fully deterministic? 🙏 It looks like we're very close to have it deterministic. |
|
|
||
| } | ||
|
|
||
| func (m *Mempool) moveTxsToUnaccepted(block *Block) { |
There was a problem hiding this comment.
Why do we need to do this? we only delete from unacceptedTxs in AcceptBlock so if a transaction is in unacceptedTxs in a child block or in a grandchild block, it won't be removed from it anyway.
|
|
||
| func TestNetworkSimpleFuzz(t *testing.T) { | ||
| config := random_network.DefaultFuzzConfig() | ||
| config.RandomSeed = 1770220909588 |
There was a problem hiding this comment.
can we initialize a random seed here?
| node.mempool.lock.Lock() | ||
| n.logger.Debug("Not all txs accepted yet by node", zap.Stringer("nodeID", node.E.ID), zap.Int("unaccepted txs in mempool", len(node.mempool.unacceptedTxs))) | ||
| node.mempool.lock.Unlock() | ||
| allAccepted = false |
| return minHeightNodeID | ||
| } | ||
|
|
||
| func (n *Network) recoverToHeight(height uint64) { |
There was a problem hiding this comment.
The below way we are more deterministic (written by me, not thanks to CC):
func (n *Network) recoverToHeight(height uint64) {
// Decide which crashed nodes to recover and in what order.
// This keeps all randomness calls deterministic (fixed count per invocation),
// independent of the non-deterministic AdvanceTime loop below.
var nodesToRecover []int
for i, node := range n.nodes {
if node.isCrashed.Load() {
nodesToRecover = append(nodesToRecover, i)
}
}
n.randomness.Shuffle(len(nodesToRecover), func(i, j int) {
nodesToRecover[i], nodesToRecover[j] = nodesToRecover[j], nodesToRecover[i]
})
// Recover nodes one at a time in the chosen order, advancing time until
// all nodes have caught up before recovering the next one.
for _, idx := range nodesToRecover {
n.logger.Debug("Recovering node", zap.Stringer("nodeID", n.nodes[idx].E.ID))
n.startNode(idx)
for n.nodes[idx].storage.NumBlocks() < height {
n.logger.Debug("Advancing network time", zap.Uint64("num crashed nodes", n.numCrashedNodes()),
zap.Uint64("min height", n.getMinHeight()),
zap.Uint64("max height", n.getMaxHeight()),
zap.Stringer("Smallest node ID", n.getMinHeightNodeID()),
zap.Uint64("target height", height),
)
for _, node := range n.nodes {
n.lock.Lock()
node.AdvanceTime(n.config.AdvanceTimeTickAmount)
n.lock.Unlock()
}
}
}
// Advance time until all remaining nodes reach the target height.
for n.getMinHeight() < height {
n.logger.Debug("Advancing network time", zap.Uint64("num crashed nodes", n.numCrashedNodes()),
zap.Uint64("min height", n.getMinHeight()),
zap.Uint64("max height", n.getMaxHeight()),
zap.Stringer("Smallest node ID", n.getMinHeightNodeID()),
zap.Uint64("target height", height),
)
n.lock.Lock()
n.BasicInMemoryNetwork.AdvanceTime(n.config.AdvanceTimeTickAmount)
n.lock.Unlock()
}
}
|
will continue the review tomorrow |
| @@ -0,0 +1,15 @@ | |||
| package simplex_test | |||
There was a problem hiding this comment.
can we do something like this and not encode a constant random seed?
func TestNetworkSimpleFuzz(t *testing.T) {
for i := 0; i < 10; i++ {
t.Run("", func(t *testing.T) {
config := random_network.DefaultFuzzConfig()
config.RandomSeed = time.Now().UnixMilli()
network := random_network.NewNetwork(config, t)
network.Run()
network.PrintStatus()
})
}
}
| .DS_Store | ||
| .idea/ | ||
|
|
||
| # tmp folder |
There was a problem hiding this comment.
I'm confused about this /tmp addition.
.gitignore is just for files in this folder and subfolders.
But the tests use /tmp and adding this to the .gitignore will have no effect, what am i missing?
There was a problem hiding this comment.
oh I see we use "tmp" not "/tmp"
| MaxNodes int // Default is 10. | ||
|
|
||
| // The minimum and maximum number of transactions to be issued at a block. Default is between 5 and 20. | ||
| MinTxsPerIssue int |
There was a problem hiding this comment.
What is the benefit of having a random number of transactions issued? Doesn't this make the test even more non-deterministic? Why not just use 1 (issue a single transaction) all the time?
|
|
||
| type asn1TX struct { | ||
| ID []byte | ||
| ShouldFailVerification bool |
There was a problem hiding this comment.
why do we serialize also ShouldFailVerification ? In order to process this uniformly across all nodes?
| return false | ||
| } | ||
|
|
||
| func (b *Block) ComputeAndSetDigest() { |
There was a problem hiding this comment.
why can't we just use b.digest = sha256.Sum256(tbBytes) ? I don't see how bb bytes.Buffer helps here.
| RandomSeed int64 | ||
|
|
||
| // Chance that a node will be randomly crashed. Default is .1 (10%). | ||
| NodeCrashPercentage float64 |
There was a problem hiding this comment.
this and the below are probabilities, not percentages. can we call them that?
| return &TX{ID: idArr, shouldFailVerification: aTX.ShouldFailVerification} | ||
| } | ||
|
|
||
| func CreateNewTX() *TX { |
There was a problem hiding this comment.
Does the randomness here have any influence on the outcome? It sure has influence on the block hashes.
Should we perhaps inject randomness from our PRG?
| defer m.lock.Unlock() | ||
|
|
||
| txs := make([]*TX, 0, maxTxs) | ||
| for _, tx := range m.unacceptedTxs { |
There was a problem hiding this comment.
can we sort the transactions after we pack them, for reproducability?
| logPath := filepath.Join(logDir, filename) | ||
|
|
||
| // Open file for appending (create if doesn't exist) | ||
| file, err := os.OpenFile(logPath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644) |
There was a problem hiding this comment.
this leaks the file descriptor. can we add a reference to a function in TestLogger that closes it upon the test ending??
| } | ||
|
|
||
| // go through any blocks that build off of this one and move their txs | ||
| func (m *Mempool) purgeChildren(block *Block) { |
There was a problem hiding this comment.
aren't we missing a call to m.moveTxsToUnaccepted(block) ?
Because what about the transactions of the sibling block itself?
|
|
||
| var asn1TXs []asn1TX | ||
| for _, tx := range b.txs { | ||
| asn1TXs = append(asn1TXs, asn1TX{ID: tx.ID[:]}) |
There was a problem hiding this comment.
shouldn't we also encode the ShouldFailVerification field?
| m.lock.Lock() | ||
| defer m.lock.Unlock() | ||
|
|
||
| // Ensure the block has not already been verified or accepted |
There was a problem hiding this comment.
I'm wondering if we should fail the test if we encounter this and the below... not just return an error. Thoughts?
| return fmt.Errorf("%w: %s", errDoubleBlockVerification, b.digest) | ||
| } | ||
|
|
||
| // Ensure the parent block is accepted or verified |
There was a problem hiding this comment.
ditto, shouldn't we fail the entire test?
|
Will make another pass tomorrow, I think that's enough comments for now. |
| } | ||
| } | ||
|
|
||
| func GenerateNodeIDFromRand(r *rand.Rand) simplex.NodeID { |
There was a problem hiding this comment.
I don't understand why we would want random node IDs. Why can't we have them increasingly ascending numbers? Much more user friendly for us to debug.
| return simplex.NodeID(b) | ||
| } | ||
|
|
||
| var emptyBlacklist = simplex.Blacklist{ |
There was a problem hiding this comment.
should this really be here if we only use this in the mempool test?
| // Stdout core only if explicitly enabled | ||
| if writeStdout { | ||
| stdoutEncoder := zapcore.NewConsoleEncoder(config) | ||
| if strings.ToLower(os.Getenv("LOG_LEVEL")) == "info" { |
There was a problem hiding this comment.
can we export this LOG_LEVEL and "info" to a constant?
| randomness: r, | ||
| logger: logger, | ||
| t: t, | ||
| lock: sync.Mutex{}, |
There was a problem hiding this comment.
no need to initialize a mutex
| return txs | ||
| } | ||
|
|
||
| func (n *Network) waitForTxAcceptance(txs []*TX) { |
There was a problem hiding this comment.
correct me if I'm wrong, but - doesn't the inner loop here performs a CPU busy-wait? Shouldn't we add some delay to give the goroutines of the nodes some time to act?
| return n.BasicNode.HandleMessage(&msgCopy, from) | ||
| } | ||
|
|
||
| // copyMessage creates a copy of the message and its relevant fields to avoid mutating shared state in the in-memory network |
There was a problem hiding this comment.
so basically we just need to assign our own mempool, is that it?
| switch { | ||
| case msgCopy.BlockMessage != nil: | ||
| block := msgCopy.BlockMessage.Block.(*Block) | ||
|
|
There was a problem hiding this comment.
can we just have a function:
func (n *Node) copyBlock(b simplex.Block) simplex.Block {
blockCopy := *b.(*Block)
blockCopy.mempool = n.mempool
return &blockCopy
}
and just call it everywhere?
|
I think that's all comments I have for now. |
No description provided.