Software Dependencies
Infrastructure Requirements to Run the Tests
- An account at Digital Ocean (DO), with a high droplet limit (>202)
- The machine to orchestrate the tests should have the following installed:
- A clone of the testnet repository
- This repository contains all the scripts mentioned in the reminder of this section
- Digital Ocean CLI
- Terraform CLI
- Ansible CLI
- A clone of the testnet repository
Requirements for Result Extraction
- Matlab or Octave
- Prometheus server installed
- blockstore DB of one of the full nodes in the testnet
- Prometheus DB
200 Node Testnet
Running the test
This section explains how the tests were carried out for reproducibility purposes.- [If you haven’t done it before]
Follow steps 1-4 of the
README.mdat the top of the testnet repository to configure Terraform, anddoctl. - Copy file
testnets/testnet200.tomlontotestnet.toml(do NOT commit this change) - Set the variable
VERSION_TAGin theMakefileto the git hash that is to be tested.- If you are running the base test, which implies an homogeneous network (all nodes are running the same version),
then make sure makefile variable
VERSION2_WEIGHTis set to 0 - If you are running a mixed network, set the variable
VERSION_TAG2to the other version you want deployed in the network. The, adjust the weight variablesVERSION_WEIGHTandVERSION2_WEIGHTto configure the desired proportion of nodes running each of the two configured versions.
- If you are running the base test, which implies an homogeneous network (all nodes are running the same version),
then make sure makefile variable
- Follow steps 5-10 of the
README.mdto configure and start the 200 node testnet- WARNING: Do NOT forget to run
make terraform-destroyas soon as you are done with the tests (see step 9)
- WARNING: Do NOT forget to run
- As a sanity check, connect to the Prometheus node’s web interface and check the graph for the
COMETBFT_CONSENSUS_HEIGHTmetric. All nodes should be increasing their heights. - You now need to start the load runner that will produce transaction load
- If you don’t know the saturation load of the version you are testing, you need to discover it.
sshinto thetestnet-load-runner, then copy scriptscript/200-node-loadscript.shand run it from the load runner node.- Before running it, you need to edit the script to provide the IP address of a full node. This node will receive all transactions from the load runner node.
- This script will take about 40 mins to run.
- It is running 90-seconds-long experiments in a loop with different loads.
- If you already know the saturation load, you can simply run the test (several times) for 90 seconds with a load somewhat
below saturation:
- set makefile variables
ROTATE_CONNECTIONS,ROTATE_TX_RATE, to values that will produce the desired transaction load. - set
ROTATE_TOTAL_TIMEto 90 (seconds). - run “make runload” and wait for it to complete. You may want to run this several times so the data from different runs can be compared.
- set makefile variables
- If you don’t know the saturation load of the version you are testing, you need to discover it.
- Run
make retrieve-datato gather all relevant data from the testnet into the orchestrating machine- Alternatively, you may want to run
make retrieve-prometheus-dataandmake retrieve-blockstoreseparately. The end result will be the same. make retrieve-blockstoreaccepts the following values in makefile variableRETRIEVE_TARGET_HOSTany: (which is the default) picks up a full node and retrieves the blockstore from that node only.all: retrieves the blockstore from all full nodes; this is extremely slow, and consumes plenty of bandwidth, so use it with care.- the name of a particular full node (e.g.,
validator01): retrieves the blockstore from that node only.
- Alternatively, you may want to run
- Verify that the data was collected without errors
- at least one blockstore DB for a CometBFT validator
- the Prometheus database from the Prometheus node
- for extra care, you can run
zip -Ton theprometheus.zipfile and (one of) theblockstore.db.zipfile(s)
- Run
make terraform-destroy- Don’t forget to type
yes! Otherwise you’re in trouble.
- Don’t forget to type
Result Extraction
The method for extracting the results described here is highly manual (and exploratory) at this stage. The CometBFT team should improve it at every iteration to increase the amount of automation.Steps
- Unzip the blockstore into a directory
-
Extract the latency report and the raw latencies for all the experiments. Run these commands from the directory containing the blockstore
-
-
File
report.txtcontains an unordered list of experiments with varying concurrent connections and transaction rate- If you are looking for the saturation point
-
Create files
report01.txt,report02.txt,report04.txtand, for each experiment in filereport.txt, copy its related lines to the filename that matches the number of connections, for example -
Sort the experiments in
report01.txtin ascending tx rate order. Likewise forreport02.txtandreport04.txt.
-
Create files
- Otherwise just keep
report.txt, and skip step 4.
- If you are looking for the saturation point
-
Generate file
report_tabbed.txtby showing the contentsreport01.txt,report02.txt,report04.txtside by side- This effectively creates a table where rows are a particular tx rate and columns are a particular number of websocket connections.
-
Extract the raw latencies from file
raw.csvusing the following bash loop. This creates a.csvfile and a.datfile per experiment. The format of the.datfiles is amenable to loading them as matrices in Octave.- Adapt the values of the for loop variables according to the experiments that you ran (check
report.txt). - Adapt
report*.txtto the files you produced in step 3.
- Adapt the values of the for loop variables according to the experiments that you ran (check
- Enter Octave
-
Load all
.datfiles generated in step 5 into matrices using this Octave code snippet -
Set variable release to the current release undergoing QA
-
Generate a plot with all (or some) experiments, where the X axis is the experiment time,
and the y axis is the latency of transactions.
The following snippet plots all experiments.
-
Consider adjusting the axis, in case you want to compare your results to the baseline, for instance
-
Use Octave’s GUI menu to save the plot (e.g. as
.png) - Repeat steps 9 and 10 to obtain as many plots as deemed necessary.
-
To generate a latency vs throughput plot, using the raw CSV file generated
in step 2, follow the instructions for the
latency_throughput.pyscript. This plot is useful to visualize the saturation point.
- Alternatively, follow the instructions for the
latency_plotter.pyscript. This script generates a series of plots per experiment and configuration that my help with visualizing Latency vs Throughput variation.
Extracting Prometheus Metrics
- Stop the prometheus server if it is running as a service (e.g. a
systemdunit). - Unzip the prometheus database retrieved from the testnet, and move it to replace the local prometheus database.
- Start the prometheus server and make sure no error logs appear at start up.
- Identify the time window you want to plot in your graphs.
- Execute the
prometheus_plotter.pyscript for the time window.
Rotating Node Testnet
Running the test
This section explains how the tests were carried out for reproducibility purposes.- [If you haven’t done it before]
Follow steps 1-4 of the
README.mdat the top of the testnet repository to configure Terraform, anddoctl. - Copy file
testnet_rotating.tomlontotestnet.toml(do NOT commit this change) - Set variable
VERSION_TAGto the git hash that is to be tested. - Run
make terraform-apply EPHEMERAL_SIZE=25- WARNING: Do NOT forget to run
make terraform-destroyas soon as you are done with the tests
- WARNING: Do NOT forget to run
- Follow steps 6-10 of the
README.mdto configure and start the “stable” part of the rotating node testnet - As a sanity check, connect to the Prometheus node’s web interface and check the graph for the
tendermint_consensus_heightmetric. All nodes should be increasing their heights. - On a different shell,
- run
make runload ROTATE_CONNECTIONS=X ROTATE_TX_RATE=Y XandYshould reflect a load below the saturation point (see, e.g., this paragraph for further info)
- run
- Run
make rotateto start the script that creates the ephemeral nodes, and kills them when they are caught up.- WARNING: If you run this command from your laptop, the laptop needs to be up and connected for full length of the experiment.
- When the height of the chain reaches 3000, stop the
make rotatescript - When the rotate script has made two iterations (i.e., all ephemeral nodes have caught up twice)
after height 3000 was reached, stop
make rotate - Run
make retrieve-datato gather all relevant data from the testnet into the orchestrating machine - Verify that the data was collected without errors
- at least one blockstore DB for a CometBFT validator
- the Prometheus database from the Prometheus node
- for extra care, you can run
zip -Ton theprometheus.zipfile and (one of) theblockstore.db.zipfile(s)
- Run
make terraform-destroy
Result Extraction
In order to obtain a latency plot, follow the instructions above for the 200 node experiment, but:- The
results.txtfile contains only one experiment - Therefore, no need for any
forloops