LAVA supports running a single test across multiple devices (of any type), combining those devices into a group. Devices within this MultiNode group can communicate with each other using the MultiNode API.
The test definitions used in MultiNode tests typically do not have to differ much from single-node tests, unless the tests need to support communication between devices in the same group. In fact, the recommended way to develop MultiNode tests is to start simple and build up complexity one step at a time. That’s what the examples here will show.
Note
When viewing MultiNode log files, the original YAML
submitted to start the job is available via the MultiNode
Definition
link. Internally, LAVA parses and splits up that
MultiNode definition into multiple sub-definitions, one per node in
the test. Each node will then see a separate logical test job (and
therefore a separate log file) based on these sub-definitions. They
can be viewed via the Definition
link. It is unlikely to be
useful to submit the definition of one node of a MultiNode job as a
separate job, due to links between the jobs.
Our first example is the simplest possible MultiNode test job - the same job runs on two devices of the same type, without using any of the synchronisation calls.
Starting with an already-working simple single-device test job, the first changes to make are in device selection:
device_type
declaration in the job; that only works
for single devices.The MultiNode protocol defines the new concept of roles. This
example snippet creates a group of two qemu
devices, one in the
foo
role and one in the bar
role.
protocols: lava-multinode: roles: foo: device_type: qemu context: arch: amd64 count: 1 bar: device_type: qemu context: arch: amd64 count: 1 timeout: minutes: 6
Note
The role is an arbitrary label - you may use whatever descriptive names you like for the different roles in your test, so long as they are unique.
The role names defined here will be used later in the test job to determine which tests are run on which devices, and also inside the test shell definition to determine how the devices communicate with each other. After just these changes, your test job will be enough to run a simple MultiNode test in LAVA. It will pick several devices for the test, then run exactly the same set of actions on each device independently.
The next thing to do is to modify the test job to use the roles that
you have defined. This first example runs the same actions on both of
the roles. Each action in the test definition should now include the
role
field and one or more label(s) to match those defined
roles
.
Here we deploy the same software to the foo
and
bar
machines by specifying each role in a list:
actions: - deploy: role: - foo - bar timeout: minutes: 5 to: tmpfs images: rootfs: image_arg: -drive format=raw,file={rootfs} url: http://images.validation.linaro.org/kvm-debian-wheezy.img.gz compression: gz os: debian root_partition: 1
We also use the same boot actions for all the devices:
- boot: role: - foo - bar method: qemu media: tmpfs prompts: - "root@debian:"
By default, tests in MultiNode jobs will be run independently. If that is sufficient, the test action is very similar to that for a single-node job:
- test: role: - foo - bar timeout: minutes: 10 definitions: - repository: http://git.linaro.org/lava-team/lava-functional-tests.git from: git path: lava-test-shell/multi-node/multinode01.yaml name: multinode-basic - repository: git://git.linaro.org/qa/test-definitions.git from: git path: ubuntu/smoke-tests-basic.yaml name: smoke-tests
That’s your first MultiNode test job complete. It’s quite simple to follow, but it hasn’t really done much yet. To see this in action, you could try the complete example test job yourself: first-multinode-job.yaml
As well as simply running the same tasks on similar devices, MultiNode
can also run different tests on the different devices in the test. To
configure this, use the role
support to allocate different
deploy
, boot
and test
actions to different roles.
This second example will use two panda
devices and one
beaglebone-black
device. These devices need different files to
deploy and different commands to boot, and will most likely take
different lengths of time to boot all the way to a login prompt. If
you want to run this example test job yourself, you will need at least
one beaglebone-black
device and at least two panda
devices.
The example includes details of how to deploy to devices using
U-Boot, but don’t worry about those details. The important elements
from a MultiNode perspective are the uses of role
here.
This is a simple change from our first example, defining the two roles
of server
and client
:
protocols: lava-multinode: roles: server: device_type: beaglebone-black count: 1 client: device_type: panda count: 2 timeout: minutes: 6
Now we’re using different files in the deployment for each role. To
support that, we define two separate deploy
action blocks, one for
the server
machines and one for the client
machines.
actions: - deploy: role: - server timeout: minutes: 2 to: tftp kernel: url: http://images.validation.linaro.org/functional-test-images/panda/uImage ramdisk: url: http://images.validation.linaro.org/functional-test-images/common/linaro-image-minimal-initramfs-genericarmv7a.cpio.gz.u-boot compression: gz header: u-boot add-header: u-boot os: oe dtb: url: http://images.validation.linaro.org/functional-test-images/am335x-boneblack.dtb - deploy: role: - client timeout: minutes: 2 to: tftp kernel: url: http://snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/1/vmlinuz ramdisk: url: http://snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/1/initramfs.cpio.gz compression: gz header: u-boot # the bootloader needs a u-boot header on the modified ramdisk add-header: u-boot modules: url: http://snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/1/modules.tar.gz compression: gz # despite this being a Debian initramfs, it is not a complete Debian rootfs, so use oe compatibility os: oe dtb: url: http://snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/1/dtbs/omap4-panda.dtb
To cover different boot commands we could now have two different
boot
action blocks. But in this case our devices behave in the
same way in terms of bootup, so we can just use a single boot
block and list both client
and server
.
- boot: role: - client - server method: u-boot commands: ramdisk prompts: # escape the brackets to ensure that the prompt does not match # kernel debug lines which may mention initramfs - '\(initramfs\)' type: bootz timeout: minutes: 2
A very common requirement in a MultiNode test is that a device (or devices) within the MultiNode group must wait until another device in the group reaches a particular stage. This can be used to ensure that a device running a server has had time to complete the boot and start the server before the device running the client tries to make a connection to the server, for example. The only way to be sure that the server is ready for client connections is to make every client in the group wait until the server confirms that it is ready.
Continuing with the same panda
and beaglebone-black
example,
let’s look at synchronising devices within a MultiNode group.
Synchronisation is done using the MultiNode API, specifically the lava-send and lava-wait calls.
Continuing our example, we have two different versions of the test
action block. In the version for the server
role, the machine will
do some work (in this case, install and start the Apache web server)
and then tell the clients that the server is ready using
lava-send:
- test: role: - server definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: apache-client description: "server installation" os: - debian scope: - functional run: steps: - apt update - apt install apache2 - lava-send server_installed from: inline name: apache-client path: inline/apache-client.yaml
Note
It is recommended to use inline definitions for the calls to the synchronisation helpers. This makes it much easier to debug when a synchronisation call times out and will allow the flow of the MultiNode job to be summarised in the UI.
The test definition specified for the client
role causes the
client devices to wait until the test definition specified for the
server
role uses lava-send to signal that the server is
ready.
- test: role: - client definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: client-wait description: "client waiting for server" os: - debian scope: - functional run: steps: - lava-wait server_installed from: inline name: client-wait path: inline/client-wait.yaml
This means that each device using the role client
will wait until
any one device in the group sends a signal with the messageID of
server_installed
. The assumption here is that the group only has
one device with the label server
.
The second MultiNode example is now complete. To run this yourself, you can see the complete example test job: second-multinode-job.yaml
HERE. Remember, you’ll need specific hardware devices for this to work.
The MultiNode protocol also provides support for using the MultiNode API outside of the test shell definition; any action block can access the protocol from within specific actions. This makes it possible to even block deployment or boot on one group of machines until others are fully up and running, for example. There is a lot of flexibility here to allow for a massive range of possible test scenarios.
See also
Writing jobs using the MultiNode protocol for more information on how to call the MultiNode API outside the test shell.
As demonstrated earlier, tests can use lava-wait to cause a device to wait on a single message from any other device in the MultiNode group. It is also possible to wait for all other devices in the MultiNode group send a signal - use lava-wait-all instead.
Each message sent using the MultiNode API uses a messageID, which is a string that must be unique within the group. It is recommended to make these strings descriptive to help track job progress and debug problems. Be careful to use underscores instead of spaces in the name. The messageID will be included in the log files of the test.
Warning
When using lava-wait and lava-wait-all, the device will wait until the expected messageID is received. If that messageID does not arrive, the job will simply wait forever and timeout when the default timeout expires. See Timeouts.
lava-send can be used to send data between devices. A device can send data at any time, and that data will be broadcast to all devices in the MultiNode group. The data can be downloaded by any device in the group using the messageID using lava-wait or lava-wait-all. Data is sent as key-value pairs.
Note
The message data is stored in a cache file which will be overwritten when the next synchronisation call is made. Ensure that your scripts make use of (or copy aside) any MultiNode cache data before calling any other MultiNode API helpers that may clear the cache.
For example, if a device raises a network interface and wants to make
data about that network connection available to other devices in the
group, the device can send the IP address using lava-send
:
run:
steps:
- lava-send ipv4 ip=$(./get_ip.sh)
The contents of get_ip.sh
is operating system specific.
On the receiving device, the test definition would include a call to
lava-wait
or lava-wait-all
with the same messageID:
run:
steps:
- lava-wait ipv4
- ipdata=$(cat /tmp/lava_multi_node_cache.txt | cut -d = -f 2)
Note
Although multiple key value pairs can be sent as a single message, the API is not intended for large amounts of data. There is a message size limit of 4KiB, including protocol overhead. Use other transfer methods like ssh or wget if you need to send larger amounts of data between devices.
LAVA provides some helper routines for common data transfer tasks and
more can be added where appropriate. The main MultiNode API calls are
intended to work on all POSIX systems, but some of the helper tools
like lava-network may be restricted to particular operating
systems or compatible shells due to a reliance on operating system
tools like ifconfig
.
It is also possible for devices to retrieve data about the group itself, including the role or name of the current device as well as the names and roles of other devices in the group. See MultiNode API for more information.
The MultiNode protocol defines the MultiNode group and also allows actions within the job pipeline to make calls using the MultiNode API outside of a test definition.
The MultiNode protocol allows data to be shared between actions, including data generated in a test shell definition for one role being made available for use by a different role in its deploy or boot action.
The MultiNode protocol can underpin the use of other tools without necessarily needing a dedicated protocol class to be written for those tools. Using the MultiNode protocol is an extension of using the existing MultiNode API calls within a test definition. The use of the protocol is an advanced use of LAVA and relies on the test writer carefully planning how the job will work. See _delayed_start_multinode for an example of how to use this.