git.maquefel.me Git - qemu.git/log

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20210208a' into staging

Migration pull 2021-02-08

v2
  Dropped vmstate: Fix memory leak in vmstate_handle_alloc
    Broke on Power
  Added migration: only check page size match if RAM postcopy is enabled

# gpg: Signature made Mon 08 Feb 2021 11:28:14 GMT
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20210208a: (27 commits)
  migration: only check page size match if RAM postcopy is enabled
  migration: introduce snapshot-{save, load, delete} QMP commands
  iotests: fix loading of common.config from tests/ subdir
  iotests: add support for capturing and matching QMP events
  migration: introduce a delete_snapshot wrapper
  migration: wire up support for snapshot device selection
  migration: control whether snapshots are ovewritten
  block: rename and alter bdrv_all_find_snapshot semantics
  block: allow specifying name of block device for vmstate storage
  block: add ability to specify list of blockdevs during snapshot
  migration: stop returning errno from load_snapshot()
  migration: Make save_snapshot() return bool, not 0/-1
  block: push error reporting into bdrv_all_*_snapshot functions
  migration: Display the migration blockers
  migration: Add blocker information
  migration: Fix a few absurdly defective error messages
  migration: Fix cache_init()'s "Failed to allocate" error messages
  migration: Clean up signed vs. unsigned XBZRLE cache-size
  migration: Fix migrate-set-parameters argument validation
  migration: introduce 'userfaultfd-wrlat.py' script
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-02-08' into staging

QAPI patches patches for 2021-02-08

# gpg: Signature made Mon 08 Feb 2021 13:54:26 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2021-02-08:
  qapi: enable strict-optional checks
  qapi: type 'info' as Optional[QAPISourceInfo]
  qapi/gen: Drop support for QAPIGen without a file name
  qapi/commands: Simplify command registry generation
  qapi/gen: Support switching to another module temporarily
  qapi/gen: write _genc/_genh access shims
  qapi: centralize the built-in module name definition
  qapi/gen: Combine ._add_[user|system]_module
  qapi: use './builtin' as the built-in module name
  qapi: use explicitly internal module names
  qapi/gen: Replace ._begin_system_module()
  qapi: centralize is_[user|system|builtin]_module methods
  qapi/gen: inline _wrap_ifcond into end_if()
  qapi/main: handle theoretical None-return from re.match()
  qapi/events: fix visit_event typing
  qapi/commands: assert arg_type is not None

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

qapi: enable strict-optional checks

In the modules that we are checking so far, we can be stricter about the
difference between Optional[T] and T types. Enable that check.

Enabling it now will assist review on further typing and cleanup work.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-17-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: type 'info' as Optional[QAPISourceInfo]

For everything typed so far, type this parameter as
Optional[QAPISourceInfo].

In the most generic case, QAPISchemaEntity's info field may be None to
represent types that come from built-in definitions. Although some
Entity types may not currently have any built-in definitions, it is not
easily possible to constrain the type except on an ad-hoc basis using
assertions.

It's easier and simpler, then, to just say it's always an Optional type.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-16-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi/gen: Drop support for QAPIGen without a file name

The previous commit removed the only user of QAPIGen(None). Tighten
the type hint.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-15-jsnow@redhat.com>

qapi/commands: Simplify command registry generation

QAPISchemaGenCommandVisitor.visit_command() needs to generate the
marshalling function into the current module, and also generate its
registration into the ./init system module. The latter is done
somewhat awkwardly: .__init__() creates a QAPIGenCCode that will not
be written out, each .visit_command() adds its registration to it, and
.visit_end() copies its contents into the ./init module it creates.

Instead provide the means to temporarily switch to another module.
Create the ./init module in .visit_begin(), and generate its initial
part. Add registrations to it in .visit_command(). Finish it in
.visit_end().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-14-jsnow@redhat.com>

qapi/gen: Support switching to another module temporarily

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-13-jsnow@redhat.com>
[Commit message tweaked]

qapi/gen: write _genc/_genh access shims

Many places assume they can access these fields without checking them
first to ensure they are defined. Eliminating the _genc and _genh fields
and replacing them with functional properties that check for correct
state can ease the typing overhead by eliminating the Optional[T] return
type.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-12-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: centralize the built-in module name definition

Use a constant to make it obvious we're referring to a very specific thing.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-11-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi/gen: Combine ._add_[user|system]_module

With callers to _add_system_module now explicitly using the './' prefix
to indicate a system module, there is no longer any reason to have
separate interfaces for adding system vs user modules; use a unified
interface that differentiates based on the name.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-10-jsnow@redhat.com>

qapi: use './builtin' as the built-in module name

Use './builtin' as the built-in module name instead of
None. Clarify the typing that this is now always a string.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-9-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: use explicitly internal module names

QAPISchemaModularCVisitor._add_system_module() prefixes './' to its name
argument to make it a module name. Pass the module name instead. This
will allow us to coalesce the methods to add modules later on.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-8-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message reworded]
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi/gen: Replace ._begin_system_module()

QAPISchemaModularCVisitor._begin_system_module() is actually just for
the builtin module. Rename it to ._begin_builtin_module() and drop
its useless @name parameter.

Clarify conditionals in visit_module to make this clear.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-7-jsnow@redhat.com>

qapi: centralize is_[user|system|builtin]_module methods

Define what a module is and define what kind of a module it is once and
for all, in one place.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-6-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi/gen: inline _wrap_ifcond into end_if()

We assert _start_if is not None in end_if, but that's opaque to mypy.
By inlining _wrap_ifcond, that constraint becomes provable to mypy.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-5-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi/main: handle theoretical None-return from re.match()

Mypy cannot understand that this match can never be None, so help it
along.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-4-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi/events: fix visit_event typing

Actually, the arg_type can indeed be Optional.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-3-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi/commands: assert arg_type is not None

When boxed is True, expr.py asserts that we must have
arguments. Ultimately, this should mean that if boxed is True that
arg_type should be defined. Mypy cannot infer this, and does not support
'stateful' type inference, e.g.:

```
if x:
assert y is not None

...

if x:
y.etc()
```

does not work, because mypy does not statefully remember the conditional
assertion in the second block. Help mypy out by creating a new local
that it can track more easily.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210201193747.2169670-2-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-gdbstub-docs-080221-1' into staging

Testing, gdbstub and doc tweaks:

  - increase timeout on replay kernel acceptance test
  - fixes for binfmt_misc docker images
  - better gdb version detection
  - don't silently skip gdb tests
  - fix for gdbstub auxv handling
  - cleaner handling of check-tcg on tcg disabled builds
  - expand vexpress/versitile docs with examples

# gpg: Signature made Mon 08 Feb 2021 11:12:03 GMT
# gpg:                using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* remotes/stsquad/tags/pull-testing-gdbstub-docs-080221-1:
  docs/system: document an example booting the versatilepb machine
  docs/system: document an example vexpress-a15 invocation
  tests/Makefile.include: don't use TARGET_DIRS for check-tcg
  scripts/mtest2make.py: export all-%s-targets variable and use it
  tests/tcg: Replace /bin/true by true (required on macOS)
  gdbstub: Fix handle_query_xfer_auxv
  tests/tcg: don't silently skip the gdb tests
  configure: bump the minimum gdb version for check-tcg to 9.1
  configure: make version_ge more tolerant of shady version input
  tests/docker: add a docker-exec-copy-test
  tests/docker: alias docker-help target for consistency
  tests/docker: preserve original name when copying libs
  tests/docker: make _copy_with_mkdir accept missing files
  tests/docker: Fix typo in help message
  tests/docker: Fix _get_so_libs() for docker-binfmt-image
  tests/acceptance: Increase the timeout in the replay tests

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

migration: only check page size match if RAM postcopy is enabled

Postcopy may also be advised for dirty-bitmap migration only, in which
case the remote page size will not be available and we'll instead read
bogus data, blocking migration with a mismatch error if the VM uses
hugepages.

Fixes: 58110f0acb ("migration: split common postcopy out of ram postcopy")
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Message-Id: <20210204163522.13291-1-s.reiter@proxmox.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: introduce snapshot-{save, load, delete} QMP commands

savevm, loadvm and delvm are some of the few HMP commands that have never
been converted to use QMP. The reasons for the lack of conversion are
that they blocked execution of the event thread, and the semantics
around choice of disks were ill-defined.

Despite this downside, however, libvirt and applications using libvirt
have used these commands for as long as QMP has existed, via the
"human-monitor-command" passthrough command. IOW, while it is clearly
desirable to be able to fix the problems, they are not a blocker to
all real world usage.

Meanwhile there is a need for other features which involve adding new
parameters to the commands. This is possible with HMP passthrough, but
it provides no reliable way for apps to introspect features, so using
QAPI modelling is highly desirable.

This patch thus introduces new snapshot-{load,save,delete} commands to
QMP that are intended to replace the old HMP counterparts. The new
commands are given different names, because they will be using the new
QEMU job framework and thus will have diverging behaviour from the HMP
originals. It would thus be misleading to keep the same name.

While this design uses the generic job framework, the current impl is
still blocking. The intention that the blocking problem is fixed later.
None the less applications using these new commands should assume that
they are asynchronous and thus wait for the job status change event to
indicate completion.

In addition to using the job framework, the new commands require the
caller to be explicit about all the block device nodes used in the
snapshot operations, with no built-in default heuristics in use.

Note that the existing "query-named-block-nodes" can be used to query
what snapshots currently exist for block nodes.

Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-13-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: removed tests for now, the output ordering isn't
deterministic

iotests: fix loading of common.config from tests/ subdir

common.rc assumes it is being sourced from the same directory and
so also tries to source common.config from the current working
directory. With the ability to now have named tests in the tests/
subdir we need to check two locations for common.config.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-12-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

iotests: add support for capturing and matching QMP events

When using the _launch_qemu and _send_qemu_cmd functions from
common.qemu, any QMP events get mixed in with the output from
the commands and responses.

This makes it difficult to write a test case as the ordering
of events in the output is not stable.

This introduces a variable 'capture_events' which can be set
to a list of event names. Any events listed in this variable
will not be printed, instead collected in the $QEMU_EVENTS
environment variable.

A new '_wait_event' function can be invoked to collect events
at a fixed point in time. The function will first pull events
cached in $QEMU_EVENTS variable, and if none are found, will
then read more from QMP.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-11-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: introduce a delete_snapshot wrapper

Make snapshot deletion consistent with the snapshot save
and load commands by using a wrapper around the blockdev
layer. The main difference is that we get upfront validation
of the passed in device list (if any).

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-10-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: wire up support for snapshot device selection

Modify load_snapshot/save_snapshot to accept the device list and vmstate
node name parameters previously added to the block layer.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-9-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: control whether snapshots are ovewritten

The traditional HMP "savevm" command will overwrite an existing snapshot
if it already exists with the requested name. This new flag allows this
to be controlled allowing for safer behaviour with a future QMP command.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-8-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

block: rename and alter bdrv_all_find_snapshot semantics

Currently bdrv_all_find_snapshot() will return 0 if it finds
a snapshot, -1 if an error occurs, or if it fails to find a
snapshot. New callers to be added want to distinguish between
the error scenario and failing to find a snapshot.

Rename it to bdrv_all_has_snapshot and make it return -1 on
error, 0 if no snapshot is found and 1 if snapshot is found.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-7-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

block: allow specifying name of block device for vmstate storage

Currently the vmstate will be stored in the first block device that
supports snapshots. Historically this would have usually been the
root device, but with UEFI it might be the variable store. There
needs to be a way to override the choice of block device to store
the state in.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-6-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

block: add ability to specify list of blockdevs during snapshot

When running snapshot operations, there are various rules for which
blockdevs are included/excluded. While this provides reasonable default
behaviour, there are scenarios that are not well handled by the default
logic. Some of the conditions do not have a single correct answer.

Thus there needs to be a way for the mgmt app to provide an explicit
list of blockdevs to perform snapshots across. This can be achieved
by passing a list of node names that should be used.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-5-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: stop returning errno from load_snapshot()

None of the callers care about the errno value since there is a full
Error object populated. This gives consistency with save_snapshot()
which already just returns a boolean value.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
[PMD: Return false/true instead of -1/0, document function]
Acked-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210204124834.774401-4-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Make save_snapshot() return bool, not 0/-1

Just for consistency, following the example documented since
commit e3fe3988d7 ("error: Document Error API usage rules"),
return a boolean value indicating an error is set or not.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210204124834.774401-3-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

block: push error reporting into bdrv_all_*_snapshot functions

The bdrv_all_*_snapshot functions return a BlockDriverState pointer
for the invalid backend, which the callers then use to report an
error message. In some cases multiple callers are reporting the
same error message, but with slightly different text. In the future
there will be more error scenarios for some of these methods, which
will benefit from fine grained error message reporting. So it is
helpful to push error reporting down a level.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
[PMD: Initialize variables]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210204124834.774401-2-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Display the migration blockers

Update 'info migrate' to display migration blocking information.
If the outbound migration is not blocked, there is no change, however
if it is blocked a message is displayed with a list of reasons why,
e.g.

qemu-system-x86_64 -nographic  -smp 4 -m 4G -M pc,usb=on \
-chardev null,id=n -device usb-serial,chardev=n \
-virtfs local,path=/home,mount_tag=fs,security_model=none \
-drive if=virtio,file=myimage.qcow2

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Outgoing migration blocked:
  Migration is disabled when VirtFS export path '/home' is mounted in the guest using mount_tag 'fs'
  non-migratable device: 0000:00:01.2/1/usb-serial

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202135522.127380-3-dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Add blocker information

Modify query-migrate so that it has a flag indicating if outbound
migration is blocked, and if it is a list of reasons.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202135522.127380-2-dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Fix a few absurdly defective error messages

migrate_params_check() has a number of error messages of the form

Parameter 'NAME' expects is invalid, it should be ...

Fix them to something like

Parameter 'NAME' expects a ...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-5-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Fix cache_init()'s "Failed to allocate" error messages

cache_init() attempts to handle allocation failure.  The two error
messages are garbage, as untested error messages commonly are:

    Parameter 'cache size' expects Failed to allocate cache
    Parameter 'cache size' expects Failed to allocate page cache

Fix them to just

    Failed to allocate cache
    Failed to allocate page cache

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-4-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Clean up signed vs. unsigned XBZRLE cache-size

73af8dd8d7 "migration: Make xbzrle_cache_size a migration
parameter" (v2.11.0) made the new parameter unsigned (QAPI type
'size', uint64_t in C).  It neglected to update existing code, which
continues to use int64_t.

migrate_xbzrle_cache_size() returns the new parameter.  Adjust its
return type.

QMP query-migrate-cache-size returns migrate_xbzrle_cache_size().
Adjust its return type.

migrate-set-parameters passes the new parameter to
xbzrle_cache_resize().  Adjust its parameter type.

xbzrle_cache_resize() passes it on to cache_init().  Adjust its
parameter type.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-3-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Fix migrate-set-parameters argument validation

Commit 741d4086c8 "migration: Use proper types in json" (v2.12.0)
switched MigrationParameters to narrower integer types, and removed
the simplified qmp_migrate_set_parameters()'s argument checking
accordingly.

Good idea, except qmp_migrate_set_parameters() takes
MigrateSetParameters, not MigrationParameters.  Its job is updating
migrate_get_current()->parameters (which *is* of type
MigrationParameters) according to its argument.  The integers now get
truncated silently.  Reproducer:

    ---> {'execute': 'query-migrate-parameters'}
    <--- {"return": {[...] "compress-threads": 8, [...]}}
    ---> {"execute": "migrate-set-parameters", "arguments": {"compress-threads": 257}}
    <--- {"return": {}}
    ---> {'execute': 'query-migrate-parameters'}
    <--- {"return": {[...] "compress-threads": 1, [...]}}

Fix by resynchronizing MigrateSetParameters with MigrationParameters.

Fixes: 741d4086c856320807a2575389d7c0505578270b
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-2-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: introduce 'userfaultfd-wrlat.py' script

Add BCC/eBPF script to analyze userfaultfd write fault latency distribution.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210129101407.103458-6-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: implementation of background snapshot thread

Introducing implementation of 'background' snapshot thread
which in overall follows the logic of precopy migration
while internally utilizes completely different mechanism
to 'freeze' vmstate at the start of snapshot creation.

This mechanism is based on userfault_fd with wr-protection
support and is Linux-specific.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210129101407.103458-5-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: support UFFD write fault processing in ram_save_iterate()

In this particular implementation the same single migration
thread is responsible for both normal linear dirty page
migration and procesing UFFD page fault events.

Processing write faults includes reading UFFD file descriptor,
finding respective RAM block and saving faulting page to
the migration stream. After page has been saved, write protection
can be removed. Since asynchronous version of qemu_put_buffer()
is expected to be used to save pages, we also have to flush
migraion stream prior to un-protecting saved memory range.

Write protection is being removed for any previously protected
memory chunk that has hit the migration stream. That's valid
for pages from linear page scan along with write fault pages.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210129101407.103458-4-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
fixup pagefault.address cast for 32bit

migration: introduce UFFD-WP low-level interface helpers

Glue code to the userfaultfd kernel implementation.
Querying feature support, createing file descriptor, feature control,
memory region registration, IOCTLs on registered registered regions.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210129101407.103458-3-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixed up range.start casting for 32bit

migration: introduce 'background-snapshot' migration capability

Add new capability to 'qapi/migration.json' schema.
Update migrate_caps_check() to validate enabled capability set
against introduced one. Perform checks for required kernel features
and compatibility with guest memory backends.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/qemu-file: Fix maybe uninitialized on qemu_get_buffer_in_place()

Fixed error when compiling migration/qemu-file.c with -Werror=maybe-uninitialized
as shown here:

../migration/qemu-file.c: In function 'qemu_get_buffer_in_place':
../migration/qemu-file.c:604:18: error: 'src' may be used uninitialized in this function [-Werror=maybe-uninitialized]
604 | *buf = src;
| ~~~~~^~~~~
cc1: all warnings being treated as errors

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Message-Id: <20210128130625.569900-1-wainersm@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

savevm: Fix memory leak of vmstate_configuration

When VM migrate VMState of configuration, the fields(name and capabilities)
of configuration having a flag of VMS_ALLOC need to allocate memory. If the
src doesn't free memory of capabilities in SaveState after save VMState of
configuration, or the dst doesn't free memory of name and capabilities in post
load of configuration, it may result in memory leak of name and capabilities.
We free memory in configuration_post_save and configuration_post_load func,
which prevents memory leak.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Jinhao Gao <gaojinhao@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20201231061020.828-3-gaojinhao@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

spapr_pci: Fix memory leak of vmstate_spapr_pci

When VM migrate VMState of spapr_pci, the field(msi_devs) of spapr_pci
having a flag of VMS_ALLOC need to allocate memory. If the src doesn't free
memory of msi_devs in SaveStateEntry of spapr_pci after QEMUFile save
VMState of spapr_pci, it may result in memory leak of msi_devs. We add the
post_save func to free memory, which prevents memory leak.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Jinhao Gao <gaojinhao@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20201231061020.828-2-gaojinhao@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Merge remote-tracking branch 'remotes/dg-gitlab/tags/cgs-pull-request' into staging

Generalize memory encryption models

A number of hardware platforms are implementing mechanisms whereby the
hypervisor does not have unfettered access to guest memory, in order
to mitigate the security impact of a compromised hypervisor.

AMD's SEV implements this with in-cpu memory encryption, and Intel has
its own memory encryption mechanism.  POWER has an upcoming mechanism
to accomplish this in a different way, using a new memory protection
level plus a small trusted ultravisor.  s390 also has a protected
execution environment.

The current code (committed or draft) for these features has each
platform's version configured entirely differently.  That doesn't seem
ideal for users, or particularly for management layers.

AMD SEV introduces a notionally generic machine option
"machine-encryption", but it doesn't actually cover any cases other
than SEV.

This series is a proposal to at least partially unify configuration
for these mechanisms, by renaming and generalizing AMD's
"memory-encryption" property.  It is replaced by a
"confidential-guest-support" property pointing to a platform specific
object which configures and manages the specific details.

Note to Ram Pai: the documentation I've included for PEF is very
minimal.  If you could send a patch expanding on that, it would be
very helpful.

Changes since v8:
* Rebase
* Fixed some cosmetic typos
Changes since v7:
* Tweaked and clarified meaning of the 'ready' flag
* Polished the interface to the PEF internals
* Shifted initialization for s390 PV later (I hope I've finally got
   this after apply_cpu_model() where it needs to be)
Changes since v6:
* Moved to using OBJECT_DECLARE_TYPE and OBJECT_DEFINE_TYPE macros
* Assorted minor fixes
Changes since v5:
* Renamed from "securable guest memory" to "confidential guest
   support"
* Simpler reworking of x86 boot time flash encryption
* Added a bunch of documentation
* Fixed some compile errors on POWER
Changes since v4:
* Renamed from "host trust limitation" to "securable guest memory",
   which I think is marginally more descriptive
* Re-organized initialization, because the previous model called at
   kvm_init didn't work for s390
* Assorted fixes to the s390 implementation; rudimentary testing
   (gitlab CI) only
Changes since v3:
* Rebased
* Added first cut at handling of s390 protected virtualization
Changes since RFCv2:
* Rebased
* Removed preliminary SEV cleanups (they've been merged)
* Changed name to "host trust limitation"
* Added migration blocker to the PEF code (based on SEV's version)
Changes since RFCv1:
* Rebased
* Fixed some errors pointed out by Dave Gilbert

# gpg: Signature made Mon 08 Feb 2021 06:07:27 GMT
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dg-gitlab/tags/cgs-pull-request:
  s390: Recognize confidential-guest-support option
  confidential guest support: Alter virtio default properties for protected guests
  spapr: PEF: prevent migration
  spapr: Add PEF based confidential guest support
  confidential guest support: Update documentation
  confidential guest support: Move SEV initialization into arch specific code
  confidential guest support: Introduce cgs "ready" flag
  sev: Add Error ** to sev_kvm_init()
  confidential guest support: Rework the "memory-encryption" property
  confidential guest support: Move side effect out of machine_set_memory_encryption()
  sev: Remove false abstraction of flash encryption
  confidential guest support: Introduce new confidential guest support class
  qom: Allow optional sugar props

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

docs/system: document an example booting the versatilepb machine

There is a bit more out there including Aurelien's excellent write up
and older Debian images here:

https://www.aurel32.net/info/debian_arm_qemu.php
https://people.debian.org/~aurel32/qemu/armel/

However the web is transitory and git is forever so lets add something
to the fine manual.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20210202134001.25738-16-alex.bennee@linaro.org>

docs/system: document an example vexpress-a15 invocation

The wiki and the web are curiously absent of the right runes to boot a
vexpress model so I had to work from first principles to work it out.
Use the more modern -drive notation so alternative backends can be
used (unlike the hardwired -sd mode).

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Message-Id: <20210202134001.25738-15-alex.bennee@linaro.org>

tests/Makefile.include: don't use TARGET_DIRS for check-tcg

TARGET_DIRS reflects what we wanted to configure which in the normal
case is all our targets. However once meson has pared-down our target
list due to missing features we need to check the final list of
ninja-targets. This prevents check-tcg barfing on a --disable-tcg
build.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210202134001.25738-14-alex.bennee@linaro.org>

scripts/mtest2make.py: export all-%s-targets variable and use it

There are some places where the conditional makefile support is the
simplest solution. Now we don't expose CONFIG_TCG as a variable create
a new one that can be checked for the check-help output.

As check-tcg is a PHONY target we re-use check-softfloat to gate that
as well.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210202134001.25738-13-alex.bennee@linaro.org>

tests/tcg: Replace /bin/true by true (required on macOS)

/bin/true is missing on macOS, but simply "true" is available as a shell builtin.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210128135627.2067003-1-sw@weilnetz.de>
Message-Id: <20210202134001.25738-12-alex.bennee@linaro.org>

gdbstub: Fix handle_query_xfer_auxv

The main problem was that we were treating a guest address
as a host address with a mere cast.

Use the correct interface for accessing guest memory. Do not
allow offset == auxv_len, which would result in an empty packet.

Fixes: 51c623b0de1 ("gdbstub: add support to Xfer:auxv:read: packet")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210128201831.534033-1-richard.henderson@linaro.org>
Message-Id: <20210202134001.25738-11-alex.bennee@linaro.org>

tests/tcg: don't silently skip the gdb tests

Otherwise people won't know what they are missing.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210202134001.25738-10-alex.bennee@linaro.org>

configure: bump the minimum gdb version for check-tcg to 9.1

For SVE, currently the bulk of the GDB TCG tests, we need at least GDB
9.1 to support the "ieee_half" data type we report. This only affects
when GDB tests are run; users can still use lower versions of gdb as
long as they aren't talking to an SVE enabled model. The work around
is to either get a newer gdb or disable SVE for their CPU model.

Reported-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Luis Machado <luis.machado@linaro.org>
Message-Id: <20210202134001.25738-9-alex.bennee@linaro.org>

configure: make version_ge more tolerant of shady version input

When checking GDB versions we have to tolerate all sorts of random
distro extensions to the version string. While we already attempt to
do some of that before we call version_ge is makes sense to try and
regularise the first input by stripping extraneous -'s. While we at it
convert the old-style shell quoting into a cleaner form t shut up my
editors linter lest it confuse me by underlining the whole line.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210202134001.25738-8-alex.bennee@linaro.org>

tests/docker: add a docker-exec-copy-test

This provides test machinery for checking the QEMU copying logic works
properly. It takes considerably less time to run than starting a
debootstrap only for it to fail later. I considered adding a remove
command to docker.py but figured that might be gold plating given the
relative size of the containers compared to the ones with actual stuff
in them.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210202134001.25738-7-alex.bennee@linaro.org>

tests/docker: alias docker-help target for consistency

We have a bunch of -help targets so this will save some cognitive
dissonance. Keep the original for those with muscle memory.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210202134001.25738-6-alex.bennee@linaro.org>

tests/docker: preserve original name when copying libs

While it is important we chase down the symlinks to copy the correct
data we can confuse the kernel by renaming the interpreter to what is
in the binary. Extend _copy_with_mkdir to preserve the original name
of the file when asked.

Fixes: 5e33f7fead ("tests/docker: better handle symlinked libs")
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210202134001.25738-5-alex.bennee@linaro.org>

tests/docker: make _copy_with_mkdir accept missing files

Depending on the linker/ldd setup we might get a file with no path.
Typically this is the psuedo library linux-vdso.so which doesn't
actually exist on the disk. Rather than try and catch these distro
specific edge cases just shout about it and try and continue.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210202134001.25738-4-alex.bennee@linaro.org>

tests/docker: Fix typo in help message

To have the variable properly passed, we need to set it,
ie. NOUSER=1. Fix the message displayed by 'make docker'.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210119052120.522069-1-f4bug@amsat.org>
Message-Id: <20210202134001.25738-3-alex.bennee@linaro.org>

tests/docker: Fix _get_so_libs() for docker-binfmt-image

Fix a variable rename mistake from commit 5e33f7fead5:

  Traceback (most recent call last):
    File "./tests/docker/docker.py", line 710, in <module>
      sys.exit(main())
    File "./tests/docker/docker.py", line 706, in main
      return args.cmdobj.run(args, argv)
    File "./tests/docker/docker.py", line 489, in run
      _copy_binary_with_libs(args.include_executable,
    File "./tests/docker/docker.py", line 149, in _copy_binary_with_libs
      libs = _get_so_libs(src)
    File "./tests/docker/docker.py", line 123, in _get_so_libs
      libs.append(s.group(1))
  NameError: name 's' is not defined

Fixes: 5e33f7fead5 ("tests/docker: better handle symlinked libs")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210119050149.516910-1-f4bug@amsat.org>
Message-Id: <20210202134001.25738-2-alex.bennee@linaro.org>

tests/acceptance: Increase the timeout in the replay tests

Our gitlab-CI just showed a failed test_ppc_mac99 since it was apparently
killed some few seconds before the test finished. Allow it some more
time to complete.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210127065222.48650-1-thuth@redhat.com>

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-20210207' into staging

qemu-sparc queue

# gpg: Signature made Sun 07 Feb 2021 22:09:12 GMT
# gpg:                using RSA key CC621AB98E82200D915CC9C45BC2C56FAE0F321F
# gpg:                issuer "mark.cave-ayland@ilande.co.uk"
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>" [full]
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C  C9C4 5BC2 C56F AE0F 321F

* remotes/mcayland/tags/qemu-sparc-20210207:
  utils/fifo8: add VMSTATE_FIFO8_TEST macro
  utils/fifo8: change fatal errors from abort() to assert()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

s390: Recognize confidential-guest-support option

At least some s390 cpu models support "Protected Virtualization" (PV),
a mechanism to protect guests from eavesdropping by a compromised
hypervisor.

This is similar in function to other mechanisms like AMD's SEV and
POWER's PEF, which are controlled by the "confidential-guest-support"
machine option.  s390 is a slightly special case, because we already
supported PV, simply by using a CPU model with the required feature
(S390_FEAT_UNPACK).

To integrate this with the option used by other platforms, we
implement the following compromise:

- When the confidential-guest-support option is set, s390 will
   recognize it, verify that the CPU can support PV (failing if not)
   and set virtio default options necessary for encrypted or protected
   guests, as on other platforms.  i.e. if confidential-guest-support
   is set, we will either create a guest capable of entering PV mode,
   or fail outright.

- If confidential-guest-support is not set, guests might still be
   able to enter PV mode, if the CPU has the right model.  This may be
   a little surprising, but shouldn't actually be harmful.

To start a guest supporting Protected Virtualization using the new
option use the command line arguments:
    -object s390-pv-guest,id=pv0 -machine confidential-guest-support=pv0

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

confidential guest support: Alter virtio default properties for protected guests

The default behaviour for virtio devices is not to use the platforms normal
DMA paths, but instead to use the fact that it's running in a hypervisor
to directly access guest memory. That doesn't work if the guest's memory
is protected from hypervisor access, such as with AMD's SEV or POWER's PEF.

So, if a confidential guest mechanism is enabled, then apply the
iommu_platform=on option so it will go through normal DMA mechanisms.
Those will presumably have some way of marking memory as shared with
the hypervisor or hardware so that DMA will work.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: PEF: prevent migration

We haven't yet implemented the fairly involved handshaking that will be
needed to migrate PEF protected guests. For now, just use a migration
blocker so we get a meaningful error if someone attempts this (this is the
same approach used by AMD SEV).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: Add PEF based confidential guest support

Some upcoming POWER machines have a system called PEF (Protected
Execution Facility) which uses a small ultravisor to allow guests to
run in a way that they can't be eavesdropped by the hypervisor.  The
effect is roughly similar to AMD SEV, although the mechanisms are
quite different.

Most of the work of this is done between the guest, KVM and the
ultravisor, with little need for involvement by qemu.  However qemu
does need to tell KVM to allow secure VMs.

Because the availability of secure mode is a guest visible difference
which depends on having the right hardware and firmware, we don't
enable this by default.  In order to run a secure guest you need to
create a "pef-guest" object and set the confidential-guest-support
property to point to it.

Note that this just *allows* secure guests, the architecture of PEF is
such that the guest still needs to talk to the ultravisor to enter
secure mode.  Qemu has no direct way of knowing if the guest is in
secure mode, and certainly can't know until well after machine
creation time.

To start a PEF-capable guest, use the command line options:
    -object pef-guest,id=pef0 -machine confidential-guest-support=pef0

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>

confidential guest support: Update documentation

Now that we've implemented a generic machine option for configuring various
confidential guest support mechanisms:
  1. Update docs/amd-memory-encryption.txt to reference this rather than
     the earlier SEV specific option
  2. Add a docs/confidential-guest-support.txt to cover the generalities of
     the confidential guest support scheme

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>

confidential guest support: Move SEV initialization into arch specific code

While we've abstracted some (potential) differences between mechanisms for
securing guest memory, the initialization is still specific to SEV. Given
that, move it into x86's kvm_arch_init() code, rather than the generic
kvm_init() code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

confidential guest support: Introduce cgs "ready" flag

The platform specific details of mechanisms for implementing
confidential guest support may require setup at various points during
initialization. Thus, it's not really feasible to have a single cgs
initialization hook, but instead each mechanism needs its own
initialization calls in arch or machine specific code.

However, to make it harder to have a bug where a mechanism isn't
properly initialized under some circumstances, we want to have a
common place, late in boot, where we verify that cgs has been
initialized if it was requested.

This patch introduces a ready flag to the ConfidentialGuestSupport
base type to accomplish this, which we verify in
qemu_machine_creation_done().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

sev: Add Error ** to sev_kvm_init()

This allows failures to be reported richly and idiomatically.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

confidential guest support: Rework the "memory-encryption" property

Currently the "memory-encryption" property is only looked at once we
get to kvm_init(). Although protection of guest memory from the
hypervisor isn't something that could really ever work with TCG, it's
not conceptually tied to the KVM accelerator.

In addition, the way the string property is resolved to an object is
almost identical to how a QOM link property is handled.

So, create a new "confidential-guest-support" link property which sets
this QOM interface link directly in the machine. For compatibility we
keep the "memory-encryption" property, but now implemented in terms of
the new property.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

confidential guest support: Move side effect out of machine_set_memory_encryption()

When the "memory-encryption" property is set, we also disable KSM
merging for the guest, since it won't accomplish anything.

We want that, but doing it in the property set function itself is
thereoretically incorrect, in the unlikely event of some configuration
environment that set the property then cleared it again before
constructing the guest.

More importantly, it makes some other cleanups we want more difficult.
So, instead move this logic to machine_run_board_init() conditional on
the final value of the property.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

sev: Remove false abstraction of flash encryption

When AMD's SEV memory encryption is in use, flash memory banks (which are
initialed by pc_system_flash_map()) need to be encrypted with the guest's
key, so that the guest can read them.

That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM
state.. except, that it doesn't really abstract much at all.

For starters, the only call site is in code specific to the 'pc'
family of machine types, so it's obviously specific to those and to
x86 to begin with.  But it makes a bunch of further assumptions that
need not be true about an arbitrary confidential guest system based on
memory encryption, let alone one based on other mechanisms:

* it assumes that the flash memory is defined to be encrypted with the
   guest key, rather than being shared with hypervisor
* it assumes that that hypervisor has some mechanism to encrypt data into
   the guest, even though it can't decrypt it out, since that's the whole
   point
* the interface assumes that this encrypt can be done in place, which
   implies that the hypervisor can write into a confidential guests's
   memory, even if what it writes isn't meaningful

So really, this "abstraction" is actually pretty specific to the way SEV
works.  So, this patch removes it and instead has the PC flash
initialization code call into a SEV specific callback.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

confidential guest support: Introduce new confidential guest support class

Several architectures have mechanisms which are designed to protect
guest memory from interference or eavesdropping by a compromised
hypervisor.  AMD SEV does this with in-chip memory encryption and
Intel's TDX can do similar things.  POWER's Protected Execution
Framework (PEF) accomplishes a similar goal using an ultravisor and
new memory protection features, instead of encryption.

To (partially) unify handling for these, this introduces a new
ConfidentialGuestSupport QOM base class.  "Confidential" is kind of vague,
but "confidential computing" seems to be the buzzword about these schemes,
and "secure" or "protected" are often used in connection to unrelated
things (such as hypervisor-from-guest or guest-from-guest security).

The "support" in the name is significant because in at least some of the
cases it requires the guest to take specific actions in order to protect
itself from hypervisor eavesdropping.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

qom: Allow optional sugar props

Global properties have an @optional field, which allows to apply a given
property to a given type even if one of its subclasses doesn't support
it. This is especially used in the compat code when dealing with the
"disable-modern" and "disable-legacy" properties and the "virtio-pci"
type.

Allow object_register_sugar_prop() to set this field as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159738953558.377274.16617742952571083440.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

utils/fifo8: add VMSTATE_FIFO8_TEST macro

Rewrite the existing VMSTATE_FIFO8 macro to use VMSTATE_FIFO8_TEST as per the
standard pattern in include/migration/vmstate.h.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210128221728.14887-3-mark.cave-ayland@ilande.co.uk>

utils/fifo8: change fatal errors from abort() to assert()

Developer errors are better represented with assert() rather than abort(). Also
improve the strictness of the checks by using range checks within the assert()
rather than converting the existing equality checks to inequality checks.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210121102518.20112-1-mark.cave-ayland@ilande.co.uk>

Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210205' into staging

TCGCPUOps cleanups (claudio)
tcg/s390 compare fix (phil)
tcg/aarch64 rotli_vec fix
tcg/tci cleanups and fixes

# gpg: Signature made Fri 05 Feb 2021 22:55:10 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth-gitlab/tags/pull-tcg-20210205: (46 commits)
  accel: introduce AccelCPUClass extending CPUClass
  accel: replace struct CpusAccel with AccelOpsClass
  accel: extend AccelState and AccelClass to user-mode
  cpu: tcg_ops: move to tcg-cpu-ops.h, keep a pointer in CPUClass
  cpu: move debug_check_watchpoint to tcg_ops
  cpu: move adjust_watchpoint_address to tcg_ops
  physmem: make watchpoint checking code TCG-only
  cpu: move do_unaligned_access to tcg_ops
  cpu: move cc->transaction_failed to tcg_ops
  cpu: move cc->do_interrupt to tcg_ops
  target/arm: do not use cc->do_interrupt for KVM directly
  cpu: Move debug_excp_handler to tcg_ops
  cpu: Move tlb_fill to tcg_ops
  cpu: Move cpu_exec_* to tcg_ops
  cpu: Move synchronize_from_tb() to tcg_ops
  accel/tcg: split TCG-only code from cpu_exec_realizefn
  target/riscv: remove CONFIG_TCG, as it is always TCG
  cpu: Introduce TCGCpuOperations struct
  tcg/tci: Remove TCG_CONST
  tcg/tci: Fix TCG_REG_R4 misusage
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

accel: introduce AccelCPUClass extending CPUClass

add a new optional interface to CPUClass, which allows accelerators
to extend the CPUClass with additional accelerator-specific
initializations.

This will allow to separate the target cpu code that is specific
to each accelerator, and register it automatically with object
hierarchy lookup depending on accelerator code availability,
as part of the accel_init_interfaces() initialization step.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-19-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel: replace struct CpusAccel with AccelOpsClass

This will allow us to centralize the registration of
the cpus.c module accelerator operations (in accel/accel-softmmu.c),
and trigger it automatically using object hierarchy lookup from the
new accel_init_interfaces() initialization step, depending just on
which accelerators are available in the code.

Rename all tcg-cpus.c, kvm-cpus.c, etc to tcg-accel-ops.c,
kvm-accel-ops.c, etc, matching the object type names.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-18-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel: extend AccelState and AccelClass to user-mode

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[claudio: rebased on Richard's splitwx work]

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-17-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: tcg_ops: move to tcg-cpu-ops.h, keep a pointer in CPUClass

we cannot in principle make the TCG Operations field definitions
conditional on CONFIG_TCG in code that is included by both common_ss
and specific_ss modules.

Therefore, what we can do safely to restrict the TCG fields to TCG-only
builds, is to move all tcg cpu operations into a separate header file,
which is only included by TCG, target-specific code.

This leaves just a NULL pointer in the cpu.h for the non-TCG builds.

This also tidies up the code in all targets a bit, having all TCG cpu
operations neatly contained by a dedicated data struct.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-16-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: move debug_check_watchpoint to tcg_ops

commit 568496c0c0f1 ("cpu: Add callback to check architectural") and
commit 3826121d9298 ("target-arm: Implement checking of fired")
introduced an ARM-specific hack for cpu_check_watchpoint.

Make debug_check_watchpoint optional, and move it to tcg_ops.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-15-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: move adjust_watchpoint_address to tcg_ops

commit 40612000599e ("arm: Correctly handle watchpoints for BE32 CPUs")

introduced this ARM-specific, TCG-specific hack to adjust the address,
before checking it with cpu_check_watchpoint.

Make adjust_watchpoint_address optional and move it to tcg_ops.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-14-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

physmem: make watchpoint checking code TCG-only

cpu_check_watchpoint, watchpoint_address_matches are TCG-only.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-13-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: move do_unaligned_access to tcg_ops

make it consistently SOFTMMU-only.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[claudio: make the field presence in cpu.h unconditional, removing the ifdefs]
Message-Id: <20210204163931.7358-12-cfontana@suse.de>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: move cc->transaction_failed to tcg_ops

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[claudio: wrap target code around CONFIG_TCG and !CONFIG_USER_ONLY]

avoiding its use in headers used by common_ss code (should be poisoned).

Note: need to be careful with the use of CONFIG_USER_ONLY,
Message-Id: <20210204163931.7358-11-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: move cc->do_interrupt to tcg_ops

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-10-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: do not use cc->do_interrupt for KVM directly

cc->do_interrupt is in theory a TCG callback used in accel/tcg only,
to prepare the emulated architecture to take an interrupt as defined
in the hardware specifications,

but in reality the _do_interrupt style of functions in targets are
also occasionally reused by KVM to prepare the architecture state in a
similar way where userspace code has identified that it needs to
deliver an exception to the guest.

In the case of ARM, that includes:

1) the vcpu thread got a SIGBUS indicating a memory error,
   and we need to deliver a Synchronous External Abort to the guest to
   let it know about the error.
2) the kernel told us about a debug exception (breakpoint, watchpoint)
   but it is not for one of QEMU's own gdbstub breakpoints/watchpoints
   so it must be a breakpoint the guest itself has set up, therefore
   we need to deliver it to the guest.

So in order to reuse code, the same arm_do_interrupt function is used.
This is all fine, but we need to avoid calling it using the callback
registered in CPUClass, since that one is now TCG-only.

Fortunately this is easily solved by replacing calls to
CPUClass::do_interrupt() with explicit calls to arm_do_interrupt().

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210204163931.7358-9-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: Move debug_excp_handler to tcg_ops

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-8-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: Move tlb_fill to tcg_ops

[claudio: wrapped target code in CONFIG_TCG]

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-7-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: Move cpu_exec_* to tcg_ops

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[claudio: wrapped target code in CONFIG_TCG]
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-6-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: Move synchronize_from_tb() to tcg_ops

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[claudio: wrapped target code in CONFIG_TCG, reworded comments]
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-5-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: split TCG-only code from cpu_exec_realizefn

move away TCG-only code, make it compile only on TCG.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[claudio: moved the prototypes from hw/core/cpu.h to exec/cpu-all.h]
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-4-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/riscv: remove CONFIG_TCG, as it is always TCG

for now only TCG is allowed as an accelerator for riscv,
so remove the CONFIG_TCG use.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-3-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cpu: Introduce TCGCpuOperations struct

The TCG-specific CPU methods will be moved to a separate struct,
to make it easier to move accel-specific code outside generic CPU
code in the future. Start by moving tcg_initialize().

The new CPUClass.tcg_opts field may eventually become a pointer,
but keep it an embedded struct for now, to make code conversion
easier.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[claudio: move TCGCpuOperations inside include/hw/core/cpu.h]
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-2-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/tci: Remove TCG_CONST

Restrict all operands to registers. All constants will be forced
into registers by the middle-end. Removing the difference in how
immediate integers were encoded will allow more code to be shared
between 32-bit and 64-bit operations.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/tci: Fix TCG_REG_R4 misusage

This was removed from tcg_target_reg_alloc_order and
tcg_target_call_iarg_regs on the assumption that it
was the stack. This was incorrectly copied from i386.
For tci, the stack is R15.

By adding R4 back to tcg_target_call_iarg_regs, adjust the other
entries so that 6 (or 12) entries are still present in the array,
and adjust the numbers in the interpreter.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>