Peter Xu [Wed, 2 May 2018 10:47:37 +0000 (18:47 +0800)]
hmp/migration: add migrate_recover command
Sister command to migrate-recover in QMP.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-22-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:36 +0000 (18:47 +0800)]
qmp/migration: new command migrate-recover
The first allow-oob=true command. It's used on destination side when
the postcopy migration is paused and ready for a recovery. After
execution, a new migration channel will be established for postcopy to
continue.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-21-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
s/2.12/2.13/
Peter Xu [Wed, 2 May 2018 10:47:35 +0000 (18:47 +0800)]
migration: init dst in migration_object_init too
Though we may not need it, now we init both the src/dst migration
objects in migration_object_init() so that even incoming migration
object would be thread safe (it was not).
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-20-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:34 +0000 (18:47 +0800)]
migration: final handshake for the resume
Finish the last step to do the final handshake for the recovery.
First source sends one MIG_CMD_RESUME to dst, telling that source is
ready to resume.
Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that
dest is ready to resume (after switch to postcopy-active state).
When source received the RESUME_ACK, it switches its state to
postcopy-active, and finally the recovery is completed.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-19-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:33 +0000 (18:47 +0800)]
migration: setup ramstate for resume
After we updated the dirty bitmaps of ramblocks, we also need to update
the critical fields in RAMState to make sure it is ready for a resume.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-18-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:32 +0000 (18:47 +0800)]
migration: synchronize dirty bitmap for resume
This patch implements the first part of core RAM resume logic for
postcopy. ram_resume_prepare() is provided for the work.
When the migration is interrupted by network failure, the dirty bitmap
on the source side will be meaningless, because even the dirty bit is
cleared, it is still possible that the sent page was lost along the way
to destination. Here instead of continue the migration with the old
dirty bitmap on source, we ask the destination side to send back its
received bitmap, then invert it to be our initial dirty bitmap.
The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
once per ramblock, to ask for the received bitmap. On destination side,
MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
Data will be received on the return-path thread of source, and the main
migration thread will be notified when all the ramblock bitmaps are
synchronized.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-17-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:31 +0000 (18:47 +0800)]
migration: introduce SaveVMHandlers.resume_prepare
This is hook function to be called when a postcopy migration wants to
resume from a failure. For each module, it should provide its own
recovery logic before we switch to the postcopy-active state.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-16-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:30 +0000 (18:47 +0800)]
migration: new message MIG_RP_MSG_RESUME_ACK
Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
is used as payload to let the source know whether destination is ready
to continue the migration.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-15-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:29 +0000 (18:47 +0800)]
migration: new cmd MIG_CMD_POSTCOPY_RESUME
Introducing this new command to be sent when the source VM is ready to
resume the paused migration. What the destination does here is
basically release the fault thread to continue service page faults.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-14-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:28 +0000 (18:47 +0800)]
migration: new message MIG_RP_MSG_RECV_BITMAP
Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
received bitmap of ramblock back to source.
This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
the header (including the ramblock name), and it was appended with the
whole ramblock received bitmap on the destination side.
When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
it parses it, convert it to the dirty bitmap by inverting the bits.
One thing to mention is that, when we send the recv bitmap, we are doing
these things in extra:
- converting the bitmap to little endian, to support when hosts are
using different endianess on src/dst.
- do proper alignment for 8 bytes, to support when hosts are using
different word size (32/64 bits) on src/dst.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-13-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:27 +0000 (18:47 +0800)]
migration: new cmd MIG_CMD_RECV_BITMAP
Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
one ramblock.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-12-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:26 +0000 (18:47 +0800)]
migration: wakeup dst ram-load-thread for recover
On the destination side, we cannot wake up all the threads when we got
reconnected. The first thing to do is to wake up the main load thread,
so that we can continue to receive valid messages from source again and
reply when needed.
At this point, we switch the destination VM state from postcopy-paused
back to postcopy-recover.
Now we are finally ready to do the resume logic.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-11-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:25 +0000 (18:47 +0800)]
migration: new state "postcopy-recover"
Introducing new migration state "postcopy-recover". If a migration
procedure is paused and the connection is rebuilt afterward
successfully, we'll switch the source VM state from "postcopy-paused" to
the new state "postcopy-recover", then we'll do the resume logic in the
migration thread (along with the return path thread).
This patch only do the state switch on source side. Another following up
patch will handle the state switching on destination side using the same
status bit.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-10-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
s/2.11/2.13/
Peter Xu [Wed, 2 May 2018 10:47:24 +0000 (18:47 +0800)]
migration: rebuild channel on source
This patch detects the "resume" flag of migration command, rebuild the
channels only if the flag is set.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-9-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:23 +0000 (18:47 +0800)]
qmp: hmp: add migrate "resume" option
It will be used when we want to resume one paused migration.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-8-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
s/2.12/2.13/
Peter Xu [Wed, 2 May 2018 10:47:22 +0000 (18:47 +0800)]
migration: allow fault thread to pause
Allows the fault thread to stop handling page faults temporarily. When
network failure happened (and if we expect a recovery afterwards), we
should not allow the fault thread to continue sending things to source,
instead, it should halt for a while until the connection is rebuilt.
When the dest main thread noticed the failure, it kicks the fault thread
to switch to pause state.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-7-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:21 +0000 (18:47 +0800)]
migration: allow src return path to pause
Let the thread pause for network issues.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-6-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:20 +0000 (18:47 +0800)]
migration: allow dst vm pause on postcopy
When there is IO error on the incoming channel (e.g., network down),
instead of bailing out immediately, we allow the dst vm to switch to the
new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
new semaphore, until someone poke it for another attempt.
One note is that here on ram loading thread we cannot detect the
POSTCOPY_ACTIVE state, but we need to detect the more specific
POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all
the device states.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-5-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:19 +0000 (18:47 +0800)]
migration: implement "postcopy-pause" src logic
Now when network down for postcopy, the source side will not fail the
migration. Instead we convert the status into this new paused state, and
we will try to wait for a rescue in the future.
If a recovery is detected, migration_thread() will reset its local
variables to prepare for that.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-4-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:18 +0000 (18:47 +0800)]
migration: new postcopy-pause state
Introducing a new state "postcopy-paused", which can be used when the
postcopy migration is paused. It is targeted for postcopy network
failure recovery.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-3-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Wed, 2 May 2018 10:47:17 +0000 (18:47 +0800)]
migration: let incoming side use thread context
The old incoming migration is running in main thread and default
gcontext. With the new qio_channel_add_watch_full() we can now let it
run in the thread's own gcontext (if there is one).
Currently this patch does nothing alone. But when any of the incoming
migration is run in another iothread (e.g., the upcoming migrate-recover
command), this patch will bind the incoming logic to the iothread
instead of the main thread (which may already get page faulted and
hanged).
RDMA is not considered for now since it's not even using the QIO watch
framework at all.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <
20180502104740.12123-2-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Juan Quintela [Sat, 7 Apr 2018 11:59:07 +0000 (13:59 +0200)]
migration: Define MultifdRecvParams sooner
Once there, we don't need the struct names anywhere, just the
typedefs. And now also document all fields.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Juan Quintela [Fri, 6 Apr 2018 17:32:12 +0000 (19:32 +0200)]
migration: Transmit initial package through the multifd channels
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
--
Be network agnostic.
Add error checking for all values.
Juan Quintela [Wed, 7 Mar 2018 07:40:52 +0000 (08:40 +0100)]
migration: Delay start of migration main routines
We need to make sure that we have started all the multifd threads.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Juan Quintela [Wed, 7 Mar 2018 06:56:15 +0000 (07:56 +0100)]
migration: Create multifd channels
In both sides. We still don't transmit anything through them.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Juan Quintela [Wed, 28 Feb 2018 11:05:15 +0000 (12:05 +0100)]
migration: Export functions to create send channels
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Juan Quintela [Mon, 19 Feb 2018 17:59:02 +0000 (18:59 +0100)]
migration: Be sure all recv channels are created
We need them before we start migration.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Juan Quintela [Mon, 19 Feb 2018 18:01:45 +0000 (19:01 +0100)]
migration: terminate_* can be called for other threads
Once there, make count field to always be accessed with atomic
operations. To make blocking operations, we need to know that the
thread is running, so create a bool to indicate that.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
--
Once here, s/terminate_multifd_*-threads/multifd_*_terminate_threads/
This is consistente with every other function
Juan Quintela [Mon, 19 Feb 2018 18:01:03 +0000 (19:01 +0100)]
migration: Introduce multifd_recv_new_channel()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Juan Quintela [Mon, 19 Feb 2018 18:01:15 +0000 (19:01 +0100)]
migration: Set error state in case of error
Signed-off-by: Juan Quintela <quintela@redhat.com>
Juan Quintela [Thu, 18 Jan 2018 17:44:17 +0000 (18:44 +0100)]
tests: Migration ppc now inlines its program
No need to write it to a file. Just need a proper firmware O:-)
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Juan Quintela [Fri, 5 Jan 2018 12:26:01 +0000 (13:26 +0100)]
tests: Add migration precopy test
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Xiao Guangrong [Sat, 28 Apr 2018 08:10:45 +0000 (16:10 +0800)]
migration: fix saving normal page even if it's been compressed
Fix the bug introduced by
da3f56cb2e767016 (migration: remove
ram_save_compressed_page()), It should be 'return' rather than
'res'
Sorry for this stupid mistake :(
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <
20180428081045.8878-1-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Maydell [Tue, 15 May 2018 16:02:00 +0000 (17:02 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- Switch AIO/callback based block drivers to a byte-based interface
- Block jobs: Expose error string via query-block-jobs
- Block job cleanups and fixes
- hmp: Allow using a qdev id in block_set_io_throttle
# gpg: Signature made Tue 15 May 2018 16:33:10 BST
# gpg: using RSA key
7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (37 commits)
iotests: Add test for -U/force-share conflicts
qemu-img: Use only string options in img_open_opts
qemu-io: Use purely string blockdev options
block: Document BDRV_REQ_WRITE_UNCHANGED support
qemu-img: Check post-truncation size
iotests: Add test for COR across nodes
iotests: Copy 197 for COR filter driver
iotests: Clean up wrap image in 197
block: Support BDRV_REQ_WRITE_UNCHANGED in filters
block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
block: Add BDRV_REQ_WRITE_UNCHANGED flag
block: BLK_PERM_WRITE includes ..._UNCHANGED
block: Add COR filter driver
iotests: Skip 181 and 201 without userfaultfd
iotests: Add failure matching to common.qemu
docs: Document the new default sizes of the qcow2 caches
qcow2: Give the refcount cache the minimum possible size by default
specs/qcow2: Clarify that compressed clusters have the COPIED bit reset
Fix error message about compressed clusters with OFLAG_COPIED
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Kevin Wolf [Tue, 15 May 2018 14:19:53 +0000 (16:19 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-2018-05-15' into queue-block
- Copy-on-read block driver
- The qcow2 default refcount cache size has been decreased
- Various bug fixes
# gpg: Signature made Tue May 15 16:18:25 2018 CEST
# gpg: using RSA key
F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2018-05-15: (21 commits)
iotests: Add test for -U/force-share conflicts
qemu-img: Use only string options in img_open_opts
qemu-io: Use purely string blockdev options
block: Document BDRV_REQ_WRITE_UNCHANGED support
qemu-img: Check post-truncation size
iotests: Add test for COR across nodes
iotests: Copy 197 for COR filter driver
iotests: Clean up wrap image in 197
block: Support BDRV_REQ_WRITE_UNCHANGED in filters
block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
block: Add BDRV_REQ_WRITE_UNCHANGED flag
block: BLK_PERM_WRITE includes ..._UNCHANGED
block: Add COR filter driver
iotests: Skip 181 and 201 without userfaultfd
iotests: Add failure matching to common.qemu
docs: Document the new default sizes of the qcow2 caches
qcow2: Give the refcount cache the minimum possible size by default
specs/qcow2: Clarify that compressed clusters have the COPIED bit reset
Fix error message about compressed clusters with OFLAG_COPIED
...
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Wed, 2 May 2018 20:20:51 +0000 (22:20 +0200)]
iotests: Add test for -U/force-share conflicts
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180502202051.15493-4-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Wed, 2 May 2018 20:20:50 +0000 (22:20 +0200)]
qemu-img: Use only string options in img_open_opts
img_open_opts() takes a QemuOpts and converts them to a QDict, so all
values therein are strings. Then it may try to call qdict_get_bool(),
however, which will fail with a segmentation fault every time:
$ ./qemu-img info -U --image-opts \
driver=file,filename=/dev/null,force-share=off
[1] 27869 segmentation fault (core dumped) ./qemu-img info -U
--image-opts driver=file,filename=/dev/null,force-share=off
Fix this by using qdict_get_str() and comparing the value as a string.
Also, when adding a force-share value to the QDict, add it as a string
so it fits the rest of the dict.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180502202051.15493-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Wed, 2 May 2018 20:20:49 +0000 (22:20 +0200)]
qemu-io: Use purely string blockdev options
Currently, qemu-io only uses string-valued blockdev options (as all are
converted directly from QemuOpts) -- with one exception: -U adds the
force-share option as a boolean. This in itself is already a bit
questionable, but a real issue is that it also assumes the value already
existing in the options QDict would be a boolean, which is wrong.
That has the following effect:
$ ./qemu-io -r -U --image-opts \
driver=file,filename=/dev/null,force-share=off
[1] 15200 segmentation fault (core dumped) ./qemu-io -r -U
--image-opts driver=file,filename=/dev/null,force-share=off
Since @opts is converted from QemuOpts, the value must be a string, and
we have to compare it as such. Consequently, it makes sense to also set
it as a string instead of a boolean.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180502202051.15493-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Wed, 2 May 2018 14:03:59 +0000 (16:03 +0200)]
block: Document BDRV_REQ_WRITE_UNCHANGED support
Add BDRV_REQ_WRITE_UNCHANGED to the list of flags honored during pwrite
and pwrite_zeroes, and also add a note on when you absolutely need to
support it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180502140359.18222-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 16:39:57 +0000 (18:39 +0200)]
qemu-img: Check post-truncation size
Some block drivers (iscsi and file-posix when dealing with device files)
do not actually support truncation, even though they provide a
.bdrv_truncate() method and will happily return success when providing a
new size that does not exceed the current size. This is because these
drivers expect the user to resize the image outside of qemu and then
provide qemu with that information through the block_resize command
(compare
cb1b83e740384b4e0d950f3d7c81c02b8ce86c2e).
Of course, anyone using qemu-img resize will find that behavior useless.
So we should check the actual size of the image after the supposedly
successful truncation took place, emit an error if nothing changed and
emit a warning if the target size was not met.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1523065
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180421163957.29872-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:29 +0000 (15:29 +0200)]
iotests: Add test for COR across nodes
COR across nodes (that is, you have some filter node between the
actually COR target and the node that performs the COR) cannot reliably
work together with the permission system when there is no explicit COR
node that can request the WRITE_UNCHANGED permission for its child.
This is because COR (currently) sneaks its requests by the usual
permission checks, so it can work without a WRITE* permission; but if
there is a filter node in between, that will re-issue the request, which
then passes through the usual check -- and if nobody has requested a
WRITE_UNCHANGED permission, that check will fail.
There is no real direct fix apart from hoping that there is someone who
has requested that permission; in case of just the qemu-io HMP command
(and no guest device), however, that is not the case. The real real fix
is to implement the copy-on-read flag through an implicitly added COR
node. Such a node can request the necessary permissions as shown in
this test.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180421132929.21610-10-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:28 +0000 (15:29 +0200)]
iotests: Copy 197 for COR filter driver
iotest 197 tests copy-on-read using the (now old) copy-on-read flag.
Copy it to 215 and modify it to use the COR filter driver instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180421132929.21610-9-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:27 +0000 (15:29 +0200)]
iotests: Clean up wrap image in 197
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id:
20180421132929.21610-8-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:26 +0000 (15:29 +0200)]
block: Support BDRV_REQ_WRITE_UNCHANGED in filters
Update the rest of the filter drivers to support
BDRV_REQ_WRITE_UNCHANGED. They already forward write request flags to
their children, so we just have to announce support for it.
This patch does not cover the replication driver because that currently
does not support flags at all, and because it just grabs the WRITE
permission for its children when it can, so we should be fine just
submitting the incoming WRITE_UNCHANGED requests as normal writes.
It also does not cover format drivers for similar reasons. They all use
bdrv_format_default_perms() as their .bdrv_child_perm() implementation
so they just always grab the WRITE permission for their file children
whenever possible. In addition, it often would be difficult to
ascertain whether incoming unchanging writes end up as unchanging writes
in their files. So we just leave them as normal potentially changing
writes.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id:
20180421132929.21610-7-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:25 +0000 (15:29 +0200)]
block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
We just need to forward it to quorum's children (except in case of a
rewrite because of corruption), but for that we first have to support
flags in child requests at all.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id:
20180421132929.21610-6-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:24 +0000 (15:29 +0200)]
block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id:
20180421132929.21610-5-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:23 +0000 (15:29 +0200)]
block: Add BDRV_REQ_WRITE_UNCHANGED flag
This flag signifies that a write request will not change the visible
disk content. With this flag set, it is sufficient to have the
BLK_PERM_WRITE_UNCHANGED permission instead of BLK_PERM_WRITE.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id:
20180421132929.21610-4-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:22 +0000 (15:29 +0200)]
block: BLK_PERM_WRITE includes ..._UNCHANGED
Currently we never actually check whether the WRITE_UNCHANGED
permission has been taken for unchanging writes. But the one check that
is commented out checks both WRITE and WRITE_UNCHANGED; and considering
that WRITE_UNCHANGED is already documented as being weaker than WRITE,
we should probably explicitly document WRITE to include WRITE_UNCHANGED.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id:
20180421132929.21610-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Sat, 21 Apr 2018 13:29:21 +0000 (15:29 +0200)]
block: Add COR filter driver
This adds a simple copy-on-read filter driver. It relies on the already
existing COR functionality in the central block layer code, which may be
moved here once we no longer need it there.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180421132929.21610-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Fri, 6 Apr 2018 15:17:31 +0000 (17:17 +0200)]
iotests: Skip 181 and 201 without userfaultfd
userfaultfd support depends on the host kernel, so it may not be
available. If so, 181 and 201 should be skipped.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180406151731.4285-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Fri, 6 Apr 2018 15:17:30 +0000 (17:17 +0200)]
iotests: Add failure matching to common.qemu
Currently, common.qemu only allows to match for results indicating
success. The only way to fail is by provoking a timeout. However,
sometimes we do have a defined failure output and can match for that,
which saves us from having to wait for the timeout in case of failure.
Because failure can sometimes just result in a _notrun in the test, it
is actually important to care about being able to fail quickly.
Also, sometimes we simply do not get any specific output in case of
success. The only way to handle this currently would be to define an
error message as the string to look for, which means that actual success
results in a timeout. This is really bad because it unnecessarily slows
down a succeeding test.
Therefore, this patch adds a new parameter $success_or_failure to
_timed_wait_for and _send_qemu_cmd. Setting this to a non-empty string
makes both commands expect two match parameters: If the first matches,
the function succeeds. If the second matches, the function fails.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180406151731.4285-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Alberto Garcia [Tue, 17 Apr 2018 12:37:05 +0000 (15:37 +0300)]
docs: Document the new default sizes of the qcow2 caches
We have just reduced the refcount cache size to the minimum unless
the user explicitly requests a larger one, so we have to update the
documentation to reflect this change.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id:
c5f0bde23558dd9d33b21fffc76ac9953cc19c56.
1523968389.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Alberto Garcia [Tue, 17 Apr 2018 12:37:04 +0000 (15:37 +0300)]
qcow2: Give the refcount cache the minimum possible size by default
The L2 and refcount caches have default sizes that can be overridden
using the l2-cache-size and refcount-cache-size (an additional
parameter named cache-size sets the combined size of both caches).
Unless forced by one of the aforementioned parameters, QEMU will set
the unspecified sizes so that the L2 cache is 4 times larger than the
refcount cache.
This is based on the premise that the refcount metadata needs to be
only a fourth of the L2 metadata to cover the same amount of disk
space. This is incorrect for two reasons:
a) The amount of disk covered by an L2 table depends solely on the
cluster size, but in the case of a refcount block it depends on
the cluster size *and* the width of each refcount entry.
The 4/1 ratio is only valid with 16-bit entries (the default).
b) When we talk about disk space and L2 tables we are talking about
guest space (L2 tables map guest clusters to host clusters),
whereas refcount blocks are used for host clusters (including
L1/L2 tables and the refcount blocks themselves). On a fully
populated (and uncompressed) qcow2 file, image size > virtual size
so there are more refcount entries than L2 entries.
Problem (a) could be fixed by adjusting the algorithm to take into
account the refcount entry width. Problem (b) could be fixed by
increasing a bit the refcount cache size to account for the clusters
used for qcow2 metadata.
However this patch takes a completely different approach and instead
of keeping a ratio between both cache sizes it assigns as much as
possible to the L2 cache and the remainder to the refcount cache.
The reason is that L2 tables are used for every single I/O request
from the guest and the effect of increasing the cache is significant
and clearly measurable. Refcount blocks are however only used for
cluster allocation and internal snapshots and in practice are accessed
sequentially in most cases, so the effect of increasing the cache is
negligible (even when doing random writes from the guest).
So, make the refcount cache as small as possible unless the user
explicitly asks for a larger one.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id:
9695182c2eb11b77cb319689a1ebaa4e7c9d6591.
1523968389.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Alberto Garcia [Tue, 10 Apr 2018 16:05:04 +0000 (18:05 +0200)]
specs/qcow2: Clarify that compressed clusters have the COPIED bit reset
Compressed clusters are not supposed to have the COPIED bit set, but
this is not made explicit in the specs, so let's document it.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id:
74552e1d6e858d3159cb0c0e188e80bc9248e337.
1523376013.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Alberto Garcia [Tue, 10 Apr 2018 16:05:03 +0000 (18:05 +0200)]
Fix error message about compressed clusters with OFLAG_COPIED
Compressed clusters are not supposed to have the COPIED bit set.
"qemu-img check" detects that and prints an error message reporting
the number of the affected host cluster. This doesn't make much sense
because compressed clusters are not aligned to host clusters, so it
would be better to report the offset instead. Plus, the calculation is
wrong and it uses the raw L2 entry as if it was simply an offset.
This patch fixes the error message and reports the offset of the
compressed cluster.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id:
0f687957feb72e80c740403191a47e607c2463fe.
1523376013.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Max Reitz [Fri, 6 Apr 2018 16:41:08 +0000 (18:41 +0200)]
iotests: Split 214 off of 122
Commit
abd3622cc03cf41ed542126a540385f30a4c0175 added a case to 122
regarding how the qcow2 driver handles an incorrect compressed data
length value. This does not really fit into 122, as that file is
supposed to contain qemu-img convert test cases, which this case is not.
So this patch splits it off into its own file; maybe we will even get
more qcow2-only compression tests in the future.
Also, that test case does not work with refcount_bits=1, so mark that
option as unsupported.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id:
20180406164108.26118-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Kevin Wolf [Fri, 19 Jan 2018 14:54:40 +0000 (15:54 +0100)]
blockjob: Add block_job_driver()
The backup block job directly accesses the driver field in BlockJob. Add
a wrapper for getting it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Kevin Wolf [Thu, 18 Jan 2018 20:19:38 +0000 (21:19 +0100)]
blockjob: Introduce block_job_ratelimit_get_delay()
This gets us rid of more direct accesses to BlockJob fields from the
job drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Kevin Wolf [Thu, 18 Jan 2018 19:25:40 +0000 (20:25 +0100)]
blockjob: Implement block_job_set_speed() centrally
All block job drivers support .set_speed and all of them duplicate the
same code to implement it. Move that code to blockjob.c and remove the
now useless callback.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Kevin Wolf [Thu, 18 Jan 2018 19:20:24 +0000 (20:20 +0100)]
blockjob: Move RateLimit to BlockJob
Every block job has a RateLimit, and they all do the exact same thing
with it, so it should be common infrastructure. Move the struct field
for a start.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Kevin Wolf [Thu, 18 Jan 2018 17:08:22 +0000 (18:08 +0100)]
blockjob: Wrappers for progress counter access
Block job drivers are not expected to mess with the internals of the
BlockJob object, so provide wrapper functions for one of the cases where
they still do it: Updating the progress counter.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Kevin Wolf [Tue, 8 May 2018 09:55:30 +0000 (11:55 +0200)]
blockjob: Fix assertion in block_job_finalize()
Every job gets a non-NULL job->txn on creation, but it doesn't
necessarily keep it until it is decommissioned: Finalising a job removes
it from its transaction. Therefore, calling 'blockdev-job-finalize' a
second time on an already concluded job causes an assertion failure.
Remove job->txn from the assertion in block_job_finalize() to fix this.
block_job_do_finalize() still has the same assertion, but if a job is
already removed from its transaction, block_job_apply_verb() will
already error out before we run into that assertion.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
John Snow [Tue, 8 May 2018 23:36:59 +0000 (19:36 -0400)]
blockjob: expose error string via query
When we've reached the concluded state, we need to expose the error
state if applicable. Add the new field.
This should be sufficient for determining if a job completed
successfully or not after concluding; if we want to discriminate
based on how it failed more mechanically, we can always add an
explicit return code enumeration later.
I didn't bother to make it only show up if we are in the concluded
state; I don't think it's necessary.
Cc: qemu-stable@nongnu.org
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alberto Garcia [Fri, 9 Mar 2018 14:11:07 +0000 (16:11 +0200)]
hmp: Allow using a qdev id in block_set_io_throttle
The QMP version of this command can take a qdev ID since
7a9877a02635,
but the HMP version is still using the deprecated block device name so
there's no way to refer to a block device added like this:
-blockdev node-name=disk0,driver=qcow2,file.driver=file,file.filename=hd.qcow2
-device virtio-blk-pci,id=virtio-blk-pci0,drive=disk0
This patch works around this problem by using the specified name as a
qdev ID if the block device name is not found.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Tue, 24 Apr 2018 22:01:57 +0000 (17:01 -0500)]
block: Merge .bdrv_co_writev{,_flags} in drivers
We have too many driver callback interfaces; simplify the mess
somewhat by merging the flags parameter of .bdrv_co_writev_flags()
into .bdrv_co_writev(). Note that as long as a driver doesn't set
.supported_write_flags, the flags argument will be 0 and behavior is
identical. Also note that the public function bdrv_co_writev() still
lacks a flags argument; so the driver signature is thus intentionally
slightly different. But that's not the end of the world, nor the first
time that the driver interface differs slightly from the public
interface.
Ideally, we should be rewriting all of these drivers to use modern
byte-based interfaces. But that's a more invasive patch to write
and audit, compared to the simplification done here.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Tue, 24 Apr 2018 19:25:06 +0000 (14:25 -0500)]
block: Drop last of the sector-based aio callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based. Now that all drivers with aio callbacks are using the
byte-based interfaces, we can remove the sector-based versions.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Tue, 24 Apr 2018 19:25:05 +0000 (14:25 -0500)]
vxhs: Switch to byte-based callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based. Make the change for the last few sector-based callbacks
in the vxhs driver.
Note that the driver was already using byte-based calls for
performing actual I/O, so this just gets rid of a round trip
of scaling; however, as I don't know if VxHS is tolerant of
non-sector AIO operations, I went with the conservative approach
of adding .bdrv_refresh_limits to override the block layer
defaults back to the pre-patch value of 512.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Tue, 24 Apr 2018 19:25:04 +0000 (14:25 -0500)]
rbd: Switch to byte-based callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based. Make the change for the last few sector-based callbacks
in the rbd driver.
Note that the driver was already using byte-based calls for
performing actual I/O, so this just gets rid of a round trip
of scaling; however, as I don't know if RBD is tolerant of
non-sector AIO operations, I went with the conservate approach
of adding .bdrv_refresh_limits to override the block layer
defaults back to the pre-patch value of 512.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Tue, 24 Apr 2018 19:25:03 +0000 (14:25 -0500)]
null: Switch to byte-based read/write
We are gradually moving away from sector-based interfaces, towards
byte-based. Make the change for the last few sector-based callbacks
in the null-co and null-aio drivers.
Note that since the null driver does nothing on writes, it trivially
supports the BDRV_REQ_FUA flag (all writes have already landed to
the same bit-bucket without needing an extra flush call). Also, since
the null driver does just as well with byte-based requests, we can
now avoid cycles wasted on read-modify-write by taking advantage of
the block layer now defaulting the alignment to 1 instead of 512.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Tue, 24 Apr 2018 19:25:02 +0000 (14:25 -0500)]
file-win32: Switch to byte-based callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based. Make the change for the last few sector-based callbacks
in the file-win32 driver.
Note that the driver was already using byte-based calls for
performing actual I/O, so this just gets rid of a round trip
of scaling; however, as I don't know if Windows is tolerant of
non-sector AIO operations, I went with the conservative approach
of modifying .bdrv_refresh_limits to override the block layer
defaults back to the pre-patch value of 512.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Tue, 24 Apr 2018 19:25:01 +0000 (14:25 -0500)]
block: Support byte-based aio callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based. Add new sector-based aio callbacks for read and write,
to match the fact that bdrv_aio_pdiscard is already byte-based.
Ideally, drivers should be converted to use coroutine callbacks
rather than aio; but that is not quite as trivial (and if we were
to do that conversion, the null-aio driver would disappear), so for
the short term, converting the signature but keeping things with
aio is easier. However, we CAN declare that a driver that uses
the byte-based aio interfaces now defaults to byte-based
operations, and must explicitly provide a refresh_limits override
to stick with larger alignments (making the alignment issues more
obvious directly in the drivers touched in the next few patches).
Once all drivers are converted, the sector-based aio callbacks will
be removed; in the meantime, a FIXME comment is added due to a
slight inefficiency that will be touched up as part of that later
cleanup.
Simplify some instances of 'bs->drv' into 'drv' while touching this,
since the local variable already exists to reduce typing.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Daniel Henrique Barboza [Tue, 27 Mar 2018 13:08:46 +0000 (10:08 -0300)]
block-backend: simplify blk_get_aio_context
blk_get_aio_context verifies if BlockDriverState bs is not NULL,
return bdrv_get_aio_context(bs) if true or qemu_get_aio_context()
otherwise. However, bdrv_get_aio_context from block.c already does
this verification itself, also returning qemu_get_aio_context()
if bs is NULL:
AioContext *bdrv_get_aio_context(BlockDriverState *bs)
{
return bs ? bs->aio_context : qemu_get_aio_context();
}
This patch simplifies blk_get_aio_context to simply call
bdrv_get_aio_context instead of replicating the same logic.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Maydell [Tue, 15 May 2018 14:07:34 +0000 (15:07 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-
20180515' into staging
target-arm queue:
* Fix coverity nit in int_to_float code
* Don't set Invalid for float-to-int(MAXINT)
* Fix fp_status_f16 tininess before rounding
* Add various missing insns from the v8.2-FP16 extension
* Fix sqrt_f16 exception raising
* sdcard: Correct CRC16 offset in sd_function_switch()
* tcg: Optionally log FPU state in TCG -d cpu logging
# gpg: Signature made Tue 15 May 2018 15:06:09 BST
# gpg: using RSA key
3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-
20180515:
tcg: Optionally log FPU state in TCG -d cpu logging
sdcard: Correct CRC16 offset in sd_function_switch()
target/arm: Fix sqrt_f16 exception raising
target/arm: Implement FMOV (immediate) for fp16
target/arm: Implement FCSEL for fp16
target/arm: Implement FCMP for fp16
target/arm: Implement FP data-processing (3 source) for fp16
target/arm: Implement FP data-processing (2 source) for fp16
target/arm: Introduce and use read_fp_hreg
target/arm: Implement FCVT (scalar, fixed-point) for fp16
target/arm: Implement FCVT (scalar, integer) for fp16
target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
target/arm: Implement FMOV (general) for fp16
target/arm: Fix fp_status_f16 tininess before rounding
fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
fpu/softfloat: int_to_float ensure r fully initialised
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 15 May 2018 13:58:44 +0000 (14:58 +0100)]
tcg: Optionally log FPU state in TCG -d cpu logging
Usually the logging of the CPU state produced by -d cpu is sufficient
to diagnose problems, but sometimes you want to see the state of
the floating point registers as well. We don't want to enable that
by default as it adds a lot of extra data to the log; instead,
allow it to be optionally enabled via -d fpu.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id:
20180510130024.31678-1-peter.maydell@linaro.org
Philippe Mathieu-Daudé [Tue, 15 May 2018 13:58:44 +0000 (14:58 +0100)]
sdcard: Correct CRC16 offset in sd_function_switch()
Per the Physical Layer Simplified Spec. "4.3.10.4 Switch Function Status":
The block length is predefined to 512 bits
and "4.10.2 SD Status":
The SD Status contains status bits that are related to the SD Memory Card
proprietary features and may be used for future application-specific usage.
The size of the SD Status is one data block of 512 bit. The content of this
register is transmitted to the Host over the DAT bus along with a 16-bit CRC.
Thus the 16-bit CRC goes at offset 64.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id:
20180509060104.4458-3-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alex Bennée [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Fix sqrt_f16 exception raising
We are meant to explicitly pass fpst, not cpu_env.
Cc: qemu-stable@nongnu.org
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alex Bennée [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FMOV (immediate) for fp16
All the hard work is already done by vfp_expand_imm, we just need to
make sure we pick up the correct size.
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id:
20180512003217.9105-11-richard.henderson@linaro.org
[rth: Merge unallocated_encoding check with TCGMemOp conversion.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alex Bennée [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FCSEL for fp16
These were missed out from the rest of the half-precision work.
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id:
20180512003217.9105-10-richard.henderson@linaro.org
[rth: Fix erroneous check vs type]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alex Bennée [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FCMP for fp16
These where missed out from the rest of the half-precision work.
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id:
20180512003217.9105-9-richard.henderson@linaro.org
[rth: Diagnose lack of FP16 before fp_access_check]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FP data-processing (3 source) for fp16
We missed all of the scalar fp16 fma operations.
Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FP data-processing (2 source) for fp16
We missed all of the scalar fp16 binary operations.
Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Introduce and use read_fp_hreg
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FCVT (scalar, fixed-point) for fp16
Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FCVT (scalar, integer) for fp16
Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
No sense in emitting code after the exception.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Richard Henderson [Tue, 15 May 2018 13:58:43 +0000 (14:58 +0100)]
target/arm: Implement FMOV (general) for fp16
Adding the fp16 moves to/from general registers.
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id:
20180512003217.9105-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 15 May 2018 13:58:42 +0000 (14:58 +0100)]
target/arm: Fix fp_status_f16 tininess before rounding
In commit
d81ce0ef2c4f105 we added an extra float_status field
fp_status_fp16 for Arm, but forgot to initialize it correctly
by setting it to float_tininess_before_rounding. This currently
will only cause problems for the new V8_FP16 feature, since the
float-to-float conversion code doesn't use it yet. The effect
would be that we failed to set the Underflow IEEE exception flag
in all the cases where we should.
Add the missing initialization.
Fixes: d81ce0ef2c4f105
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id:
20180512004311.9299-16-richard.henderson@linaro.org
Peter Maydell [Tue, 15 May 2018 13:58:42 +0000 (14:58 +0100)]
fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
In float-to-integer conversion, if the floating point input
converts exactly to the largest or smallest integer that
fits in to the result type, this is not an overflow.
In this situation we were producing the correct result value,
but were incorrectly setting the Invalid flag.
For example for Arm A64, "FCVTAS w0, d0" on an input of
0x41dfffffffc00000 should produce 0x7fffffff and set no flags.
Fix the boundary case to take the right half of the if()
statements.
This fixes a regression from 2.11 introduced by the softfloat
refactoring.
Cc: qemu-stable@nongnu.org
Fixes: ab52f973a50
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id:
20180510140141.12120-1-peter.maydell@linaro.org
Alex Bennée [Tue, 15 May 2018 13:58:42 +0000 (14:58 +0100)]
fpu/softfloat: int_to_float ensure r fully initialised
Reported by Coverity (CID1390635). We ensure this for uint_to_float
later on so we might as well mirror that.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 15 May 2018 11:50:06 +0000 (12:50 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/input-
20180515-pull-request' into staging
input: ps2 fixes.
# gpg: Signature made Tue 15 May 2018 10:43:20 BST
# gpg: using RSA key
4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg: aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138
* remotes/kraxel/tags/input-
20180515-pull-request:
ps2: Fix mouse stream corruption due to lost data
ps2: Clear the PS/2 queue and obey disable
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 15 May 2018 11:00:23 +0000 (12:00 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/ui-
20180515-pull-request' into staging
ui: qapi parser for -display cmd line.
gtk: multiple fixes.
sdl: opts bugfix.
vnc: magic cookie.
# gpg: Signature made Tue 15 May 2018 10:18:51 BST
# gpg: using RSA key
4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg: aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138
* remotes/kraxel/tags/ui-
20180515-pull-request:
gtk: disable the F10 menubar key
console: use linked list for QemuConsoles
ui: document non-qapi parser cases.
ui: switch gtk display to qapi parser
ui: switch trivial displays to qapi parser
ui: add qapi parser for -display
vnc: add magic cookie to VncState
ui/gtk: Only try to initialize EGL/X11 if GtkGlArea failed
gtk: make it possible to hide the menu bar
sdl2: move opts assignment into loop
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 15 May 2018 10:11:36 +0000 (11:11 +0100)]
Merge remote-tracking branch 'remotes/rth/tags/tgt-openrisc-pull-request' into staging
Convert openrisc to decodetree.py
# gpg: Signature made Mon 14 May 2018 23:25:40 BST
# gpg: using RSA key
64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/tgt-openrisc-pull-request:
target/openrisc: Merge disas_openrisc_insn
target/openrisc: Convert dec_float
target/openrisc: Convert dec_compi
target/openrisc: Convert dec_comp
target/openrisc: Convert dec_M
target/openrisc: Convert dec_logic
target/openrisc: Convert dec_mac
target/openrisc: Convert dec_calc
target/openrisc: Convert remainder of dec_misc insns
target/openrisc: Convert memory insns
target/openrisc: Convert branch insns
target/openrisc: Start conversion to decodetree.py
target-openrisc: Write back result before FPE exception
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Geoffrey McRae [Mon, 7 May 2018 13:13:12 +0000 (23:13 +1000)]
ps2: Fix mouse stream corruption due to lost data
This fixes an issue by adding bounds checking to multi-byte packets
where the PS/2 mouse data stream may become corrupted due to data being
discarded when the PS/2 ringbuffer is full.
Interrupts for Multi-byte responses are postponed until the final byte
has been queued.
These changes fix a bug where windows guests drop the mouse device
entirely requring the guest to be restarted.
Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Message-Id: <
20180507150310.
2FEA0381924@moya.office.hostfission.com>
[ kraxel: codestyle fixes ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Geoffrey McRae [Mon, 7 May 2018 13:01:46 +0000 (23:01 +1000)]
ps2: Clear the PS/2 queue and obey disable
This allows guest's to correctly reinitialize and identify the mouse
should the guest decide to re-scan or reset during mouse input events.
When the guest sends the "Identify" command, due to the PC's hardware
architecutre it is impossible to reliably determine the response from
the command amongst other streaming data, such as mouse or keyboard
events. Standard practice is for the guest to disable the device and
then issue the identify command, so this must be obeyed.
Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Message-Id: <
20180507150303.
7486B381924@moya.office.hostfission.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Peter Maydell [Tue, 15 May 2018 09:04:22 +0000 (10:04 +0100)]
Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-2.13-pull-request' into staging
# gpg: Signature made Mon 14 May 2018 19:15:02 BST
# gpg: using RSA key
F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
# gpg: aka "Laurent Vivier <laurent@vivier.eu>"
# gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C
* remotes/vivier2/tags/linux-user-for-2.13-pull-request:
linux-user: correctly align types in thunking code
linux-user: fix UNAME_MACHINE for sparc/sparc64
linux-user: add sparc/sparc64 specific errno
linux-user: fix conversion of flock/flock64 l_type field
linux-user: update sparc/syscall_nr.h to linux header 4.16
linux-user: fix flock/flock64 padding
linux-user: define correct fcntl() values for sparc
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Wu [Thu, 10 May 2018 23:07:39 +0000 (01:07 +0200)]
gtk: disable the F10 menubar key
The F10 key is used in various applications, disable it unconditionally
(do not limit it to grab mode). Note that this property is deprecated
and might be removed in the future (GTK+ commit
b082fb598d).
Fixes: https://bugs.launchpad.net/qemu/+bug/1726910
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Message-id:
20180510230739.28459-2-peter@lekensteyn.nl
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Gerd Hoffmann [Mon, 7 May 2018 09:54:24 +0000 (11:54 +0200)]
console: use linked list for QemuConsoles
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id:
20180507095424.16220-1-kraxel@redhat.com
Gerd Hoffmann [Mon, 7 May 2018 09:55:39 +0000 (11:55 +0200)]
ui: document non-qapi parser cases.
Add comments to the cases not (yet) switched
over to parse_display_qapi().
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id:
20180507095539.19584-5-kraxel@redhat.com
Gerd Hoffmann [Mon, 7 May 2018 09:55:38 +0000 (11:55 +0200)]
ui: switch gtk display to qapi parser
Drop the gtk option parser from parse_display(), so parse_display_qapi()
will handle it instead.
With this change the parser will accept gl=core and gl=es too, gtk
must catch the unsupported gles variant now.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id:
20180507095539.19584-4-kraxel@redhat.com
Gerd Hoffmann [Mon, 7 May 2018 09:55:37 +0000 (11:55 +0200)]
ui: switch trivial displays to qapi parser
Drop the option-less display types (egl-headless, curses, none) from
parse_display(), so they'll be handled by parse_display_qapi().
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id:
20180507095539.19584-3-kraxel@redhat.com