--- /dev/null
+Using XSTATE features in user space applications
+================================================
+
+The x86 architecture supports floating-point extensions which are
+enumerated via CPUID. Applications consult CPUID and use XGETBV to
+evaluate which features have been enabled by the kernel XCR0.
+
+Up to AVX-512 and PKRU states, these features are automatically enabled by
+the kernel if available. Features like AMX TILE_DATA (XSTATE component 18)
+are enabled by XCR0 as well, but the first use of related instruction is
+trapped by the kernel because by default the required large XSTATE buffers
+are not allocated automatically.
+
++The purpose for dynamic features
++--------------------------------
++
++Legacy userspace libraries often have hard-coded, static sizes for
++alternate signal stacks, often using MINSIGSTKSZ which is typically 2KB.
++That stack must be able to store at *least* the signal frame that the
++kernel sets up before jumping into the signal handler. That signal frame
++must include an XSAVE buffer defined by the CPU.
++
++However, that means that the size of signal stacks is dynamic, not static,
++because different CPUs have differently-sized XSAVE buffers. A compiled-in
++size of 2KB with existing applications is too small for new CPU features
++like AMX. Instead of universally requiring larger stack, with the dynamic
++enabling, the kernel can enforce userspace applications to have
++properly-sized altstacks.
++
+Using dynamically enabled XSTATE features in user space applications
+--------------------------------------------------------------------
+
+The kernel provides an arch_prctl(2) based mechanism for applications to
+request the usage of such features. The arch_prctl(2) options related to
+this are:
+
+-ARCH_GET_XCOMP_SUPP
+
+ arch_prctl(ARCH_GET_XCOMP_SUPP, &features);
+
+ ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of
+ type uint64_t. The second argument is a pointer to that storage.
+
+-ARCH_GET_XCOMP_PERM
+
+ arch_prctl(ARCH_GET_XCOMP_PERM, &features);
+
+ ARCH_GET_XCOMP_PERM stores the features for which the userspace process
+ has permission in userspace storage of type uint64_t. The second argument
+ is a pointer to that storage.
+
+-ARCH_REQ_XCOMP_PERM
+
+ arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr);
+
+ ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled
+ feature or a feature set. A feature set can be mapped to a facility, e.g.
+ AMX, and can require one or more XSTATE components to be enabled.
+
+ The feature argument is the number of the highest XSTATE component which
+ is required for a facility to work.
+
+When requesting permission for a feature, the kernel checks the
+availability. The kernel ensures that sigaltstacks in the process's tasks
+are large enough to accommodate the resulting large signal frame. It
+enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent
+sigaltstack(2) calls. If an installed sigaltstack is smaller than the
+resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also,
+sigaltstack(2) results in -ENOMEM if the requested altstack is too small
+for the permitted features.
+
+Permission, when granted, is valid per process. Permissions are inherited
+on fork(2) and cleared on exec(3).
+
+The first use of an instruction related to a dynamically enabled feature is
+trapped by the kernel. The trap handler checks whether the process has
+permission to use the feature. If the process has no permission then the
+kernel sends SIGILL to the application. If the process has permission then
+the handler allocates a larger xstate buffer for the task so the large
+state can be context switched. In the unlikely cases that the allocation
+fails, the kernel sends SIGSEGV.
+
++AMX TILE_DATA enabling example
++^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
++
++Below is the example of how userspace applications enable
++TILE_DATA dynamically:
++
++ 1. The application first needs to query the kernel for AMX
++ support::
++
++ #include <asm/prctl.h>
++ #include <sys/syscall.h>
++ #include <stdio.h>
++ #include <unistd.h>
++
++ #ifndef ARCH_GET_XCOMP_SUPP
++ #define ARCH_GET_XCOMP_SUPP 0x1021
++ #endif
++
++ #ifndef ARCH_XCOMP_TILECFG
++ #define ARCH_XCOMP_TILECFG 17
++ #endif
++
++ #ifndef ARCH_XCOMP_TILEDATA
++ #define ARCH_XCOMP_TILEDATA 18
++ #endif
++
++ #define MASK_XCOMP_TILE ((1 << ARCH_XCOMP_TILECFG) | \
++ (1 << ARCH_XCOMP_TILEDATA))
++
++ unsigned long features;
++ long rc;
++
++ ...
++
++ rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_SUPP, &features);
++
++ if (!rc && (features & MASK_XCOMP_TILE) == MASK_XCOMP_TILE)
++ printf("AMX is available.\n");
++
++ 2. After that, determining support for AMX, an application must
++ explicitly ask permission to use it::
++
++ #ifndef ARCH_REQ_XCOMP_PERM
++ #define ARCH_REQ_XCOMP_PERM 0x1023
++ #endif
++
++ ...
++
++ rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, ARCH_XCOMP_TILEDATA);
++
++ if (!rc)
++ printf("AMX is ready for use.\n");
++
++Note this example does not include the sigaltstack preparation.
++
+Dynamic features in signal frames
+---------------------------------
+
+Dynamcally enabled features are not written to the signal frame upon signal
+entry if the feature is in its initial configuration. This differs from
+non-dynamic features which are always written regardless of their
+configuration. Signal handlers can examine the XSAVE buffer's XSTATE_BV
+field to determine if a features was written.
++
++Dynamic features for virtual machines
++-------------------------------------
++
++The permission for the guest state component needs to be managed separately
++from the host, as they are exclusive to each other. A coupled of options
++are extended to control the guest permission:
++
++-ARCH_GET_XCOMP_GUEST_PERM
++
++ arch_prctl(ARCH_GET_XCOMP_GUEST_PERM, &features);
++
++ ARCH_GET_XCOMP_GUEST_PERM is a variant of ARCH_GET_XCOMP_PERM. So it
++ provides the same semantics and functionality but for the guest
++ components.
++
++-ARCH_REQ_XCOMP_GUEST_PERM
++
++ arch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, feature_nr);
++
++ ARCH_REQ_XCOMP_GUEST_PERM is a variant of ARCH_REQ_XCOMP_PERM. It has the
++ same semantics for the guest permission. While providing a similar
++ functionality, this comes with a constraint. Permission is frozen when the
++ first VCPU is created. Any attempt to change permission after that point
++ is going to be rejected. So, the permission has to be requested before the
++ first VCPU creation.
++
++Note that some VMMs may have already established a set of supported state
++components. These options are not presumed to support any particular VMM.