mirror of https://github.com/F-Stack/f-stack.git
161 lines
8.7 KiB
ReStructuredText
161 lines
8.7 KiB
ReStructuredText
.. BSD LICENSE
|
|
Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
|
|
All rights reserved.
|
|
|
|
Redistribution and use in source and binary forms, with or without
|
|
modification, are permitted provided that the following conditions
|
|
are met:
|
|
|
|
* Redistributions of source code must retain the above copyright
|
|
notice, this list of conditions and the following disclaimer.
|
|
* Redistributions in binary form must reproduce the above copyright
|
|
notice, this list of conditions and the following disclaimer in
|
|
the documentation and/or other materials provided with the
|
|
distribution.
|
|
* Neither the name of Intel Corporation nor the names of its
|
|
contributors may be used to endorse or promote products derived
|
|
from this software without specific prior written permission.
|
|
|
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
|
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
|
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
|
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
|
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
|
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
|
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
|
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
|
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
IVSHMEM Library
|
|
===============
|
|
|
|
The DPDK IVSHMEM library facilitates fast zero-copy data sharing among virtual machines
|
|
(host-to-guest or guest-to-guest) by means of QEMU's IVSHMEM mechanism.
|
|
|
|
The library works by providing a command line for QEMU to map several hugepages into a single IVSHMEM device.
|
|
For the guest to know what is inside any given IVSHMEM device
|
|
(and to distinguish between DPDK and non-DPDK IVSHMEM devices),
|
|
a metadata file is also mapped into the IVSHMEM segment.
|
|
No work needs to be done by the guest application to map IVSHMEM devices into memory;
|
|
they are automatically recognized by the DPDK Environment Abstraction Layer (EAL).
|
|
|
|
A typical DPDK IVSHMEM use case looks like the following.
|
|
|
|
|
|
.. figure:: img/ivshmem.*
|
|
|
|
Typical Ivshmem use case
|
|
|
|
|
|
The same could work with several virtual machines, providing host-to-VM or VM-to-VM communication.
|
|
The maximum number of metadata files is 32 (by default) and each metadata file can contain different (or even the same) hugepages.
|
|
The only constraint is that each VM has to have access to the memory it is sharing with other entities (be it host or another VM).
|
|
For example, if the user wants to share the same memzone across two VMs, each VM must have that memzone in its metadata file.
|
|
|
|
IVHSHMEM Library API Overview
|
|
-----------------------------
|
|
|
|
The following is a simple guide to using the IVSHMEM Library API:
|
|
|
|
* Call rte_ivshmem_metadata_create() to create a new metadata file.
|
|
The metadata name is used to distinguish between multiple metadata files.
|
|
|
|
* Populate each metadata file with DPDK data structures.
|
|
This can be done using the following API calls:
|
|
|
|
* rte_ivhshmem_metadata_add_memzone() to add rte_memzone to metadata file
|
|
|
|
* rte_ivshmem_metadata_add_ring() to add rte_ring to metadata file
|
|
|
|
* rte_ivshmem_metadata_add_mempool() to add rte_mempool to metadata file
|
|
|
|
* Finally, call rte_ivshmem_metadata_cmdline_generate() to generate the command line for QEMU.
|
|
Multiple metadata files (and thus multiple command lines) can be supplied to a single VM.
|
|
|
|
.. note::
|
|
|
|
Only data structures fully residing in DPDK hugepage memory work correctly.
|
|
Supported data structures created by malloc(), mmap()
|
|
or otherwise using non-DPDK memory cause undefined behavior and even a segmentation fault.
|
|
Specifically, because the memzone field in an rte_ring refers to a memzone structure residing in local memory,
|
|
accessing the memzone field in a shared rte_ring will cause an immediate segmentation fault.
|
|
|
|
IVSHMEM Environment Configuration
|
|
---------------------------------
|
|
|
|
The steps needed to successfully run IVSHMEM applications are the following:
|
|
|
|
* Compile a special version of QEMU from sources.
|
|
|
|
The source code can be found on the QEMU website (currently, version 1.4.x is supported, but version 1.5.x is known to work also),
|
|
however, the source code will need to be patched to support using regular files as the IVSHMEM memory backend.
|
|
The patch is not included in the DPDK package,
|
|
but is available on the `Intel®DPDK-vswitch project webpage <https://01.org/packet-processing/intel%C2%AE-ovdk>`_
|
|
(either separately or in a DPDK vSwitch package).
|
|
|
|
* Enable IVSHMEM library in the DPDK build configuration.
|
|
|
|
In the default configuration, IVSHMEM library is not compiled. To compile the IVSHMEM library,
|
|
one has to either use one of the provided IVSHMEM targets
|
|
(for example, x86_64-ivshmem-linuxapp-gcc),
|
|
or set CONFIG_RTE_LIBRTE_IVSHMEM to "y" in the build configuration.
|
|
|
|
* Set up hugepage memory on the virtual machine.
|
|
|
|
The guest applications run as regular DPDK (primary) processes and thus need their own hugepage memory set up inside the VM.
|
|
The process is identical to the one described in the *DPDK Getting Started Guide*.
|
|
|
|
Best Practices for Writing IVSHMEM Applications
|
|
-----------------------------------------------
|
|
|
|
When considering the use of IVSHMEM for sharing memory, security implications need to be carefully evaluated.
|
|
IVSHMEM is not suitable for untrusted guests, as IVSHMEM is essentially a window into the host process memory.
|
|
This also has implications for the multiple VM scenarios.
|
|
While the IVSHMEM library tries to share as little memory as possible,
|
|
it is quite probable that data designated for one VM might also be present in an IVSMHMEM device designated for another VM.
|
|
Consequently, any shared memory corruption will affect both host and all VMs sharing that particular memory.
|
|
|
|
IVSHMEM applications essentially behave like multi-process applications,
|
|
so it is important to implement access serialization to data and thread safety.
|
|
DPDK ring structures are already thread-safe, however,
|
|
any custom data structures that the user might need would have to be thread-safe also.
|
|
|
|
Similar to regular DPDK multi-process applications,
|
|
it is not recommended to use function pointers as functions might have different memory addresses in different processes.
|
|
|
|
It is best to avoid freeing the rte_mbuf structure on a different machine from where it was allocated,
|
|
that is, if the mbuf was allocated on the host, the host should free it.
|
|
Consequently, any packet transmission and reception should also happen on the same machine (whether virtual or physical).
|
|
Failing to do so may lead to data corruption in the mempool cache.
|
|
|
|
Despite the IVSHMEM mechanism being zero-copy and having good performance,
|
|
it is still desirable to do processing in batches and follow other procedures described in
|
|
:ref:`Performance Optimization <Performance_Optimization>`.
|
|
|
|
Best Practices for Running IVSHMEM Applications
|
|
-----------------------------------------------
|
|
|
|
For performance reasons,
|
|
it is best to pin host processes and QEMU processes to different cores so that they do not interfere with each other.
|
|
If NUMA support is enabled, it is also desirable to keep host process' hugepage memory and QEMU process on the same NUMA node.
|
|
|
|
For the best performance across all NUMA nodes, each QEMU core should be pinned to host CPU core on the appropriate NUMA node.
|
|
QEMU's virtual NUMA nodes should also be set up to correspond to physical NUMA nodes.
|
|
More on how to set up DPDK and QEMU NUMA support can be found in *DPDK Getting Started Guide* and
|
|
`QEMU documentation <http://qemu.weilnetz.de/qemu-doc.html>`_ respectively.
|
|
A script called cpu_layout.py is provided with the DPDK package (in the tools directory)
|
|
that can be used to identify which CPU cores correspond to which NUMA node.
|
|
|
|
The QEMU IVSHMEM command line creation should be considered the last step before starting the virtual machine.
|
|
Currently, there is no hot plug support for QEMU IVSHMEM devices,
|
|
so one cannot add additional memory to an IVSHMEM device once it has been created.
|
|
Therefore, the correct sequence to run an IVSHMEM application is to run host application first,
|
|
obtain the command lines for each IVSHMEM device and then run all QEMU instances with guest applications afterwards.
|
|
|
|
It is important to note that once QEMU is started, it holds on to the hugepages it uses for IVSHMEM devices.
|
|
As a result, if the user wishes to shut down or restart the IVSHMEM host application,
|
|
it is not enough to simply shut the application down.
|
|
The virtual machine must also be shut down (if not, it will hold onto outdated host data).
|