mirror of https://github.com/F-Stack/f-stack.git
551 lines
19 KiB
ReStructuredText
551 lines
19 KiB
ReStructuredText
.. SPDX-License-Identifier: BSD-3-Clause
|
|
Copyright(C) 2021 Marvell.
|
|
|
|
CNXK Poll Mode driver
|
|
=====================
|
|
|
|
The CNXK ETHDEV PMD (**librte_net_cnxk**) provides poll mode ethdev driver
|
|
support for the inbuilt network device found in **Marvell OCTEON CN9K/CN10K**
|
|
SoC family as well as for their virtual functions (VF) in SR-IOV context.
|
|
|
|
More information can be found at `Marvell Official Website
|
|
<https://www.marvell.com/embedded-processors/infrastructure-processors>`_.
|
|
|
|
Features
|
|
--------
|
|
|
|
Features of the CNXK Ethdev PMD are:
|
|
|
|
- Packet type information
|
|
- Promiscuous mode
|
|
- Jumbo frames
|
|
- SR-IOV VF
|
|
- Lock-free Tx queue
|
|
- Multiple queues for TX and RX
|
|
- Receiver Side Scaling (RSS)
|
|
- MAC filtering
|
|
- Generic flow API
|
|
- Inner and Outer Checksum offload
|
|
- Port hardware statistics
|
|
- Link state information
|
|
- Link flow control
|
|
- MTU update
|
|
- Scatter-Gather IO support
|
|
- Vector Poll mode driver
|
|
- Debug utilities - Context dump and error interrupt support
|
|
- Support Rx interrupt
|
|
- Inline IPsec processing support
|
|
- Ingress meter support
|
|
- Queue based priority flow control support
|
|
|
|
Prerequisites
|
|
-------------
|
|
|
|
See :doc:`../platform/cnxk` for setup information.
|
|
|
|
|
|
Driver compilation and testing
|
|
------------------------------
|
|
|
|
Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
|
|
for details.
|
|
|
|
#. Running testpmd:
|
|
|
|
Follow instructions available in the document
|
|
:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
|
|
to run testpmd.
|
|
|
|
Example output:
|
|
|
|
.. code-block:: console
|
|
|
|
./<build_dir>/app/dpdk-testpmd -c 0xc -a 0002:02:00.0 -- --portmask=0x1 --nb-cores=1 --port-topology=loop --rxq=1 --txq=1
|
|
EAL: Detected 4 lcore(s)
|
|
EAL: Detected 1 NUMA nodes
|
|
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
|
|
EAL: Selected IOVA mode 'VA'
|
|
EAL: No available hugepages reported in hugepages-16777216kB
|
|
EAL: No available hugepages reported in hugepages-2048kB
|
|
EAL: Probing VFIO support...
|
|
EAL: VFIO support initialized
|
|
EAL: using IOMMU type 1 (Type 1)
|
|
[ 2003.202721] vfio-pci 0002:02:00.0: vfio_cap_init: hiding cap 0x14@0x98
|
|
EAL: Probe PCI driver: net_cn10k (177d:a063) device: 0002:02:00.0 (socket 0)
|
|
PMD: RoC Model: cn10k
|
|
EAL: No legacy callbacks, legacy socket not created
|
|
testpmd: create a new mbuf pool <mb_pool_0>: n=155456, size=2176, socket=0
|
|
testpmd: preferred mempool ops selected: cn10k_mempool_ops
|
|
Configuring Port 0 (socket 0)
|
|
PMD: Port 0: Link Up - speed 25000 Mbps - full-duplex
|
|
|
|
Port 0: link state change event
|
|
Port 0: 96:D4:99:72:A5:BF
|
|
Checking link statuses...
|
|
Done
|
|
No commandline core given, start packet forwarding
|
|
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
|
|
Logical Core 3 (socket 0) forwards packets on 1 streams:
|
|
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
|
|
|
|
io packet forwarding packets/burst=32
|
|
nb forwarding cores=1 - nb forwarding ports=1
|
|
port 0: RX queue number: 1 Tx queue number: 1
|
|
Rx offloads=0x0 Tx offloads=0x10000
|
|
RX queue: 0
|
|
RX desc=4096 - RX free threshold=0
|
|
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
|
|
RX Offloads=0x0
|
|
TX queue: 0
|
|
TX desc=512 - TX free threshold=0
|
|
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
|
|
TX offloads=0x0 - TX RS bit threshold=0
|
|
Press enter to exit
|
|
|
|
Runtime Config Options
|
|
----------------------
|
|
|
|
- ``Rx&Tx scalar mode enable`` (default ``0``)
|
|
|
|
PMD supports both scalar and vector mode, it may be selected at runtime
|
|
using ``scalar_enable`` ``devargs`` parameter.
|
|
|
|
- ``RSS reta size`` (default ``64``)
|
|
|
|
RSS redirection table size may be configured during runtime using ``reta_size``
|
|
``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,reta_size=256
|
|
|
|
With the above configuration, reta table of size 256 is populated.
|
|
|
|
- ``Flow priority levels`` (default ``3``)
|
|
|
|
RTE Flow priority levels can be configured during runtime using
|
|
``flow_max_priority`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,flow_max_priority=10
|
|
|
|
With the above configuration, priority level was set to 10 (0-9). Max
|
|
priority level supported is 32.
|
|
|
|
- ``Reserve Flow entries`` (default ``8``)
|
|
|
|
RTE flow entries can be pre allocated and the size of pre allocation can be
|
|
selected runtime using ``flow_prealloc_size`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,flow_prealloc_size=4
|
|
|
|
With the above configuration, pre alloc size was set to 4. Max pre alloc
|
|
size supported is 32.
|
|
|
|
- ``Max SQB buffer count`` (default ``512``)
|
|
|
|
Send queue descriptor buffer count may be limited during runtime using
|
|
``max_sqb_count`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,max_sqb_count=64
|
|
|
|
With the above configuration, each send queue's descriptor buffer count is
|
|
limited to a maximum of 64 buffers.
|
|
|
|
- ``SQB slack count`` (default ``12``)
|
|
|
|
Send queue descriptor slack count added to SQB count when a Tx queue is
|
|
created, can be set using ``sqb_slack`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,sqb_slack=32
|
|
|
|
With the above configuration, each send queue's descriptor buffer count will
|
|
be increased by 32, while keeping the queue limit to default configuration.
|
|
|
|
- ``Switch header enable`` (default ``none``)
|
|
|
|
A port can be configured to a specific switch header type by using
|
|
``switch_header`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,switch_header="higig2"
|
|
|
|
With the above configuration, higig2 will be enabled on that port and the
|
|
traffic on this port should be higig2 traffic only. Supported switch header
|
|
types are "chlen24b", "chlen90b", "dsa", "exdsa", "higig2", "vlan_exdsa" and
|
|
"pre_l2".
|
|
|
|
- ``Flow pre_l2 info`` (default ``0x0/0x0/0x0``)
|
|
|
|
pre_l2 headers are custom headers placed before the ethernet header. For
|
|
parsing custom pre_l2 headers, an offset, mask within the offset and shift
|
|
direction has to be provided within the custom header that holds the size of
|
|
the custom header. This is valid only with switch header pre_l2. Maximum
|
|
supported offset range is 0 to 255 and mask range is 1 to 255 and
|
|
shift direction, 0: left shift, 1: right shift.
|
|
Info format will be "offset/mask/shift direction". All parameters has to be
|
|
in hexadecimal format and mask should be contiguous. Info can be configured
|
|
using ``flow_pre_l2_info`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,switch_header="pre_l2",flow_pre_l2_info=0x2/0x7e/0x1
|
|
|
|
With the above configuration, custom pre_l2 header will be enabled on that
|
|
port and size of the header is placed at byte offset 0x2 in the packet with
|
|
mask 0x7e and right shift will be used to get the size. That is, size will be
|
|
(pkt[0x2] & 0x7e) >> shift count. Shift count will be calculated based on
|
|
mask and shift direction. For example, if mask is 0x7c and shift direction is
|
|
1 (i.e., right shift) then the shift count will be 2, that is, absolute
|
|
position of the rightmost set bit. If the mask is 0x7c and shift direction
|
|
is 0 (i.e., left shift) then the shift count will be 1, that is, (8 - n),
|
|
where n is the absolute position of leftmost set bit.
|
|
|
|
- ``RSS tag as XOR`` (default ``0``)
|
|
|
|
The HW gives two options to configure the RSS adder i.e
|
|
|
|
* ``rss_adder<7:0> = flow_tag<7:0> ^ flow_tag<15:8> ^ flow_tag<23:16> ^ flow_tag<31:24>``
|
|
|
|
* ``rss_adder<7:0> = flow_tag<7:0>``
|
|
|
|
Latter one aligns with standard NIC behavior vs former one is a legacy
|
|
RSS adder scheme used in OCTEON 9 products.
|
|
|
|
By default, the driver runs in the latter mode.
|
|
Setting this flag to 1 to select the legacy mode.
|
|
|
|
For example to select the legacy mode(RSS tag adder as XOR)::
|
|
|
|
-a 0002:02:00.0,tag_as_xor=1
|
|
|
|
- ``Min SPI for inbound inline IPsec`` (default ``0``)
|
|
|
|
Min SPI supported for inbound inline IPsec processing can be specified by
|
|
``ipsec_in_min_spi`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,ipsec_in_min_spi=6
|
|
|
|
With the above configuration, application can enable inline IPsec processing
|
|
for inbound SA with min SPI of 6.
|
|
|
|
- ``Max SPI for inbound inline IPsec`` (default ``255``)
|
|
|
|
Max SPI supported for inbound inline IPsec processing can be specified by
|
|
``ipsec_in_max_spi`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,ipsec_in_max_spi=128
|
|
|
|
With the above configuration, application can enable inline IPsec processing
|
|
with max SPI of 128.
|
|
|
|
- ``Max SA's for outbound inline IPsec`` (default ``4096``)
|
|
|
|
Max number of SA's supported for outbound inline IPsec processing can be
|
|
specified by ``ipsec_out_max_sa`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,ipsec_out_max_sa=128
|
|
|
|
With the above configuration, application can enable inline IPsec processing
|
|
for 128 outbound SAs.
|
|
|
|
- ``Enable custom SA action`` (default ``0``)
|
|
|
|
Custom SA action can be enabled by specifying ``custom_sa_act`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,custom_sa_act=1
|
|
|
|
With the above configuration, application can enable custom SA action. This
|
|
configuration allows the potential for a MCAM entry to match many SAs,
|
|
rather than only match a single SA.
|
|
For cnxk device sa_index will be calculated based on SPI value. So, it will
|
|
be 1 to 1 mapping. By enabling this devargs and setting a MCAM rule, will
|
|
allow application to configure the sa_index as part of session create. And
|
|
later original SPI value can be updated using session update.
|
|
For example, application can set sa_index as 0 using session create as SPI value
|
|
and later can update the original SPI value (for example 0x10000001) using
|
|
session update. And create a flow rule with security action and algorithm as
|
|
RTE_PMD_CNXK_SEC_ACTION_ALG0 and sa_hi as 0x1000 and sa_lo as 0x0001.
|
|
|
|
- ``Outbound CPT LF queue size`` (default ``8200``)
|
|
|
|
Size of Outbound CPT LF queue in number of descriptors can be specified by
|
|
``outb_nb_desc`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,outb_nb_desc=16384
|
|
|
|
With the above configuration, Outbound CPT LF will be created to accommodate
|
|
at max 16384 descriptors at any given time.
|
|
|
|
- ``Outbound CPT LF count`` (default ``1``)
|
|
|
|
Number of CPT LF's to attach for Outbound processing can be specified by
|
|
``outb_nb_crypto_qs`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,outb_nb_crypto_qs=2
|
|
|
|
With the above configuration, two CPT LF's are setup and distributed among
|
|
all the Tx queues for outbound processing.
|
|
|
|
- ``Disable using inline ipsec device for inbound`` (default ``0``)
|
|
|
|
In CN10K, in event mode, driver can work in two modes,
|
|
|
|
1. Inbound encrypted traffic received by probed ipsec inline device while
|
|
plain traffic post decryption is received by ethdev.
|
|
|
|
2. Both Inbound encrypted traffic and plain traffic post decryption are
|
|
received by ethdev.
|
|
|
|
By default event mode works using inline device i.e mode ``1``.
|
|
This behaviour can be changed to pick mode ``2`` by using
|
|
``no_inl_dev`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,no_inl_dev=1 -a 0002:03:00.0,no_inl_dev=1
|
|
|
|
With the above configuration, inbound encrypted traffic from both the ports
|
|
is received by ipsec inline device.
|
|
|
|
- ``Inline IPsec device channel and mask`` (default ``none``)
|
|
|
|
Set channel and channel mask configuration for the inline IPSec device. This
|
|
will be used when creating flow rules with RTE_FLOW_ACTION_TYPE_SECURITY
|
|
action.
|
|
|
|
By default, RTE Flow API sets the channel number of the port on which the
|
|
rule is created in the MCAM entry and matches it exactly. This behaviour can
|
|
be modified using the ``inl_cpt_channel`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:1d:00.0,inl_cpt_channel=0x100/0xf00
|
|
|
|
With the above configuration, RTE Flow rules API will set the channel
|
|
and channel mask as 0x100 and 0xF00 in the MCAM entries of the flow rules
|
|
created with RTE_FLOW_ACTION_TYPE_SECURITY action. Since channel number is
|
|
set with this custom mask, inbound encrypted traffic from all ports with
|
|
matching channel number pattern will be directed to the inline IPSec device.
|
|
|
|
- ``SDP device channel and mask`` (default ``none``)
|
|
Set channel and channel mask configuration for the SDP device. This
|
|
will be used when creating flow rules on the SDP device.
|
|
|
|
By default, for rules created on the SDP device, the RTE Flow API sets the
|
|
channel number and mask to cover the entire SDP channel range in the channel
|
|
field of the MCAM entry. This behaviour can be modified using the
|
|
``sdp_channel_mask`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:1d:00.0,sdp_channel_mask=0x700/0xf00
|
|
|
|
With the above configuration, RTE Flow rules API will set the channel
|
|
and channel mask as 0x700 and 0xF00 in the MCAM entries of the flow rules
|
|
created on the SDP device. This option needs to be used when more than one
|
|
SDP interface is in use and RTE Flow rules created need to distinguish
|
|
between traffic from each SDP interface. The channel and mask combination
|
|
specified should match all the channels(or rings) configured on the SDP
|
|
interface.
|
|
|
|
.. note::
|
|
|
|
Above devarg parameters are configurable per device, user needs to pass the
|
|
parameters to all the PCIe devices if application requires to configure on
|
|
all the ethdev ports.
|
|
|
|
Limitations
|
|
-----------
|
|
|
|
``mempool_cnxk`` external mempool handler dependency
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The OCTEON CN9K/CN10K SoC family NIC has inbuilt HW assisted external mempool manager.
|
|
``net_cnxk`` PMD only works with ``mempool_cnxk`` mempool handler
|
|
as it is performance wise most effective way for packet allocation and Tx buffer
|
|
recycling on OCTEON 9 SoC platform.
|
|
|
|
CRC stripping
|
|
~~~~~~~~~~~~~
|
|
|
|
The OCTEON CN9K/CN10K SoC family NICs strip the CRC for every packet being received by
|
|
the host interface irrespective of the offload configuration.
|
|
|
|
RTE flow GRE support
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ``RTE_FLOW_ITEM_TYPE_GRE_KEY`` works only when checksum and routing
|
|
bits in the GRE header are equal to 0.
|
|
|
|
RTE flow action represented_port support
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ``RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT`` only works between a PF and its VFs.
|
|
|
|
RTE flow action port_id support
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ``RTE_FLOW_ACTION_TYPE_PORT_ID`` is only supported between PF and its VFs.
|
|
|
|
Custom protocols supported in RTE Flow
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The ``RTE_FLOW_ITEM_TYPE_RAW`` can be used to parse the below custom protocols.
|
|
|
|
* ``vlan_exdsa`` and ``exdsa`` can be parsed at L2 level.
|
|
* ``NGIO`` can be parsed at L3 level.
|
|
|
|
For ``vlan_exdsa`` and ``exdsa``, the port has to be configured with the
|
|
respective switch header.
|
|
|
|
For example::
|
|
|
|
-a 0002:02:00.0,switch_header="vlan_exdsa"
|
|
|
|
The below fields of ``struct rte_flow_item_raw`` shall be used to specify the
|
|
pattern.
|
|
|
|
- ``relative`` Selects the layer at which parsing is done.
|
|
|
|
- 0 for ``exdsa`` and ``vlan_exdsa``.
|
|
|
|
- 1 for ``NGIO``.
|
|
|
|
- ``offset`` The offset in the header where the pattern should be matched.
|
|
- ``length`` Length of the pattern.
|
|
- ``pattern`` Pattern as a byte string.
|
|
|
|
Example usage in testpmd::
|
|
|
|
./dpdk-testpmd -c 3 -w 0002:02:00.0,switch_header=exdsa -- -i \
|
|
--rx-offloads=0x00080000 --rxq 8 --txq 8
|
|
testpmd> flow create 0 ingress pattern eth / raw relative is 0 pattern \
|
|
spec ab pattern mask ab offset is 4 / end actions queue index 1 / end
|
|
|
|
RTE Flow mark item support
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ``RTE_FLOW_ITEM_TYPE_MARK`` can be used to create ingress flow rules to match
|
|
packets from CPT(second pass packets). When mark item type is used, it should
|
|
be the first item in the patterns specification.
|
|
|
|
Inline device support for CN10K
|
|
-------------------------------
|
|
|
|
CN10K HW provides a misc device Inline device that supports ethernet devices in
|
|
providing following features.
|
|
|
|
- Aggregate all the inline IPsec inbound traffic from all the CN10K ethernet
|
|
devices to be processed by the single inline IPSec device. This allows
|
|
single rte security session to accept traffic from multiple ports.
|
|
|
|
- Support for event generation on outbound inline IPsec processing errors.
|
|
|
|
- Support CN106xx poll mode of operation for inline IPSec inbound processing.
|
|
|
|
Inline IPsec device is identified by PCI PF vendid:devid ``177D:A0F0`` or
|
|
VF ``177D:A0F1``.
|
|
|
|
Runtime Config Options for inline device
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ``Min SPI for inbound inline IPsec`` (default ``0``)
|
|
|
|
Min SPI supported for inbound inline IPsec processing can be specified by
|
|
``ipsec_in_min_spi`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:1d:00.0,ipsec_in_min_spi=6
|
|
|
|
With the above configuration, application can enable inline IPsec processing
|
|
for inbound SA with min SPI of 6 for traffic aggregated on inline device.
|
|
|
|
- ``Max SPI for inbound inline IPsec`` (default ``255``)
|
|
|
|
Max SPI supported for inbound inline IPsec processing can be specified by
|
|
``ipsec_in_max_spi`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:1d:00.0,ipsec_in_max_spi=128
|
|
|
|
With the above configuration, application can enable inline IPsec processing
|
|
for inbound SA with max SPI of 128 for traffic aggregated on inline device.
|
|
|
|
- ``Count of meta buffers for inline inbound IPsec second pass``
|
|
|
|
Number of meta buffers allocated for inline inbound IPsec second pass can
|
|
be specified by ``nb_meta_bufs`` ``devargs`` parameter. Default value is
|
|
computed runtime based on pkt mbuf pools created and in use. Number of meta
|
|
buffers should be at least equal to aggregated number of packet buffers of all
|
|
packet mbuf pools in use by Inline IPsec enabled ethernet devices.
|
|
|
|
For example::
|
|
|
|
-a 0002:1d:00.0,nb_meta_bufs=1024
|
|
|
|
With the above configuration, PMD would enable inline IPsec processing
|
|
for inbound with 1024 meta buffers available for second pass.
|
|
|
|
- ``Meta buffer size for inline inbound IPsec second pass``
|
|
|
|
Size of meta buffer allocated for inline inbound IPsec second pass can
|
|
be specified by ``meta_buf_sz`` ``devargs`` parameter. Default value is
|
|
computed runtime based on pkt mbuf pools created and in use.
|
|
|
|
For example::
|
|
|
|
-a 0002:1d:00.0,meta_buf_sz=512
|
|
|
|
With the above configuration, PMD would allocate meta buffers of size 512 for
|
|
inline inbound IPsec processing second pass.
|
|
|
|
- ``Inline Outbound soft expiry poll frequency in usec`` (default ``100``)
|
|
|
|
Soft expiry poll frequency for Inline Outbound sessions can be specified by
|
|
``soft_exp_poll_freq`` ``devargs`` parameter.
|
|
|
|
For example::
|
|
|
|
-a 0002:1d:00.0,soft_exp_poll_freq=1000
|
|
|
|
With the above configuration, driver would poll for soft expiry events every
|
|
1000 usec.
|
|
|
|
Debugging Options
|
|
-----------------
|
|
|
|
.. _table_cnxk_ethdev_debug_options:
|
|
|
|
.. table:: cnxk ethdev debug options
|
|
|
|
+---+------------+-------------------------------------------------------+
|
|
| # | Component | EAL log command |
|
|
+===+============+=======================================================+
|
|
| 1 | NIX | --log-level='pmd\.net.cnxk,8' |
|
|
+---+------------+-------------------------------------------------------+
|
|
| 2 | NPC | --log-level='pmd\.net.cnxk\.flow,8' |
|
|
+---+------------+-------------------------------------------------------+
|