Linux* Driver for Intel(R) Ethernet Adaptive Virtual Function
=============================================================

April 2, 2021


Contents
========
- Overview
- Building and Installation
- Command Line Parameters
- Additional Configurations
- Performance Optimization
- Known Issues
- Support
- License


Overview
========
The virtual function (VF) driver supports virtual functions generated by the
physical function (PF) driver, with one or more VFs enabled through sysfs.

The associated PF drivers for this VF driver are:
- ice
- i40e

SR-IOV requires the correct platform and OS support.

The guest OS loading this driver must support MSI-X interrupts.


For questions related to hardware requirements, refer to the documentation
supplied with your Intel adapter. All hardware requirements listed apply to use
with Linux.

Driver information can be obtained using ethtool, lspci, and ip. Instructions
on updating ethtool can be found in the section Additional Configurations later
in this document.


Adaptive Virtual Function
-------------------------
Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to
adapt to changing feature sets of the physical function driver (PF) with which
it is associated. This allows system administrators to update a PF without
having to update all the VFs associated with it. All AVFs have a single common
device ID and branding string.

AVFs have a minimum set of features known as "base mode," but may provide
additional features depending on what features are available in the PF with
which the AVF is associated. The following are base mode features:

- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
  for Tx/Rx
- iavf descriptors and ring format
- Descriptor write-back completion
- 1 control queue, with iavf descriptors, CSRs and ring format
- 5 MSI-X interrupt vectors and corresponding iavf CSRs
- 1 Interrupt Throttle Rate (ITR) index
- 1 Virtual Station Interface (VSI) per VF
- 1 Traffic Class (TC), TC0
- Receive Side Scaling (RSS) with 64 entry indirection table and key,
  configured through the PF
- 1 unicast MAC address reserved per VF
- 16 MAC address filters for each VF
- Stateless offloads - non-tunneled checksums
- AVF device ID
- HW mailbox is used for VF to PF communications (including on Windows)


Identifying Your Adapter
========================
This driver is compatible with virtual functions bound to devices based on the
following:
  * Intel(R) Ethernet Controller E810-C
  * Intel(R) Ethernet Controller E810-XXV
  * Intel(R) Ethernet Controller X710
  * Intel(R) Ethernet Controller XL710
  * Intel(R) Ethernet Network Connection X722
  * Intel(R) Ethernet Controller XXV710
  * Intel(R) Ethernet Controller V710

For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
http://www.intel.com/support


Building and Installation
=========================

To build a binary RPM package of this driver
--------------------------------------------
Note: RPM functionality has only been tested in Red Hat distributions.

1. Run the following command, where <x.x.x> is the version number for the
   driver tar file.

   # rpmbuild -tb iavf-<x.x.x>.tar.gz

   NOTE: For the build to work properly, the currently running kernel MUST
   match the version and configuration of the installed kernel sources. If
   you have just recompiled the kernel, reboot the system before building.

2. After building the RPM, the last few lines of the tool output contain the
   location of the RPM file that was built. Install the RPM with one of the
   following commands, where <RPM> is the location of the RPM file:

   # rpm -Uvh <RPM>
       or
   # dnf/yum localinstall <RPM>

NOTES:
- To compile the driver on some kernel/arch combinations, you may need to
install a package with the development version of libelf (e.g. libelf-dev,
libelf-devel, elfutilsl-libelf-devel).
- When compiling an out-of-tree driver, details will vary by distribution.
However, you will usually need a kernel-devel RPM or some RPM that provides the
kernel headers at a minimum. The RPM kernel-devel will usually fill in the link
at /lib/modules/'uname -r'/build.


To manually build the driver
----------------------------
1. Move the virtual function driver tar file to the directory of your choice.
   For example, use '/home/username/iavf' or '/usr/local/src/iavf'.

2. Untar/unzip the archive, where <x.x.x> is the version number for the
   driver tar file:

   # tar zxf iavf-<x.x.x>.tar.gz

3. Change to the driver src directory, where <x.x.x> is the version number
   for the driver tar:

   # cd iavf-<x.x.x>/src/

4. Compile the driver module:

   # make install

   The binary will be installed as:
   /lib/modules/<KERNEL VER>/updates/drivers/net/ethernet/intel/iavf/iavf.ko

   The install location listed above is the default location. This may differ
   for various Linux distributions.

5. Load the module using the modprobe command.

   To check the version of the driver and then load it:

   # modinfo iavf
   # modprobe iavf

   Alternately, make sure that any older iavf drivers are removed from the
   kernel before loading the new module:

   # rmmod iavf; modprobe iavf

6. Assign an IP address to the interface by entering the following,
   where <ethX> is the interface name that was shown in dmesg after modprobe:

   # ip address add <IP_address>/<netmask bits> dev <ethX>

7. Verify that the interface works. Enter the following, where IP_address
   is the IP address for another machine on the same subnet as the interface
   that is being tested:

   # ping <IP_address>


Command Line Parameters
=======================

The iavf driver does not support any command line parameters.


Additional Features and Configurations
======================================

Viewing Link Messages
---------------------
Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
your console, set dmesg to eight by entering the following:

# dmesg -n 8

NOTE: This setting is not saved across reboots.


ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest ethtool
version is required for this functionality. Download it at:
https://kernel.org/pub/software/network/ethtool/


Setting VLAN Tag Stripping
--------------------------
If you have applications that require Virtual Functions (VFs) to receive
packets with VLAN tags, you can disable VLAN tag stripping for the VF. The
Physical Function (PF) processes requests issued from the VF to enable or
disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF,
then requests from that VF to set VLAN tag stripping will be ignored.

To enable/disable VLAN tag stripping for a VF, issue the following command
from inside the VM in which you are running the VF:

# ethtool -K <ethX> rxvlan on/off

    or alternatively:

# ethtool --offload <ethX> rxvlan on/off


IEEE 802.1ad (QinQ) Support
---------------------------
The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
allow L2 tunneling and the ability to separate traffic within a particular VLAN
ID, among other uses.

The following are examples of how to configure 802.1ad (QinQ):

# ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
# ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
  Where "24" and "371" are example VLAN IDs.

NOTES:
- 802.1ad (QinQ) is supported in 3.19 and later kernels.
- VLAN protocols use the following EtherTypes:
    802.1Q = EtherType 0x8100
    802.1ad = EtherType 0x88A8


Double VLANs
------------
Devices based on the Intel(R) Ethernet 800 Series can process up to two VLANs
in a packet when all the following are installed:
- ice driver version 1.4.0 or later
- NVM version 2.4 or later
- ice DDP package version 1.3.21 or later
If you don't use the versions above, the only supported VLAN configuration is
single 802.1Q VLAN traffic.

When two VLAN tags are present in a packet, the outer VLAN tag can be either
802.1Q or 802.1ad. The inner VLAN tag must always be 802.1Q.

Note the following limitations:
- For each VF, the PF can only allow VLAN hardware offloads (insertion and
stripping) of one type, either 802.1Q or 802.1ad.

To enable outer or single 802.1Q VLAN insertion and stripping and disable
802.1ad VLAN insertion and stripping:
# ethtool -K <ethX> rxvlan on txvlan on rx-vlan-stag-hw-parse off
tx-vlan-stag-hw-insert off

To enable outer or single 802.1ad VLAN insertion and stripping and disable
802.1Q VLAN insertion and stripping:
# ethtool -K <ethX> rxvlan off txvlan off rx-vlan-stag-hw-parse on
tx-vlan-stag-hw-insert on

To enable outer or single VLAN filtering if the VF supports modifying VLAN
filtering:
# ethtool -K <ethX> rx-vlan-filter on rx-vlan-stag-filter on

To disable outer or single VLAN filtering if the VF supports modifying VLAN
filtering:
# ethtool -K <ethX> rx-vlan-filter off rx-vlan-stag-filter off


Combining QinQ with SR-IOV VFs
------------------------------
We recommend you always configure a port VLAN for the VF from the PF. If a port
VLAN is not configured, the VF driver may only offload VLANs via software. The
PF allows all VLAN traffic to reach the VF, and the VF manages all VLAN traffic.

When the device is configured for double VLANs and the PF has configured a port
VLAN:
- The VF can only offload guest VLANs for 802.1Q traffic.
- The VF can only configure VLAN filtering rules for guest VLANs using 802.1Q
traffic.

However, when the device is configured for double VLANs and the PF has NOT
configured a port VLAN:
- You must use iavf driver version 4.1.0 or later to offload and filter VLANs.
- The PF turns on VLAN pruning and antispoof in the VF's VSI by default. The VF
will not transmit or receive any tagged traffic until the VF requests a VLAN
filter.
- The VF can offload (insert and strip) the outer VLAN tag of 802.1Q or 802.1ad
traffic.
- The VF can create filter rules for the outer VLAN tag of both 802.1Q and
802.1ad traffic.

If the PF does not support double VLANs, the VF can hardware offload single
802.1Q VLANs without a port VLAN.

When the PF is enabled for double VLANs, for iavf drivers before version 4.1.x:
- VLAN hardware offloads and filtering are supported only when the PF has
configured a port VLAN.
- VLAN filtering, insertion, and stripping will be software offloaded when no
port VLAN is configured.

To see VLAN filtering and offload capabilities, use the following command:

# ethtool -k <ethX> | grep vlan


Application Device Queues (ADQ)
-------------------------------
Application Device Queues (ADQ) allow you to dedicate one or more queues to a
specific application. This can reduce latency for the specified application,
and allow Tx traffic to be rate limited per application.

Requirements:
- Kernel version 4.19.58 or later
- Depending on the underlying PF device, ADQ cannot be enabled when the
following features are enabled: Data Center Bridging (DCB), Multiple Functions
per Port (MFP), or Sideband Filters.
- If another driver (for example, DPDK) has set cloud filters, you cannot
enable ADQ.

When ADQ is enabled:
- You cannot change RSS parameters, the number of queues, or the MAC address in
the PF or VF. Delete the ADQ configuration before changing these settings.
- The driver supports subnet masks for IP addresses in the PF and VF. When you
add a subnet mask filter, the driver forwards packets to the ADQ VSI instead of
the main VSI.

To create traffic classes (TCs) on the interface:
NOTE: Run all TC commands from the ../iproute2/tc/ directory.
1. Use the tc command to create traffic classes. You can create a maximum of
   16 TCs per interface on Intel(R) Ethernet 800 Series devices and
   8 TCs per interface on Intel(R) Ethernet 700 Series devices.

   # tc qdisc add dev <ethX> root mqprio num_tc <tcs> map <priorities>
     queues <count1@offset1 ...> hw 1 mode channel shaper bw_rlimit
     min_rate <min_rate1 ...> max_rate <max_rate1 ...>
   Where:
      num_tc <tcs>: The number of TCs to use.
      map <priorities>: The map of priorities to TCs. You can map up to
          16 priorities to TCs.
      queues <count1@offset1 ...>: For each TC, <num queues>@<offset>. The max
          total number of queues for all TCs is the number of cores.
      hw 1 mode channel: 'channel' with 'hw' set to 1 is a new hardware offload
          mode in mqprio that makes full use of the mqprio options, the TCs,
          the queue configurations, and the QoS parameters.
      shaper bw_rlimit: For each TC, sets the minimum and maximum bandwidth
          rates. The totals must be equal to or less than the port speed. This
          parameter is optional and is required only to set up the Tx rates.
      min_rate <min_rate1>: Sets the minimum bandwidth rate limit for each TC.
      max_rate <max_rate1 ...>: Sets the maximum bandwidth rate limit for each
          TC. You can set a min and max rate together.

NOTE: See the mqprio man page and the examples below for more information.

2. Verify the bandwidth limit using network monitoring tools such as ifstat or
sar -n DEV [interval] [number of samples]

NOTE: Setting up channels via ethtool (ethtool -L) is not supported when the
TCs are configured using mqprio.

3. Enable hardware TC offload on the interface:

   # ethtool -K <ethX> hw-tc-offload on

4. Apply TCs to ingress (Rx) flow of the interface:

   # tc qdisc add dev <ethX> ingress

EXAMPLES:
See the tc and tc-flower man pages for more information on traffic control and
TC flower filters.

- To set up two TCs (tc0 and tc1), with 16 queues each, priorities 0-3 for
  tc0 and 4-7 for tc1, and max Tx rate set to 1Gbit for tc0 and 3Gbit for tc1:

  # tc qdisc add dev ens4f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 queues
    16@0 16@16 hw 1 mode channel shaper bw_rlimit max_rate 1Gbit 3Gbit
  Where:
      map 0 0 0 0 1 1 1 1: Sets priorities 0-3 to use tc0 and 4-7 to use tc1
      queues 16@0 16@16: Assigns 16 queues to tc0 at offset 0 and 16 queues
          to tc1 at offset 16

Multiple filters can be added to the device, using the same recipe (and
requires no additional recipe resources), either on the same interface or on
different interfaces. Each filter uses the same fields for matching, but can
have different match values.
  # tc filter add dev <ethX> protocol ip ingress prio 1 flower ip_proto
    tcp dst_port $app_port skip_sw hw_tc 1

For example:

  # tc filter add dev <ethX> protocol ip ingress prio 1 flower ip_proto
     tcp dst_port 5555 skip_sw hw_tc 1


Performance Optimization
========================
Driver defaults are meant to fit a wide variety of workloads, but if further
optimization is required, we recommend experimenting with the following
settings.


Rx Descriptor Ring Size
-----------------------
To reduce the number of Rx packet discards, increase the number of Rx
descriptors for each Rx ring using ethtool.

 - Check if the interface is dropping Rx packets due to buffers being full
   (rx_dropped.nic can mean that there is no PCIe bandwidth):

   # ethtool -S <ethX> | grep "rx_dropped"

 - If the previous command shows drops on queues, it may help to increase
   the number of descriptors using 'ethtool -G':

   # ethtool -G <ethX> rx <N>
   Where <N> is the desired number of ring entries/descriptors

   This can provide temporary buffering for issues that create latency while
   the CPUs process descriptors.

NOTE: When you are handling a large number of connections in a VF, we recommend
setting the number of Rx descriptors to 1024 or above. For example:

# ethtool -G <ethX> rx 2048


Known Issues/Troubleshooting
============================

Software Issues
---------------
NOTE: After installing the driver, if your Intel Ethernet Network Connection
is not working, verify that you have installed the correct driver.


Linux bonding fails with VFs bound to an Intel(R) Ethernet 700 Series device
----------------------------------------------------------------------------
If you bind Virtual Functions (VFs) to an Intel(R) Ethernet 700 Series device,
the VF targets may fail when they become the active target. If the MAC address
of the VF is set by the PF (Physical Function) of the device, when you add a
target, or change the active-backup target, Linux bonding tries to sync the
backup target's MAC address to the same MAC address as the active target. Linux
bonding will fail at this point. This issue will not occur if the VF's MAC
address is not set by the PF.


Traffic Is Not Being Passed Between VM and Client
-------------------------------------------------
You may not be able to pass traffic between a client system and a Virtual
Machine (VM) running on a separate host if the Virtual Function (VF, or Virtual
NIC) is not in trusted mode and spoof checking is enabled on the VF. Note that
this situation can occur in any combination of client, host, and guest
operating system. See the readme for the PF driver for information on spoof
checking and how to set the VF to trusted mode.


Using four traffic classes fails
--------------------------------
Do not try to reserve more than three traffic classes in the iavf driver. Doing
so will fail to set any traffic classes and will cause the driver to write
errors to stdout. Use a maximum of three queues to avoid this issue.


Unexpected errors in dmesg when adding TCP filters on the VF
------------------------------------------------------------
When ADQ is configured and the VF is not in trusted mode, you may see
unexpected error messages in dmesg on the host when you try to add TCP filters
on the VF. This is due to the asynchronous design of the iavf driver. The VF
does not know whether it is trusted and appears to set the filter, while the PF
blocks the request and reports an error. See the dmesg log in the host OS for
details about the error.


Multiple log error messages on iavf driver removal
----------------------------------------------------
If you have several VFs and you remove the iavf driver, several instances of
the following log errors are written to the log:
  Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
  Unable to send the message to VF 2 aq_err 12
  ARQ Overflow Error detected


MAC address of Virtual Function changes unexpectedly
----------------------------------------------------
If a Virtual Function's MAC address is not assigned in the host, then the VF
(virtual function) driver will use a random MAC address. This random MAC
address may change each time the VF driver is reloaded. You can assign a static
MAC address in the host machine. This static MAC address will survive a VF
driver reload.


Driver Buffer Overflow Fix
--------------------------
The fix to resolve CVE-2016-8105, referenced in Intel SA-00069
<https://security-center.intel.com/advisory.aspx?intelid=INTEL-SA-00069&language
id=en-fr>, is included in this and future versions of the driver.


Compiling the Driver
--------------------
When trying to compile the driver by running make install, the following error
may occur: "Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source
tree and entering:

# make include/linux/version.h


Multiple Interfaces on Same Ethernet Broadcast Network
------------------------------------------------------
Due to the default ARP behavior on Linux, it is not possible to have one system
on two IP networks in the same Ethernet broadcast domain (non-partitioned
switch) behave as expected. All Ethernet interfaces will respond to IP traffic
for any IP address assigned to the system. This results in unbalanced receive
traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by
entering the following:

# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

This only works if your kernel's version is higher than 2.4.5.

NOTE: This setting is not saved across reboots. The configuration change can be
made permanent by adding the following line to the file /etc/sysctl.conf:

  net.ipv4.conf.all.arp_filter = 1

Another alternative is to install the interfaces in separate broadcast domains
(either in different switches or in a switch partitioned to VLANs).


Rx Page Allocation Errors
-------------------------
'Page allocation failure. order:0' errors may occur under stress with kernels
2.6.25 and newer. This is caused by the way the Linux kernel reports this
stressed condition.


Host May Reboot after Removing PF when VF is Active in Guest
------------------------------------------------------------
Using kernel versions earlier than 3.2, do not unload the PF driver with
active VFs. Doing this will cause your VFs to stop working until you reload
the PF driver and may cause a spontaneous reboot of your system.

Prior to unloading the PF driver, you must first ensure that all VFs are
no longer active. Do this by shutting down all VMs and unloading the VF driver.


Older VF drivers on Intel Ethernet 800 Series adapters
------------------------------------------------------
Some Windows* VF drivers from Release 22.9 or older may encounter errors when
loaded on a PF based on the Intel Ethernet 800 Series on Linux KVM. You may see
errors and the VF may not load. This issue does not occur starting with the
following Windows VF drivers:
- v40e64, v40e65: Version 1.5.65.0 and newer

To resolve this issue, download and install the latest iavf driver.


SR-IOV virtual functions have identical MAC addresses
-----------------------------------------------------
When you create multiple SR-IOV virtual functions, the VFs may have identical
MAC addresses. Only one VF will pass traffic, and all traffic on other VFs with
identical MAC addresses will fail. This is related to the
"MACAddressPolicy=persistent" setting in
/usr/lib/systemd/network/99-default.link.

To resolve this issue, edit the /usr/lib/systemd/network/99-default.link file
and change the MACAddressPolicy line to "MACAddressPolicy=none". For more
information, see the systemd.link man page.


Support
=======
For general information, go to the Intel support website at:
http://www.intel.com/support/

or the Intel Wired Networking project hosted by Sourceforge at:
http://sourceforge.net/projects/e1000

If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.


License
=======
This program is free software; you can redistribute it and/or modify it under
the terms and conditions of the GNU General Public License, version 2, as
published by the Free Software Foundation.

This program is distributed in the hope it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 51 Franklin
St - Fifth Floor, Boston, MA 02110-1301 USA.

The full GNU General Public License is included in this distribution in the
file called "COPYING".

Copyright(c) 2018 - 2021 Intel Corporation.


Trademarks
==========
Intel is a trademark or registered trademark of Intel Corporation or its
subsidiaries in the United States and/or other countries.

* Other names and brands may be claimed as the property of others.


