* An extra flag to describe additional settings, for example the multithreading mode of operation and extendable bucket functionality (as will be described later)
The example hash tables in the L2/L3 Forwarding sample applications define which port to forward a packet to based on a packet flow identified by the five-tuple lookup.
However, this table could also be used for more sophisticated features and provide many other functions and actions that could be performed on the packets and flows.
The hash library supports multithreading, and the user specifies the needed mode of operation at the creation time of the hash table
by appropriately setting the flag. In all modes of operation lookups are thread-safe meaning lookups can be called from multiple
threads concurrently.
For concurrent writes, and concurrent reads and writes the following flag values define the corresponding modes of operation:
* If the multi-writer flag (RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) is set, multiple threads writing to the table is allowed.
Key add, delete, and table reset are protected from other writer threads. With only this flag set, readers are not protected from ongoing writes.
* If the read/write concurrency (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY) is set, multithread read/write operation is safe
(i.e., application does not need to stop the readers from accessing the hash table until writers finish their updates. Readers and writers can operate on the table concurrently).
The library uses a reader-writer lock to provide the concurrency.
* In addition to these two flag values, if the transactional memory flag (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT) is also set,
the reader-writer lock will use hardware transactional memory (e.g., Intel® TSX) if supported to guarantee thread safety.
If the platform supports Intel® TSX, it is advised to set the transactional memory flag, as this will speed up concurrent table operations.
Otherwise concurrent operations will be slower because of the overhead associated with the software locking mechanisms.
* If lock free read/write concurrency (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF) is set, read/write concurrency is provided without using reader-writer lock.
For platforms (e.g., current ARM based platforms) that do not support transactional memory, it is advised to set this flag to achieve greater scalability in performance.
If this flag is set, the (RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL) flag is set by default.
* If the 'do not free on delete' (RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL) flag is set, the position of the entry in the hash table is not freed upon calling delete(). This flag is enabled
by default when the lock free read/write concurrency flag is set. The application should free the position after all the readers have stopped referencing the position.
Where required, the application can make use of RCU mechanisms to determine when the readers have stopped referencing the position.
Extendable Bucket Functionality support
----------------------------------------
An extra flag is used to enable this functionality (flag is not set by default). When the (RTE_HASH_EXTRA_FLAGS_EXT_TABLE) is set and
in the very unlikely case due to excessive hash collisions that a key has failed to be inserted, the hash table bucket is extended with a linked
list to insert these failed keys. This feature is important for the workloads (e.g. telco workloads) that need to insert up to 100% of the
hash table size and can't tolerate any key insertion failure (even if very few). Currently the extendable bucket is not supported
with the lock-free concurrency implementation (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF).
In the very unlikely event that an empty entry cannot be found after certain number of displacements,
key is considered not able to be added (unless extendable bucket flag is set, and in that case the bucket is extended to insert the key, as will be explained later).
With random keys, this method allows the user to get more than 90% table utilization, without
having to drop any stored entry (e.g. using a LRU replacement policy) or allocate more memory (extendable buckets or rehashing).
Example of deletion:
Similar to lookup, the key is searched in its primary and secondary buckets. If the key is found, the
entry is marked as empty. If the hash table was configured with 'no free on delete' or 'lock free read/write concurrency',
the position of the key is not freed. It is the responsibility of the user to free the position after
readers are not referencing the position anymore.
Implementation Details (with Extendable Bucket)
-------------------------------------------------
When the RTE_HASH_EXTRA_FLAGS_EXT_TABLE flag is set, the hash table implementation still uses the same Cuckoo Hash algorithm to store the keys into
the first and second tables. However, in the very unlikely event that a key can't be inserted after certain number of the Cuckoo displacements is
reached, the secondary bucket of this key is extended
with a linked list of extra buckets and the key is stored in this linked list.
In case of lookup for a certain key, as before, the primary bucket is searched for a match and then the secondary bucket is looked up.
If there is no match there either, the extendable buckets (linked list of extra buckets) are searched one by one for a possible match and if there is no match
the key is considered not to be in the table.
The deletion is the same as the case when the RTE_HASH_EXTRA_FLAGS_EXT_TABLE flag is not set. With one exception, if a key is deleted from any bucket
and an empty location is created, the last entry from the extendable buckets associated with this bucket is displaced into
this empty location to possibly shorten the linked list.
Last values on the tables above are the average maximum table
utilization with random keys and using Jenkins hash function.
Use Case: Flow Classification
-----------------------------
Flow classification is used to map each input packet to the connection/flow it belongs to.
This operation is necessary as the processing of each input packet is usually done in the context of their connection,
so the same set of operations is applied to all the packets from the same flow.
Applications using flow classification typically have a flow table to manage, with each separate flow having an entry associated with it in this table.
The size of the flow table entry is application specific, with typical values of 4, 16, 32 or 64 bytes.
Each application using flow classification typically has a mechanism defined to uniquely identify a flow based on
a number of fields read from the input packet that make up the flow key.
One example is to use the DiffServ 5-tuple made up of the following fields of the IP and transport layer packet headers:
Source IP Address, Destination IP Address, Protocol, Source Port, Destination Port.
The DPDK hash provides a generic method to implement an application specific flow classification mechanism.
Given a flow table implemented as an array, the application should create a hash object with the same number of entries as the flow table and
with the hash key size set to the number of bytes in the selected flow key.
The flow table operations on the application side are described below:
* Add flow: Add the flow key to hash.
If the returned position is valid, use it to access the flow entry in the flow table for adding a new flow or
updating the information associated with an existing flow.
Otherwise, the flow addition failed, for example due to lack of free entries for storing new flows.
* Delete flow: Delete the flow key from the hash. If the returned position is valid,
use it to access the flow entry in the flow table to invalidate the information associated with the flow.
* [partial-key] Bin Fan, David G. Andersen, and Michael Kaminsky, MemC3: compact and concurrent MemCache with dumber caching and smarter hashing, 2013, NSDI