According to the FreeBSD Manual Page:
- When kevent() returns and if `flags` is EVFILT_READ, sockets which have previously been passed to listen() return when there is an incoming connection pending. `data` contains the size of the listen backlog.
So if an EVFILT_READ event reaches and it is the listen socket, we must accept `event->data` times. And for `ff_epoll` interface, we should continue to accept until it fails.
In the previous version, we only accept once when event reaches, it will cause listen queue overflow.
According to the FreeBSD Manual Page:
- When kevent() returns and if `flags` is EVFILT_READ, sockets which have previously been passed to listen() return when there is an incoming connection pending. `data` contains the size of the listen backlog.
So if an EVFILT_READ event reaches and it is the listen socket, we must accept `event->data` times. And for `ff_epoll` interface, we should continue to accept until it fails.
In the previous version, we only accept once when event reaches, it will cause listen queue overflow.
e.g. unix socket, ipc (with APP on kernel network stack), packet from kernel network stack.
1. Add a new directive kernel_network_stack :
Syntax: kernel_network_stack on | off;
Default: kernel_network_stack off;
Context: http, server
This directive is available only when NGX_HAVE_FF_STACK is defined.
Determines whether server should run on kernel network stack or fstack.
2. Use a simpler and more effective solution to discriminate fstack fd(file descriptor, only socket for now) from kernel fd.
e.g. unix socket, ipc (with APP on kernel network stack), packet from kernel network stack.
1. Add a new directive kernel_network_stack :
Syntax: kernel_network_stack on | off;
Default: kernel_network_stack off;
Context: http, server
This directive is available only when NGX_HAVE_FF_STACK is defined.
Determines whether server should run on kernel network stack or fstack.
2. Use a simpler and more effective solution to discriminate fstack fd(file descriptor, only socket for now) from kernel fd.
Run with valgrind, and found this:
==2228== Invalid write of size 8
==2228== at 0x4E05DA: AliasSctpInit (alias_sctp.c:641)
==2228== by 0x4DE565: LibAliasInit (alias_db.c:2503)
==2228== by 0x4E9B3B: nat44_config (ip_fw_nat.c:505)
==2228== by 0x4E9E91: nat44_cfg (ip_fw_nat.c:599)
==2228== by 0x4F1719: ipfw_ctl3 (ip_fw_sockopt.c:3666)
==2228== by 0x4B9954: rip_ctloutput (raw_ip.c:659)
==2228== by 0x447E11: sosetopt (uipc_socket.c:2505)
==2228== by 0x44BF4D: kern_setsockopt (uipc_syscalls.c:1407)
==2228== by 0x409F08: ff_setsockopt (ff_syscall_wrapper.c:412)
==2228== by 0x5277AA: handle_ipfw_msg (ff_dpdk_if.c:1146)
==2228== by 0x52788C: handle_msg (ff_dpdk_if.c:1196)
==2228== by 0x5289B8: process_msg_ring (ff_dpdk_if.c:1213)
==2228== Address 0x60779b0 is 4,800 bytes inside a block of size 4,802
alloc'd
==2228== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==2228== by 0x509F15: ff_malloc (ff_host_interface.c:89)
==2228== by 0x4053BE: malloc (ff_glue.c:1021)
==2228== by 0x4E054E: AliasSctpInit (alias_sctp.c:632)
==2228== by 0x4DE565: LibAliasInit (alias_db.c:2503)
==2228== by 0x4E9B3B: nat44_config (ip_fw_nat.c:505)
==2228== by 0x4E9E91: nat44_cfg (ip_fw_nat.c:599)
==2228== by 0x4F1719: ipfw_ctl3 (ip_fw_sockopt.c:3666)
==2228== by 0x4B9954: rip_ctloutput (raw_ip.c:659)
==2228== by 0x447E11: sosetopt (uipc_socket.c:2505)
==2228== by 0x44BF4D: kern_setsockopt (uipc_syscalls.c:1407)
==2228== by 0x409F08: ff_setsockopt (ff_syscall_wrapper.c:412)
==2228==
The error line is:
`la->sctpNatTimer.TimerQ = sn_calloc(SN_TIMER_QUEUE_SIZE, sizeof(struct
sctpTimerQ));`
Since SN_TIMER_QUEUE_SIZE is defined as SN_MAX_TIMER+2, and sn_calloc is
defined as sn_malloc(x * n) if _SYS_MALLOC_H_ is defined, the size of
calloced memory will be wrong, because the macro will be expanded to
sizeof(struct sctpTimerQ)*SN_MAX_TIMER+2.
And the memory will be out of bounds here.
```
/* Initialise circular timer Q*/
for (i = 0; i < SN_TIMER_QUEUE_SIZE; i++)
LIST_INIT(&la->sctpNatTimer.TimerQ[i]);
```
Run with valgrind, and found this:
==2228== Invalid write of size 8
==2228== at 0x4E05DA: AliasSctpInit (alias_sctp.c:641)
==2228== by 0x4DE565: LibAliasInit (alias_db.c:2503)
==2228== by 0x4E9B3B: nat44_config (ip_fw_nat.c:505)
==2228== by 0x4E9E91: nat44_cfg (ip_fw_nat.c:599)
==2228== by 0x4F1719: ipfw_ctl3 (ip_fw_sockopt.c:3666)
==2228== by 0x4B9954: rip_ctloutput (raw_ip.c:659)
==2228== by 0x447E11: sosetopt (uipc_socket.c:2505)
==2228== by 0x44BF4D: kern_setsockopt (uipc_syscalls.c:1407)
==2228== by 0x409F08: ff_setsockopt (ff_syscall_wrapper.c:412)
==2228== by 0x5277AA: handle_ipfw_msg (ff_dpdk_if.c:1146)
==2228== by 0x52788C: handle_msg (ff_dpdk_if.c:1196)
==2228== by 0x5289B8: process_msg_ring (ff_dpdk_if.c:1213)
==2228== Address 0x60779b0 is 4,800 bytes inside a block of size 4,802
alloc'd
==2228== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==2228== by 0x509F15: ff_malloc (ff_host_interface.c:89)
==2228== by 0x4053BE: malloc (ff_glue.c:1021)
==2228== by 0x4E054E: AliasSctpInit (alias_sctp.c:632)
==2228== by 0x4DE565: LibAliasInit (alias_db.c:2503)
==2228== by 0x4E9B3B: nat44_config (ip_fw_nat.c:505)
==2228== by 0x4E9E91: nat44_cfg (ip_fw_nat.c:599)
==2228== by 0x4F1719: ipfw_ctl3 (ip_fw_sockopt.c:3666)
==2228== by 0x4B9954: rip_ctloutput (raw_ip.c:659)
==2228== by 0x447E11: sosetopt (uipc_socket.c:2505)
==2228== by 0x44BF4D: kern_setsockopt (uipc_syscalls.c:1407)
==2228== by 0x409F08: ff_setsockopt (ff_syscall_wrapper.c:412)
==2228==
The error line is:
`la->sctpNatTimer.TimerQ = sn_calloc(SN_TIMER_QUEUE_SIZE, sizeof(struct
sctpTimerQ));`
Since SN_TIMER_QUEUE_SIZE is defined as SN_MAX_TIMER+2, and sn_calloc is
defined as sn_malloc(x * n) if _SYS_MALLOC_H_ is defined, the size of
calloced memory will be wrong, because the macro will be expanded to
sizeof(struct sctpTimerQ)*SN_MAX_TIMER+2.
And the memory will be out of bounds here.
```
/* Initialise circular timer Q*/
for (i = 0; i < SN_TIMER_QUEUE_SIZE; i++)
LIST_INIT(&la->sctpNatTimer.TimerQ[i]);
```
Since f-stack uses `rte_pktmbuf_clone` to copy mbuf to other lcores when dispatching arp packets, but it doesn't real copy the packet data. The buf_addr of pktmbuf is pointed to the same address.
The arp response packet is generated with the same mbuf from the request
packet, it just swaps the src and dst address, so the copied mbufs will also be changed.
What we need is a deep copy function, and the arp packets are really small, so deep copy will not harm performance too much.
Fix#53#111#112.
Since f-stack uses `rte_pktmbuf_clone` to copy mbuf to other lcores when dispatching arp packets, but it doesn't real copy the packet data. The buf_addr of pktmbuf is pointed to the same address.
The arp response packet is generated with the same mbuf from the request
packet, it just swaps the src and dst address, so the copied mbufs will also be changed.
What we need is a deep copy function, and the arp packets are really small, so deep copy will not harm performance too much.
Fix#53#111#112.