1. Nginx based on fstack delays setting up server on fstack until ngx_worker_process_init. ngx_configure_listening_sockets should as well.
2. FStack does not support IP_PKTINFO
When nginx is configured like this:
```
server {
listen 8000;
kernel_network_stack on;
location / {
proxy_pass http://127.0.0.1:8080/;
}
}
```
nginx will crash, becasue kernel network stack is handled in a single thread, but we have hijacked all the socket apis, it causes that all apis enter to f-stack's path which is in main thread.
Note that this only fix the error, `divert` is still not usable, refer to #136.
If you want to use NAT, you can just use the built-in `ipfw nat`
instead.
According to the FreeBSD Manual Page:
- When kevent() returns and if `flags` is EVFILT_READ, sockets which have previously been passed to listen() return when there is an incoming connection pending. `data` contains the size of the listen backlog.
So if an EVFILT_READ event reaches and it is the listen socket, we must accept `event->data` times. And for `ff_epoll` interface, we should continue to accept until it fails.
In the previous version, we only accept once when event reaches, it will cause listen queue overflow.
e.g. unix socket, ipc (with APP on kernel network stack), packet from kernel network stack.
1. Add a new directive kernel_network_stack :
Syntax: kernel_network_stack on | off;
Default: kernel_network_stack off;
Context: http, server
This directive is available only when NGX_HAVE_FF_STACK is defined.
Determines whether server should run on kernel network stack or fstack.
2. Use a simpler and more effective solution to discriminate fstack fd(file descriptor, only socket for now) from kernel fd.
Run with valgrind, and found this:
==2228== Invalid write of size 8
==2228== at 0x4E05DA: AliasSctpInit (alias_sctp.c:641)
==2228== by 0x4DE565: LibAliasInit (alias_db.c:2503)
==2228== by 0x4E9B3B: nat44_config (ip_fw_nat.c:505)
==2228== by 0x4E9E91: nat44_cfg (ip_fw_nat.c:599)
==2228== by 0x4F1719: ipfw_ctl3 (ip_fw_sockopt.c:3666)
==2228== by 0x4B9954: rip_ctloutput (raw_ip.c:659)
==2228== by 0x447E11: sosetopt (uipc_socket.c:2505)
==2228== by 0x44BF4D: kern_setsockopt (uipc_syscalls.c:1407)
==2228== by 0x409F08: ff_setsockopt (ff_syscall_wrapper.c:412)
==2228== by 0x5277AA: handle_ipfw_msg (ff_dpdk_if.c:1146)
==2228== by 0x52788C: handle_msg (ff_dpdk_if.c:1196)
==2228== by 0x5289B8: process_msg_ring (ff_dpdk_if.c:1213)
==2228== Address 0x60779b0 is 4,800 bytes inside a block of size 4,802
alloc'd
==2228== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==2228== by 0x509F15: ff_malloc (ff_host_interface.c:89)
==2228== by 0x4053BE: malloc (ff_glue.c:1021)
==2228== by 0x4E054E: AliasSctpInit (alias_sctp.c:632)
==2228== by 0x4DE565: LibAliasInit (alias_db.c:2503)
==2228== by 0x4E9B3B: nat44_config (ip_fw_nat.c:505)
==2228== by 0x4E9E91: nat44_cfg (ip_fw_nat.c:599)
==2228== by 0x4F1719: ipfw_ctl3 (ip_fw_sockopt.c:3666)
==2228== by 0x4B9954: rip_ctloutput (raw_ip.c:659)
==2228== by 0x447E11: sosetopt (uipc_socket.c:2505)
==2228== by 0x44BF4D: kern_setsockopt (uipc_syscalls.c:1407)
==2228== by 0x409F08: ff_setsockopt (ff_syscall_wrapper.c:412)
==2228==
The error line is:
`la->sctpNatTimer.TimerQ = sn_calloc(SN_TIMER_QUEUE_SIZE, sizeof(struct
sctpTimerQ));`
Since SN_TIMER_QUEUE_SIZE is defined as SN_MAX_TIMER+2, and sn_calloc is
defined as sn_malloc(x * n) if _SYS_MALLOC_H_ is defined, the size of
calloced memory will be wrong, because the macro will be expanded to
sizeof(struct sctpTimerQ)*SN_MAX_TIMER+2.
And the memory will be out of bounds here.
```
/* Initialise circular timer Q*/
for (i = 0; i < SN_TIMER_QUEUE_SIZE; i++)
LIST_INIT(&la->sctpNatTimer.TimerQ[i]);
```
Since f-stack uses `rte_pktmbuf_clone` to copy mbuf to other lcores when dispatching arp packets, but it doesn't real copy the packet data. The buf_addr of pktmbuf is pointed to the same address.
The arp response packet is generated with the same mbuf from the request
packet, it just swaps the src and dst address, so the copied mbufs will also be changed.
What we need is a deep copy function, and the arp packets are really small, so deep copy will not harm performance too much.
Fix#53#111#112.
1.Both EVFILT_READ and EVFILT_WRITE are values but not flags. It needs to check whether it is equal but not to do logic and.
2.If the read direction of the socket has shutdown, then the filter also sets EV_EOF in `flags`, and returns the socket error (if any) in `fflags`.