summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_input.c
AgeCommit message (Collapse)Author
2011-10-15Respect the ToS setting in tcp syn+ack for IPv4, still need to fix forChristiano F. Haesbaert
IPv6. ok claudio@
2011-05-13Revert the pf->socket linking diff.Owain Ainsworth
at least krw@, pirofti@ and todd@ have been seeing panics (todd and krw with xxxterm not sure about pirofti) involving pool corruption while using this commit. krw and todd confirm that this backout fixes the problem. ok blambert@ krw@, todd@ henning@ and kettenis@ Double link between pf states and sockets. Henning has already implemented half of it. The additional part is: - The pf state lookup for outgoing packets is optimized by using mbuf->inp->state. - For incomming tcp, udp, raw, raw6 packets the socket lookup always is optimized by using mbuf->state->inp. - All protocols establish the link for incomming packets. - All protocols set the inp in the mbuf for outgoing packets. This allows the linkage beginning with the first packet for outgoing connections. - In case of divert states, delete the state when the socket closes. Otherwise new connections could match on old states instead of being diverted to the listen socket. ok henning@
2011-05-04Clean up gotos for listening sockets to make it obvious when packetsBret Lambert
are dropped and when normal program flow occurs. Change error return value of syn_cache_add() from 0 to -1 in order to clearly communicate intent. ok claudio@
2011-04-29In certain failure cases, a RST would be sent out on rdomain 0,Bret Lambert
regardless of the rdomain the packet was received on. Explicitly pass the rdomain to the tcp_respond() monstrosity to compensate for said monstricism which led to this behavior. ok claudio@
2011-04-28Make in_broadcast() rdomain aware. Mostly mechanical change.Claudio Jeker
This fixes the problem of binding sockets to broadcast IPs in other rdomains. OK henning@
2011-04-24Double link between pf states and sockets. Henning has alreadyAlexander Bluhm
implemented half of it. The additional part is: - The pf state lookup for outgoing packets is optimized by using mbuf->inp->state. - For incomming tcp, udp, raw, raw6 packets the socket lookup always is optimized by using mbuf->state->inp. - All protocols establish the link for incomming packets. - All protocols set the inp in the mbuf for outgoing packets. This allows the linkage beginning with the first packet for outgoing connections. - In case of divert states, delete the state when the socket closes. Otherwise new connections could match on old states instead of being diverted to the listen socket. ok henning@
2011-04-12put the accepted socket of a diverted connection into the routing domainMike Belopuhov
of a connection originator. this allows one to query the source rdomain with a SO_RTABLE socket option. figured out with reyk, ok claudio.
2011-04-05Replace if/else ladder with much more legible switch statement forBret Lambert
testing tcp flags. ok henning@ claudio@
2011-04-04turn some macros into functions; saves 1400+ bytes from the kernelBret Lambert
on amd64 ok claudio@
2011-04-04Instead of calling tcp_reass (tcp reassembly) with magic argumentsBret Lambert
in order to skip most of the reassembly logic and try to flush available tcp segments to the socket, just split it off into its own function and use it where appropriate. ok claudio@ henning@
2011-04-04change an if statement to a switch to reduce eye bleedageBret Lambert
no change in .o md5 "ok gcc" claudio@
2011-01-07Add socket option SO_SPLICE to splice together two TCP sockets.Alexander Bluhm
The data received on the source socket will automatically be sent on the drain socket. This allows to write relay daemons with zero data copy. ok markus@
2010-09-29Initialize the ts_recent (received timestamp) field in the newly createdClaudio Jeker
socket from the information we have in the syncache. Also bzero() the tcpcb that is passed to tcp_dooptions() just to be sure.
2010-09-29It is not allowed to recalculate the window scale after the initial SYN.Claudio Jeker
A session must stick to the rscale factor sent out in the SYN packet. Remove the bogus tcp_rscale() call which is done after a full established session is returned from the syncache.
2010-09-29Do not delay ACKs on connections using loopback interfaces. There is noClaudio Jeker
reason to reduce the amount of ACKs sent and delayed ACKs have a very bad interaction with the large MTU of lo(4) and the fairly small socketbuffer size. In collaboration with andre@freebsd. OK deraadt@
2010-09-24TCP send and recv buffer scaling.Claudio Jeker
Send buffer is scaled by not accounting unacknowledged on the wire data against the buffer limit. Receive buffer scaling is done similar to FreeBSD -- measure the delay * bandwith product and base the buffer on that. The problem is that our RTT measurment is coarse so it overshoots on low delay links. This does not matter that much since the recvbuffer is almost always empty. Add a back pressure mechanism to control the amount of memory assigned to socketbuffers that kicks in when 80% of the cluster pool is used. Increases the download speed from 300kB/s to 4.4MB/s on ftp.eu.openbsd.org. Based on work by markus@ and djm@. OK dlg@, henning@, put it in deraadt@
2010-07-20Switch some obvious network stack MAC comparisons from bcmp() toMatthew Dempsky
timingsafe_bcmp(). ok deraadt@; committed over WPA.
2010-07-09Add support for using IPsec in multiple rdomains.Reyk Floeter
This allows to run isakmpd/iked/ipsecctl in multiple rdomains independently (with "route exec"); the kernel will pickup the rdomain from the process context of the pfkey socket and load the flows and SAs into the matching rdomain encap routing table. The network stack also needs to pass the rdomain to the ipsec stack to lookup the correct rdomain that belongs to an interface/mbuf/... You can now run individual IPsec configs per rdomain or create IPsec VPNs between multiple rdomains on the same machine ;). Note that a primary enc(4) in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1. Test by some people, mostly on existing "rdomain 0" setups. Was in snaps for some days and people didn't complain. ok claudio@ naddy@
2010-07-03Fix the naming of interfaces and variables for rdomains and rtablesPhilip Guenthe
and make it possible to bind sockets (including listening sockets!) to rtables and not just rdomains. This changes the name of the system calls, socket option, and ioctl. After building with this you should remove the files /usr/share/man/cat2/[gs]etrdomain.0. Since this removes the existing [gs]etrdomain() system calls, the libc major is bumped. Written by claudio@, criticized^Wcritiqued by me
2010-03-11unbreak the build with a custom kernel config including "pseudo-deviceStuart Henderson
faith 1", noticed by Andris Kadar. ok kettenis@ beck@
2010-01-15Replace pool_get() + bzero() with pool_get(..., PR_ZERO).Charles Longeau
With input from oga@ and krw@ ok oga@ krw@ thib@ markus@ mk@
2009-11-13Extend the protosw pr_ctlinput function to include the rdomain. This isClaudio Jeker
needed so that the route and inp lookups done in TCP and UDP know where to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain argument as well for similar reasons. With this tcp seems to be now fully rdomain save and no longer leaks single packets into the main domain. Looks good markus@, henning@
2009-11-03rtables are stacked on rdomains (it is possible to have multiple routingClaudio Jeker
tables on top of a rdomain) but until now our code was a crazy mix so that it was impossible to correctly use rtables in that case. Additionally pf(4) only knows about rtables and not about rdomains. This is especially bad when tracking (possibly conflicting) states in various domains. This diff fixes all or most of these issues. It adds a lookup function to get the rdomain id based on a rtable id. Makes pf understand rdomains and allows pf to move packets between rdomains (it is similar to NAT). Because pf states now track the rdomain id as well it is necessary to modify the pfsync wire format. So old and new systems will not sync up. A lot of help by dlg@, tested by sthen@, jsg@ and probably more OK dlg@, mpf@, deraadt@
2009-08-20fix indentationAlexander Bluhm
no binary change; ok grunk@
2009-08-10sockets created via a listening socket lose the rdomain and fail to workClaudio Jeker
therefore. Inherit the rdomain through the syncache. There are some interactions that need some more work (ctlinput) so this can be improved but is good enough for now. OK markus@
2009-06-05Initial support for routing domains. This allows to bind interfaces toClaudio Jeker
alternate routing table and separate them from other interfaces in distinct routing tables. The same network can now be used in any doamin at the same time without causing conflicts. This diff is mostly mechanical and adds the necessary rdomain checks accross net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6. input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@
2009-06-03add the basic infrastructure to take advantage of TCP and UDP receiveChristian Weisgerber
checksum offload over IPv6; ok deraadt@
2008-11-02Remove the M_ANYCAST6 mbuf flag by doing the detection all in ip6_input().Claudio Jeker
M_ANYCAST6 was only used to signal tcp6_input() that it should drop the packet and send back icmp error. This can be done in ip6_input() without the need for a mbuf flag. Gives us back one slot in m_flags for possible future need. Looked at and some input by naddy@ and henning@. OK dlg@
2008-10-10back out previous change. Another panic, not as frequent, andDavid Hill
definitely not at will.
2008-10-10Comment out statekey code to stop 'panic: soreceive 3', whichDavid Hill
happens with IPv6 TCP traffic, until a better fix is found. patch from henning@ proded by deraadt@
2008-09-09The pf state to pcb linking code change didn't account for theMarco Pfatschbacher
TIME_WAIT socket recycling code to redo the pcb lookup w/out resetting the inp pointer. Therefore we used the stale pcb, which leads us to reply with a RST to SYNs received on TIME_WAIT sockets. Also move the findpcb label below the pf pcb cache lookup, to avoid using a stale pcb when the caching code gets activated. OK markus@, henning@
2008-07-03link pf state keys to tcp pcbs and vice versa.Henning Brauer
when we first do a pcb lookup and we have a pointer to a pf state key in the mbuf header, store the state key pointer in the pcb and a pointer to the pcb we just found in the state key. when either the state key or the pcb is removed, clear the pointers. on subsequent packets inbound we can skip the pcb lookup and just use the pointer from the state key. on subsequent packets outbound we can skip the state key lookup and use the pointer from the pcb. about 8% speedup with 100 concurrent tcp sessions, should help much more with more tcp sessions. ok markus ryan
2008-06-14Include "faith.h" in order to get NFAITH. Also clean up NFAITH conditionalsJoel Sing
whilst we're here. ok henning@ deraadt@
2008-06-12Remove some crazy #if mess.Joel Sing
ok markus@ henning@
2008-06-12ANSIfy function definitions.Joel Sing
ok markus@ mcbride@ henning@ deraadt@
2008-06-12Fix type difference between function prototype and implementation.Joel Sing
According to millert@ this would have been promoted from a short to an int anyway, since K&R C cannot pass variables that are smaller than an int. ok deraadt@ millert@
2008-05-15divert for ipv6; ok henning, pyrMarkus Friedl
2008-05-09divert packets to local socket without modifying the ip header;Markus Friedl
makes transparent proxies much easier; ok beck@, feedback claudio@
2008-05-06remove tcp_drain code since it's not longer used; ok henning, feedback thibMarkus Friedl
2008-02-20when creating a response, use the correct TCP header instead ofMarkus Friedl
relying on the mbuf chain layout; with claudio@ and krw@; ok henning@
2008-02-11The TCP server has to recalculate the client's window size takenAlexander Bluhm
from the first ACK packet. Otherwise the server would use the unscaled window size for the fist data it is sending. ok markus@ dhartmei@
2007-11-27TCP_COMPAT_42 was last used in 1997. Kill it.Theo de Raadt
ok millert
2007-11-27typos; ok jmc@Martynas Venckus
sys/dev/pci/pciide.c from naddy@
2007-09-01since theHenning Brauer
MGET* macros were changed to function calls, there wasn't any need for the pool declarations and the inclusion of pool.h From: tbert <bret.lambert@gmail.com>
2007-06-15Drop the current random timestamps and the current ISN generationMarkus Friedl
code and replace both with a RFC1948 based method, so TCP clients now have monotonic ISN/timestamps. The server side uses completely random ISN/timestamps and does time-wait recycling (on port reuse). ok djm@, mcbride@; thanks to lots of testers
2007-06-11there was code inside #if NPF > 0, but pf.h was not included, so it didHenning Brauer
not get build. the code looks at flags that used to be in mbuf tags, now they are in the mbuf header, so we can check them unconditionally. problem spotted by Daniel Roethlisberger <daniel@roe.ch>, ok ryan markus
2007-06-01apply the "skip ipsec if there are no flows" speedup diff to IPv6 too.Henning Brauer
we need a pointer to the inpcb to decide, which was not previously passed to ip6_output, so this diff is a little bigger. from itojun, ok ryan
2007-05-27diffs are better if compilers see them firstTheo de Raadt
2007-05-27take static off tcp_mss_adv.David Gwynne
ok reyk@
2007-05-22When a partial ack is received check if congestion window is larger thanMichele Marchetto
acked bytes and update the window accordingly fix PR4278 OK henning@ markus@ claudio@