diff options
author | Alexander Bluhm <bluhm@cvs.openbsd.org> | 2011-03-08 00:13:42 +0000 |
---|---|---|
committer | Alexander Bluhm <bluhm@cvs.openbsd.org> | 2011-03-08 00:13:42 +0000 |
commit | 488c14dc7c744c2390691f754bdf25040049f583 (patch) | |
tree | 4b93ef89148a6fb3c773880b93c94f5a6afe32e9 /share | |
parent | 7b0bcaf525ebc1cc0f406f8320da0faa6c55c1a6 (diff) |
Add a kernel man page sosplice(9) for the socket splicing implementation.
ok jmc@
Diffstat (limited to 'share')
-rw-r--r-- | share/man/man4/options.4 | 9 | ||||
-rw-r--r-- | share/man/man9/Makefile | 5 | ||||
-rw-r--r-- | share/man/man9/sosplice.9 | 211 |
3 files changed, 220 insertions, 5 deletions
diff --git a/share/man/man4/options.4 b/share/man/man4/options.4 index 229fcdf735a..f8e4c4a0e57 100644 --- a/share/man/man4/options.4 +++ b/share/man/man4/options.4 @@ -1,4 +1,4 @@ -.\" $OpenBSD: options.4,v 1.206 2011/01/31 13:27:05 bluhm Exp $ +.\" $OpenBSD: options.4,v 1.207 2011/03/08 00:13:41 bluhm Exp $ .\" $NetBSD: options.4,v 1.21 1997/06/25 03:13:00 thorpej Exp $ .\" .\" Copyright (c) 1998 Theo de Raadt @@ -34,7 +34,7 @@ .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" .\" -.Dd $Mdocdate: January 31 2011 $ +.Dd $Mdocdate: March 8 2011 $ .Dt OPTIONS 4 .Os .Sh NAME @@ -655,7 +655,10 @@ Enables zero-copy socket splicing in the kernel. See .Dv SO_SPLICE in -.Xr setsockopt 2 . +.Xr setsockopt 2 +and +.Xr sosplice 9 +for details. .It Cd option TCP_ECN Turns on Explicit Congestion Notification (RFC 3168). .Em ECN diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile index fb387aec33a..de771d0b6ff 100644 --- a/share/man/man9/Makefile +++ b/share/man/man9/Makefile @@ -1,4 +1,4 @@ -# $OpenBSD: Makefile,v 1.162 2011/01/09 02:26:31 deraadt Exp $ +# $OpenBSD: Makefile,v 1.163 2011/03/08 00:13:41 bluhm Exp $ # $NetBSD: Makefile,v 1.4 1996/01/09 03:23:01 thorpej Exp $ # Makefile for section 9 (kernel function and variable) manual pages. @@ -23,7 +23,7 @@ MAN= altq.9 aml_evalnode.9 atomic.9 audio.9 autoconf.9 bio_register.9 \ radio.9 arc4random.9 rasops.9 ratecheck.9 resettodr.9 rssadapt.9 rwlock.9 \ sensor_attach.9 \ shutdownhook_establish.9 tsleep.9 spl.9 startuphook_establish.9 \ - socreate.9 style.9 syscall.9 systrace.9 sysctl_int.9 \ + socreate.9 sosplice.9 style.9 syscall.9 systrace.9 sysctl_int.9 \ tc_init.9 time.9 timeout.9 tvtohz.9 uiomove.9 uvm.9 vfs.9 vfs_busy.9 \ vfs_cache.9 vaccess.9 vclean.9 vcount.9 vdevgone.9 vfinddev.9 vflush.9 \ vflushbuf.9 vget.9 vgone.9 vhold.9 vinvalbuf.9 vnode.9 vnsubr.9 \ @@ -279,6 +279,7 @@ MLINKS+=shutdownhook_establish.9 shutdownhook_disestablish.9 MLINKS+=socreate.9 sobind.9 socreate.9 soclose.9 socreate.9 soconnect.9 \ socreate.9 sogetopt.9 socreate.9 soreceive.9 socreate.9 sosetopt.9 \ socreate.9 sosend.9 socreate.9 soshutdown.9 +MLINKS+=sosplice.9 somove.9 MLINKS+=spl.9 spl0.9 spl.9 splassert.9 spl.9 splbio.9 spl.9 splclock.9 \ spl.9 splhigh.9 spl.9 spllowersoftclock.9 \ spl.9 splnet.9 spl.9 splsched.9 spl.9 splserial.9 spl.9 splsoftclock.9 \ diff --git a/share/man/man9/sosplice.9 b/share/man/man9/sosplice.9 new file mode 100644 index 00000000000..e21ba301c31 --- /dev/null +++ b/share/man/man9/sosplice.9 @@ -0,0 +1,211 @@ +.\" $OpenBSD: sosplice.9,v 1.1 2011/03/08 00:13:41 bluhm Exp $ +.\" +.\" Copyright (c) 2011 Alexander Bluhm <bluhm@openbsd.org> +.\" +.\" Permission to use, copy, modify, and distribute this software for any +.\" purpose with or without fee is hereby granted, provided that the above +.\" copyright notice and this permission notice appear in all copies. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. +.\" +.Dd $Mdocdate: March 8 2011 $ +.Dt SOSPLICE 9 +.Os +.Sh NAME +.Nm sosplice , +.Nm somove +.Nd splice two sockets for zero-copy data transfer +.Sh SYNOPSIS +.Ft int +.Fn sosplice "struct socket *so" "int fd" "off_t max" +.Ft int +.Fn somove "struct socket *so" "int wait" +.Sh DESCRIPTION +The function +.Fn sosplice +is used to splice together a source and a drain socket. +The source socket is passed as the +.Fa so +argument; +the file descriptor of the drain is passed in +.Fa fd . +If +.Fa fd +is negative, an existing splicing gets dissolved. +If +.Fa max +is positive, at most that many bytes will get transferred. +Socket splicing can be invoked from user-land via the +.Xr setsockopt 2 +system-call at the +.Dv SOL_SOCKET +level with the socket option +.Dv SO_SPLICE . +.Pp +Before connecting both sockets, several checks are executed. +See the +.Sx ERRORS +section for possible failures. +The connection between both sockets is implemented by setting these +additional fields in +.Vt struct socket : +.Pp +.Bl -dash -compact -offset indent +.It +.Vt struct socket Fa *so_splice +links from the source to the drain socket. +.It +.Vt struct socket Fa *so_spliceback +links back from the drain to the source socket. +.It +.Vt off_t Fa so_splicelen +counts the number of bytes spliced so far from this socket. +.It +.Vt off_t Fa so_splicemax +specifies the maximum number of bytes to splice from this socket if +non-zero. +.El +.Pp +After connecting both sockets, +.Fn sosplice +calls +.Fn somove +to transfer the mbufs already in the source receive buffer to the +drain send buffer. +Finally the socket buffer flag +.Dv SB_SPLICE +is set on both socket buffers, to indicate that the protocol layer +has to call +.Fn somove +whenever data or space is available. +.Pp +The function +.Fn somove +transfers data from the source's receive buffer to the drain's send +buffer. +It must be called at +.Xr splsoftnet 9 +and +.Fa so +must be a spliced drain socket. +It may be necessary to split an mbuf to handle out-of-band data +inline or when the maximum splice length has been reached. +If +.Fa wait +is +.Dv M_WAIT , +splitting mbufs will always succeed. +For +.Dv M_DONTWAIT +the out-of-band property might get lost or a short splice might +happen. +In the latter case, less than the given maximum number of bytes are +transferred and user-land has to cope with this. +Note that a short splice cannot happen if +.Fn somove +was called by +.Fn sosplice . +So a second +.Xr setsockopt 2 +after a short splice pointing to the same maximum will always +succeed. +.Pp +Before transferring data, +.Fn somove +checks both sockets for errors and that the drain socket is connected. +If the drain cannot send anymore, an +.Er EPIPE +error is set on the source socket. +The data length to move is limited by the optional maximum splice +length and the space in the drain's send socket buffer. +Up to this amount of data is taken out of the source's receive +socket buffer. +.Pp +If the maximum splice length has been reached, an mbuf may get +split. +Otherwise an mbuf is either moved completely to the send buffer or +left in the receive buffer for later processing. +If SO_OOBINLINE is set, out-of-band data will get moved as such +although this might not be reliable. +The data is sent out to the drain socket via the protocol function. +If that fails and the drain socket cannot send anymore, an +.Er EPIPE +error is set on the source socket. +.Pp +Finally the socket splicing gets dissolved if the source socket +cannot receive anymore and its receive buffer is empty; or if the +drain socket cannot send anymore; or if the maximum has been reached; +or if an error occurred. +.Pp +If the socket buffer flag +.Dv SB_SPLICE +is set, the functions +.Fn sorwakeup +and +.Fn sowwakeup +will call +.Fn somove +to trigger the transfer when new data or buffer space is available. +While socket splicing is active, the read wakeup will not be delivered +to the source file descriptor. +A read event is signaled to user-land after dissolving. +.Sh RETURN VALUES +.Fn sosplice +returns 0 on success and otherwise the error number. +.Fn somove +returns 0 if socket splicing has been finished and 1 if it continues. +.Sh ERRORS +.Fn sosplice +will succeed unless: +.Bl -tag -width Er +.It Bq Er EBADF +The given file descriptor +.Fa fd +is not an active descriptor. +.It Bq Er EBUSY +The source or the drain socket is already spliced. +.It Bq Er EINVAL +The given maximum value +.Fa max +is negative. +.It Bq Er ENOTCONN +The source or the drain socket is neither connected nor in the +process of connecting to a peer. +.It Bq Er ENOTSOCK +The given file descriptor +.Fa fd +is not a socket. +.It Bq Er EOPNOTSUPP +The source or the drain socket is a listen socket. +.It Bq Er EPROTONOSUPPORT +The source socket's protocol layer does not have the +.Dv PR_SPLICE +flag set. +At the moment only TCP supports socket splicing. +.It Bq Er EPROTONOSUPPORT +The drain socket's protocol does not have the same +.Fa pr_usrreq +function as the source. +.It Bq Er EWOULDBLOCK +The source socket is non-blocking and the receive buffer is already +locked. +.El +.Sh SEE ALSO +.Xr setsockopt 2 , +.Xr options 4 +.Sh HISTORY +Socket splicing first appeared in +.Ox 4.9 . +.Sh AUTHORS +.An -nosplit +The idea for socket splicing originally came from +.An Markus Friedl Aq markus@openbsd.org , +and +.An Alexander Bluhm Aq bluhm@openbsd.org +implemented it. |