summaryrefslogtreecommitdiff
path: root/share
diff options
context:
space:
mode:
authorAlexander Bluhm <bluhm@cvs.openbsd.org>2011-03-08 00:13:42 +0000
committerAlexander Bluhm <bluhm@cvs.openbsd.org>2011-03-08 00:13:42 +0000
commit488c14dc7c744c2390691f754bdf25040049f583 (patch)
tree4b93ef89148a6fb3c773880b93c94f5a6afe32e9 /share
parent7b0bcaf525ebc1cc0f406f8320da0faa6c55c1a6 (diff)
Add a kernel man page sosplice(9) for the socket splicing implementation.
ok jmc@
Diffstat (limited to 'share')
-rw-r--r--share/man/man4/options.49
-rw-r--r--share/man/man9/Makefile5
-rw-r--r--share/man/man9/sosplice.9211
3 files changed, 220 insertions, 5 deletions
diff --git a/share/man/man4/options.4 b/share/man/man4/options.4
index 229fcdf735a..f8e4c4a0e57 100644
--- a/share/man/man4/options.4
+++ b/share/man/man4/options.4
@@ -1,4 +1,4 @@
-.\" $OpenBSD: options.4,v 1.206 2011/01/31 13:27:05 bluhm Exp $
+.\" $OpenBSD: options.4,v 1.207 2011/03/08 00:13:41 bluhm Exp $
.\" $NetBSD: options.4,v 1.21 1997/06/25 03:13:00 thorpej Exp $
.\"
.\" Copyright (c) 1998 Theo de Raadt
@@ -34,7 +34,7 @@
.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.\"
.\"
-.Dd $Mdocdate: January 31 2011 $
+.Dd $Mdocdate: March 8 2011 $
.Dt OPTIONS 4
.Os
.Sh NAME
@@ -655,7 +655,10 @@ Enables zero-copy socket splicing in the kernel.
See
.Dv SO_SPLICE
in
-.Xr setsockopt 2 .
+.Xr setsockopt 2
+and
+.Xr sosplice 9
+for details.
.It Cd option TCP_ECN
Turns on Explicit Congestion Notification (RFC 3168).
.Em ECN
diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile
index fb387aec33a..de771d0b6ff 100644
--- a/share/man/man9/Makefile
+++ b/share/man/man9/Makefile
@@ -1,4 +1,4 @@
-# $OpenBSD: Makefile,v 1.162 2011/01/09 02:26:31 deraadt Exp $
+# $OpenBSD: Makefile,v 1.163 2011/03/08 00:13:41 bluhm Exp $
# $NetBSD: Makefile,v 1.4 1996/01/09 03:23:01 thorpej Exp $
# Makefile for section 9 (kernel function and variable) manual pages.
@@ -23,7 +23,7 @@ MAN= altq.9 aml_evalnode.9 atomic.9 audio.9 autoconf.9 bio_register.9 \
radio.9 arc4random.9 rasops.9 ratecheck.9 resettodr.9 rssadapt.9 rwlock.9 \
sensor_attach.9 \
shutdownhook_establish.9 tsleep.9 spl.9 startuphook_establish.9 \
- socreate.9 style.9 syscall.9 systrace.9 sysctl_int.9 \
+ socreate.9 sosplice.9 style.9 syscall.9 systrace.9 sysctl_int.9 \
tc_init.9 time.9 timeout.9 tvtohz.9 uiomove.9 uvm.9 vfs.9 vfs_busy.9 \
vfs_cache.9 vaccess.9 vclean.9 vcount.9 vdevgone.9 vfinddev.9 vflush.9 \
vflushbuf.9 vget.9 vgone.9 vhold.9 vinvalbuf.9 vnode.9 vnsubr.9 \
@@ -279,6 +279,7 @@ MLINKS+=shutdownhook_establish.9 shutdownhook_disestablish.9
MLINKS+=socreate.9 sobind.9 socreate.9 soclose.9 socreate.9 soconnect.9 \
socreate.9 sogetopt.9 socreate.9 soreceive.9 socreate.9 sosetopt.9 \
socreate.9 sosend.9 socreate.9 soshutdown.9
+MLINKS+=sosplice.9 somove.9
MLINKS+=spl.9 spl0.9 spl.9 splassert.9 spl.9 splbio.9 spl.9 splclock.9 \
spl.9 splhigh.9 spl.9 spllowersoftclock.9 \
spl.9 splnet.9 spl.9 splsched.9 spl.9 splserial.9 spl.9 splsoftclock.9 \
diff --git a/share/man/man9/sosplice.9 b/share/man/man9/sosplice.9
new file mode 100644
index 00000000000..e21ba301c31
--- /dev/null
+++ b/share/man/man9/sosplice.9
@@ -0,0 +1,211 @@
+.\" $OpenBSD: sosplice.9,v 1.1 2011/03/08 00:13:41 bluhm Exp $
+.\"
+.\" Copyright (c) 2011 Alexander Bluhm <bluhm@openbsd.org>
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate: March 8 2011 $
+.Dt SOSPLICE 9
+.Os
+.Sh NAME
+.Nm sosplice ,
+.Nm somove
+.Nd splice two sockets for zero-copy data transfer
+.Sh SYNOPSIS
+.Ft int
+.Fn sosplice "struct socket *so" "int fd" "off_t max"
+.Ft int
+.Fn somove "struct socket *so" "int wait"
+.Sh DESCRIPTION
+The function
+.Fn sosplice
+is used to splice together a source and a drain socket.
+The source socket is passed as the
+.Fa so
+argument;
+the file descriptor of the drain is passed in
+.Fa fd .
+If
+.Fa fd
+is negative, an existing splicing gets dissolved.
+If
+.Fa max
+is positive, at most that many bytes will get transferred.
+Socket splicing can be invoked from user-land via the
+.Xr setsockopt 2
+system-call at the
+.Dv SOL_SOCKET
+level with the socket option
+.Dv SO_SPLICE .
+.Pp
+Before connecting both sockets, several checks are executed.
+See the
+.Sx ERRORS
+section for possible failures.
+The connection between both sockets is implemented by setting these
+additional fields in
+.Vt struct socket :
+.Pp
+.Bl -dash -compact -offset indent
+.It
+.Vt struct socket Fa *so_splice
+links from the source to the drain socket.
+.It
+.Vt struct socket Fa *so_spliceback
+links back from the drain to the source socket.
+.It
+.Vt off_t Fa so_splicelen
+counts the number of bytes spliced so far from this socket.
+.It
+.Vt off_t Fa so_splicemax
+specifies the maximum number of bytes to splice from this socket if
+non-zero.
+.El
+.Pp
+After connecting both sockets,
+.Fn sosplice
+calls
+.Fn somove
+to transfer the mbufs already in the source receive buffer to the
+drain send buffer.
+Finally the socket buffer flag
+.Dv SB_SPLICE
+is set on both socket buffers, to indicate that the protocol layer
+has to call
+.Fn somove
+whenever data or space is available.
+.Pp
+The function
+.Fn somove
+transfers data from the source's receive buffer to the drain's send
+buffer.
+It must be called at
+.Xr splsoftnet 9
+and
+.Fa so
+must be a spliced drain socket.
+It may be necessary to split an mbuf to handle out-of-band data
+inline or when the maximum splice length has been reached.
+If
+.Fa wait
+is
+.Dv M_WAIT ,
+splitting mbufs will always succeed.
+For
+.Dv M_DONTWAIT
+the out-of-band property might get lost or a short splice might
+happen.
+In the latter case, less than the given maximum number of bytes are
+transferred and user-land has to cope with this.
+Note that a short splice cannot happen if
+.Fn somove
+was called by
+.Fn sosplice .
+So a second
+.Xr setsockopt 2
+after a short splice pointing to the same maximum will always
+succeed.
+.Pp
+Before transferring data,
+.Fn somove
+checks both sockets for errors and that the drain socket is connected.
+If the drain cannot send anymore, an
+.Er EPIPE
+error is set on the source socket.
+The data length to move is limited by the optional maximum splice
+length and the space in the drain's send socket buffer.
+Up to this amount of data is taken out of the source's receive
+socket buffer.
+.Pp
+If the maximum splice length has been reached, an mbuf may get
+split.
+Otherwise an mbuf is either moved completely to the send buffer or
+left in the receive buffer for later processing.
+If SO_OOBINLINE is set, out-of-band data will get moved as such
+although this might not be reliable.
+The data is sent out to the drain socket via the protocol function.
+If that fails and the drain socket cannot send anymore, an
+.Er EPIPE
+error is set on the source socket.
+.Pp
+Finally the socket splicing gets dissolved if the source socket
+cannot receive anymore and its receive buffer is empty; or if the
+drain socket cannot send anymore; or if the maximum has been reached;
+or if an error occurred.
+.Pp
+If the socket buffer flag
+.Dv SB_SPLICE
+is set, the functions
+.Fn sorwakeup
+and
+.Fn sowwakeup
+will call
+.Fn somove
+to trigger the transfer when new data or buffer space is available.
+While socket splicing is active, the read wakeup will not be delivered
+to the source file descriptor.
+A read event is signaled to user-land after dissolving.
+.Sh RETURN VALUES
+.Fn sosplice
+returns 0 on success and otherwise the error number.
+.Fn somove
+returns 0 if socket splicing has been finished and 1 if it continues.
+.Sh ERRORS
+.Fn sosplice
+will succeed unless:
+.Bl -tag -width Er
+.It Bq Er EBADF
+The given file descriptor
+.Fa fd
+is not an active descriptor.
+.It Bq Er EBUSY
+The source or the drain socket is already spliced.
+.It Bq Er EINVAL
+The given maximum value
+.Fa max
+is negative.
+.It Bq Er ENOTCONN
+The source or the drain socket is neither connected nor in the
+process of connecting to a peer.
+.It Bq Er ENOTSOCK
+The given file descriptor
+.Fa fd
+is not a socket.
+.It Bq Er EOPNOTSUPP
+The source or the drain socket is a listen socket.
+.It Bq Er EPROTONOSUPPORT
+The source socket's protocol layer does not have the
+.Dv PR_SPLICE
+flag set.
+At the moment only TCP supports socket splicing.
+.It Bq Er EPROTONOSUPPORT
+The drain socket's protocol does not have the same
+.Fa pr_usrreq
+function as the source.
+.It Bq Er EWOULDBLOCK
+The source socket is non-blocking and the receive buffer is already
+locked.
+.El
+.Sh SEE ALSO
+.Xr setsockopt 2 ,
+.Xr options 4
+.Sh HISTORY
+Socket splicing first appeared in
+.Ox 4.9 .
+.Sh AUTHORS
+.An -nosplit
+The idea for socket splicing originally came from
+.An Markus Friedl Aq markus@openbsd.org ,
+and
+.An Alexander Bluhm Aq bluhm@openbsd.org
+implemented it.