summaryrefslogtreecommitdiff
path: root/share/man/man9
diff options
context:
space:
mode:
Diffstat (limited to 'share/man/man9')
-rw-r--r--share/man/man9/vnode.9304
1 files changed, 248 insertions, 56 deletions
diff --git a/share/man/man9/vnode.9 b/share/man/man9/vnode.9
index 06d5b15237f..b81d97fe03d 100644
--- a/share/man/man9/vnode.9
+++ b/share/man/man9/vnode.9
@@ -1,4 +1,4 @@
-.\" $OpenBSD: vnode.9,v 1.18 2003/06/06 20:56:32 jmc Exp $
+.\" $OpenBSD: vnode.9,v 1.19 2004/09/22 21:54:17 jaredy Exp $
.\"
.\" Copyright (c) 2001 Constantine Sapuntzakis
.\" All rights reserved.
@@ -23,67 +23,133 @@
.\" OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
.\" ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.\"
-.Dd February 22, 2001
+.Dd September 16, 2004
.Dt VNODE 9
.Os
.Sh NAME
.Nm vnode
.Nd an overview of vnodes
.Sh DESCRIPTION
-A vnode is an object in kernel memory that speaks the UNIX file
-interface (open, read, write, close, readdir, etc.).
+A
+.Em vnode
+is an object in kernel memory that speaks the
+.Ux
+file interface (open, read, write, close, readdir, etc.).
Vnodes can represent files, directories, FIFOs, domain sockets, block devices,
character devices.
.Pp
-Each vnode has a set of methods which start with string 'VOP_'.
-These methods include VOP_OPEN, VOP_READ, VOP_WRITE, VOP_RENAME, VOP_CLOSE,
-VOP_MKDIR.
+Each vnode has a set of methods which start with the string
+.Dq VOP_ .
+These methods include
+.Fn VOP_OPEN ,
+.Fn VOP_READ ,
+.Fn VOP_WRITE ,
+.Fn VOP_RENAME ,
+.Fn VOP_CLOSE ,
+and
+.Fn VOP_MKDIR .
Many of these methods correspond closely to the equivalent
-file system call - open, read, write, rename, etc.
+file system call \-
+.Xr open 2 ,
+.Xr read 2 ,
+.Xr write 2 ,
+.Xr rename 2 ,
+etc.
Each file system (FFS, NFS, etc.) provides implementations for these methods.
.Pp
-The Virtual File System (VFS) library maintains a pool of vnodes.
+The Virtual File System library (see
+.Xr vfs 9 )
+maintains a pool of vnodes.
File systems cannot allocate their own vnodes; they must use the functions
provided by the VFS to create and manage vnodes.
+.Pp
+The definition of a vnode is as follows:
+.Bd -literal
+struct vnode {
+ struct uvm_vnode v_uvm; /* uvm(9) data */
+ int (**v_op)(void *); /* vnode operations vector */
+ enum vtype v_type; /* vnode type */
+ u_int v_flag; /* vnode flags (see below) */
+ u_int v_usecount; /* reference count of users */
+ u_int v_writecount; /* reference count of writers */
+ /* Flags that can be read/written in interrupts */
+ u_int v_bioflag; /* flags used by intr handlers */
+ u_int v_holdcnt; /* buffer references */
+ u_int v_id; /* capability identifier */
+ struct mount *v_mount; /* ptr to vfs we are in */
+ TAILQ_ENTRY(vnode) v_freelist; /* vnode freelist */
+ LIST_ENTRY(vnode) v_mntvnodes; /* vnodes for mount point */
+ struct buflists v_cleanblkhd; /* clean blocklist head */
+ struct buflists v_dirtyblkhd; /* dirty blocklist head */
+ u_int v_numoutput; /* num of writes in progress */
+ LIST_ENTRY(vnode) v_synclist; /* vnode with dirty buffers */
+ union {
+ struct mount *vu_mountedhere;/* ptr to mounted vfs (VDIR) */
+ struct socket *vu_socket; /* UNIX IPC (VSOCK) */
+ struct specinfo *vu_specinfo; /* device (VCHR, VBLK) */
+ struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */
+ } v_un;
+
+ struct simplelock v_interlock; /* lock on usecount and flag */
+ struct lock v_lock; /* used for non-locking fs's */
+ struct lock *v_vnlock; /* pointer to vnode lock */
+ enum vtagtype v_tag; /* type of underlying data */
+ void *v_data; /* private data for fs */
+ struct {
+ struct simplelock vsi_lock; /* lock to protect below */
+ struct selinfo vsi_selinfo; /* identity of poller(s) */
+ } v_selectinfo;
+};
+#define v_mountedhere v_un.vu_mountedhere
+#define v_socket v_un.vu_socket
+#define v_specinfo v_un.vu_specinfo
+#define v_fifoinfo v_un.vu_fifoinfo
+.Ed
.Ss Vnode life cycle
When a client of the VFS requests a new vnode, the vnode allocation
code can reuse an old vnode object that is no longer in use.
Whether a vnode is in use is tracked by the vnode reference count
-(v_usecount).
+.Pq Va v_usecount .
By convention, each open file handle holds a reference
as do VM objects backed by files.
-A vnode with a reference count of 1 or more will not be de-allocated or
-re-used to point to a different file.
+A vnode with a reference count of 1 or more will not be deallocated or
+reused to point to a different file.
So, if you want to ensure that your vnode doesn't become a different
file under you, you better be sure you have a reference to it.
A vnode that points to a valid file and has a reference count of 1 or more
-is called "active".
+is called
+.Em active .
.Pp
-When a vnode's reference count drops to zero, it becomes "inactive",
+When a vnode's reference count drops to zero, it becomes
+.Em inactive ,
that is, a candidate for reuse.
-An "inactive" vnode still refers to a valid file and one can try to
+An inactive vnode still refers to a valid file and one can try to
reactivate it using
.Xr vget 9
(this is used a lot by caches).
.Pp
Before the VFS can reuse an inactive vnode to refer to another file,
it must clean all information pertaining to the old file.
-A cleaned out vnode is called a "reclaimed" vnode.
+A cleaned out vnode is called a
+.Em reclaimed
+vnode.
.Pp
To support forceable unmounts and the
.Xr revoke 2
-system call, the VFS may "reclaim" a vnode with a positive reference
+system call, the VFS may reclaim a vnode with a positive reference
count.
-The "reclaimed" vnode is given to the dead file system, which
+The reclaimed vnode is given to the dead file system, which
returns errors for most operations.
The reclaimed vnode will not be
-re-used for another file until its reference count hits zero.
+reused for another file until its reference count hits zero.
.Ss Vnode pool
The
.Xr getnewvnode 9
system call allocates a vnode from the pool, possibly reusing an
-"inactive" vnode, and returns it to the caller.
-The vnode returned has a reference count (v_usecount) of 1.
+inactive vnode, and returns it to the caller.
+The vnode returned has a reference count
+.Pq Va v_usecount
+of 1.
.Pp
The
.Xr vref 9
@@ -100,27 +166,32 @@ call also releases the vnode lock.
.Pp
The
.Xr vget 9
-call, when used on an inactive vnode, will make the vnode "active"
+call, when used on an inactive vnode, will make the vnode active
by bumping the reference count to one.
-When called on an active vnode, vget increases the reference count by one.
-However, if the vnode is being reclaimed concurrently, then vget will fail
-and return an error.
+When called on an active vnode,
+.Fn vget
+increases the reference count by one.
+However, if the vnode is being reclaimed concurrently, then
+.Fn vget
+will fail and return an error.
.Pp
The
.Xr vgone 9
and
.Xr vgonel 9
+calls
orchestrate the reclamation of a vnode.
They can be called on both active and inactive vnodes.
.Pp
-When transitioning a vnode to the "reclaimed" state, the VFS will call
+When transitioning a vnode to the reclaimed state, the VFS will call
.Xr VOP_RECLAIM 9
method.
File systems use this method to free any file-system specific data
they attached to the vnode.
.Ss Vnode locks
The vnode actually has three different types of lock: the vnode lock,
-the vnode interlock, and the vnode reclamation lock (VXLOCK).
+the vnode interlock, and the vnode reclamation lock
+.Pq Dv VXLOCK .
.Ss The vnode lock
The vnode lock and its consistent use accomplishes the following:
.Bl -bullet
@@ -128,13 +199,22 @@ The vnode lock and its consistent use accomplishes the following:
It keeps a locked vnode from changing across certain pairs of VOP_ calls,
thus preserving cached data.
For example, it keeps the directory from
-changing between a VOP_LOOKUP call and a VOP_CREATE.
-The VOP_LOOKUP call makes sure the name doesn't already exist in the
+changing between a
+.Xr VOP_LOOKUP 9
+call and a
+.Xr VOP_CREATE 9 .
+The
+.Fn VOP_LOOKUP
+call makes sure the name doesn't already exist in the
directory and finds free room in the directory for the new entry.
-The VOP_CREATE can then go ahead and create the file without checking if
+The
+.Fn VOP_CREATE
+call can then go ahead and create the file without checking if
it already exists or looking for free space.
.It
-Some file systems rely on it to ensure that only one "thread" at a time
+Some file systems rely on it to ensure that only one
+.Dq thread
+at a time
is calling VOP_ vnode operations on a given file or directory.
Otherwise, the file system's behavior is undefined.
.It
@@ -165,18 +245,34 @@ Not all file systems implement it.
To prevent deadlocks, when acquiring locks on multiple vnodes, the lock
of parent directory must be acquired before the lock on the child directory.
.Ss Vnode interlock
-The vnode interlock (vp->v_interlock) is a spinlock.
+The vnode interlock
+.Pq Va v_interlock
+is a simplelock (see
+.Xr simple_lock 9 ) .
It is useful on multi-processor systems for acquiring a quick exclusive
lock on the contents of the vnode.
It MUST NOT be held while sleeping.
-(What fields does it cover? What about splbio/interrupt issues?)
+.Pp
+This field protects the
+.Va v_flag , v_writecount , v_usecount ,
+and
+.Va v_holdcnt
+fields from concurrent access.
+See
+.Xr lock 9
+for more details on lock synchronization in interrupt context.
+.\" Other splbio/interrupt issues?
.Pp
Operations on this lock are a no-op on uniprocessor systems.
-.Ss Other Vnode synchronization
-The vnode reclamation lock (VXLOCK) is used to prevent multiple
+.Ss Other vnode synchronization
+The vnode reclamation lock
+.Pq Dv VXLOCK
+is used to prevent multiple
processes from entering the vnode reclamation code.
It is also used as a flag to indicate that reclamation is in progress.
-The VXWANT flag is set by processes that wish to be woken up when reclamation
+The
+.Dv VXWANT
+flag is set by processes that wish to be woken up when reclamation
is finished.
.Pp
The
@@ -184,16 +280,20 @@ The
call is used to wait for all outstanding write I/Os associated with a
vnode to complete.
.Ss Version number/capability
-The vnode capability, v_id, is a 32-bit version number on the vnode.
+The vnode capability,
+.Va v_id ,
+is a 32-bit version number on the vnode.
Every time a vnode is reassigned to a new file, the vnode capability
is changed.
This is used by code that wishes to keep pointers to vnodes but doesn't want
to hold a reference (e.g., caches).
-The code keeps both a vnode * and a copy of the capability.
+The code keeps both a vnode pointer and a copy of the capability.
The code can later compare the vnode's capability to its copy and see
if the vnode still points to the same file.
.Pp
-Note: for this to work, memory assigned to hold a struct vnode can
+Note: for this to work, memory assigned to hold a
+.Vt struct vnode
+can
only be used for another purpose when all pointers to it have disappeared.
Since the vnode pool has no way of knowing when all pointers have
disappeared, it never frees memory it has allocated for vnodes.
@@ -202,45 +302,137 @@ Most of the fields of the vnode structure should be treated as opaque
and only manipulated through the proper APIs.
This section describes the fields that are manipulated directly.
.Pp
-The v_flag attribute contains random flags related to various functions.
-They are summarized in table ...
+The
+.Va v_flag
+attribute contains random flags related to various functions.
+They are summarized in the following table:
.Pp
-The v_tag attribute indicates what file system the vnode belongs to.
+.Bl -tag -width 10n -compact -offset indent
+.It Dv VROOT
+This vnode is the root of its file system.
+.It Dv VTEXT
+This vnode is a pure text prototype.
+.It Dv VSYSTEM
+This vnode is being used by kernel.
+.It Dv VISTTY
+This vnode represents a
+.Xr tty 4 .
+.It Dv VXLOCK
+This vnode is locked to change its underlying type.
+.It Dv VXWANT
+A process is waiting for this vnode.
+.It Dv VALIASED
+This vnode has an alias.
+.It Dv VLAYER
+This vnode is on a layered file system.
+.It Dv VLOCKSWORK
+This vnode's underlying file system supports locking discipline.
+.El
+.Pp
+The
+.Va v_tag
+attribute indicates what file system the vnode belongs to.
Very little code actually uses this attribute and its use is deprecated.
Programmers should seriously consider using more object-oriented approaches
(e.g. function tables).
-There is no safe way of defining new v_tags for loadable file systems.
-The v_tag attribute is read-only.
+There is no safe way of defining new
+.Va v_tag Ns 's
+for loadable file systems.
+The
+.Va v_tag
+attribute is read-only.
.Pp
-The v_type attribute indicates what type of file (e.g. directory,
+The
+.Va v_type
+attribute indicates what type of file (e.g. directory,
regular, FIFO) this vnode is.
This is used by the generic code for various checks.
For example, the
.Xr read 2
system call returns an error when a read is attempted on a directory.
.Pp
-The v_data attribute allows a file system to attach a piece of file
+Possible types are:
+.Pp
+.Bl -tag -width 10n -offset indent -compact
+.It Dv VNON
+This vnode has no type.
+.It Dv VREG
+This vnode represents a regular file.
+.It Dv VDIR
+This vnode represents a directory.
+.It Dv VBLK
+This vnode represents a block device.
+.It Dv VCHR
+This vnode represents a character device.
+.It Dv VLNK
+This vnode represents a symbolic link.
+.It Dv VSOCK
+This vnode represents a socket.
+.It Dv VFIFO
+This vnode represents a named pipe.
+.It Dv VBAD
+This vnode represents a bad or dead file.
+.El
+.Pp
+The
+.Va v_data
+attribute allows a file system to attach a piece of file
system specific memory to the vnode.
This contains information about the file that is specific to
-the file system.
+the file system (such as an inode pointer in the case of FFS).
.Pp
-The v_numoutput attribute indicates the number of pending synchronous
+The
+.Va v_numoutput
+attribute indicates the number of pending synchronous
and asynchronous writes on the vnode.
It does not track the number of dirty buffers attached to the vnode.
-The attribute is used by code like fsync to wait for all writes
+The attribute is used by code like
+.Xr fsync 2
+to wait for all writes
to complete before returning to the user.
-This attribute must be manipulated at splbio().
+This attribute must be manipulated at
+.Xr splbio 9 .
.Pp
-The v_writecount attribute tracks the number of write calls pending
+The
+.Va v_writecount
+attribute tracks the number of write calls pending
on the vnode.
-.Ss RULES
+.Ss Rules
The vast majority of vnode functions may not be called from interrupt
context.
-The exceptions are bgetvp and brelvp.
+The exceptions are
+.Fn bgetvp
+and
+.Fn brelvp .
The following fields of the vnode are manipulated at interrupt level:
-v_numoutput, v_holdcnt, v_dirtyblkhd, v_cleanblkhd, v_bioflag, v_freelist,
-and v_synclist.
-Any access to these fields should be protected by splbio.
+.Va v_numoutput , v_holdcnt , v_dirtyblkhd ,
+.Va v_cleanblkhd , v_bioflag , v_freelist ,
+and
+.Va v_synclist .
+Any access to these fields should be protected by
+.Xr splbio 9 .
+.Sh SEE ALSO
+.Xr uvm 9 ,
+.Xr vaccess 9 ,
+.Xr vclean 9 ,
+.Xr vcount 9 ,
+.Xr vdevgone 9 ,
+.Xr vfinddev 9 ,
+.Xr vflush 9 ,
+.Xr vflushbuf 9 ,
+.Xr vfs 9 ,
+.Xr vget 9 ,
+.Xr vgone 9 ,
+.Xr vhold 9 ,
+.Xr vinvalbuf 9 ,
+.Xr vn_lock 9 ,
+.Xr VOP_LOOKUP 9 ,
+.Xr vput 9 ,
+.Xr vrecycle 9 ,
+.Xr vref 9 ,
+.Xr vrele 9 ,
+.Xr vwaitforio 9 ,
+.Xr vwakeup 9
.Sh HISTORY
This document first appeared in
.Ox 2.9 .