share/man/man9/vnode.9


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196

.Dd February 22, 2001
.Dt vnode 9
.Os OpenBSD 2.9
.Sh NAME
.Nm vnode
.Nd an overview of vnodes
.Sh DESCRIPTION
A vnode is an object that speaks the UNIX file interface (open,
read, write, close, readdir, etc.). Vnodes can represent files, 
directories, FIFOs, domain sockets, block devices, character devices.
.Pp
Each vnode has a set of methods which start with string 'VOP_'. These
methods include VOP_OPEN, VOP_READ, VOP_WRITE, VOP_RENAME, VOP_CLOSE,
VOP_MKDIR. Many of these methods correspond closely to the equivalent
file system call--open, read, write, rename, etc. Each file system (FFS,
NFS, etc.) provides implementations for these methods.
.Pp
The Virtual File System (VFS) library maintains a pool of vnodes. File systems
cannot allocate their own vnodes; they must use the functions provided
by the VFS to create and manage vnodes.
.Ss Vnode state
.Pp
Vnodes have a reference count which corresponds to the number of kernel
objects that hold references to the vnode. A positive reference count keeps
the vnode off of the free list, which prevents the vnode from being recycled
to refer to a different file.
.Pp
Vnodes that refer to a valid file and have a reference count of 1 or
greater are "active". When a vnodes reference count drops to zero, it
is "inactivated" and becomes "inactive". Inactive vnodes are placed on the
free list, to be re-used to represent other files.
.Pp
Before a struct vnode can be re-used to refer to another file, it must
be cleaned out of all information pertaining to the old file. A vnode that
doesn't refer to any file is called a "reclaimed" vnode.
.Pp
The VFS may "reclaim" a vnode with a positive reference count.
This is done when the underlying file is revoked, as happens with the
revoke system call or through a forceable unmount. Such a vnode is given
to the dead file system, which returns errors for most operations.
The vnode will not be re-used for another file until its reference count
hits zero.
.Pp
There are three states then for a vnode: active, inactive, and reclaimed.
All transitions are meaningful except reclaimed to inactive.
.Ss Vnode pool
The
.Xr getnewvnode 9 
system call returns a fresh active vnode from the vnode
pool assigned to the file system specified in its arguments.
The vnode returned has a reference count (v_usecount) of 1.
.Pp
The 
.Xr vref 9 
call increments the reference count on the vnode. It may only be
on a vnode with reference count of 1 or greater. The
.Xr vrele 9
and 
.Xr vput 9 
calls decrement the reference count. 
In addition, the
.Xr vput 9
call also releases the vnode lock.
.Pp
The
.Xr vget 9
call, when used on an inactive vnode, will make the vnode "active"
by bumping the reference count to one. When called on an active vnode,
vget increases the reference count by one. However, if the vnode
is being reclaimed concurrently, then vget will fail and return an error.
.Pp
The
.Xr vgone 9 
and 
.Xr vgonel 9 
orchestrate the reclamation of a vnode. They can be called on both
active and inactive vnodes.
.Pp
While transitioning a vnode to the "reclaimed" state, the VFS will call
.Xr vop_reclaim 9 
method. File systems use this method to free any file-system specific data
they attached to the vnode.
.Ss Vnode locks 
The vnode actually has three different types of lock: the vnode lock,
the vnode interlock, and the vnode reclamation lock (VXLOCK).
.Ss The vnode lock
The most general lock is the vnode lock.
This lock is acquired by calling 
.Xr vn_lock 9 
and released by calling 
.Xr vn_unlock 9 .
The vnode lock is used to serialize operations through the file system for
a given file when there are multiple concurrent requests on the same file. 
Many file system functions require that you hold the vnode lock on entry.
The vnode lock may be held when sleeping.
.Pp
A vnode will not be reclaimed as long as the vnode lock is held by some 
other process.
.Pp
The vnode lock is a multiple-reader or single-writer lock.
An exclusive vnode lock may be acquired multiple times by the same
process.
.Pp
The vnode lock is somewhat messy because it is used for many purposes.
Some clients of the vnode interface use it to try to bundle a series
of VOP_ method calls into an atomic group.
Many file systems rely on it to prevent race conditions in updating file
system specific data structures (as opposed to having their own locks). 
.Pp
The implementation of the vnode lock is the responsibility of the individual
file systems. Not all file system implement it.
.Pp
To prevent deadlocks, when acquiring locks on multiple vnodes, the lock
of parent directory must be acquired before the lock on the child directory.
.Ss Vnode interlock
The vnode interlock (vp->v_interlock) is a spinlock.
It is useful on multi-processor systems for acquiring a quick exclusive
lock on the contents of the vnode.
It MUST NOT be held while sleeping.
(What fields does it cover? What about splbio/interrupt issues?)
.Pp
Operations on this lock are a no-op on uniprocessor systems.
.Ss Other Vnode synchronization
The vnode reclamation lock (VXLOCK) is used to prevent multiple
processes from entering the vnode reclamation code.
It is also used as a flag to indicate that reclamation is in progress.
The VXWANT flag is set by processes that wish to woken up when reclamation
is finished.
.Pp
The 
.Xr vwaitforio 9
call is used for to wait for all outstanding write I/Os associated with a 
vnode to complete. 
.Ss Version number/capability
The vnode capability, v_id, is a 32-bit version number on the vnode.
Every time a vnode is reassigned to a new file, the vnode capability
is changed.
This is used by code that wish to keep pointers to vnodes but doesn't want
to hold a reference (e.g., caches).
The code keeps both a vnode * and a copy of the capability.
The code can later compare the vnode's capability to its copy and see
if the vnode still points to the same file.
.Pp
Note: for this to work, memory assigned to hold a struct vnode can
only be used for another purpose when all pointers to it have disappeared.
Since the vnode pool has no way of knowing when all pointers have
disappeared, it never frees memory it has allocated for vnodes.
.Ss Vnode fields
Most of the fields of the vnode structure should be treated as opaque
and only manipulated through the proper APIs.
This section describes the fields that are manipulated directly.
.Pp
The v_flag attribute contains random flags related to various functions.
They are summarized in table ...
.Pp
The v_tag attribute indicates what file system the vnode belongs to.
Very little code actually uses this attribute and its use is deprecated.
Programmers should seriously consider using more object-oriented approaches
(e.g. function tables).
There is no safe way of defining new v_tags for loadable file systems.
The v_tag attribute is read-only.
.Pp
The v_type attribute indicates what type of file (e.g. directory,
regular, fifo) this vnode is.
This is used by the generic code to ensure for various checks.
For example, the
.Xr read 2 
system call returns an error when a read is attempted on a directory.
.Pp
The v_data attribute allows a file system to attach piece of file
system specific memory to the vnode.
This contains information about the file that is specific to
the file system.
.Pp
The v_numoutput attribute indicates the number of pending synchronous
and asynchronous writes on the vnode.
It does not track the number of dirty buffers attached to the vnode.
The attribute is used by code like fsync to wait for all writes
to complete before returning to the user.
This attribute must be manipulated at splbio().
.Pp
The v_writecount attribute tracks the number of write calls pending
on the vnode.
.Ss RULES
The vast majority of vnode functions may not be called from interrupt
context.
The exceptions are bgetvp and brelvp.
The following fields of the vnode are manipulated at interrupt level:
v_numoutput, v_holdcnt, v_dirtyblkhd, v_cleanblkhd, v_bioflag, v_freelist,
and v_synclist.
Any accesses to these field should be protected by splbio,
unless you are certain that there is no chance an interrupt handler
will modify them.
.Sh HISTORY
This document first appeared in
.Ox 2.9 .