summaryrefslogtreecommitdiff
path: root/share/man/man4/raid.4
blob: 7445c6fb384fed75f4539700a2a342c3eb977c7e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
.\"	$OpenBSD: raid.4,v 1.31 2010/04/01 17:06:55 jmc Exp $
.\"     $NetBSD: raid.4,v 1.20 2001/09/22 16:03:58 wiz Exp $
.\"
.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Greg Oster
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.\"
.\" Copyright (c) 1995 Carnegie-Mellon University.
.\" All rights reserved.
.\"
.\" Author: Mark Holland
.\"
.\" Permission to use, copy, modify and distribute this software and
.\" its documentation is hereby granted, provided that both the copyright
.\" notice and this permission notice appear in all copies of the
.\" software, derivative works or modified versions, and any portions
.\" thereof, and that both notices appear in supporting documentation.
.\"
.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
.\"
.\" Carnegie Mellon requests users of this software to return to
.\"
.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
.\"  School of Computer Science
.\"  Carnegie Mellon University
.\"  Pittsburgh PA 15213-3890
.\"
.\" any improvements or extensions that they make and grant Carnegie the
.\" rights to redistribute these changes.
.\"
.Dd $Mdocdate: April 1 2010 $
.Dt RAID 4
.Os
.Sh NAME
.Nm raid
.Nd RAIDframe disk driver
.Sh SYNOPSIS
.Cd "pseudo-device raid" Op Ar count
.Sh DESCRIPTION
The
.Nm
driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to
.Ox .
This
document assumes that the reader has at least some familiarity with RAID
and RAID concepts.
The reader is also assumed to know how to configure
disks and pseudo-devices into kernels, how to generate kernels, and how
to partition disks.
.Pp
RAIDframe provides a number of different RAID levels including:
.Bl -tag -width indent
.It RAID 0
provides simple data striping across the components.
.It RAID 1
provides mirroring.
.It RAID 4
provides data striping across the components, with parity
stored on a dedicated drive (in this case, the last component).
.It RAID 5
provides data striping across the components, with parity
distributed across all the components.
.El
.Pp
There are a wide variety of other RAID levels supported by RAIDframe,
including Even-Odd parity, RAID level 5 with rotated sparing, Chained
declustering, and Interleaved declustering.
The reader is referred to the RAIDframe documentation mentioned in the
.Sx HISTORY
section for more detail on these various RAID configurations.
.Pp
Depending on the parity level configured, the device driver can
support the failure of component drives.
The number of failures allowed depends on the parity level selected.
If the driver is able to handle drive failures, and a drive does fail,
then the system is operating in "degraded mode".
In this mode, all missing data must be reconstructed from the data and
parity present on the other components.
This results in much slower data accesses, but does mean that a failure
need not bring the system to a complete halt.
.Pp
The RAID driver supports and enforces the use of
.Sq component labels .
A
.Sq component label
contains important information about the component, including a
user-specified serial number, the row and column of that component in
the RAID set, and whether the data (and parity) on the component is
.Sq clean .
If the driver determines that the labels are very inconsistent with
respect to each other (e.g. two or more serial numbers do not match)
or that the component label is not consistent with its assigned place
in the set (e.g., the component label claims the component should be
the 3rd one of a 6-disk set, but the RAID set has it as the 3rd component
in a 5-disk set) then the device will fail to configure.
If the driver determines that exactly one component label seems to be
incorrect, and the RAID set is being configured as a set that supports
a single failure, then the RAID set will be allowed to configure, but
the incorrectly labeled component will be marked as
.Sq failed ,
and the RAID set will begin operation in degraded mode.
If all of the components are consistent among themselves, the RAID set
will configure normally.
.Pp
Component labels are also used to support the auto-detection and
auto-configuration of RAID sets.
A RAID set can be flagged as auto-configurable, in which case it will be
configured automatically during the kernel boot process.
RAID filesystems which are
automatically configured are also eligible to be the root filesystem.
There is currently no support for booting a kernel directly from a RAID
set.
To use a RAID set as the root filesystem, a kernel is usually
obtained from a small non-RAID partition, after which any
auto-configuring RAID set can be used for the root filesystem.
See
.Xr raidctl 8
for more information on auto-configuration of RAID sets.
.Pp
The driver supports
.Sq hot spares ,
disks which are on-line, but are not actively used in an existing
filesystem.
Should a disk fail, the driver is capable of reconstructing
the failed disk onto a hot spare or back onto a replacement drive.
If the components are hot swapable, the failed disk can then be
removed, a new disk put in its place, and a copyback operation
performed.
The copyback operation, as its name indicates, will copy
the reconstructed data from the hot spare to the previously failed
(and now replaced) disk.
Hot spares can also be hot-added using
.Xr raidctl 8 .
.Pp
If a component cannot be detected when the RAID device is configured,
that component will be simply marked as 'failed'.
.Pp
The user-land utility for doing all
.Nm
configuration and other operations
is
.Xr raidctl 8 .
Most importantly,
.Xr raidctl 8
must be used with the
.Fl i
option to initialize all RAID sets.
In particular, this initialization includes re-building the parity data.
This rebuilding of parity data is also required when either a) a new RAID
device is brought up for the first time or b) after an un-clean shutdown of a
RAID device.
By using the
.Fl P
option to
.Xr raidctl 8 ,
and performing this on-demand recomputation of all parity
before doing a
.Xr fsck 8
or a
.Xr newfs 8 ,
filesystem integrity and parity integrity can be ensured.
It bears repeating again that parity recomputation is
.Ar required
before any filesystems are created or used on the RAID device.
If the parity is not correct, then missing data cannot be correctly recovered.
.Pp
RAID levels may be combined in a hierarchical fashion.
For example, a RAID 0 device can be constructed out of a number of RAID 5
devices (which, in turn, may be constructed out of the physical disks,
or of other RAID devices).
.Pp
It is important that drives be hard-coded at their respective
addresses (i.e., not left free-floating, where a drive with SCSI ID of
4 can end up as
.Pa /dev/sd0c )
for well-behaved functioning of the RAID device.
This is true for all types of drives, including IDE, HP-IB, etc.
For normal SCSI drives, for example, the following can be used
to fix the device addresses:
.Bd -unfilled -offset indent
sd0     at scsibus0 target 0       # SCSI disk drives
sd1     at scsibus0 target 1       # SCSI disk drives
sd2     at scsibus0 target 2       # SCSI disk drives
sd3     at scsibus0 target 3       # SCSI disk drives
sd4     at scsibus0 target 4       # SCSI disk drives
sd5     at scsibus0 target 5       # SCSI disk drives
sd6     at scsibus0 target 6       # SCSI disk drives
.Ed
.Pp
See
.Xr sd 4
for more information.
The rationale for fixing the device addresses is as follows:
Consider a system with three SCSI drives at SCSI ID's 4, 5, and 6,
and which map to components
.Pa /dev/sd0e , /dev/sd1e ,
and
.Pa /dev/sd2e
of a RAID 5 set.
If the drive with SCSI ID 5 fails, and the system reboots, the old
.Pa /dev/sd2e
will show up as
.Pa /dev/sd1e .
The RAID driver is able to detect that component positions have changed, and
will not allow normal configuration.
If the device addresses are hard
coded, however, the RAID driver would detect that the middle component
is unavailable, and bring the RAID 5 set up in degraded mode.
Note that the auto-detection and auto-configuration code does not care
about where the components live.
The auto-configuration code will
correctly configure a device even after any number of the components
have been re-arranged.
.Pp
The first step to using the
.Nm
driver is to ensure that it is suitably configured in the kernel.
This is done by adding a line similar to:
.Bd -unfilled -offset indent
pseudo-device   raid   4       # RAIDframe disk device
.Ed
.Pp
to the kernel configuration file.
The
.Sq count
argument (
.Sq 4 ,
in this case), specifies the number of RAIDframe drivers to configure.
To turn on component auto-detection and auto-configuration of RAID
sets, simply add:
.Bd -unfilled -offset indent
option	RAID_AUTOCONFIG
.Ed
.Pp
to the kernel configuration file.
.Pp
All component partitions must be of the type
.Dv FS_BSDFFS
(e.g., 4.2BSD) or
.Dv FS_RAID
(e.g., RAID).
The use of the latter is strongly encouraged, and is
required if auto-configuration of the RAID set is desired.
Since RAIDframe leaves room for disklabels, RAID components can be simply
raw disks, or partitions which use an entire disk.
Note that some platforms (such as SUN) do not allow using the FS_RAID
partition type.
On these platforms, the
.Nm
driver can still auto-configure from FS_BSDFFS partitions.
.Pp
A more detailed treatment of actually using a
.Nm
device is found in
.Xr raidctl 8 .
It is highly recommended that the steps to reconstruct, copyback, and
re-compute parity are well understood by the system administrator(s)
.Ar before
a component failure.
Doing the wrong thing when a component fails may result in data loss.
.Pp
Additional debug information can be sent to the console by specifying:
.Bd -unfilled -offset indent
option	RAIDDEBUG
.Ed
.Sh FILES
.Bl -tag -width /dev/XXrXraidX -compact
.It Pa /dev/{,r}raid*
.Nm
device special files.
.El
.Sh SEE ALSO
.Xr ccd 4 ,
.Xr sd 4 ,
.Xr wd 4 ,
.Xr config 8 ,
.Xr fsck 8 ,
.Xr MAKEDEV 8 ,
.Xr mount 8 ,
.Xr newfs 8 ,
.Xr raidctl 8
.Sh HISTORY
The
.Nm
driver in
.Ox
is a port of RAIDframe, a framework for rapid prototyping of RAID
structures developed by the folks at the Parallel Data Laboratory at
Carnegie Mellon University (CMU).
RAIDframe, as originally distributed
by CMU, provides a RAID simulator for a number of different
architectures, and a user-level device driver and a kernel device
driver for Digital UNIX.
The
.Nm
driver is a kernelized version of RAIDframe v1.1.
.Pp
A more complete description of the internals and functionality of
RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
Parallel Data Laboratory of Carnegie Mellon University.
The
.Nm
driver first appeared in
.Nx 1.4
from where it was ported to
.Ox 2.5 .
.Sh CAVEATS
Certain RAID levels (1, 4, 5, 6, and others) can protect against some
data loss due to component failure.
However the loss of two components of a RAID 4 or 5 system, or the loss
of a single component of a RAID 0 system, will result in the entire
filesystems on that RAID device being lost.
RAID is
.Ar NOT
a substitute for good backup practices.
.Pp
Recomputation of parity
.Ar MUST
be performed whenever there is a chance that it may have been
compromised.
This includes after system crashes, or before a RAID
device has been used for the first time.
Failure to keep parity correct will be catastrophic should a component
ever fail -- it is better to use RAID 0 and get the additional space and
speed, than it is to use parity, but not keep the parity correct.
At least with RAID 0 there is no perception of increased data security.