summaryrefslogtreecommitdiff
path: root/usr.sbin/pkg_add/pod/OpenBSD::Intro.pod
blob: 9e4649825c359f904ee74f2c68626632f2f4519e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
$OpenBSD: OpenBSD::Intro.pod,v 1.5 2008/02/25 16:58:46 espie Exp $

=head1 NAME

OpenBSD::Intro - Introduction to the pkg tools internals

=head1 SYNOPSIS

   use OpenBSD::PackingList;
   ...

=head1 DESCRIPTION

Note that the C<OpenBSD::> namespace of perl modules is not limited to
package tools, but also includes L<pkg-config(1)> and L<makewhatis(8)>
support modules.
This document only covers package tools material.

The design of the package tools revolves around a few central ideas:

Design modules that manipulate some notions in a consistent way, so
that they can be used by the package tools proper, but also with a
high-level API that's useful for anything that needs to manipulate
packages.  This was validated by the ease with which we can now update
packing-lists, check for conflicts, and check various properties of
our packages.

Try to be as safe as possible where installation and update operations
are concerned.  Cut up operations into small subsets which yields frequent
safe intermediate points where the machine is completely functional.

Traditional package tools often rely on the following model: take
a snapshot of the system, try to perform an operation, and roll back to
a stable state if anything goes wrong.

Instead, OpenBSD package tools take a computational approach: record
semantic information in a useful format, pre-compute as much as can be
about an operation, and only perform the operation when we
have proved that (almost) nothing can go wrong.  As far as possible,
the actual operation happens on the side, as a temporary scaffolding, and
we only commit to the operation once most of the work is over.

Keep high-level semantic information instead of recomputing it all the
time, but try to organize as much as possible as plain text files.
Originally, it was a bit of a challenge: trying to see how much we could
get away with, before having to define an actual database format.
Turns out we do not need a database format, or even any cache on the
ftp server.

Avoid copying files all over the place. Hence the L<OpenBSD::UStar(3p)>
module that allows package tools to manipulate tarballs directly without
having to extract them first in a staging area.

All the package tools use the same internal perl modules, which gives them
some consistency about fundamental notions.

It is highly recommended to try to understand packing-lists and packing
elements first, since they are the core that unlocks most of the package
tools.


=head1 COMMON NOTIONS

=over 3

=item packing-lists and elements

Each package consists of a list of objects (mostly files, but there are some
other abstract structures, like new user accounts, or stuff to do when
the package gets installed).
They are recorded in a L<OpenBSD::PackingList(3p)>, the module offers
everything needed to manipulate packing-lists.
The packing-list format has a text representation, which is documented
in L<pkg_create(3p)>.
Internally, packing-lists are heavily structured. Objects are reordered
by the internals of L<OpenBSD::PackingList(3p)>, and there are some standard
filters defined to gain access to some commonly used information (dependencies
and conflicts mostly) without having to read and parse the whole packing-list.
Each object is an L<OpenBSD::PackingElement(3p)>, which is an abstract class
with lots of children classes.
The use of packing-lists most often combines two classic design patterns:
one uses Visitor to traverse a packing-list and perform an operation on
all its elements (this is where the order is important, and why some
stuff like user creation will `bubble up' to the beginning of the list), allied
to Template Method: the operation is often not determined for a
basic L<OpenBSD::PackingElement(3p)>, but will make more sense to an
L<OpenBSD::PackingElement::FileObject(3p)> or similar.
Packing-list objects have an "automatic visitor" property: if a method is not
defined for the packing-list proper, but exists for packing elements, then
invoking the method on the packing-list will traverse it and apply the method
to each element.
For instance, package installation happens through the following snippet:

    $plist->install_and_progress(...)

where C<install_and_progress> is defined at the packing element level,
and invokes C<install> and shows a progress bar if needed.

=item package names and specs

Package names and specifications for package names have a specific format,
which is described in L<packages-specs(7)>.   Package names are handled
within L<OpenBSD::PackageName(3p)>.  There is also a framework to organize
searches based on L<OpenBSD::Search(3p)> objects.  Specifications are
structured in a specific way, which yields a shorthand for conflict handling
through L<OpenBSD::PkgCfl(3p)>, allows the package system to resolve
dependencies in L<OpenBSD::Dependencies(3p)> and to figure out package
updates in L<OpenBSD::Update(3p)>.

=item sources of packages

Historically, L<OpenBSD::PackageInfo(3p)> was used to get to the list of
installed packages and grab information.  This is now part of a more
generic framework L<OpenBSD::PackageRepository(3p)>, which interacts with
the search objects to allow you to access packages, be they installed,
on the local machines, or distant.  Once a package is located, the repository
yields a proxy object called L<OpenBSD::PackageLocation(3p)> that can be used
to gain further info.  (There are still shortcuts for installed packages
for performance and simplicity reasons.)

=item update sets

This is a new notion (introduced around OpenBSD 4.1) which is not yet fully
used throughout the package tools.  Each operation (installation, removal,
or replacement of packages) is cut up into small atomic operations, in order
to guarantee maximal stability of the installed system. The package tools
will try really hard to only deal with one or two packages at a time,
in order to minimize combinatorial complexity, and to have a maximal number
of safe points, where an update operation can stop without hosing the
whole system. An update set is simply a minimal bag of packages, with old
packages that are going to be installed, new packages that are going
to replace them, and an area to record related ongoing computations.
The old set may be empty, the new set may be empty, and in all cases,
the update set shall be small (as small as possible).   Currently,
the package tools do not deal with all the cases they should. This is still
under construction.  We have already met with update situations where
dependencies between packages invert (A-1.0 depends on B-1.0, but B-0.0
depends on A-0.0), or where files move between packages, which in
theory will require update-sets with two new packages that replace two
old packages.  Currently, we cheat.

=item dependency information

Dependency information exists at three levels: first, there are source
specifications within ports. Then, those specifications turn into binary
specifications with more constraints when the package is built by
L<pkg_create(1)>, and finally, they're matched against lists of installed
objects when the package is installed, and recorded as lists of
inter-dependencies in the package system.

At the package level, there are currently two types of dependencies:
package specifications, that establish direct dependencies between
packages, and shared libraries, that are described below.

Normal dependencies are shallow: it is up to the package tools to
figure out a whole dependency tree throughout top-level dependencies.
None of this is hard-coded: this a prerequisite for flavored packages to
work, as we do not want to depend on a specific package if something
more generic will do.

At the same time, shared libraries have harsher constraints: a package
won't work without the exact same shared libraries it needs (same major
number, at least), so shared libraries are handled through a want/provide
mechanism that walks the whole dependency tree to find the required shared
libraries.

Dependencies are just a subclass of the packing-elements, rooted at
the C<OpenBSD::PackingElement::Depend> class.

A specific C<OpenBSD::Dependencies::Solver> object is used for the resolution
of dependencies (see L<OpenBSD::Dependencies(3p)>, the solver is mostly
a tree-walker, but there are performance considerations, so it may need
to cache a lot of information).
Specificities of shared libraries are handled by L<OpenBSD::SharedLibs(3p)>.
In particular, the base system also provides some shared libraries which are
not recorded within the dependency tree.

Lists of inter-dependencies are recorded in both directions
(RequiredBy/Requiring). The L<OpenBSD::RequiredBy(3p)> module handles the
subtleties (removing duplicates, keeping things ordered, and handling
pretend operations).


=item shared items

Some items may be recorded multiple times within several packages (mostly
directories, users and groups). There is a specific L<OpenBSD::SharedItems(3p)>
module which handles these. Mostly, removal operations will scan
all packing-lists at high speed to figure out shared items, and remove
stuff that's no longer in use.

=item virtual file system

Most package operations will lead to the installation and removal of some
files.   Everything is checked beforehand: the package systeme must verify
that no new file will erase an existing file, or that the file system
won't overflow during the package installation.
The package tools also have a "pretend" mode where the user can check what
will happen before doing an operation.  All the computations and caching
are handled through the L<OpenBSD::Vstat(3p)> module, which is designed
to hide file system oddities, and to perform addition/deletion operations
virtually before doing them for real.

=back

=head1 BASIC ALGORITHMS

There are three basic operations: package addition (installation),
package removal (deinstallation), and package replacement (update).

These operations are achieved through repeating the correct
operations on all elements of a packing-list.

=head2 PACKAGE ADDITION

For package addition, L<pkg_add(1)> first checks that everything is correct, 
then runs through the packing-list, and extracts element from the archive.

=head2 PACKAGE DELETION

For package deletion, L<pkg_delete(1)> removes elements from the packing-list,
and marks `common' stuff that may need to be unregistered, then walks quickly
through all installed packages and removes stuff that's no longer used
(directories, users, groups...)

=head2 PACKAGE REPLACEMENT

Package replacement is more complicated. It relies on package names 
and conflict markers.  

In normal usage, pkg_add installs only new stuff, and checks that all files 
in the new package don't already exist in the file system.
By convention, packages with the same stem are assumed to be different
versions of the same package, e.g., screen-1.0 and screen-1.1 correspond
to the same software, and users are not expected to be able to install
both at the same time. 

This is a conflict. 

One can also mark extra conflicts (if two software distributions install 
the same file, generally a bad idea), or remove default conflict markers 
(for instance, so that the user can install several versions of autoconf at 
the same time).

If pkg_add is invoked in replacement mode (-r), it will use conflict 
information to figure out which package(s) it should replace. It will then
operate in a specific mode, where it replaces old package(s) with a new one.

=over

=item *

determine which package to replace through conflict information

=item *

extract the new package 'alongside' the existing package(s) using
temporary filenames.

=item *

remove the old package

=item *

finish installing the new package by renaming the temporary files.

=back

Thus replacements will work without needing any extra information besides
conflict markers. pkg_add -r will happily replace any package with a
conflicting package.  Due to missing information (one can't predict the
future), conflict markers work both way: packages a and b conflict as
soon as a conflicts with b, or b conflicts with a.

=head2 PACKAGE UPDATES

Package replacement is the basic operation behind package updates.
In your average update, each individual package will be replaced
by a more recent one, starting with dependencies, so that the installation
stays functional the whole time.  Shared libraries enjoy a special status: 
old shared libraries are kept around in a stub .lib-* package, so that 
software that depends on them keeps running. (Thus, it is vital that porters
pay attention to shared library version numbers during an update.)

An update operation starts by figuring out which packages to update (walking
the selected packages dependency trees). 

Then it looks for new packages to replace the old ones.
This search relies on stem names first (e.g., to update package
foo-1.0, pkg_add -u will look for foo-* in the PKG_PATH), then it trims
the search results by looking more closely inside the package candidates. 
More specifically, their pkgpath (the directory in the ports tree from which 
they were compiled). Thus, a package
that comes from category/someport/snapshot will never replace a package
that comes from category/someport/stable. Likewise for flavors.

Finally, pkg_add -u decides whether the update is needed by comparing
the package signatures: a package signature is information is composed of
the name of a package, together with relevant dependency information: 
all wantlib versions, and all run dependencies versions. 
If anything is different, it means the new package must replace the old one.

Currently, pkg_add -u stops at the first entry in the PKG_PATH from which
suitable candidates are found.

=head1 BUGS AND LIMITATIONS

There are a few desireable changes that will happen in the future:
* pkg_add -u should be able to find updates with a stem change.
This requires an extra mechanism that would say `oh, this package changed
name', as we cannot afford to look into every single package.
* more complicated update scenarios could happen, once UpdateSets are fully
functional.
* pkg_add has no real notion of version number. Thus, updates work only
in so far as you direct your PKG_PATH to a more recent repository.
* there should be some carefully designed mechanisms to register more
`global' processing, to avoid exec/unexec.
* some packages should be able to refuse automatic update, and to interact
with the user. Again, to be carefully designed, so that one can still
understand what's going on.

=head1 LIST OF MODULES

=over 3

=item OpenBSD::Add

common operations related to a package addition.

=item OpenBSD::ArcCheck

additional layer on top of C<OpenBSD::Ustar> that matches extra
information that the archive format cannot record with a packing-list.


=item OpenBSD::CollisionReport

checks a collision list obtained through C<OpenBSD::Vstat> against the
full list of installed files, and reports origin of existing files.

=item OpenBSD::Delete

common operations related to a package deletion

=item OpenBSD::Error

handles external calls through C<system(3p)>, exception mechanism,
fatal errors and warnings.

=item OpenBSD::Getopt

L<Getopt::Std(3p)>-like with extra hooks for special options.

=item OpenBSD::IdCache

caches uid and gid vs. user names and group names correspondences.

=item OpenBSD::Interactive

handles user questions (to be redesigned)

=item OpenBSD::Mtree

simple parser for L<mtree(8)> specifications.

=item OpenBSD::PackageInfo

handles package meta-information (all the +CONTENTS, +DESCR, etc files)

=item OpenBSD::PackageLocation

proxy for a package, either as a tarball, or an installed package.
Obtained through C<OpenBSD::PackageRepository>.

=item OpenBSD::PackageLocator

central non-OO hub for the normal repository list
(should use a singleton pattern instead).

=item OpenBSD::PackageName

common operations on package names.

=item OpenBSD::PackageRepository

base class for all package sources. Actual packages instantiate as
C<OpenBSD::PackageLocation>.

=item OpenBSD::PackageRepositoryList

list of package repository, provided as a front to search objects,
because searching through a repository list has L<ld(1)>-like semantics
(stops at the first repository that matches).

=item OpenBSD::PackingElement

all the packing-list elements class hierarchy, together with common
methods that do not belong elsewhere.

=item OpenBSD::Paths

hardcoded paths to external programs and locations.

=item OpenBSD::PkgCfl

conflict lists handling in an efficient way.

=item OpenBSD::PkgSpec

ad-hoc search for package specifications. External API is stable, but it
needs to be updated to use C<OpenBSD::PackageName> objects now that they
exist.

=item OpenBSD::ProgressMeter

handles display of a progress meter when a terminal is available.
Provided as a singleton object so that a graphical interface could
be used instead. In practice, most input-output of the package tools
will need to be abstracted first.

=item OpenBSD::Replace

common operations related to package replacement.

=item OpenBSD::RequiredBy

handles requiredby and requiring lists.

=item OpenBSD::Search

search object for package repositories: specs, stems, and pkgpaths.

=item OpenBSD::SharedItems

handles items that may be shared by several packages.

=item OpenBSD::SharedLibs

shared library specificities when handled as dependencies.

=item OpenBSD::Temp

safe creation of temporary files as a light-weight module that also
deals with signal issues.

=item OpenBSD::Update

computes lists of package replacements required by an update.

=item OpenBSD::UpdateSet

common operations to all package tools that manipulate update sets.
Lots of things to do.


=item OpenBSD::Ustar

simple API that allows for Ustar (new tar) archive manipulation,
allowing for extraction and copies on the fly.

=item OpenBSD::Vstat

virtual file system (pretend) operations.

=item OpenBSD::md5

simple interface to the L<Digest::MD5(3p)> module.

=back