diff options
author | Antoine Jacoutot <ajacoutot@cvs.openbsd.org> | 2009-10-26 21:03:04 +0000 |
---|---|---|
committer | Antoine Jacoutot <ajacoutot@cvs.openbsd.org> | 2009-10-26 21:03:04 +0000 |
commit | c98ffe3707557963746335ab3f55ff2346630b9d (patch) | |
tree | 3300f3ebf077268edc9925994e33007ae6bdf484 /usr.bin/file/file.1 | |
parent | 3e1f8e92de0ce74e440d65b3a1ad2d6015f736d1 (diff) |
Bring man pages on par with our file(1) version (merge from upstream with
several tweaks).
As usual, several enhancements and inputs from jmc@
Input from ian@
ok jmc@ ian@
Diffstat (limited to 'usr.bin/file/file.1')
-rw-r--r-- | usr.bin/file/file.1 | 499 |
1 files changed, 270 insertions, 229 deletions
diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1 index 711562deb81..ef80608377c 100644 --- a/usr.bin/file/file.1 +++ b/usr.bin/file/file.1 @@ -1,4 +1,4 @@ -.\" $OpenBSD: file.1,v 1.29 2009/08/16 09:41:08 sobrado Exp $ +.\" $OpenBSD: file.1,v 1.30 2009/10/26 21:03:03 ajacoutot Exp $ .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $ .\" .\" Copyright (c) Ian F. Darwin 1986-1995. @@ -27,61 +27,57 @@ .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd $Mdocdate: August 16 2009 $ +.Dd $Mdocdate: October 26 2009 $ .Dt FILE 1 .Os .Sh NAME .Nm file .Nd determine file type .Sh SYNOPSIS -.Nm file -.Op Fl bckLNnrsvz +.Nm +.Bk -words +.Op Fl 0bCcehikLNnprsvz +.Op Fl -help +.Op Fl -mime-encoding +.Op Fl -mime-type .Op Fl F Ar separator .Op Fl f Ar namefile .Op Fl m Ar magicfiles -.Bk -words -.Ar +.Ar file .Ek -.Nm file -.Op Fl m Ar magicfiles -.Fl C .Sh DESCRIPTION The .Nm -utility -tests each argument in an attempt to classify it. +utility tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: -filesystem tests, magic number tests, and language tests. +filesystem tests, magic tests, and language tests. The first test that succeeds causes the file type to be printed. .Pp The type printed will usually contain one of the words -.Dq text +.Em text (the file contains only -.Tn ASCII +printing characters and a few common control characters and is probably safe to read on an -.Tn ASCII -terminal), -.Dq executable +ASCII terminal), +.Em executable (the file contains the result of compiling a program in a form understandable to some .Ux kernel or another), or -.Dq data -meaning anything else (data is usually binary or non-printable). -.Pp +.Em data +meaning anything else (data is usually +.Dq binary +or non-printable). Exceptions are well-known file formats (core files, tar archives) that are known to contain binary data. -When modifying the file -.Pa /etc/magic -or the program itself, -.Em "preserve these keywords" . -.Pp -People depend on knowing that all the readable files in a directory +When modifying magic files or the program itself, make sure to +.Em preserve these keywords . +Users depend on knowing that all the readable files in a directory have the word .Dq text printed. -Don't do as Berkeley did; change +Don't do as Berkeley did and change .Dq shell commands text to .Dq shell script . @@ -91,23 +87,21 @@ The filesystem tests are based on examining the return from a system call. The program checks to see if the file is empty, or if it's some sort of special file. -Any known file types appropriate to the system you are running on -(sockets, symbolic links, or named pipes (FIFOs) on those systems that -implement them) +Any known file types, +such as sockets, symbolic links, and named pipes (FIFOs), are intuited if they are defined in the system header file .Aq Pa sys/stat.h . .Pp -The magic number tests are used to check for files with data in +The magic tests are used to check for files with data in particular fixed formats. The canonical example of this is a binary executable (compiled program) -.Pa a.out -file, whose format is defined in -.Aq Pa a.out.h +a.out file, whose format is defined in +.Aq Pa elf.h , +.Aq Pa a.out.h , and possibly .Aq Pa exec.h -in the standard include directory and is explained in -.Xr a.out 5 . +in the standard include directory. These files have a .Dq magic number stored in a particular place @@ -115,58 +109,124 @@ near the beginning of the file that tells the .Ux operating system that the file is a binary executable, and which of several types thereof. -.Pp -The concept of magic number has been applied by extension to data files. +The concept of a +.Dq magic +has been applied by extension to data files. Any file with some invariant identifier at a small fixed offset into the file can usually be described in this way. -The information in these files is read from the magic file +The information identifying these files is read from the magic file .Pa /etc/magic . +In addition, if +.Pa $HOME/.magic.mgc +or +.Pa $HOME/.magic +exists, it will be used in preference to the system magic files. +.Pp +If a file does not match any of the entries in the magic file, +it is examined to see if it seems to be a text file. +ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets +(such as those used on Macintosh and IBM PC systems), +UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC +character sets can be distinguished by the different +ranges and sequences of bytes that constitute printable text +in each set. +If a file passes any of these tests, its character set is reported. +ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified +as +.Dq text +because they will be mostly readable on nearly any terminal; +UTF-16 and EBCDIC are only +.Dq character data +because, while +they contain text, it is text that will require translation +before it can be read. +In addition, +.Nm +will attempt to determine other characteristics of text-type files. +If the lines of a file are terminated by CR, CRLF, or NEL, instead +of the Unix-standard LF, this will be reported. +Files that contain embedded escape sequences or overstriking +will also be identified. .Pp -If an argument appears to be an -.Tn ASCII -file, +Once .Nm -attempts to guess its language. -The language tests look for particular strings (cf -.Pa names.h ) +has determined the character set used in a text-type file, +it will +attempt to determine in what language the file is written. +The language tests look for particular strings (cf.\& +.Aq Pa names.h ) that can appear anywhere in the first few blocks of a file. For example, the keyword .Em .br indicates that the file is most likely a .Xr troff 1 input file, just as the keyword -.Li struct +.Em struct indicates a C program. These tests are less reliable than the previous two groups, so they are performed last. The language test routines also test for some miscellany (such as .Xr tar 1 -archives) and determine whether an unknown file should be -labelled as -.Dq ASCII text -or -.Dq data . +archives). .Pp -The options are as follows: -.Bl -tag -width Ds -.It Fl b +Any file that cannot be identified as having been written +in any of the character sets listed above is simply said to be +.Dq data . +.Sh OPTIONS +.Bl -tag -width indent +.It Fl 0 , -print0 +Output a null character +.Sq \e0 +after the end of the filename. +Nice to +.Xr cut 1 +the output. +This does not affect the separator which is still printed. +.It Fl b , -brief Do not prepend filenames to output lines (brief mode). -.It Fl C -For each magic number file, write a +.It Fl C , -compile +Write a .Pa magic.mgc -output file that contains a preparsed (compiled) version of it. -.It Fl c +output file that contains a pre-parsed version of the magic file or directory. +.It Fl c , -checking-printout Cause a checking printout of the parsed form of the magic file. -This is usually used in conjunction with +This is usually used in conjunction with the .Fl m -to debug a new magic file before installing it. -.It Fl F Ar separator -Use the specified string as the separator between the filename and -the file result returned. +flag to debug a new magic file before installing it. +.It Fl e , -exclude Ar testname +Exclude the test named in +.Ar testname +from the list of tests made to determine the file type. +Valid test names are: +.Bl -tag -width +.It apptype +Check for +.Dv EMX +application type (only on EMX). +.It ascii +Check for various types of ASCII files. +.It compress +Don't look for, or inside, compressed files. +.It elf +Don't print elf details. +.It fortran +Don't look for fortran sequences inside ASCII files. +.It soft +Don't consult magic files. +.It tar +Don't examine tar files. +.It token +Don't look for known tokens inside ASCII files. +.It troff +Don't look for troff sequences inside ASCII files. +.El +.It Fl F , -separator Ar separator +Use the specified string as the separator between the filename and the +file result returned. Defaults to .Sq \&: . -.It Fl f Ar namefile +.It Fl f , -files-from Ar namefile Read the names of the files to be examined from .Ar namefile (one per line) @@ -177,35 +237,74 @@ or at least one filename argument must be present; to test the standard input, use .Sq - as a filename argument. -.It Fl k +.It Fl h , -no-dereference +Causes symlinks not to be followed. +This is the default if the environment variable +.Dv POSIXLY_CORRECT +is not defined. +.It Fl -help +Print a help message and exit. +.It Fl i , -mime +Causes the file command to output mime type strings rather than the more +traditional human readable ones. +Thus it may say +.Dq text/plain charset=us-ascii +rather than +.Dq ASCII text . +In order for this option to work, +.Nm +changes the way it handles files recognized by the command itself +(such as many of the text file types, directories etc.), +and makes use of an alternative +.Dq magic +file. +See also +.Sx FILES , +below. +.It Fl -mime-encoding , -mime-type +Like +.Fl i , +but print only the specified element(s). +.It Fl k , -keep-going Don't stop at the first match, keep going. -.It Fl L -Cause symlinks to be followed, as the like-named option in -.Xr ls 1 -(on systems that support symbolic links). -.It Fl m Ar magicfiles -Specify an alternate list, -.Ar magicfiles , -of files containing magic numbers. -This can be a single file or a colon-separated list of files. -If a compiled magic file is found alongside, it will be used instead. -.It Fl N +Subsequent matches will have the string +.Dq "\[rs]012\- " +prepended. +(If a newline is required, see the +.Fl r +option.) +.It Fl L , -dereference +Causes symlinks to be followed; +analogous to the option of the same name in +.Xr ls 1 . +This is the default if the environment variable +.Dv POSIXLY_CORRECT +is defined. +.It Fl m , -magic-file Ar magicfiles +Specify an alternate list of files and directories containing magic. +This can be a single item, or a colon-separated list. +If a compiled magic file is found alongside a file or directory, +it will be used instead. +.It Fl N , -no-pad Don't pad filenames so that they align in the output. -.It Fl n -Force -.Em stdout -to be flushed after checking each file. +.It Fl n , -no-buffer +Force stdout to be flushed after checking each file. This is only useful if checking a list of files. -It is intended to be used by programs that want filetype output from a -pipe. -.It Fl r -Don't translate unprintable characters to -.Sq \e Ns Em ooo . +It is intended to be used by programs that want filetype output from a pipe. +.It Fl p , -preserve-date +On systems that support +.Xr utime 3 +or +.Xr utimes 2 , +attempt to preserve the access time of files analyzed, to pretend that +.Nm +never read them. +.It Fl r , -raw +Don't translate unprintable characters to \eooo. Normally .Nm -translates unprintable characters to their octal representation -(raw mode). -.It Fl s +translates unprintable characters to their octal representation. +.It Fl s , -special-files Normally, .Nm only attempts to read and determine the type of argument files which @@ -223,93 +322,96 @@ disk partitions, which are block special files. This option also causes .Nm to disregard the file size as reported by -.Xr stat 2 , +.Xr stat 2 since on some systems it reports a zero size for raw disk partitions. -.It Fl v +.It Fl v , -version Print the version of the program and exit. -.It Fl z -Try to look inside files that have been run through -.Xr compress 1 . +.It Fl z , -uncompress +Try to look inside compressed files. .El +.Pp +.Ex -std file .Sh ENVIRONMENT -.Bl -tag -width indent -.It Ev MAGIC -Default magic number files, separated by colon characters. +The environment variable +.Dv MAGIC +can be used to set the default magic file name. +If that variable is set, then +.Nm +will not attempt to open +.Pa $HOME/.magic . .Nm adds .Dq .mgc to the value of this variable as appropriate. -.El +The environment variable +.Dv POSIXLY_CORRECT +controls whether +.Nm +will attempt to follow symlinks or not. +If set, then +.Nm +follows symlinks; otherwise it does not. +This is also controlled by the +.Fl L +and +.Fl h +options. .Sh FILES .Bl -tag -width /etc/magic -compact .It Pa /etc/magic default list of magic numbers .El .Sh SEE ALSO -.Xr compress 1 , .Xr hexdump 1 , -.Xr ls 1 , .Xr od 1 , .Xr strings 1 , -.Xr a.out 5 , .Xr magic 5 .Sh STANDARDS CONFORMANCE This program is believed to exceed the System V Interface Definition of FILE(CMD), as near as one can determine from the vague language contained therein. -Its behaviour is mostly compatible with the System V program of the same name. +Its behavior is mostly compatible with the System V program of the same name. This version knows more magic, however, so it will produce different (albeit more accurate) output in many cases. +.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html .Pp The one significant difference between this version and System V -is that this version treats any white space +is that this version treats any whitespace as a delimiter, so that spaces in pattern strings must be escaped. For example, -.Pp ->10 string language impress\ (imPRESS data) +.Bd -literal -offset indent +\*(Gt10 string language impress\ (imPRESS data) +.Ed .Pp in an existing magic file would have to be changed to -.Pp ->10 string language\e impress (imPRESS data) +.Bd -literal -offset indent +\*(Gt10 string language\e impress (imPRESS data) +.Ed .Pp In addition, in this version, if a pattern string contains a backslash, it must be escaped. For example -.Pp -0 string \ebegindata Andrew Toolkit document +.Bd -literal -offset indent +0 string \ebegindata Andrew Toolkit document +.Ed .Pp in an existing magic file would have to be changed to -.Pp -0 string \e\ebegindata Andrew Toolkit document +.Bd -literal -offset indent +0 string \e\ebegindata Andrew Toolkit document +.Ed .Pp SunOS releases 3.2 and later from Sun Microsystems include a -.Nm file +.Nm command derived from the System V one, but with some extensions. -My version differs from Sun's only in minor ways. +This version differs from Sun's only in minor ways. It includes the extension of the -.Ql & +.Sq & operator, used as, for example, -.Pp ->16 long&0x7fffffff >0 not stripped -.Sh MAGIC DIRECTORY -The magic file entries have been collected from various sources, -mainly USENET, and contributed by various authors. -.An Christos Zoulas -(address below) will collect additional -or corrected magic file entries. -A consolidation of magic file entries -will be distributed periodically. -The order of entries in the magic file is significant. -Depending on what system you are using, the order that -they are put together may be incorrect. -If your old -.Nm -command uses a magic file, -keep the old magic file around for comparison purposes -(rename it to -.Pa /etc/magic.orig ) . +.Bd -literal -offset indent +\*(Gt16 long&0x7fffffff \*(Gt0 not stripped +.Ed .Sh HISTORY There has been a .Nm @@ -318,117 +420,66 @@ command in every since at least Research Version 4 (man page dated November, 1973). The System V version introduced one significant major change: -the external list of magic number types. +the external list of magic types. This slowed the program down slightly but made it a lot more flexible. .Pp -This program, based on the System V version, was written by -.An Ian F. Darwin +This program, based on the System V version, +was written by Ian Darwin without looking at anybody else's source code. .Pp -.An John Gilmore -revised the code extensively, making it better than +John Gilmore revised the code extensively, making it better than the first version. -.An Geoff Collyer -found several inadequacies +Geoff Collyer found several inadequacies and provided some magic file entries. -Contributions to the -.Ql & -operator by -.An Rob McMahon , -1989. +Contributions by the `&' operator by Rob McMahon, 1989. .Pp -.An Guy Harris -made many changes from 1993 to the present. +Guy Harris, made many changes from 1993 to the present. .Pp Primary development and maintenance from 1990 to the present by -.An Christos Zoulas Aq christos@zoulas.com . +Christos Zoulas. +.Pp +Altered by Chris Lowth, 2000: +Handle the +.Fl i +option to output mime type strings, using an alternative +magic file and internal logic. .Pp -Altered by -.An Chris Lowth , -2000, to optionally report MIME types. -This required an alternative magic file, and is not available in -.Ox . +Altered by Eric Fischer, July, 2000, +to identify character codes and attempt to identify the languages +of non-ASCII files. .Pp -Altered by -.An Eric Fischer , -July, 2000, to identify character codes and attempt to identify the -languages of non-ASCII files. +Altered by Reuben Thomas, 2007 to 2008, to improve MIME +support and merge MIME and non-MIME magic, support directories as well +as files of magic, apply many bug fixes and improve the build system. .Pp The list of contributors to the -.Dq magdir -directory (source for the -.Pa /etc/magic -file) is too long to include here. +.Dq magic +directory (magic files) +is too long to include here. You know who you are; thank you. -.Sh LEGAL NOTICE -Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. -Covered by the standard Berkeley Software Distribution copyright; see the file -LEGAL.NOTICE in the distribution. -.Pp -The files -.Pa tar.h -and -.Pa is_tar.c -were written by -.An John Gilmore -from his public-domain -.Nm tar -program, and are not covered by the above license. +Many contributors are listed in the source files. .Sh BUGS +.Pp There must be a better way to automate the construction of the Magic file from all the glop in Magdir. What is it? -Better yet, the magic file should be compiled into binary (say, -.Xr ndbm 3 -or, better yet, fixed-length -.Tn ASCII -strings for use in heterogenous network environments) for faster startup. -Then the program would run as fast as the Version 7 program of the same name, -with the flexibility of the System V version. .Pp .Nm -uses several algorithms that favor speed over accuracy; +uses several algorithms that favor speed over accuracy, thus it can be misled about the contents of -.Tn ASCII +text files. .Pp -The support for -.Tn ASCII -files (primarily for programming languages) +The support for text files (primarily for programming languages) is simplistic, inefficient and requires recompilation to update. .Pp -There should be an -.Dq else -clause to follow a series of continuation lines. -.Pp -The magic file and keywords should have regular expression support. -Their use of -.Tn ASCII TAB -as a field delimiter is ugly and makes -it hard to edit the files, but is entrenched. -.Pp -It might be advisable to allow upper-case letters in keywords -for e.g., -.Xr troff 1 -commands vs man page macros. -Regular expression support would make this easy. -.Pp -The program doesn't grok \s-2FORTRAN\s0. -It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which -appear indented at the start of line. -Regular expression support would make this easy. -.Pp The list of keywords in -.Em ascmagic +.Pa ascmagic probably belongs in the Magic file. This could be done by using some keyword like -.Ql * +.Sq * for the offset value. .Pp -Another optimization would be to sort -the magic file so that we can just run down all the -tests for the first byte, first word, first long, etc, once we -have fetched it. Complain about conflicts in the magic file entries. Make a rule that the magic entries sort based on file offset rather than position within the magic file? @@ -437,24 +488,14 @@ The program should provide a way to give an estimate of .Dq how good a guess is. -We end up removing guesses (e.g., -.Dq From\ \& +We end up removing guesses (e.g. +.Dq From\ as first 5 chars of file) because -they are not as good as other guesses (e.g., +they are not as good as other guesses (e.g.\& .Dq Newsgroups: versus -.Qq Return-Path: ) . -Still, if the others don't pan out, it should be -possible to use the first guess. -.Pp -This program is slower than some vendors' -.Nm -commands. +.Dq Return-Path: ) . +Still, if the others don't pan out, it should be possible to use the +first guess. .Pp This manual page, and particularly this section, is too long. -.Sh AVAILABILITY -You can obtain the original author's latest version by anonymous FTP -on -.Em ftp.astron.com -in the directory -.Pa /pub/file/file-X.YY.tar.gz . |