summaryrefslogtreecommitdiff
path: root/usr.bin/file/file.1
diff options
context:
space:
mode:
authorAntoine Jacoutot <ajacoutot@cvs.openbsd.org>2009-10-26 21:03:04 +0000
committerAntoine Jacoutot <ajacoutot@cvs.openbsd.org>2009-10-26 21:03:04 +0000
commitc98ffe3707557963746335ab3f55ff2346630b9d (patch)
tree3300f3ebf077268edc9925994e33007ae6bdf484 /usr.bin/file/file.1
parent3e1f8e92de0ce74e440d65b3a1ad2d6015f736d1 (diff)
Bring man pages on par with our file(1) version (merge from upstream with
several tweaks). As usual, several enhancements and inputs from jmc@ Input from ian@ ok jmc@ ian@
Diffstat (limited to 'usr.bin/file/file.1')
-rw-r--r--usr.bin/file/file.1499
1 files changed, 270 insertions, 229 deletions
diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1
index 711562deb81..ef80608377c 100644
--- a/usr.bin/file/file.1
+++ b/usr.bin/file/file.1
@@ -1,4 +1,4 @@
-.\" $OpenBSD: file.1,v 1.29 2009/08/16 09:41:08 sobrado Exp $
+.\" $OpenBSD: file.1,v 1.30 2009/10/26 21:03:03 ajacoutot Exp $
.\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
.\"
.\" Copyright (c) Ian F. Darwin 1986-1995.
@@ -27,61 +27,57 @@
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
-.Dd $Mdocdate: August 16 2009 $
+.Dd $Mdocdate: October 26 2009 $
.Dt FILE 1
.Os
.Sh NAME
.Nm file
.Nd determine file type
.Sh SYNOPSIS
-.Nm file
-.Op Fl bckLNnrsvz
+.Nm
+.Bk -words
+.Op Fl 0bCcehikLNnprsvz
+.Op Fl -help
+.Op Fl -mime-encoding
+.Op Fl -mime-type
.Op Fl F Ar separator
.Op Fl f Ar namefile
.Op Fl m Ar magicfiles
-.Bk -words
-.Ar
+.Ar file
.Ek
-.Nm file
-.Op Fl m Ar magicfiles
-.Fl C
.Sh DESCRIPTION
The
.Nm
-utility
-tests each argument in an attempt to classify it.
+utility tests each argument in an attempt to classify it.
There are three sets of tests, performed in this order:
-filesystem tests, magic number tests, and language tests.
+filesystem tests, magic tests, and language tests.
The first test that succeeds causes the file type to be printed.
.Pp
The type printed will usually contain one of the words
-.Dq text
+.Em text
(the file contains only
-.Tn ASCII
+printing characters and a few common control
characters and is probably safe to read on an
-.Tn ASCII
-terminal),
-.Dq executable
+ASCII terminal),
+.Em executable
(the file contains the result of compiling a program
in a form understandable to some
.Ux
kernel or another),
or
-.Dq data
-meaning anything else (data is usually binary or non-printable).
-.Pp
+.Em data
+meaning anything else (data is usually
+.Dq binary
+or non-printable).
Exceptions are well-known file formats (core files, tar archives)
that are known to contain binary data.
-When modifying the file
-.Pa /etc/magic
-or the program itself,
-.Em "preserve these keywords" .
-.Pp
-People depend on knowing that all the readable files in a directory
+When modifying magic files or the program itself, make sure to
+.Em preserve these keywords .
+Users depend on knowing that all the readable files in a directory
have the word
.Dq text
printed.
-Don't do as Berkeley did; change
+Don't do as Berkeley did and change
.Dq shell commands text
to
.Dq shell script .
@@ -91,23 +87,21 @@ The filesystem tests are based on examining the return from a
system call.
The program checks to see if the file is empty,
or if it's some sort of special file.
-Any known file types appropriate to the system you are running on
-(sockets, symbolic links, or named pipes (FIFOs) on those systems that
-implement them)
+Any known file types,
+such as sockets, symbolic links, and named pipes (FIFOs),
are intuited if they are defined in
the system header file
.Aq Pa sys/stat.h .
.Pp
-The magic number tests are used to check for files with data in
+The magic tests are used to check for files with data in
particular fixed formats.
The canonical example of this is a binary executable (compiled program)
-.Pa a.out
-file, whose format is defined in
-.Aq Pa a.out.h
+a.out file, whose format is defined in
+.Aq Pa elf.h ,
+.Aq Pa a.out.h ,
and possibly
.Aq Pa exec.h
-in the standard include directory and is explained in
-.Xr a.out 5 .
+in the standard include directory.
These files have a
.Dq magic number
stored in a particular place
@@ -115,58 +109,124 @@ near the beginning of the file that tells the
.Ux
operating system
that the file is a binary executable, and which of several types thereof.
-.Pp
-The concept of magic number has been applied by extension to data files.
+The concept of a
+.Dq magic
+has been applied by extension to data files.
Any file with some invariant identifier at a small fixed
offset into the file can usually be described in this way.
-The information in these files is read from the magic file
+The information identifying these files is read from the magic file
.Pa /etc/magic .
+In addition, if
+.Pa $HOME/.magic.mgc
+or
+.Pa $HOME/.magic
+exists, it will be used in preference to the system magic files.
+.Pp
+If a file does not match any of the entries in the magic file,
+it is examined to see if it seems to be a text file.
+ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
+(such as those used on Macintosh and IBM PC systems),
+UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
+character sets can be distinguished by the different
+ranges and sequences of bytes that constitute printable text
+in each set.
+If a file passes any of these tests, its character set is reported.
+ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
+as
+.Dq text
+because they will be mostly readable on nearly any terminal;
+UTF-16 and EBCDIC are only
+.Dq character data
+because, while
+they contain text, it is text that will require translation
+before it can be read.
+In addition,
+.Nm
+will attempt to determine other characteristics of text-type files.
+If the lines of a file are terminated by CR, CRLF, or NEL, instead
+of the Unix-standard LF, this will be reported.
+Files that contain embedded escape sequences or overstriking
+will also be identified.
.Pp
-If an argument appears to be an
-.Tn ASCII
-file,
+Once
.Nm
-attempts to guess its language.
-The language tests look for particular strings (cf
-.Pa names.h )
+has determined the character set used in a text-type file,
+it will
+attempt to determine in what language the file is written.
+The language tests look for particular strings (cf.\&
+.Aq Pa names.h )
that can appear anywhere in the first few blocks of a file.
For example, the keyword
.Em .br
indicates that the file is most likely a
.Xr troff 1
input file, just as the keyword
-.Li struct
+.Em struct
indicates a C program.
These tests are less reliable than the previous
two groups, so they are performed last.
The language test routines also test for some miscellany
(such as
.Xr tar 1
-archives) and determine whether an unknown file should be
-labelled as
-.Dq ASCII text
-or
-.Dq data .
+archives).
.Pp
-The options are as follows:
-.Bl -tag -width Ds
-.It Fl b
+Any file that cannot be identified as having been written
+in any of the character sets listed above is simply said to be
+.Dq data .
+.Sh OPTIONS
+.Bl -tag -width indent
+.It Fl 0 , -print0
+Output a null character
+.Sq \e0
+after the end of the filename.
+Nice to
+.Xr cut 1
+the output.
+This does not affect the separator which is still printed.
+.It Fl b , -brief
Do not prepend filenames to output lines (brief mode).
-.It Fl C
-For each magic number file, write a
+.It Fl C , -compile
+Write a
.Pa magic.mgc
-output file that contains a preparsed (compiled) version of it.
-.It Fl c
+output file that contains a pre-parsed version of the magic file or directory.
+.It Fl c , -checking-printout
Cause a checking printout of the parsed form of the magic file.
-This is usually used in conjunction with
+This is usually used in conjunction with the
.Fl m
-to debug a new magic file before installing it.
-.It Fl F Ar separator
-Use the specified string as the separator between the filename and
-the file result returned.
+flag to debug a new magic file before installing it.
+.It Fl e , -exclude Ar testname
+Exclude the test named in
+.Ar testname
+from the list of tests made to determine the file type.
+Valid test names are:
+.Bl -tag -width
+.It apptype
+Check for
+.Dv EMX
+application type (only on EMX).
+.It ascii
+Check for various types of ASCII files.
+.It compress
+Don't look for, or inside, compressed files.
+.It elf
+Don't print elf details.
+.It fortran
+Don't look for fortran sequences inside ASCII files.
+.It soft
+Don't consult magic files.
+.It tar
+Don't examine tar files.
+.It token
+Don't look for known tokens inside ASCII files.
+.It troff
+Don't look for troff sequences inside ASCII files.
+.El
+.It Fl F , -separator Ar separator
+Use the specified string as the separator between the filename and the
+file result returned.
Defaults to
.Sq \&: .
-.It Fl f Ar namefile
+.It Fl f , -files-from Ar namefile
Read the names of the files to be examined from
.Ar namefile
(one per line)
@@ -177,35 +237,74 @@ or at least one filename argument must be present;
to test the standard input, use
.Sq -
as a filename argument.
-.It Fl k
+.It Fl h , -no-dereference
+Causes symlinks not to be followed.
+This is the default if the environment variable
+.Dv POSIXLY_CORRECT
+is not defined.
+.It Fl -help
+Print a help message and exit.
+.It Fl i , -mime
+Causes the file command to output mime type strings rather than the more
+traditional human readable ones.
+Thus it may say
+.Dq text/plain charset=us-ascii
+rather than
+.Dq ASCII text .
+In order for this option to work,
+.Nm
+changes the way it handles files recognized by the command itself
+(such as many of the text file types, directories etc.),
+and makes use of an alternative
+.Dq magic
+file.
+See also
+.Sx FILES ,
+below.
+.It Fl -mime-encoding , -mime-type
+Like
+.Fl i ,
+but print only the specified element(s).
+.It Fl k , -keep-going
Don't stop at the first match, keep going.
-.It Fl L
-Cause symlinks to be followed, as the like-named option in
-.Xr ls 1
-(on systems that support symbolic links).
-.It Fl m Ar magicfiles
-Specify an alternate list,
-.Ar magicfiles ,
-of files containing magic numbers.
-This can be a single file or a colon-separated list of files.
-If a compiled magic file is found alongside, it will be used instead.
-.It Fl N
+Subsequent matches will have the string
+.Dq "\[rs]012\- "
+prepended.
+(If a newline is required, see the
+.Fl r
+option.)
+.It Fl L , -dereference
+Causes symlinks to be followed;
+analogous to the option of the same name in
+.Xr ls 1 .
+This is the default if the environment variable
+.Dv POSIXLY_CORRECT
+is defined.
+.It Fl m , -magic-file Ar magicfiles
+Specify an alternate list of files and directories containing magic.
+This can be a single item, or a colon-separated list.
+If a compiled magic file is found alongside a file or directory,
+it will be used instead.
+.It Fl N , -no-pad
Don't pad filenames so that they align in the output.
-.It Fl n
-Force
-.Em stdout
-to be flushed after checking each file.
+.It Fl n , -no-buffer
+Force stdout to be flushed after checking each file.
This is only useful if checking a list of files.
-It is intended to be used by programs that want filetype output from a
-pipe.
-.It Fl r
-Don't translate unprintable characters to
-.Sq \e Ns Em ooo .
+It is intended to be used by programs that want filetype output from a pipe.
+.It Fl p , -preserve-date
+On systems that support
+.Xr utime 3
+or
+.Xr utimes 2 ,
+attempt to preserve the access time of files analyzed, to pretend that
+.Nm
+never read them.
+.It Fl r , -raw
+Don't translate unprintable characters to \eooo.
Normally
.Nm
-translates unprintable characters to their octal representation
-(raw mode).
-.It Fl s
+translates unprintable characters to their octal representation.
+.It Fl s , -special-files
Normally,
.Nm
only attempts to read and determine the type of argument files which
@@ -223,93 +322,96 @@ disk partitions, which are block special files.
This option also causes
.Nm
to disregard the file size as reported by
-.Xr stat 2 ,
+.Xr stat 2
since on some systems it reports a zero size for raw disk partitions.
-.It Fl v
+.It Fl v , -version
Print the version of the program and exit.
-.It Fl z
-Try to look inside files that have been run through
-.Xr compress 1 .
+.It Fl z , -uncompress
+Try to look inside compressed files.
.El
+.Pp
+.Ex -std file
.Sh ENVIRONMENT
-.Bl -tag -width indent
-.It Ev MAGIC
-Default magic number files, separated by colon characters.
+The environment variable
+.Dv MAGIC
+can be used to set the default magic file name.
+If that variable is set, then
+.Nm
+will not attempt to open
+.Pa $HOME/.magic .
.Nm
adds
.Dq .mgc
to the value of this variable as appropriate.
-.El
+The environment variable
+.Dv POSIXLY_CORRECT
+controls whether
+.Nm
+will attempt to follow symlinks or not.
+If set, then
+.Nm
+follows symlinks; otherwise it does not.
+This is also controlled by the
+.Fl L
+and
+.Fl h
+options.
.Sh FILES
.Bl -tag -width /etc/magic -compact
.It Pa /etc/magic
default list of magic numbers
.El
.Sh SEE ALSO
-.Xr compress 1 ,
.Xr hexdump 1 ,
-.Xr ls 1 ,
.Xr od 1 ,
.Xr strings 1 ,
-.Xr a.out 5 ,
.Xr magic 5
.Sh STANDARDS CONFORMANCE
This program is believed to exceed the System V Interface Definition
of FILE(CMD), as near as one can determine from the vague language
contained therein.
-Its behaviour is mostly compatible with the System V program of the same name.
+Its behavior is mostly compatible with the System V program of the same name.
This version knows more magic, however, so it will produce
different (albeit more accurate) output in many cases.
+.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
.Pp
The one significant difference
between this version and System V
-is that this version treats any white space
+is that this version treats any whitespace
as a delimiter, so that spaces in pattern strings must be escaped.
For example,
-.Pp
->10 string language impress\ (imPRESS data)
+.Bd -literal -offset indent
+\*(Gt10 string language impress\ (imPRESS data)
+.Ed
.Pp
in an existing magic file would have to be changed to
-.Pp
->10 string language\e impress (imPRESS data)
+.Bd -literal -offset indent
+\*(Gt10 string language\e impress (imPRESS data)
+.Ed
.Pp
In addition, in this version, if a pattern string contains a backslash,
it must be escaped.
For example
-.Pp
-0 string \ebegindata Andrew Toolkit document
+.Bd -literal -offset indent
+0 string \ebegindata Andrew Toolkit document
+.Ed
.Pp
in an existing magic file would have to be changed to
-.Pp
-0 string \e\ebegindata Andrew Toolkit document
+.Bd -literal -offset indent
+0 string \e\ebegindata Andrew Toolkit document
+.Ed
.Pp
SunOS releases 3.2 and later from Sun Microsystems include a
-.Nm file
+.Nm
command derived from the System V one, but with some extensions.
-My version differs from Sun's only in minor ways.
+This version differs from Sun's only in minor ways.
It includes the extension of the
-.Ql &
+.Sq &
operator, used as,
for example,
-.Pp
->16 long&0x7fffffff >0 not stripped
-.Sh MAGIC DIRECTORY
-The magic file entries have been collected from various sources,
-mainly USENET, and contributed by various authors.
-.An Christos Zoulas
-(address below) will collect additional
-or corrected magic file entries.
-A consolidation of magic file entries
-will be distributed periodically.
-The order of entries in the magic file is significant.
-Depending on what system you are using, the order that
-they are put together may be incorrect.
-If your old
-.Nm
-command uses a magic file,
-keep the old magic file around for comparison purposes
-(rename it to
-.Pa /etc/magic.orig ) .
+.Bd -literal -offset indent
+\*(Gt16 long&0x7fffffff \*(Gt0 not stripped
+.Ed
.Sh HISTORY
There has been a
.Nm
@@ -318,117 +420,66 @@ command in every
since at least Research Version 4
(man page dated November, 1973).
The System V version introduced one significant major change:
-the external list of magic number types.
+the external list of magic types.
This slowed the program down slightly but made it a lot more flexible.
.Pp
-This program, based on the System V version, was written by
-.An Ian F. Darwin
+This program, based on the System V version,
+was written by Ian Darwin
without looking at anybody else's source code.
.Pp
-.An John Gilmore
-revised the code extensively, making it better than
+John Gilmore revised the code extensively, making it better than
the first version.
-.An Geoff Collyer
-found several inadequacies
+Geoff Collyer found several inadequacies
and provided some magic file entries.
-Contributions to the
-.Ql &
-operator by
-.An Rob McMahon ,
-1989.
+Contributions by the `&' operator by Rob McMahon, 1989.
.Pp
-.An Guy Harris
-made many changes from 1993 to the present.
+Guy Harris, made many changes from 1993 to the present.
.Pp
Primary development and maintenance from 1990 to the present by
-.An Christos Zoulas Aq christos@zoulas.com .
+Christos Zoulas.
+.Pp
+Altered by Chris Lowth, 2000:
+Handle the
+.Fl i
+option to output mime type strings, using an alternative
+magic file and internal logic.
.Pp
-Altered by
-.An Chris Lowth ,
-2000, to optionally report MIME types.
-This required an alternative magic file, and is not available in
-.Ox .
+Altered by Eric Fischer, July, 2000,
+to identify character codes and attempt to identify the languages
+of non-ASCII files.
.Pp
-Altered by
-.An Eric Fischer ,
-July, 2000, to identify character codes and attempt to identify the
-languages of non-ASCII files.
+Altered by Reuben Thomas, 2007 to 2008, to improve MIME
+support and merge MIME and non-MIME magic, support directories as well
+as files of magic, apply many bug fixes and improve the build system.
.Pp
The list of contributors to the
-.Dq magdir
-directory (source for the
-.Pa /etc/magic
-file) is too long to include here.
+.Dq magic
+directory (magic files)
+is too long to include here.
You know who you are; thank you.
-.Sh LEGAL NOTICE
-Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
-Covered by the standard Berkeley Software Distribution copyright; see the file
-LEGAL.NOTICE in the distribution.
-.Pp
-The files
-.Pa tar.h
-and
-.Pa is_tar.c
-were written by
-.An John Gilmore
-from his public-domain
-.Nm tar
-program, and are not covered by the above license.
+Many contributors are listed in the source files.
.Sh BUGS
+.Pp
There must be a better way to automate the construction of the Magic
file from all the glop in Magdir.
What is it?
-Better yet, the magic file should be compiled into binary (say,
-.Xr ndbm 3
-or, better yet, fixed-length
-.Tn ASCII
-strings for use in heterogenous network environments) for faster startup.
-Then the program would run as fast as the Version 7 program of the same name,
-with the flexibility of the System V version.
.Pp
.Nm
-uses several algorithms that favor speed over accuracy;
+uses several algorithms that favor speed over accuracy,
thus it can be misled about the contents of
-.Tn ASCII
+text
files.
.Pp
-The support for
-.Tn ASCII
-files (primarily for programming languages)
+The support for text files (primarily for programming languages)
is simplistic, inefficient and requires recompilation to update.
.Pp
-There should be an
-.Dq else
-clause to follow a series of continuation lines.
-.Pp
-The magic file and keywords should have regular expression support.
-Their use of
-.Tn ASCII TAB
-as a field delimiter is ugly and makes
-it hard to edit the files, but is entrenched.
-.Pp
-It might be advisable to allow upper-case letters in keywords
-for e.g.,
-.Xr troff 1
-commands vs man page macros.
-Regular expression support would make this easy.
-.Pp
-The program doesn't grok \s-2FORTRAN\s0.
-It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
-appear indented at the start of line.
-Regular expression support would make this easy.
-.Pp
The list of keywords in
-.Em ascmagic
+.Pa ascmagic
probably belongs in the Magic file.
This could be done by using some keyword like
-.Ql *
+.Sq *
for the offset value.
.Pp
-Another optimization would be to sort
-the magic file so that we can just run down all the
-tests for the first byte, first word, first long, etc, once we
-have fetched it.
Complain about conflicts in the magic file entries.
Make a rule that the magic entries sort based on file offset rather
than position within the magic file?
@@ -437,24 +488,14 @@ The program should provide a way to give an estimate
of
.Dq how good
a guess is.
-We end up removing guesses (e.g.,
-.Dq From\ \&
+We end up removing guesses (e.g.
+.Dq From\
as first 5 chars of file) because
-they are not as good as other guesses (e.g.,
+they are not as good as other guesses (e.g.\&
.Dq Newsgroups:
versus
-.Qq Return-Path: ) .
-Still, if the others don't pan out, it should be
-possible to use the first guess.
-.Pp
-This program is slower than some vendors'
-.Nm
-commands.
+.Dq Return-Path: ) .
+Still, if the others don't pan out, it should be possible to use the
+first guess.
.Pp
This manual page, and particularly this section, is too long.
-.Sh AVAILABILITY
-You can obtain the original author's latest version by anonymous FTP
-on
-.Em ftp.astron.com
-in the directory
-.Pa /pub/file/file-X.YY.tar.gz .