diff options
author | Bob Beck <beck@cvs.openbsd.org> | 2002-02-12 07:56:50 +0000 |
---|---|---|
committer | Bob Beck <beck@cvs.openbsd.org> | 2002-02-12 07:56:50 +0000 |
commit | d1e2288e10518d676f396f8c04919387902012fc (patch) | |
tree | bcad9caf217eb2d709b022988de4f217820d9510 /usr.sbin/httpd/htdocs/manual/logs.html | |
parent | fa34e581e9ecb76479288163c942d8ad4e550948 (diff) |
import apache 1.3.26 + mod_ssl 2.8.10
Diffstat (limited to 'usr.sbin/httpd/htdocs/manual/logs.html')
-rw-r--r-- | usr.sbin/httpd/htdocs/manual/logs.html | 677 |
1 files changed, 677 insertions, 0 deletions
diff --git a/usr.sbin/httpd/htdocs/manual/logs.html b/usr.sbin/httpd/htdocs/manual/logs.html new file mode 100644 index 00000000000..f84d81e60a5 --- /dev/null +++ b/usr.sbin/httpd/htdocs/manual/logs.html @@ -0,0 +1,677 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + +<html xmlns="http://www.w3.org/1999/xhtml"> + <head> + <meta name="generator" content="HTML Tidy, see www.w3.org" /> + + <title>Log Files - Apache HTTP Server</title> + </head> + <!-- Background white, links blue (unvisited), navy (visited), red (active) --> + + <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" + vlink="#000080" alink="#FF0000"> + <div align="CENTER"> + <img src="images/sub.gif" alt="[APACHE DOCUMENTATION]" /> + + <h3>Apache HTTP Server</h3> + </div> + + + + <h1 align="center">Log Files</h1> + + <p>In order to effectively manage a web server, it is necessary + to get feedback about the activity and performance of the + server as well as any problems that may be occuring. The Apache + HTTP Server provides very comprehensive and flexible logging + capabilities. This document describes how to configure its + logging capabilities, and how to understand what the logs + contain.</p> + + <ul> + <li><a href="#security">Security Warning</a></li> + + <li><a href="#errorlog">Error Log</a></li> + + <li> + <a href="#accesslog">Access Log</a> + + <ul> + <li><a href="#common">Common Log Format</a></li> + + <li><a href="#combined">Combined Log Format</a></li> + + <li><a href="#multiple">Multiple Access Logs</a></li> + + <li><a href="#conditional">Conditional Logging</a></li> + </ul> + </li> + + <li><a href="#rotation">Log Rotation</a></li> + + <li><a href="#piped">Piped Logs</a></li> + + <li><a href="#virtualhosts">VirtualHosts</a></li> + + <li> + <a href="#other">Other Log Files</a> + + <ul> + <li><a href="#pidfile">PID File</a></li> + + <li><a href="#scriptlog">Script Log</a></li> + + <li><a href="#rewritelog">Rewrite Log</a></li> + </ul> + </li> + </ul> + <hr /> + + <h2><a id="security" name="security">Security Warning</a></h2> + + <p>Anyone who can write to the directory where Apache is + writing a log file can almost certainly gain access to the uid + that the server is started as, which is normally root. Do + <em>NOT</em> give people write access to the directory the logs + are stored in without being aware of the consequences; see the + <a href="misc/security_tips.html">security tips</a> document + for details.</p> + + <p>In addition, log files may contain information supplied + directly by the client, without escaping. Therefore, it is + possible for malicious clients to insert control-characters in + the log files, so care must be taken in dealing with raw + logs.</p> + <hr /> + + <h2><a id="errorlog" name="errorlog">Error Log</a></h2> + + <table border="1"> + <tr> + <td valign="top"><strong>Related Directives</strong><br /> + <br /> + <a href="mod/core.html#errorlog">ErrorLog</a><br /> + <a href="mod/core.html#loglevel">LogLevel</a> </td> + </tr> + </table> + + <p>The server error log, whose name and location is set by the + <a href="mod/core.html#errorlog">ErrorLog</a> directive, is the + most important log file. This is the place where Apache httpd + will send diagnostic information and record any errors that it + encounters in processing requests. It is the first place to + look when a problem occurs with starting the server or with the + operation of the server, since it will often contain details of + what went wrong and how to fix it.</p> + + <p>The error log is usually written to a file (typically + <code>error_log</code> on unix systems and + <code>error.log</code> on Windows and OS/2). On unix systems it + is also possible to have the server send errors to + <code>syslog</code> or <a href="#pipe">pipe them to a + program</a>.</p> + + <p>The format of the error log is relatively free-form and + descriptive. But there is certain information that is contained + in most error log entries. For example, here is a typical + message.</p> + + <blockquote> + <code>[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] + client denied by server configuration: + /export/home/live/ap/htdocs/test</code> + </blockquote> + + <p>The first item in the log entry is the date and time of the + message. The second entry lists the severity of the error being + reported. The <a href="mod/core.html#loglevel">LogLevel</a> + directive is used to control the types of errors that are sent + to the error log by restricting the severity level. The third + entry gives the IP address of the client that generated the + error. Beyond that is the message itself, which in this case + indicates that the server has been configured to deny the + client access. The server reports the file-system path (as + opposed to the web path) of the requested document.</p> + + <p>A very wide variety of different messages can appear in the + error log. Most look similar to the example above. The error + log will also contain debugging output from CGI scripts. Any + information written to <code>stderr</code> by a CGI script will + be copied directly to the error log.</p> + + <p>It is not possible to customize the error log by adding or + removing information. However, error log entries dealing with + particular requests have corresponding entries in the <a + href="#accesslog">access log</a>. For example, the above example + entry corresponds to an access log entry with status code 403. + Since it is possible to customize the access log, you can + obtain more information about error conditions using that log + file.</p> + + <p>During testing, it is often useful to continuously monitor + the error log for any problems. On unix systems, you can + accomplish this using:</p> + + <blockquote> + <code>tail -f error_log</code> + </blockquote> + <hr /> + + <h2><a id="accesslog" name="accesslog">Access Log</a></h2> + + <table border="1"> + <tr> + <td valign="top"><strong>Related Modules</strong><br /> + <br /> + <a href="mod/mod_log_config.html">mod_log_config</a><br /> + </td> + + <td valign="top"><strong>Related Directives</strong><br /> + <br /> + <a + href="mod/mod_log_config.html#customlog">CustomLog</a><br /> + <a + href="mod/mod_log_config.html#logformat">LogFormat</a><br /> + <a href="mod/mod_setenvif.html#setenvif">SetEnvIf</a> + </td> + </tr> + </table> + + <p>The server access log records all requests processed by the + server. The location and content of the access log are + controlled by the <a + href="mod/mod_log_config.html#customlog">CustomLog</a> + directive. The <a + href="mod/mod_log_config.html#logformat">LogFormat</a> + directive can be used to simplify the selection of the contents + of the logs. This section describes how to configure the server + to record information in the access log.</p> + + <p>Of course, storing the information in the access log is only + the start of log management. The next step is to analyze this + information to produce useful statistics. Log analysis in + general is beyond the scope of this document, and not really + part of the job of the web server itself. For more information + about this topic, and for applications which perform log + analysis, check the <a + href="http://dmoz.org/Computers/Software/Internet/Site_Management/Log_analysis/"> + Open Directory</a> or <a + href="http://dir.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Servers/Log_Analysis_Tools/"> + Yahoo</a>.</p> + + <p>Various versions of Apache httpd have used other modules and + directives to control access logging, including + mod_log_referer, mod_log_agent, and the + <code>TransferLog</code> directive. The <code>CustomLog</code> + directive now subsumes the functionality of all the older + directives.</p> + + <p>The format of the access log is highly configurable. The + format is specified using a <a + href="mod/mod_log_config.html#format">format string</a> that + looks much like a C-style printf(1) format string. Some + examples are presented in the next sections. For a complete + list of the possible contents of the format string, see the <a + href="mod/mod_log_config.html">mod_log_config + documentation</a>.</p> + + <h3><a id="common" name="common">Common Log Format</a></h3> + + <p>A typical configuration for the access log might look as + follows.</p> + + <blockquote> + <code>LogFormat "%h %l %u %t \"%r\" %>s %b" common<br /> + CustomLog logs/access_log common</code> + </blockquote> + + <p>This defines the <em>nickname</em> <code>common</code> and + associates it with a particular log format string. The format + string consists of percent directives, each of which tell the + server to log a particular piece of information. Literal + characters may also be placed in the format string and will be + copied directly into the log output. The quote character + (<code>"</code>) must be escaped by placing a back-slash before + it to prevent it from being interpreted as the end of the + format string. The format string may also contain the special + control characters "<code>\n</code>" for new-line and + "<code>\t</code>" for tab.</p> + + <p>The <code>CustomLog</code> directive sets up a new log file + using the defined <em>nickname</em>. The filename for the + access log is relative to the <a + href="mod/core.html#serverroot">ServerRoot</a> unless it begins + with a slash.</p> + + <p>The above configuration will write log entries in a format + known as the Common Log Format (CLF). This standard format can + be produced by many different web servers and read by many log + analysis programs. The log file entries produced in CLF will + look something like this:</p> + + <blockquote> + <code>127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET + /apache_pb.gif HTTP/1.0" 200 2326</code> + </blockquote> + + <p>Each part of this log entry is described below.</p> + + <dl> + <dt><code>127.0.0.1</code> (<code>%h</code>)</dt> + + <dd>This is the IP address of the client (remote host) which + made the request to the server. If <a + href="mod/core.html#hostnamelookups">HostnameLookups</a> is + set to <code>On</code>, then the server will try to determine + the hostname and log it in place of the IP address. However, + this configuration is not recommended since it can + significantly slow the server. Instead, it is best to use a + log post-processor such as <a + href="programs/logresolve.html">logresolve</a> to determine + the hostnames. The IP address reported here is not + necessarily the address of the machine at which the user is + sitting. If a proxy server exists between the user and the + server, this address will be the address of the proxy, rather + than the originating machine.</dd> + + <dt><code>-</code> (<code>%l</code>)</dt> + + <dd>The "hyphen" in the output indicates that the requested + piece of information is not available. In this case, the + information that is not available is the RFC 1413 identity of + the client determined by <code>identd</code> on the clients + machine. This information is highly unreliable and should + almost never be used except on tightly controlled internal + networks. Apache httpd will not even attempt to determine + this information unless <a + href="mod/core.html#identitycheck">IdentityCheck</a> is set + to <code>On</code>.</dd> + + <dt><code>frank</code> (<code>%u</code>)</dt> + + <dd>This is the userid of the person requesting the document + as determined by HTTP authentication. The same value is + typically provided to CGI scripts in the + <code>REMOTE_USER</code> environment variable. If the status + code for the request (see below) is 401, then this value + should not be trusted because the user is not yet + authenticated. If the document is not password protected, + this entry will be "<code>-</code>" just like the previous + one.</dd> + + <dt><code>[10/Oct/2000:13:55:36 -0700]</code> + (<code>%t</code>)</dt> + + <dd> + The time that the server finished processing the request. + The format is: + + <blockquote> + <code>[day/month/year:hour:minute:second zone]<br /> + day = 2*digit<br /> + month = 3*letter<br /> + year = 4*digit<br /> + hour = 2*digit<br /> + minute = 2*digit<br /> + second = 2*digit<br /> + zone = (`+' | `-') 4*digit</code> + </blockquote> + It is possible to have the time displayed in another format + by specifying <code>%{format}t</code> in the log format + string, where <code>format</code> is as in + <code>strftime(3)</code> from the C standard library. + </dd> + + <dt><code>"GET /apache_pb.gif HTTP/1.0"</code> + (<code>\"%r\"</code>)</dt> + + <dd>The request line from the client is given in double + quotes. The request line contains a great deal of useful + information. First, the method used by the client is + <code>GET</code>. Second, the client requested the resource + <code>/apache_pb.gif</code>, and third, the client used the + protocol <code>HTTP/1.0</code>. It is also possible to log + one or more parts of the request line independently. For + example, the format string "<code>%m %U%q %H</code>" will log + the method, path, query-string, and protocol, resulting in + exactly the same output as "<code>%r</code>".</dd> + + <dt><code>200</code> (<code>%>s</code>)</dt> + + <dd>This is the status code that the server sends back to the + client. This information is very valuable, because it reveals + whether the request resulted in a successful response (codes + beginning in 2), a redirection (codes beginning in 3), an + error caused by the client (codes beginning in 4), or an + error in the server (codes beginning in 5). The full list of + possible status codes can be found in the <a + href="http://www.w3.org/Protocols/rfc2616/rfc2616.txt">HTTP + specification</a> (RFC2616 section 10).</dd> + + <dt><code>2326</code> (<code>%b</code>)</dt> + + <dd>The last entry indicates the size of the object returned + to the client, not including the response headers. If no + content was returned to the client, this value will be + "<code>-</code>". To log "<code>0</code>" for no content, use + <code>%B</code> instead.</dd> + </dl> + + <h4><a id="combined" name="combined">Combined Log + Format</a></h4> + + <p>Another commonly used format string is called the Combined + Log Format. It can be used as follows.</p> + + <blockquote> + <code>LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" + \"%{User-agent}i\"" combined<br /> + CustomLog log/acces_log combined</code> + </blockquote> + + <p>This format is exactly the same as the Common Log Format, + with the addition of two more fields. Each of the additional + fields uses the percent-directive + <code>%{<em>header</em>}i</code>, where <em>header</em> can be + any HTTP request header. The access log under this format will + look like:</p> + + <blockquote> + <code>127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET + /apache_pb.gif HTTP/1.0" 200 2326 + "http://www.example.com/start.html" "Mozilla/4.08 [en] + (Win98; I ;Nav)"</code> + </blockquote> + + <p>The additional fields are:</p> + + <dl> + <dt><code>"http://www.example.com/start.html"</code> + (<code>\"%{Referer}i\"</code>)</dt> + + <dd>The "Referer" (sic) HTTP request header. This gives the + site that the client reports having been referred from. (This + should be the page that links to or includes + <code>/apache_pb.gif</code>).</dd> + + <dt><code>"Mozilla/4.08 [en] (Win98; I ;Nav)"</code> + (<code>\"%{User-agent}i\"</code>)</dt> + + <dd>The User-Agent HTTP request header. This is the + identifying information that the client browser reports about + itself.</dd> + </dl> + + <h3><a id="multiple" name="multiple">Multiple Access + Logs</a></h3> + + <p>Multiple access logs can be created simply by specifying + multiple <code>CustomLog</code> directives in the configuration + file. For example, the following directives will create three + access logs. The first contains the basic CLF information, + while the second and third contain referer and browser + information. The last two <code>CustomLog</code> lines show how + to mimic the effects of the <code>ReferLog</code> and + <code>AgentLog</code> directives.</p> + + <blockquote> + <code>LogFormat "%h %l %u %t \"%r\" %>s %b" common<br /> + CustomLog logs/access_log common<br /> + CustomLog logs/referer_log "%{Referer}i -> %U"<br /> + CustomLog logs/agent_log "%{User-agent}i"</code> + </blockquote> + + <p>This example also shows that it is not necessary to define a + nickname with the <code>LogFormat</code> directive. Instead, + the log format can be specified directly in the + <code>CustomLog</code> directive.</p> + + <h3><a id="conditional" name="conditional">Conditional + Logging</a></h3> + + <p>There are times when it is convenient to exclude certain + entries from the access logs based on characteristics of the + client request. This is easily accomplished with the help of <a + href="env.html">environment variables</a>. First, an + environment variable must be set to indicate that the request + meets certain conditions. This is usually accomplished with <a + href="mod/mod_setenvif.html#setenvif">SetEnvIf</a>. Then the + <code>env=</code> clause of the <code>CustomLog</code> + directive is used to include or exclude requests where the + environment variable is set. Some examples:</p> + + <blockquote> + <code># Mark requests from the loop-back interface<br /> + SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog<br /> + # Mark requests for the robots.txt file<br /> + SetEnvIf Request_URI "^/robots\.txt$" dontlog<br /> + # Log what remains<br /> + CustomLog logs/access_log common env=!dontlog</code> + </blockquote> + + <p>As another example, consider logging requests from + english-speakers to one log file, and non-english speakers to a + different log file.</p> + + <blockquote> + <code>SetEnvIf Accept-Language "en" english<br /> + CustomLog logs/english_log common env=english<br /> + CustomLog logs/non_english_log common env=!english</code> + </blockquote> + + <p>Although we have just shown that conditional logging is very + powerful and flexibly, it is not the only way to control the + contents of the logs. Log files are more useful when they + contain a complete record of server activity. It is often + easier to simply post-process the log files to remove requests + that you do not want to consider.</p> + <hr /> + + <h2><a id="rotation" name="rotation">Log Rotation</a></h2> + + <p>On even a moderately busy server, the quantity of + information stored in the log files is very large. The access + log file typically grows 1 MB or more per 10,000 requests. It + will consequently be necessary to periodically rotate the log + files by moving or deleting the existing logs. This cannot be + done while the server is running, because Apache will continue + writing to the old log file as long as it holds the file open. + Instead, the server must be <a + href="stopping.html">restarted</a> after the log files are + moved or deleted so that it will open new log files.</p> + + <p>By using a <em>graceful</em> restart, the server can be + instructed to open new log files without losing any existing or + pending connections from clients. However, in order to + accomplish this, the server must continue to write to the old + log files while it finishes serving old requests. It is + therefore necessary to wait for some time after the restart + before doing any processing on the log files. A typical + scenario that simply rotates the logs and compresses the old + logs to save space is:</p> + + <blockquote> + <code>mv access_log access_log.old<br /> + mv error_log error_log.old<br /> + apachectl graceful<br /> + sleep 600<br /> + gzip access_log.old error_log.old</code> + </blockquote> + + <p>Another way to perform log rotation is using <a + href="#piped">piped logs</a> as discussed in the next + section.</p> + <hr /> + + <h2><a id="piped" name="piped">Piped Logs</a></h2> + + <p>Apache httpd is capable of writing error and access log + files through a pipe to another process, rather than directly + to a file. This capability dramatically increases the + flexibility of logging, without adding code to the main server. + In order to write logs to a pipe, simply replace the filename + with the pipe character "<code>|</code>", followed by the name + of the executable which should accept log entries on its + standard input. Apache will start the piped-log process when + the server starts, and will restart it if it crashes while the + server is running. (This last feature is why we can refer to + this technique as "reliable piped logging".)</p> + + <p>Piped log processes are spawned by the parent Apache httpd + process, and inherit the userid of that process. This means + that piped log programs usually run as root. It is therefore + very important to keep the programs simple and secure.</p> + + <p>Some simple examples using piped logs:</p> + + <blockquote> + <code># compressed logs<br /> + CustomLog "|/usr/bin/gzip -c >> + /var/log/access_log.gz" common<br /> + # almost-real-time name resolution<br /> + CustomLog "|/usr/local/apache/bin/logresolve >> + /var/log/access_log" common</code> + </blockquote> + + <p>Notice that quotes are used to enclose the entire command + that will be called for the pipe. Although these examples are + for the access log, the same technique can be used for the + error log.</p> + + <p>One important use of piped logs is to allow log rotation + without having to restart the server. The Apache HTTP Server + includes a simple program called <a + href="programs/rotatelogs.html">rotatelogs</a> for this + purpose. For example, to rotate the logs every 24 hours, you + can use:</p> + + <blockquote> + <code>CustomLog "|/usr/local/apache/bin/rotatelogs + /var/log/access_log 86400" common</code> + </blockquote> + + <p>A similar, but much more flexible log rotation program + called <a + href="http://www.ford-mason.co.uk/resources/cronolog/">cronolog</a> + is available at an external site.</p> + + <p>As with conditional logging, piped logs are a very powerful + tool, but they should not be used where a simpler solution like + off-line post-processing is available.</p> + <hr /> + + <h2><a id="virtualhosts" name="virtualhosts">Virtual + Hosts</a></h2> + + <p>When running a server with many <a href="vhosts/">virtual + hosts</a>, there are several options for dealing with log + files. First, it is possible to use logs exactly as in a + single-host server. Simply by placing the logging directives + outside the <code><VirtualHost></code> sections in the + main server context, it is possible to log all requests in the + same access log and error log. This technique does not allow + for easy collection of statistics on individual virtual + hosts.</p> + + <p>If <code>CustomLog</code> or <code>ErrorLog</code> + directives are placed inside a <code><VirtualHost></code> + section, all requests or errors for that virtual host will be + logged only to the specified file. Any virtual host which does + not have logging directives will still have its requests sent + to the main server logs. This technique is very useful for a + small number of virtual hosts, but if the number of hosts is + very large, it can be complicated to manage. In addition, it + can often create problems with <a + href="vhosts/fd-limits.html">insufficient file + descriptors</a>.</p> + + <p>For the access log, there is a very good compromise. By + adding information on the virtual host to the log format + string, it is possible to log all hosts to the same log, and + later split the log into individual files. For example, + consider the following directives.</p> + + <blockquote> + <code>LogFormat "%v %l %u %t \"%r\" %>s %b" + comonvhost<br /> + CustomLog logs/access_log comonvhost</code> + </blockquote> + + <p>The <code>%v</code> is used to log the name of the virtual + host that is serving the request. Then a program like <a + href="programs/other.html">split-logfile</a> can be used to + post-process the access log in order to split it into one file + per virtual host.</p> + + <p>Unfortunately, no similar technique is available for the + error log, so you must choose between mixing all virtual hosts + in the same error log and using one error log per virtual + host.</p> + <hr /> + + <h2><a id="other" name="other">Other Log Files</a></h2> + + <table border="1"> + <tr> + <td valign="top"><strong>Related Modules</strong><br /> + <br /> + <a href="mod/mod_cgi.html">mod_cgi</a><br /> + <a href="mod/mod_rewrite.html">mod_rewrite</a> </td> + + <td valign="top"><strong>Related Directives</strong><br /> + <br /> + <a href="mod/core.html#pidfile">PidFile</a><br /> + <a + href="mod/mod_rewrite.html#RewriteLog">RewriteLog</a><br /> + <a + href="mod/mod_rewrite.html#RewriteLogLevel">RewriteLogLevel</a><br /> + <a href="mod/mod_cgi.html#scriptlog">ScriptLog</a><br /> + <a + href="mod/mod_cgi.html#scriptloglength">ScriptLogLength</a><br /> + <a + href="mod/mod_cgi.html#scriptlogbuffer">ScriptLogBuffer</a> + </td> + </tr> + </table> + + <h3><a id="pidfile" name="pidfile">PID File</a></h3> + + <p>On startup, Apache httpd saves the process id of the parent + httpd process to the file <code>logs/httpd.pid</code>. This + filename can be changed with the <a + href="mod/core.html#pidfile">PidFile</a> directive. The + process-id is for use by the administrator in restarting and + terminating the daemon by sending signals to the parent + process; on Windows, use the -k command line option instead. + For more information see the <a href="stopping.html">Stopping + and Restarting</a> page.</p> + + <h3><a id="scriptlog" name="scriptlog">Script Log</a></h3> + + <p>In order to aid in debugging, the <a + href="mod/mod_cgi.html#scriptlog">ScriptLog</a> directive + allows you to record the input to and output from CGI scripts. + This should only be used in testing - not for live servers. + More information is available in the <a + href="mod/mod_cgi.html">mod_cgi documentation</a>.</p> + + <h3><a id="rewritelog" name="rewritelog">Rewrite Log</a></h3> + + <p>When using the powerful and complex features of <a + href="mod/mod_rewrite.html">mod_rewrite</a>, it is almost + always necessary to use the <a + href="mod/mod_rewrite.html#RewriteLog">RewriteLog</a> to help + in debugging. This log file produces a detailed analysis of how + the rewriting engine transforms requests. The level of detail + is controlled by the <a + href="mod/mod_rewrite.html#RewriteLogLevel">RewriteLogLevel</a> + directive.</p> + <hr /> + + <h3 align="CENTER">Apache HTTP Server</h3> + <a href="./"><img src="images/index.gif" alt="Index" /></a> + + </body> +</html> + |