diff options
Diffstat (limited to 'usr.sbin/httpd/htdocs/manual/vhosts/details_1_2.html')
-rw-r--r-- | usr.sbin/httpd/htdocs/manual/vhosts/details_1_2.html | 410 |
1 files changed, 410 insertions, 0 deletions
diff --git a/usr.sbin/httpd/htdocs/manual/vhosts/details_1_2.html b/usr.sbin/httpd/htdocs/manual/vhosts/details_1_2.html new file mode 100644 index 00000000000..3560ee48cab --- /dev/null +++ b/usr.sbin/httpd/htdocs/manual/vhosts/details_1_2.html @@ -0,0 +1,410 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> +<HTML><HEAD> +<TITLE>An In-Depth Discussion of VirtualHost Matching</TITLE> +</HEAD> + +<!-- Background white, links blue (unvisited), navy (visited), red (active) --> +<BODY + BGCOLOR="#FFFFFF" + TEXT="#000000" + LINK="#0000FF" + VLINK="#000080" + ALINK="#FF0000" +> +<DIV ALIGN="CENTER"> + <IMG SRC="../images/sub.gif" ALT="[APACHE DOCUMENTATION]"> + <H3> + Apache HTTP Server Version 1.3 + </H3> +</DIV> + +<H1 ALIGN="CENTER">An In-Depth Discussion of VirtualHost Matching</H1> + +<P>This is a very rough document that was probably out of date the moment +it was written. It attempts to explain exactly what the code does when +deciding what virtual host to serve a hit from. It's provided on the +assumption that something is better than nothing. The server version +under discussion is Apache 1.2. + +<P>If you just want to "make it work" without understanding +how, there's a <A HREF="#whatworks">What Works</A> section at the bottom. + +<H3>Config File Parsing</H3> + +<P>There is a main_server which consists of all the definitions appearing +outside of <CODE>VirtualHost</CODE> sections. There are virtual servers, +called <EM>vhosts</EM>, which are defined by +<A + HREF="../mod/core.html#virtualhost" +><SAMP>VirtualHost</SAMP></A> +sections. + +<P>The directives +<A + HREF="../mod/core.html#port" +><SAMP>Port</SAMP></A>, +<A + HREF="../mod/core.html#servername" +><SAMP>ServerName</SAMP></A>, +<A + HREF="../mod/core.html#serverpath" +><SAMP>ServerPath</SAMP></A>, +and +<A + HREF="../mod/core.html#serveralias" +><SAMP>ServerAlias</SAMP></A> +can appear anywhere within the definition of +a server. However, each appearance overrides the previous appearance +(within that server). + +<P>The default value of the <CODE>Port</CODE> field for main_server +is 80. The main_server has no default <CODE>ServerName</CODE>, +<CODE>ServerPath</CODE>, or <CODE>ServerAlias</CODE>. + +<P>In the absence of any +<A + HREF="../mod/core.html#listen" +><SAMP>Listen</SAMP></A> +directives, the (final if there +are multiple) <CODE>Port</CODE> directive in the main_server indicates +which port httpd will listen on. + +<P> The <CODE>Port</CODE> and <CODE>ServerName</CODE> directives for +any server main or virtual are used when generating URLs such as during +redirects. + +<P> Each address appearing in the <CODE>VirtualHost</CODE> directive +can have an optional port. If the port is unspecified it defaults to +the value of the main_server's most recent <CODE>Port</CODE> statement. +The special port <SAMP>*</SAMP> indicates a wildcard that matches any port. +Collectively the entire set of addresses (including multiple +<SAMP>A</SAMP> record +results from DNS lookups) are called the vhost's <EM>address set</EM>. + +<P> The magic <CODE>_default_</CODE> address has significance during +the matching algorithm. It essentially matches any unspecified address. + +<P> After parsing the <CODE>VirtualHost</CODE> directive, the vhost server +is given a default <CODE>Port</CODE> equal to the port assigned to the +first name in its <CODE>VirtualHost</CODE> directive. The complete +list of names in the <CODE>VirtualHost</CODE> directive are treated +just like a <CODE>ServerAlias</CODE> (but are not overridden by any +<CODE>ServerAlias</CODE> statement). Note that subsequent <CODE>Port</CODE> +statements for this vhost will not affect the ports assigned in the +address set. + +<P> +All vhosts are stored in a list which is in the reverse order that +they appeared in the config file. For example, if the config file is: + +<BLOCKQUOTE><PRE> + <VirtualHost A> + ... + </VirtualHost> + + <VirtualHost B> + ... + </VirtualHost> + + <VirtualHost C> + ... + </VirtualHost> +</PRE></BLOCKQUOTE> + +Then the list will be ordered: main_server, C, B, A. Keep this in mind. + +<P> +After parsing has completed, the list of servers is scanned, and various +merges and default values are set. In particular: + +<OL> +<LI>If a vhost has no + <A + HREF="../mod/core.html#serveradmin" + ><CODE>ServerAdmin</CODE></A>, + <A + HREF="../mod/core.html#resourceconfig" + ><CODE>ResourceConfig</CODE></A>, + <A + HREF="../mod/core.html#accessconfig" + ><CODE>AccessConfig</CODE></A>, + <A + HREF="../mod/core.html#timeout" + ><CODE>Timeout</CODE></A>, + <A + HREF="../mod/core.html#keepalivetimeout" + ><CODE>KeepAliveTimeout</CODE></A>, + <A + HREF="../mod/core.html#keepalive" + ><CODE>KeepAlive</CODE></A>, + <A + HREF="../mod/core.html#maxkeepaliverequests" + ><CODE>MaxKeepAliveRequests</CODE></A>, + or + <A + HREF="../mod/core.html#sendbuffersize" + ><CODE>SendBufferSize</CODE></A> + directive then the respective value is + inherited from the main_server. (That is, inherited from whatever + the final setting of that value is in the main_server.) + +<LI>The "lookup defaults" that define the default directory + permissions + for a vhost are merged with those of the main server. This includes + any per-directory configuration information for any module. + +<LI>The per-server configs for each module from the main_server are + merged into the vhost server. +</OL> + +Essentially, the main_server is treated as "defaults" or a +"base" on +which to build each vhost. But the positioning of these main_server +definitions in the config file is largely irrelevant -- the entire +config of the main_server has been parsed when this final merging occurs. +So even if a main_server definition appears after a vhost definition +it might affect the vhost definition. + +<P> If the main_server has no <CODE>ServerName</CODE> at this point, +then the hostname of the machine that httpd is running on is used +instead. We will call the <EM>main_server address set</EM> those IP +addresses returned by a DNS lookup on the <CODE>ServerName</CODE> of +the main_server. + +<P> Now a pass is made through the vhosts to fill in any missing +<CODE>ServerName</CODE> fields and to classify the vhost as either +an <EM>IP-based</EM> vhost or a <EM>name-based</EM> vhost. A vhost is +considered a name-based vhost if any of its address set overlaps the +main_server (the port associated with each address must match the +main_server's <CODE>Port</CODE>). Otherwise it is considered an IP-based +vhost. + +<P> For any undefined <CODE>ServerName</CODE> fields, a name-based vhost +defaults to the address given first in the <CODE>VirtualHost</CODE> +statement defining the vhost. Any vhost that includes the magic +<SAMP>_default_</SAMP> wildcard is given the same <CODE>ServerName</CODE> as +the main_server. Otherwise the vhost (which is necessarily an IP-based +vhost) is given a <CODE>ServerName</CODE> based on the result of a reverse +DNS lookup on the first address given in the <CODE>VirtualHost</CODE> +statement. + +<P> + +<H3>Vhost Matching</H3> + + +<P><STRONG>Apache 1.3 differs from what is documented +here, and documentation still has to be written.</STRONG> + +<P> +The server determines which vhost to use for a request as follows: + +<P> <CODE>find_virtual_server</CODE>: When the connection is first made +by the client, the local IP address (the IP address to which the client +connected) is looked up in the server list. A vhost is matched if it +is an IP-based vhost, the IP address matches and the port matches +(taking into account wildcards). + +<P> If no vhosts are matched then the last occurrence, if it appears, +of a <SAMP>_default_</SAMP> address (which if you recall the ordering of the +server list mentioned above means that this would be the first occurrence +of <SAMP>_default_</SAMP> in the config file) is matched. + +<P> In any event, if nothing above has matched, then the main_server is +matched. + +<P> The vhost resulting from the above search is stored with data +about the connection. We'll call this the <EM>connection vhost</EM>. +The connection vhost is constant over all requests in a particular TCP/IP +session -- that is, over all requests in a KeepAlive/persistent session. + +<P> For each request made on the connection the following sequence of +events further determines the actual vhost that will be used to serve +the request. + +<P> <CODE>check_fulluri</CODE>: If the requestURI is an absoluteURI, that +is it includes <CODE>http://hostname/</CODE>, then an attempt is made to +determine if the hostname's address (and optional port) match that of +the connection vhost. If it does then the hostname portion of the URI +is saved as the <EM>request_hostname</EM>. If it does not match, then the +URI remains untouched. <STRONG>Note</STRONG>: to achieve this address +comparison, +the hostname supplied goes through a DNS lookup unless it matches the +<CODE>ServerName</CODE> or the local IP address of the client's socket. + +<P> <CODE>parse_uri</CODE>: If the URI begins with a protocol +(<EM>i.e.</EM>, <CODE>http:</CODE>, <CODE>ftp:</CODE>) then the request is +considered a proxy request. Note that even though we may have stripped +an <CODE>http://hostname/</CODE> in the previous step, this could still +be a proxy request. + +<P> <CODE>read_request</CODE>: If the request does not have a hostname +from the earlier step, then any <CODE>Host:</CODE> header sent by the +client is used as the request hostname. + +<P> <CODE>check_hostalias</CODE>: If the request now has a hostname, +then an attempt is made to match for this hostname. The first step +of this match is to compare any port, if one was given in the request, +against the <CODE>Port</CODE> field of the connection vhost. If there's +a mismatch then the vhost used for the request is the connection vhost. +(This is a bug, see observations.) + +<P> +If the port matches, then httpd scans the list of vhosts starting with +the next server <STRONG>after</STRONG> the connection vhost. This scan does not +stop if there are any matches, it goes through all possible vhosts, +and in the end uses the last match it found. The comparisons performed +are as follows: + +<UL> +<LI>Compare the request hostname:port with the vhost + <CODE>ServerName</CODE> and <CODE>Port</CODE>. + +<LI>Compare the request hostname against any and all addresses given in + the <CODE>VirtualHost</CODE> directive for this vhost. + +<LI>Compare the request hostname against the <CODE>ServerAlias</CODE> + given for the vhost. +</UL> + +<P> +<CODE>check_serverpath</CODE>: If the request has no hostname +(back up a few paragraphs) then a scan similar to the one +in <CODE>check_hostalias</CODE> is performed to match any +<CODE>ServerPath</CODE> directives given in the vhosts. Note that the +<STRONG>last match</STRONG> is used regardless (again consider the ordering of +the virtual hosts). + +<H3>Observations</H3> + +<UL> + +<LI>It is difficult to define an IP-based vhost for the machine's + "main IP address". You essentially have to create a bogus + <CODE>ServerName</CODE> for the main_server that does not match the + machine's IPs. + <P> + +<LI>During the scans in both <CODE>check_hostalias</CODE> and + <CODE>check_serverpath</CODE> no check is made that the vhost being + scanned is actually a name-based vhost. This means, for example, that + it's possible to match an IP-based vhost through another address. But + because the scan starts in the vhost list at the first vhost that + matched the local IP address of the connection, not all IP-based vhosts + can be matched. + <P> + Consider the config file above with three vhosts A, B, C. Suppose + that B is a named-based vhost, and A and C are IP-based vhosts. If + a request comes in on B or C's address containing a header + "<SAMP>Host: A</SAMP>" then + it will be served from A's config. If a request comes in on A's + address then it will always be served from A's config regardless of + any Host: header. + </P> + +<LI>Unless you have a <SAMP>_default_</SAMP> vhost, + it doesn't matter if you mix name-based vhosts in amongst IP-based + vhosts. During the <CODE>find_virtual_server</CODE> phase above no + named-based vhost will be matched, so the main_server will remain the + connection vhost. Then scans will cover all vhosts in the vhost list. + <P> + If you do have a <SAMP>_default_</SAMP> vhost, then you cannot place + named-based vhosts after it in the config. This is because on any + connection to the main server IPs the connection vhost will always be + the <SAMP>_default_</SAMP> vhost since none of the name-based are + considered during <CODE>find_virtual_server</CODE>. + </P> + +<LI>You should never specify DNS names in <CODE>VirtualHost</CODE> + directives because it will force your server to rely on DNS to boot. + Furthermore it poses a security threat if you do not control the + DNS for all the domains listed. + <A HREF="dns-caveats.html">There's more information + available on this and the next two topics</A>. + <P> + +<LI><CODE>ServerName</CODE> should always be set for each vhost. Otherwise + A DNS lookup is required for each vhost. + <P> + +<LI>A DNS lookup is always required for the main_server's + <CODE>ServerName</CODE> (or to generate that if it isn't specified + in the config). + <P> + +<LI>If a <CODE>ServerPath</CODE> directive exists which is a prefix of + another <CODE>ServerPath</CODE> directive that appears later in + the configuration file, then the former will always be matched + and the latter will never be matched. (That is assuming that no + Host header was available to disambiguate the two.) + <P> + +<LI>If a vhost that would otherwise be a name-vhost includes a + <CODE>Port</CODE> statement that doesn't match the main_server + <CODE>Port</CODE> then it will be considered an IP-based vhost. + Then <CODE>find_virtual_server</CODE> will match it (because + the ports associated with each address in the address set default + to the port of the main_server) as the connection vhost. Then + <CODE>check_hostalias</CODE> will refuse to check any other name-based + vhost because of the port mismatch. The result is that the vhost + will steal all hits going to the main_server address. + <P> + +<LI>If two IP-based vhosts have an address in common, the vhost appearing + later in the file is always matched. Such a thing might happen + inadvertently. If the config has name-based vhosts and for some reason + the main_server <CODE>ServerName</CODE> resolves to the wrong address + then all the name-based vhosts will be parsed as ip-based vhosts. + Then the last of them will steal all the hits. + <P> + +<LI>The last name-based vhost in the config is always matched for any hit + which doesn't match one of the other name-based vhosts. + +</UL> + +<H3><A NAME="whatworks">What Works</A></H3> + +<P>In addition to the tips on the <A HREF="dns-caveats.html#tips">DNS +Issues</A> page, here are some further tips: + +<UL> + +<LI>Place all main_server definitions before any VirtualHost definitions. +(This is to aid the readability of the configuration -- the post-config +merging process makes it non-obvious that definitions mixed in around +virtualhosts might affect all virtualhosts.) +<P> + +<LI>Arrange your VirtualHosts such +that all name-based virtual hosts come first, followed by IP-based +virtual hosts, followed by any <SAMP>_default_</SAMP> virtual host +<P> + +<LI>Avoid <CODE>ServerPaths</CODE> which are prefixes of other +<CODE>ServerPaths</CODE>. If you cannot avoid this then you have to +ensure that the longer (more specific) prefix vhost appears earlier in +the configuration file than the shorter (less specific) prefix +(<EM>i.e.</EM>, "ServerPath /abc" should appear after +"ServerPath /abcdef"). +<P> + +<LI>Do not use <EM>port-based</EM> vhosts in the same server as +name-based vhosts. A loose definition for port-based is a vhost which +is determined by the port on the server (<EM>i.e.</EM>, one server with +ports 8000, 8080, and 80 - all of which have different configurations). +<P> + +</UL> + +<HR> + +<H3 ALIGN="CENTER"> + Apache HTTP Server Version 1.3 +</H3> + +<A HREF="./"><IMG SRC="../images/index.gif" ALT="Index"></A> +<A HREF="../"><IMG SRC="../images/home.gif" ALT="Home"></A> + +</BODY> +</HTML> |