NOTE: The original URL for this file is: http://www.isthe.com/chongo/src/webalizer-patch/README-FIRST =-=-= To Use and Install this patch: The webalizer-2.01-10-chongo-rollup.patch should be applied to the base webalizer-2.01-10 (Webalizer v2.01-10) tree as follows: # place a copy the original webalizer-2.01-10 source under: # # ./webalizer-2.01-10 patch -p0 < webalizer-2.01-10-chongo-rollup.patch IMPORTANT NOTE: The track_hist tool should be downloaded: http://www.isthe.com/chongo/src/webalizer-patch/track_hist and run on a monthly basis. This will allow you to extend the webalizer history beyond the initial 12 months. For more details see the summary of the webalizer-2.01-10-chongo-rollup.patch comments below. =-=-= Optional GeoIP patch: If AND ONLY IF you use one the MaxMind (http://www.maxmind.com/) GeoIP databases, then apply the webalizer-2.01-10-GeoIP-rollup.patch AFTER you have applied the webalizer-2.01-10-chongo-rollup.patch file, as follows: patch -p0 < webalizer-2.01-10-GeoIP-rollup.patch =-=-= Solaris users: Solaris uses u_longlong_t instead of u_int64_t. So when you build and install, try: cd somepath/webalizer-2.01-10 rm -f config.cache CFLAGS="$CFLAGS -Du_int64_t=u_longlong_t" ./configure [--optional_flags ...] make all make install See: ./configure --help for a list of optional flags. =-=-= Here is a summary of the webalizer-2.01-10-chongo-rollup.patch: ################## # Allow webalizer on systems such as Linux to process very large log # files (such as >2GB in size). ################## # Some of the entries on the list are not countries. In some cases the # nation state status is contested. In other cases the entry is related # to a territory that does not claim to be a country. In some cases # what some claim is a country is in dispute by another country. And # things like .arpa are not a country. # # I recommend that one use the term 'location' instead of 'Nation' or # 'Country' to avoid the whole mess. ;-) # # Added are some missing locations (from the ISO UN codes and # from GeoIP's list). Some location names have been corrected # or changed to their official name. Added some more TLDs. ################## # Spammers and other low-life forms have been stuffing the # "top N referrer" table in order to get webalizer to generate # links to their sites ... (perhaps because they think this # will improve their search engine placement or perhaps because # they wish to direct people to a poisoned web page in an effort # to exploit some browser bug?). Whatever the reason, we don't # need to give them their links. # # This patch turns the "top N referrer" table into just values # instead of A tags. ################## # A number of Linux distributions need this change (because # they do not ship with the DB compatibility headers and libs?). # # The POSIX time spec says that during a leap second, the seconds value # may be 60. And yes ... we had an entry that came in during a leap # second that was rejected. :-) ################## # The URL, referrer and search frequently are a bit longer than # what the original code wanted to deal with. A 256 char limit # seems to catch most of the longer values. ################## # For very busy sites, 32 bit signed counters can overflow. This # is particularly when using webalizer to cover a long span of time. # This patch converts a few values to be u_int64_t to avoid these # numeric overflow problems. ################## # By default, webalizer only keeps the last 12 months of data. And at the # start of a month, the oldest month is discarded resulting in only 11+ # months of data. # # This code gets around the 12 month limit by maintaining a history of # older months in a parallel directory ../history. The summary file: # # ../history/summary # # contains lines, which of which describes a row in the 'Summary by Month' # table. Each line as the following fields separated by a single space: # # field 0: yyyy 4 digit year (e.g., 2003) # field 1: MM 2 digit month, with leading 0 (e.g., 08) # field 2: Mon 3 Char month name (e.g., Aug) # field 3: ../history/usage_yyyyMM.html # URL of the monthly stats. In that same # directory is the ctry_usage_yyyyMM.png, # daily_usage_yyyyMM.png, and the # hourly_usage_yyyyMM.png files. # field 4: summary_dhits Average daily hits # field 5: summary_dfiles Average daily files # field 6: summary_dpages Average daily pages # field 7: summary_dvisits Average daily visits # field 8: summary_tsites Monthly total sites # field 9: summary_tkbytes Monthly total KBytes # field 10: summary_tvisits Monthly total visits # field 11: summary_tpages Monthly total pages # field 12: summary_tfiles Monthly total files # field 13: summary_thits Monthly total hits # # Example of a ../history/summary line: # # 2003 08 Aug ../history/usage_200308.html 13364 3634 10954 567 11138 3623829 17587 339601 112666 414304 # # The ../history/summary file contains lines that prefer to completed webalizer # month summaries. Webalizer will process its 12 monthly as it does before. # However with this patch it will look over into the ../history/summary file # for even older months to append to the end of the 'Summary by Month' table. # # The file ../history/prehistory should contain a single line in the same # form as ../history/summary except that the field 3 is just 'index.html'. # Fields 0, 1 and 2 refer to the oldest month in the ../history/summary file. # This is the total of the stats that have been lost / rolled off the # end and are no longer available. If you have no prehistory just use: # # 2004 07 Jul index.html 0 0 0 0 0 0 0 0 0 0 # # where '2004 07 (Jul)' is the current month. # # Webalizer will produce a grand total that includes the 12 recent months, # all of the history months mentioned in the ../history/summary file and # the stats from the ../history/prehistory file. # # If ../history/summary or ../history/prehistory are not found, Webalizer # will ignore them and just output the standard last 12 months. # ####################### # IMPORTANT NOTE!!!!! # ####################### # # The track_hist tool will generate and update your ../history directory. # All you need to do is once a month, run the track_hist # script giving it the director(ies) under which webalizer works. # So if your webalizer data is found in: # # /var/www/html/webalizer/usage/index.html # # at the 1st of the month, after you run webalizer, run: # # track_hist /var/www/html/webalizer # # See: # # http://www.isthe.com/site/isthe/webalizer/usage/index.html # # for an example if what an extended history looks like. ################## # Added -z option so that webalizer will strip off the absolute part of URLs # found in logs. For example, if a log shows the access of the URL # # http://www.example.com/some/path.html # # the -z flag will cause webalizer to read it as: # # /some/path.html # # The reason why you want to use -z is so that webalizer will not # produce web pages with links that refer to external URLs. Link # spammers might attempt to inject bogus URLs into the logs in order # to try can get webalizer to create links to their site (say because # they think their site will go up in search page rank if they create # links to their site). Many web servers, such as apache, will by # default convert an HTTP request of: # # GET http://example.net/index.html HTTP/1.1 # Host: www.example.com # # Even though the apache web server is setup for serving web pages # for www.example.com only, the URL for http://example.net/index.html # is treated as a local URL and is resolved as: # # /index.html # # And thus apache will give a 200 code if /index.html is accessible # (which it usually is). However apache logs the URL access as the # absolute http://example.net/index.html URL even though the local # URL /index.html is returned! # # A URL spammer can fetch with these # absolute URLs enough to push their # absolute URL site become one of the top N URLs in a webalizer # report. # # In addition, system cracker tools that probe for open web proxies and # exploit URLs will issue HTTP requests with absolute URLs. So they are # another way that strange absolute URLs can wind up in access logs. # # By using the -z flag, webalizer will strip out the method://host.name # and effectively convey the absolute URL into the relative local URL that # the web server actually processed. Thus with -z, webalizer will only # produce URLs that are relative to the local web server. ################## # Added -Z option so that webalizer will ignore any URL in an access log # that is NOT an absolute URL. # # The reason why one might want to use this option is when one is # combining logs from multiple web sites where each sites log is # converted into an absolute URL for that site. By ignoring all # non-absolute URLs, only those converted log entries will be processed. ################## # The proto:// is now stripped off of referrers before the HideReferrer # and GroupReferrer rules are applied to referrer strings. How instead # of doing: # # GroupReferrer http://www.example.com/curds/whey.html Curds_n_whey # HideReferrer http://www.example.com/curds/whey.html # # one must do: # # GroupReferrer www.example.com/curds/whey.html Curds_n_whey # HideReferrer www.example.com/curds/whey.html # # without the leading http:// or https:// or proto:// for that matter. # ####################### # IMPORTANT NOTE!!!!! # ####################### # # In fact, the proto:// or proto:/ or just proto: is removed from all # IncludeReferrer, GroupReferrer and HideReferrer 2nd field strings. # This means that if # you wrote: # # GroupReferrer ftp:* ftp # # then webalizer would strip off the ftp: and you would have in effect: # # GroupReferrer * ftp # # which would cause every referrer string to fall into the ftp group. # Since this is likely not what you want, it is best to avoid using # and match of the form: # # something: # or something:/ # of something:// # # with any IncludeReferrer, GroupReferrer or HideReferrer directive. # # A number of people fell into this type of problem before in certain # cases. Now, webalizer is at least consistent in stripping any # leading proto: or proto:/ or proto:// from such strings, both in # referrer strings from log files and from the 2nd arg of all # IncludeReferrer, GroupReferrer and HideReferrer directives. ################## # Now only transactions that return a 2xx or 304 code are counted as # hits. Before any HTTP / HTTPS operaction, even those that resulted # in errors such as 403 forbidden or 404 not found were counted as # page hits. Now only 2xx success codes and 304 (not modified since # last proxy check) transactions count as hits. # # In addition, only those URLs that return 2xx or 304 are listed # in the URL hits and bytes. Only those sites that access URLs returnning # 2xx or 304 are counted as sites. Before a system cracker trolling for web # exploits by looking for exploitable URLs would not only generate # hit counts, their URLs could be registered in the topN list. # Now such expliot attempts do not alter the URL, hitcount, bytes and # site stats. ################## # Improved the Monthly Stats table. Added 2 digits to the numbers # reported to help with low volume sites. # # Distinguished between normal hits and abnormal hits. The Hits by # Response Code table notes which codes are normal hits. Normal # hits are transactions that return a 2xx or 304 (see above). # Monthly Stats table reports on normal and abnormal hit average # hit rate. ################## # Added report of usgae by directory. The stats of all web pages # directly under a directory are accumulated into a single directory # statistic. Note that the accumulation is not recursive. For # example, the stats for the top of the web tree only reflect the # accumulated stats of those web pages immediately under the # top level directory (not all files everyone under it). # # The -j flag disables reporting on the top N directories. # The -J N changes the default top number of directories reported. ################## =-=-= Here is a summary of the optional webalizer-2.01-10-GeoIP-rollup.patch: ################## # If AND ONLY IF you use one the MaxMind (http://www.maxmind.com/) GeoIP # databases, then apply this patch. # # This patch, on Un*x / Linux / GNU-Linux systems performs the same as the # geolizer.patch file in geolizer_2.01-10-patch.20021107.tar.gz # line adjusted to deal with the previous 0.basic.patch, 1.64bit.patch, # and 2.hist.patch patches. # # When I apply this patch, I configure and build as follows: # # cd webalizer-2.01-10 # ./configure --with-gdlib --enable-geoip # make all # # The important part is to add --enable-geoip to the configure line. ################## =-=-= Here are the SHA-1 hashes of the patch files: ee3a2380c871b2366c1172fd2093ff7611cbeb7c webalizer-2.01-10-GeoIP-rollup.patch 67d1f2d5ba25736dd68826da1285b7eb74843115 webalizer-2.01-10-chongo-rollup.patch Here are the MD5 hashes of the patch files: 9d4fde43a1ecd21f63726d689e2d539b webalizer-2.01-10-GeoIP-rollup.patch 6c556cd71ed8627fd8b2c2916a35c33e webalizer-2.01-10-chongo-rollup.patch =-=-= A copy of the original webalizer v2.01-10 source may be found at: http://www.isthe.com/chongo/src/webalizer-patch/webalizer-2.01-10-origsrc.tgz Here is a gzipped tar file of what the that source tree looks like after the webalizer-2.01-10-chongo-rollup.patch has been applied: http://www.isthe.com/chongo/src/webalizer-patch/webalizer-2.01-10-after-rollup.tgz Here is a gzipped tar file of what the that source tree looks like after the webalizer-2.01-10-chongo-rollup.patch and followed by the webalizer-2.01-10-GeoIP-rollup.patch has been applied: http://www.isthe.com/chongo/src/webalizer-patch/webalizer-2.01-10-after-rollup-with-GeoIP.tgz Here are the SHA-1 hases of these gziped tar files: d1ddfa297981a8aa6e1e4195777bfd47c14c6641 webalizer-2.01-10-after-rollup-with-GeoIP.tgz bc4e082d52301b432a9705e9c0c3f18fdfe663e0 webalizer-2.01-10-after-rollup.tgz b13bceac94b221b5435d45b142d30663d7399f40 webalizer-2.01-10-origsrc.tgz Here are the MD5 hashes of these gziped tar files: d951a309922bcb4959fd4af64786964e webalizer-2.01-10-after-rollup-with-GeoIP.tgz 3b96c212b9db864ab3aeea5f6b30163e webalizer-2.01-10-after-rollup.tgz 9217595005aec46a505e1fb349052a8e webalizer-2.01-10-origsrc.tgz Here is the SHA1 hash of the track_hist tool: 488b7f5f550599d55e4d50d01088ab130eabbb5f track_hist Here is the MD5 hash of the track_hist tool: d59c21aca64e2672ce7cd09e3b25e4db track_hist =-=-= # NOTE: This README-FIRST file: # # Copyright (c) 2007 by Landon Curt Noll. All Rights Reserved. # # Permission to use, copy, modify, and distribute this software and # its documentation for any purpose and without fee is hereby granted, # provided that the above copyright, this permission notice and text # this comment, and the disclaimer below appear in all of the following: # # supporting documentation # source copies of this file # # LANDON CURT NOLL DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, # INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO # EVENT SHALL LANDON CURT NOLL BE LIABLE FOR ANY SPECIAL, INDIRECT OR # CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF # USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR # OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR # PERFORMANCE OF THIS SOFTWARE. # # chongo (Landon Curt Noll, http://www.isthe.com/chongo/index.html) /\oo/\ # # Share and enjoy! :-) # PLEASE NOTE: # # I do not support nor did I write webalizer, I just use it. # These patches are offered to you in the hopes that you # will find them useful and in the hopes that they might # make it into a webalizer release someday. chongo (Landon Curt Noll, http://www.isthe.com/chongo/index.html) /\oo/\ Share and enjoy! :-)