Name: gmodstat Version: 0.2.1, November 6th 2001 Author: gj_armitage@yahoo.com Copyright (c) 2001, Grenville Armitage 1. Summary: gmodstat supports post-analysis of QuakeIII Arena server logfiles to extract things like playing time trends, histograms of ping times as perceived by clients, the domains from which different clients connect, and the percentage of clients who may be playing from home and using NAT boxes. Servers must be running the gmod1.0 server mod (or something equivalent) to generate the additional logfile entries required by gmodstat. gmodstat was initially developed in a Win32 environment under MS Visual C++, but most subsequent development has occurred under FreeBSD using KDevelop 1.4. Installation instructions are included in section 2 for compiling gmodstat under *nix-like environments and Win32/MS Visual C++. gmodstat is released under the GNU General Public License, Version 2, 1991. gmodstat compiles 'out of the box' under FreeBSD4.3 and Win32/MS Visual C++ environments. It requires no special libraries, and should compile under other *nix environments. The rest of this file contains: Section 2: Installation Section 3: Intial startup Section 4: The basic definition of a client Section 5: IP address to domain name mappings Section 6: NAT and Home User estimations Section 7: Ping histograms Section 8: Played time histograms and charts Section 9: General game statistics Section 10: Other stuff Section 11: Conclusion Appendix A: New logfile tokens Appendix B: Release summary 2. Installation under *nix or Win32 2.1 Under FreeBSD/*nix environments The current development environment for gmodstat is FreeBSD4.3 with KDevelop 1.4, a free, X11-based C/C++ development tool. KDevelop automagically adds tools to create an appropriate makefile, with which you can generate a running executable. (I have not verified whether gmodstat can or cannot be compiled under anything other than FreeBSD4.3 or Win32, but I'd be interested in hearing experiences.) The following installation steps apply to *nix environments. The basic distribution is a gzipped tarfile named gmodstat-0.2.1.tar.gz, which creates a subdirectory ./gmodstat-0.2.1 when gunzipped/untar'ed. Once the tarfile is unpacked, perform the following steps: > cd ./gmodstat-0.2.1 > ./configure > cd gmodstat > make "./configure" will spend a minute or so inspecting your system, compiler settings, etc and generating appropriate makefiles. Once this has completed successfully, you move into the source subdirectory and run "make" to actually compile gmodstat. You can then either copy gmodstat to somewhere more convenient in your path, or use "make install" to automatically copy gmodstat into /usr/local/bin. (An alternate installation location can be specified during the configuration stage. If you wish to install into //bin then execute "./configure --prefix=//" instead of "./configure" before compiling. "make install" will then copy gmodstat to //bin/gmodstat.) Executing "make clean" in ./gmodstat-0.2.1/gmodstat will subsequently remove all intermediate object files. KDevelop 1.4's gmodstat.kdevprj file is also supplied, in case it helps you do further development of gmodstat. 2.2 Under Win32/MS Visual C++ I've supplied sample Visual C++ 6.0 project/workspace files ./gmodstat-0.2.1/gmodstat.dsp and ./gmodstat-0.2.1/gmodstat.dsw If you have Visual C++, you should be able to use WinZip (or similar) to unpack/untar the gmodstat distribution, then go into the ./gmodstat-0.2.1 folder and double-click on gmodstat.dsw to start start Visual C++. Tell Visual C++ to "build" and a Win32 version of gmodstat should be built. (At least, it worked for me on a Windows 2000 system. No promises it'll work on every Win32 platform, although I imagine it should.) The gmodstat executable (in ./gmodstat-0.2.1/Debug) must be run from a console window (or from within Visual C++). Where I discovered differences between Visual C++ 6.0 in a Win32 environment and KDevelop 1.4/gcc in a FreeBSD4.3 environment, I've used conditional compilation directives. The flag WIN32 should be set for Win32-compatible code, and unset for FreeBSD4.3 (or equivalent) environments. You will need to specifically link against ws2_32.lib (add under Project->Settings->Linker if you're using MS Visual C++) for Win32 (to bring in inet_ntoa() functions). 3. Starting gmodstat gmodstat is primarily controlled by options specified in a configuration file. By default, the file is ./gmodstatconf.txt. The file ./conf-example.txt contains brief commentary on a range of configuration options. Start gmodstat with: gmodstat (to use ./gmodstatconf.txt as config file) or gmodstat -c (where is the filename of your specific configuration file.) A certain amount of run-time status information is printed to stdout, with the main record of gmodstat's activities logged to the file "logout.txt". A number of auxiliary output files may also be generated, depending on the set of options specified in the configuration file. gmodstat can analyse a single QuakeIII server logfile, or a series of logfiles representing a server that has been running over a long period of time. (Ideally the server has been stopped and restarted every six days, resulting in the sequence of logfiles. After 6.9 days the server's timestamping loses some accuracy, and in my experience the server itself often gets flakey.) The next section briefly defines what gmodstat considers to be a 'client', section 5 discusses how to create an initial ipaddress to domain name mapping file, section 6 covers the generation of home user and NAT penetration analysis, while section 7 discusses how to generate ping histograms. Section 8 mentions how to generate played time histograms and cumulative played time plots. 4. Definition of a 'client' QuakeIII players are uniquely identified by their playername (an arbitrary ASCII string) and their IP address. However, because of the wide deployment of dynamic address assignment techniques by many ISPs, the same player may appear with many different (but related) IP addresses over many different appearances. In order to more accurately associate these instances as the same person, gmodstat uses the following algorithm to determine a client: Take the player's IP address, in dotted-quad form "w.x.y.z" Resolved this address into a domain name, Take the non-host part of and call it A Client is defined by the tuple For example, consider playername GUEST playing twice, with a different IP address each time that resolved to random123.dsl.myisp.com and otherpop.dsl.myisp.com - gmodstat would consider this to be the same client, since the domain suffix of ".dsl.myisp.com" is common in both cases. The logic here is that ISPs often dynamically assign addresses from related pools of IP addresses associated with the access points through which customers connect. Commonality of the domain suffix is a reasonable guesstimate that this represents the same human player simply being assigned a different dynamic IP address. My experience while developing gmodstat is that, while not perfectly accurate, the above algorithm is far better than counting each unique tuple as a distinct client. As a further optimization, un-resolved IP addresses are mapped to 'fake' domain suffixes inside gmodstat. When you specify 'faked_suffix_range n' in the config file: n = 4, fake suffixes are ".w-x-y-z.unresolved". n = 3, fake suffixes are ".x-y-z.unresolved". n = 2, fake suffixes are ".y-z.unresolved". n = 1, fake suffixes are ".z.unresolved". Where a client is being dynamically assigned IP addresses from a common address pool, these faked suffixes increase the chances we'll correctly recognize multiple unresolvable IP addresses as representing the same client. In order to avoid DNS lookups every time gmodstat is run, gmodstat can create a local file of ipaddr->domainname mappings for later re-use. This is discussed further in the next section. The configuration file option 'clientnames' causes gmodstat to dump the list of seen clients in descending order of total playing time. This list is dumped to file ./clients-logout.txt 5. Creating an intial ipaddress to domain name mapping file The first thing I recommend is generating a local copy of all the ipaddr->domainname mappings relevant to your logfile(s). Create a config file with, at minimum, the following entries: sourcefile resolve_missing dump_dns_maps (where is the logfile you are analysing) Start gmodstat, and it will begin walking through the supplied server logfile performing DNS lookups on every IP address found. Once this is complete, it will dump all the discovered ipaddr->domainname mappings to the file ./ipnames-logout.txt. Note that this initial process may take many minutes, as not all IP addresses have registered domain names. gmodstat currently sits idle when a DNS lookups stalls waiting to timeout. Now, on all subsequent runs of gmodstat add the following to the config file: hosts_file (where is a local copy of ./ipnames-logout.txt) With the 'hosts_file' option, gmodstat pre-loads its internal ipaddr->domainname mapping cache from the named file. This then avoids the length process of performing DNS lookups each time you re-run gmodstat on the same logfile(s). You can have both 'hosts_file', 'resolve_missing' and 'dump_dns_maps' specified concurrently in the config file - gmodstat will then use mappings from the local file when available, lookup the DNS for any new IP addresses it might find, and then dump the newly updated total list of seen ipaddr->domainname mappings to ./ipnames-logout.txt at the end of its run. If you have multiple logfiles they can be handled in one run by replacing 'sourcefile' with 'filelist nnn' where nnn is a text file containing each logfile's name one per line. 6. Analysing Home users and NAT penetration gmodstat can be used to estimate the use of NAT (network address translation) functionality across the Internet by looking for evidence of NAT in the client traffic. NAT is typically embedded in home routers and gateways, and sometimes in gateway routers of small ISPs. The tell-tale sign of NAT is where the UDP or TCP port numbers get modified from 'expected' values to unusual values in transit. QuakeIII uses well-known UDP port numbers - un-modified QuakeIII clients almost invariably use UDP port 27960 as the source port in the packets they send to the server. Detecting NAT is as simple as detecting clients who connect from a source UDP port other than 27960. gmod1.0 causes a server to log each player's source IP address and source UDP port number when they connect to the server. This information is used by gmodstat to estimate the % of NAT penetration in the player community. To do a simple NAT estimation, build a configuration file like this: sourcefile hosts_file clients_NAT_range 64 range_increment 8 humanreadable (where is the server log we're analysing and is the cache of resolved ipaddr->domainname mappings created per discussion in section 5.) 'clients_NAT_range 64' specifies that NAT estimation should be performed for a range of sets of clients, where each set is made up from the clients who played for more than N minutes, where 0 <= N <= 64. 'range_increment' says increase N in steps of 8. The output is dumped to ./client_stats.txt, and will be in verbose ASCII form because 'humandreadable' was set. gmodstat also tries to calculate how many clients are playing from "home" by counting how many clients' domain names fall under domains believed to represent "home users". You can set this list to be whatever you want, and select it with the 'homedomain_file' config option. Finally, gmodstat can also calculate the number of home users as a percentage of a specific subset of domains, which you can specify with the 'valid_domains' config option. See ./conf-example.txt for more details on these options. 7 Analysing client ping distributions Server's running gmod1.0 or later will generate ping sample histograms every few tens of seconds, reflecting hundreds or thousands of server-estimated ping samples. (By default gmod1.0 logs a new histogram every 2000 client frames, sampling the server's internal ping estimate each frame.) The configuration file option 'clientnames' causes gmodstat to dump the list of seen clients in descending order of total playing time. This list is dumped to file ./clients-logout.txt 7.1 Specific Clients To dump aggregate histograms of a specific client's ping samples, use the following config file options: sourcefile hosts_file single_client_phisto This will cause every sampled histogram to be dumped in ASCII format to disk for the client The ASCII file is "SCPH-.txt" and will have a format suitable for passing to xgraph, with each histo's timestamp as each title line. To create a series of per-game histograms, add the 'do_phisto_pergame' config option. The xgraph title line for each histo will be the game's starttime (in seconds since 1/1/1970). To create an aggregate histogram over all games played by the specified client, add the 'do_phisto_total' config option. Use only one of 'do_phisto_pergame' or 'do_phisto_total' at a time. As an alternative to specifying a particular client, you can have histograms generated for every client who played for more than a certain number of minutes (measured over all games) with: topN_client_phistos nnn Clients who played more than nnn minutes will have their ping histos dumped to disk in individual files named "SCPH-.txt" Although gmod1.0 uses buckets one millisecond wide, the aggregate histos generated under 'do_phisto_pergame' and 'do_phisto_total' modes can have larger buckets. Use: ping_histo_range nnn zzz to set the bucket width to nnn milliseconds, and a maximum ping value of zzz milliseconds. See ./conf-example.txt for more details on these options. 7.2 Overall client ping distributions gmodstat can also create aggregate histograms of median ping times seen by players in every game, sorted and filtered by source domain (rather than time played or player name). Use: graph_ping_histo This generates a single histogram in "./PH0-pinghisto.txt" (with bucket size set by 'ping_histo_range' as described earlier). A cumulative distribution of the median ping times is stored in "./CPH0-pinghisto.txt". Add the following option to restrict the histogram to only those clients who fall under certain domains: graph_include_regions (the allowed domains are specified by the 'homedomain_file' configuration option.) If 'graph_many' option is also specified, the histograms are generated for each specified domain rather than for the union of specified domains. In this case, the output files are named "./PHxxxx-pinghisto.txt" where "xxxx" is a domain suffix. The cumulative distribitions will be in "./CPHxxxx-pinghisto.txt". See ./conf-example.txt for more details on these options. 8. Analysing played time The configuration option 'graph_ptime_histo' will cause gmodstat to create a histogram of played time versus hour of the week, breaking the week up into 168 hours. The output will be dumped to "./PT0-ptimehisto.txt". This can be useful in seeing playing trends that have weekly cycles (although you really need to have logfiles covering many weeks before this histogram starts to show clear trends). Day 0 is Sunday local time, 0.999 is midnight on Sunday, 1.999 is midnight Monday, etc. If the config option 'hour_of_day' is also specified, the histogram becomes a time-of-day histogram of total played time during any given 30 minute period over a 24 hour period (where hour 0 to 0.99 is the first hour after midnight local time). If 'graph_include_regions' is specified, only the playing time of clients who fall under the specified domains will be counted. If 'graph_many' is specified, separate histograms will be created for clients falling under each domain, with the outputs dumped to "./PTxxxx-ptimehisto.txt" (where "xxxx" is a domain suffix). Note that both graph_ptime_histo and graph_ping_histo are modified by the same graph_include_regions and graph_many options. gmodstat can also dump the cumulative played time, which can reveal long term playing trends, popular days/weeks, or server downtimes: cumulative_gametime The total played time across all games seen in the logfile(s) is dumped to "cumulativetime.txt" as a list of XY pairs (X is calendar time, Y is cumulative time in days). By default, X is hours of the week (0 is 12am Sunday morning, 6.99 is midnight Saturday night, etc). Adding the 'day_of_year' option causes X axis to become day of the year (0 is Jan 1st). 9. General game statistics gmodstat can also provide a summary of the games seen in the logfile(s), the players present during each game, and the kills/deaths of each player. Use the option 'gamestats' to start dumping per-game information. Use the option 'playerstats' to list each player's stats per game. See ./conf-example.txt for more details on these options. 10. Other stuff A variety of other configuration options are listed in ./conf-example.txt that haven't been covered in this README. Specifying 'minimum_itemratio 1.0' is a good idea, so that gmodstat ignores players who appeared in a game and didn't manage to pick up more than one item per minute of played time. Such players are basically idle, and don't deserve to skew our ping and NAT estimations. In addition, the "UnnamedPlayer" is QuakeIII's default playername for clients who haven't properly configured their client software. By default gmodstat ignores them. Use 'include_unnamedplayer' config option to include UnnamedPlayer statistics. gmodstat assumes it can extract the start time of a given logfile from the logfile itself (gmod1.0 adds a "BaseTime:" token to logfiles it generates). However, the timestamp is relative to the server's local time. Thus, when comparing played time histograms, etc, from servers in different timezones you need to inform gmodstat of an appropriate offset relative to your local timezone. base_time_offset nn adjusts the logfile's own notion of its start time by nn hours (forward if postive, backward if negative). For example, use 'base_time_offset -8' to adjust a UK-based server's timestamps to Californian time. 11. Bugs, things TODO, Conclusions Naturally, this README file is not complete. Indeed, woefully inadequate in describing the output file formats of the various configuration options described here. The ultimate source of information is, of course, the source code. Unfortunately gmodstat has developed organically over the past year, so the code itself isn't always as clean and logical as I'd like. It is also still evolving, so you'll probably find routines and structures in there than have no apparent current or future use. Hopefully things will be cleaner in later releases. Enjoy! Appendix A: New logfile tokens gmodstat assumes there are a number of new tokens in the QuakeIII server's logfiles, and one modified token. The new ones are "ModVersion:", "BaseTime:", and "CPhisto2:". The modified token is "ClientConnect:". These tokens are supplied as part of the gmod1.0 (or later) server mod. A.1 ClientConnect ClientConnect is an existing token issued by the server when a new client has been detected (is in the Connecting state) but hasn't yet started playing. The new syntax is ClientConnect where is the small integer used by the server to uniquely identify clients during a game, and is one of: "w.x.y.z:pp" Client is from IP addr w.x.y.z, UDP port pp "seen" Client was seen in previous game, same ipaddr:port "bot" This is a bot, no network identity A.2 ModVersion Should be the first entry in the logfile, appears only once per logfile. The second parameter is a unique string identifying the version of gmod (in this case "gja1.0" identifies gmod 1.0) A.3 BaseTime Should be the first or second entry in the logfile, appears only once per logfile. The second parameter is a unique string identifying the local time at which the server was started. Format of the string is "ddmmyy-hhmm-0" to represent the date dd/mm/yy at time hhmm hours. A.4 CPhisto2 This token is the primary method for collecting ping data. Each line is of the form: CPhisto2: ID Low Hi lowerrs hierrs tdelta where: ID clientID Low the lowest bucket in this interval (ms) Hi the highest bucket in this interval (ms) lerr number of ping samples = 0ms (wierd but possible) herr number of ping samples > 998ms (mostly 999ms) tdelta number of milliseconds since last histogram string the histogram, encoded in printable ASCII gmod1.0 and 1.1 default to generating a new CPhisto2 line for each client every 2000 packets from the client to the server. CPhisto2 lines are also generated at the end of each game for every client, or when a client disconnects, if the client has sent at least 50 packets since the last CPhisto2 issued for that client. gmod1.0 and 1.1 use slight different encodings for , but in either case it is always less than 1024 characters long. Under gmod1.0 is: Repeated "XY" pairs of ASCII characters, or "+nn%" indicating the previous bucket's value is repeated in the next nn buckets (mostly used for suppressing adjacent buckets with value of zero when there's bi/multi-modal distribution of ping values). The "XY" pairs use base64, with X being the 64s column and Y being the 1s column. The ASCII encoding adds 32 (code for " ") to the base64 digit. This way each bucket can count up to 4095 using just two ASCII characters. Under gmod1.1 is: Repeated "XY" pairs of ASCII characters, or "znn%" indicating the previous bucket's value is repeated in the next nn buckets (mostly used for suppressing adjacent buckets with value of zero when there's bi/multi-modal distribution of ping values). The "XY" pairs use base64, with X being the 64s column and Y being the 1s column. The ASCII encoding adds 33 (code for " ") to the base64 digit. This way each bucket can count up to 4095 using just two ASCII characters. The total number of samples represented by a CPhisto2 line can be calculated simply by summing the values in every histogram bucket. The total number of client frames that were seen since the previous CPhisto2 line can be calculated from the total samples in the histo + lerr + herr. Given knowledge of the total number of frames since the previous CPhisto2, and the time since the previous CPhisto2 (given by the tdelta field) you can calculate the average client frame rate. A.5 CPhistoErr This is a variant on CPhisto2, and only occurs when the server could not compressed under 1024 characters for some reason. There's not much gmodstat can do about such entries, and they mean that the ping samples of the last 2000 frames must have been fairly evenly and widely spread out. Such lines are of the form: CPhistoErr: ID Low Hi lowerrs hierrs tdelta histo-too-long where the parameters are as for CPhisto2, and the text "histo-too-long" replaces the compressed ASCII histogram. Appendix B: Release Summary Releases to date: 0.2.1 11/6/01 - Fixed malloc() bug in NAT estimation routines. - Noted that cumulative_time Y-axis represents days rather than hours 0.2 10/28/01 - Fixed bug in the 'graph_ping_histo' routine (median ping values would be erroneously scaled by 1/N where N is the ping histogram bucket size set by ping_histo_range). - Clarified documentation for graph_ping_histo: Per-game median pings only calculated for games wherein which the client generated three or more "CPhisto2" log entries. 0.1 9/28/01 (First release) gj_armitage@yahoo.com