What are Log File Analysis and Page Tagging?
Posted on | September 27, 2006 | Comments Off
Normally we talk about web stats tools as one unique thing but in fact there are two very different technologies that most of web stats tool use, those are Log File Analysis and Page Tagging:

What are Log File Analysis and Page Tagging?
Log File Analysis: Web log analysis software (also called a web log analyzer) is software that parses a log file from a web server (like Apache), and based on the values contained in the log file, derives indicators about who, when and how a web server is visited. The indicators reported by most of web log analyzers are:
o Number of visits and number of unique visitors
o Visits duration and last visits
o Authenticated users, and last authenticated visits
o Days of week and rush hours
o Domains/countries of host’s visitors
o Hosts list
o Most viewed, entry and exit pages
o Files type
o OS used
o Browsers used
o Robots
o Search engines, keyphrases and keywords used to find the analyzed web site
o HTTP errors
Some polular web analyzers are:
o Analog
o AWStats
o NetTracker
o SmarterStats
o Visitors
o Webalizer
o WebTrends
o W3Perl
o Urchin.

Page Tagging: Concerns about the accuracy of logfile analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging.
In the mid 1990s, Web counters were commonly seen — these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed by a web analytics company, and extensive statistics generated. This can be done remotely, by the web analytics company.
The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits.
Pros and Cons of Log File Analysis:
Advantages:
o The web server normally already produces logfiles, so the raw data is already available. To collect data via page tagging requires changes to the website.
o The web server reliably records every transaction it makes. Page tagging relies on the visitors’ browsers co-operating, which a certain proportion may not do.
o The data is on the company’s own servers, and is in a standard, rather than a proprietary, format. This makes it easy for a company to switch programs later, use several different programs, and analyze historical data with a new program. Page tagging solutions involve vendor lock-in.
o Logfiles contain information on visits from search engine spiders. Although these should not be reported as part of the human activity, it is important data for performing search engine optimization.
o Logfiles contain information on failed requests; page tagging only records an event if the page is successfully viewed.
o Require Experts: For Log File analysis, you have to hire the experts to handle all the technical tasks; it’s a little complex thing for the non-technical ones.
o Storing Management: For high traffic sites the Log files can be too large, so the storage and management of the Log files in-house needs some investments at the installation time.
o Complexity: This method is a little complex in nature, at the beginning, the settings and filtering of unwanted visits are some tedious technical tasks inside this.
Pros and Cons of Page Tagging:
Advantages:
o The JavaScript is automatically run every time the page is loaded. Thus there are fewer worries about caching.
o It is easier to add additional information to the JavaScript, which can then be collected by the remote server. For example, information about the visitors’ screen sizes, or the price of the goods they purchased, can be added in this way. With logfile analysis, information not normally collected by the web server can only be recorded by modifying the URL.
o The page tagging service manages the process of assigning cookies to visitors; with logfile analysis, the server has to be configured to do this.
o Page tagging is available to companies who do not run their own web servers.
Disadvantages:
o Non-JavaScript Browsers: Even though there are a very less % of current Internet Users comes in this category, but there are some who disabled JS in their Browser. Such visits won’t get tracked in this Page Tagging method.
o Not Secure: Mostly you have to depend on a third party tool to give you the details, so your online information is avail to them also.
o Complexity of Inserting JS Code: If your site is a large one and having different templates inside, there will be a time consuming issue to put the JS Code inside each and every pages you want to track.
Economic factors
Logfile analysis is almost always performed in-house. Page tagging can be performed in-house, but it is more often provided as a third-party service. The economic difference between these two models is often the most important difference for a company deciding which to purchase.
• Logfile analysis typically involves a one-off software purchase; however, some vendors are introducing maximum annual page views with additional costs to process additional information.
• Page tagging most often involves a monthly fee, although some vendors offer installable page tagging solutions with no additional page view costs.
Which solution is cheaper often depends on the amount of technical expertise within the company, the vendor chosen, the amount of activity seen on the web sites, the depth and type of information sought, and the number of distinct web sites needing statistics.
Hybrid methods
Some companies are now producing programs which collect data through both logfiles and page tagging. By using a hybrid method, they aim to produce more accurate statistics than either method on its own.
Other methods
Other methods of data collection have been used, but are not currently widely deployed. These include integrating the web analytics program into the web server, and collecting data by sniffing the network traffic passing between the web server and the outside world.
There is also another method of the page tagging analysis. Instead of getting the information from the user side, when he opens the page, it’s also possible to let the script work on the server side. Right before a page is sent to a user it then sends the data (Thanks Wikipedia!).














