A .htaccess file is a file that works on Apache and other NCSA-compliant webservers. The name is actually a bit of a misnomer due to the fact that hyper-text access is only a small function of what it’s capabilities are.
The .htaccess file affects the directory it is located in and all directories below it on the directory tree unless there is a .htaccess file contained within a directory, in which case it will take priority for that directory and all directories located below it in tree. Thus if a .htaccess file is contained within the root directory it will affect all directories on the webserver.
The basics are as follows. The .htaccess file is an ASCII (American Standard Code for Information Interchange), these files are most easily generated through notepad or anything that can type in simple text format. One of the most common questions about .htaccess files are what to name them, unfortunately they can have no name, and the extension (although uncommon) really is 8 characters long.
Creating the file is somewhat tricky because programs like Windows Operating System will not allow you to have a file wih no name and only an extension. In order to get around this what you must do is name the file whatever you would like and after it has been uploaded to the server rename it to .htaccess. At this point however the file will become invisible to browsers and ftp clients (although it can still be navigated to and the contents of it viewed), this is due to the fact that any file with a period at the beginning of it’s name is considered a hidden file.
When uploading the .htaccess file it is very important to make sure that you upload it as ASCII and not as binary. Also when it has been uploaded there are a few precautions you take to prevent it from being read by a browser, one is to CHMOD it’s permissions to 644 (or RW-R–R–). The other’s will be covered later on in more detail. Due to the nature of the information stored in the .htaccess file it is usually of the utmost importance to keep it secure.
When creating a .htaccess file for the first time there is one quick note to keep in mind, this is that most commands are typically meant to be placed on one line, so if you are using a text editor which has the word wrap feature it may be in your best interest to turn it off as this can input syntax that Apache does not understand and will cause your scripts to fail. Also note that .htaccess files will not work on a NT or Windows platform, there are various other methods of accomplishing the tasks that .htaccess provides, but none that are bundled together in such a nice little package.
.htaccess files are not globally accepted. Due to the fact that they can be used for security that can become very serious security holes. Due to this some webhosting companies have either limited the use of .htaccess or removed it all together. Before you take the time to create a .htaccess file or series of them you should always know what you can and cannot do.
Custom Error Pages / Request Pages
There are various client requests and error pages that can happen when someone is navigating a website. A brief list of them is as follows;
200 - Okay
201 - Created
202 - Accepted
203 - Non-Authorative Information
204 - No Content
205 - Reset Content
206 - Partial Content
400 - Bad Request
401 - Authorization Required
402 - Payment Required
403 - Forbidden
404 - Not Found
405 - Method Not Allowed
406 - Not Acceptable
407 - Proxy Authentication Required
408 - Request Timed Out
409 - Conflicting Request
410 - Gone
411 - Content Length Required
412 - Precondition Failed
413 - Request Entity Too Long
414 - Request URI Too Long
415 - Unsupported Media Type
On this list I have included some good and some bad things that custom pages could be set up for in a .htaccess file. For instance if you set up a customer page for the 200 request everytime someone successfully typed in a URL or accessed your website and it was successfully bringing up a page then it would refresh to the page you specified in the .htaccess file, as soon as it was successfully brought up it would then redirect back to the page specified in the .htaccess file, and so on infinitely. This would be an example of a bad way to use this feature. However, if you were to set it up for error 404 then when someone typed in an incorrect url or a link to a page has become outdated then someone could be redirected to a nice professional looking page which could also be useful and provide links back to your mainpage or to a help section within your website.
The coding used to within a .htaccess file to redirect upon the completion of a request or error is as follows (and only goes on a single line);
ErrorDocument code /directory/filename.ext
For instance this could look like;
ErrorDocument 404 /errors/404.html
This would redirect anyone who got a 404 error on my website to a folder called errors and then to a file named 404.html.
You also have the ability to add html to the .htaccess file for these, for instance you could add;
ErrorDocument 404 ” The page you are requesting is not here, please use your back button to return.
Notice that there are quotation marks before the html code but not at the end of it. This is as it should be for the Apache to read it correctly. Also make sure that it is all on one line so turn off your wordwrap when inputting it.
Password Protecting Folders
In order to password protect any directory you will require two files, .htaccess file and a .htpasswd file. The naming convention is identical to the .htaccess file.
Within the .htpasswd you will need to put in the username and password (although the password must be encrypted) you would like to use, for instance, if we use the username of username and the password of password it would look like this.
username:66yGQHg8KA7jw
In order to encrypt a password you can go to http://www.earthlink.net/cgi-bin/pwgenerator.pl or do a search on google for password encryptor.
For security purposes it is recommended that you do not place your .htpasswd file in a directory that is not web accessible, rather try and place it above your root www directory. And also make sure that you upload the .htpasswd file as ASCII instead of binary.
Now you must add the code to the .htaccess file which will be located within the directory you would like to password protect;
AuthUserFile /home/users/web/b2278/ph.dprouse/.htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic
require user username
The AuthUserFile line deals with the absolute location (not the web location) of the .htpasswd file, there is no set standard for this so always make sure you double check with your webhost provider.
The AuthName line is arbitrary, it can say whatever you would like to put in there within reason (no spaces).
The AuthType is basic because we are using a HTTP login.
The final line is require user and then the customer’s username, this is setup as though each user has their own seperate directory they can have access to, if you have multiple users that would like to access the same directory you change the last line to read;
require valid-user
Enabling SSI Through .htaccess
Many webhosts do not allow SSI access, this is due to the fact that there are many SSI hacks out there and it is a large vulnerbality. There is a way to allow it, although you should always contact your host and make sure that this is permitted as it can be a breach of your terms of service.
The following lines must be added to your .htaccess file;
AddType text/x-server-parsed-html htm html
The AddType line adds a MIME type to the text category and the extension is .shtml. This allows them to be seen on the server, even though most hosts do allow this it is always better to add it to the code to make sure.
The AddHandler line makes sure that all .shtml files are server-parsed for server side commands.
If you do not feel like renaming all of your .html files to .shtml you can add this line between the first and second lines above;
AddHandler server-parsed .html
This line is not overly recommended as it will cause the server to parse every file with the .html file extension. This adds extra load time to every page you have as well as extra server strain, if you are worried about load time it is always better to only use the .shtml files.
If you are planning on using the .shtml extension and would like to use SSI on your index page you must add another line of code into your .htaccess file;
DirectoryIndex index.shtml index.html
This line of code will allow your index file to be index.shtml and if it does not find one it will automatically check for a index.html.
Blocking Users By IP Address
If you were to need to block someone or a group of people from accessing your website it would be as simple as adding the following lines of code to your .htaccess file;
order allow,deny
deny from xxx.xxx.xxx.xxx
deny from xxx.xxx.xxx
allow from all
The first line sets the order of steps, the first step is to allow, then to deny.
The second line is the first line of denials, there can be as many as you require. This line will prevent anyone from IP address xxx.xxx.xxx.xxx from entering this directory (or website).
The third line will block everyone from an IP range, anyone at xxx.xxx.xxx.??? will be blocked, such as xxx.xxx.xxx.1, xxx.xxx.xxx.2 … xxx.xxx.xxx.255.
The last line will allow everyone else to enter, however, if you chose to prevent everyone you could set this line to read;
deny from all
You may also allow or deny by domain name, such as;
deny from .purehost.com
This will prevent all users from this domain to be blocked, it also includes all sub-domains (such as username.purehost.com).
Changing Your Default Directory
If you have a problem setting your homepage to index.html you may want to look into using this piece of code in you .htaccess file;
DirectoryIndex filename.ext
What this will make happen is when someone accesses your website they will be directed to the filename listed instead of the typical index.html file. You can also setup priorities on this too, if you were to list multiple files it would check for the first one and if unable to find it, it would then move on to the second one and so forth.
For example;
DirectoryIndex danny.html index.pl home.php index.html
This would first check for the daniscool.html file and if unable to find it check for the index.pl file and if unable to locate it check for the home.php file and if unable to find it check for the index.html file. Once it has exhausted all of these then it would display a 404 error (hopefully you have already set up a custom one using your .htaccess file).
.htaccess Redirects
Although redirects can be coded through many different means, such as http-equiv, javascript, or any type of dynamic scripting it is typically more efficient to do it through a .htaccess file. The reason being that the coding for all your redirects can be done through a single file instead of having to add code to multiple files. This can save time, which ultimately can mean the difference between someone coming to your site and finding broken links or not seeing updated information.
htaccess uses redirect to look for any request for a specific page (or a non-specific location, though this can cause infinite loops) and if it finds that request, it forwards it to a new page you have specified:
Redirect /folder1/file1.html http://site.com/folder2/file2.html
Notice there are three separate yet required parts to this line of code. The first part is the Redirect command, this informs the browser that when a specific file or folder is accessed the browser is going to be redirected to a new location. The second part is the address of the file or folder you want to redirect from relative to your root directory. The third and final step is to indicate the file or folder that you want to redirect to, this should be indicated by the complete path to it.
As with most .htaccess commands all three sections of this are seperated by a single space but located on one line. This command will often be used if there are massive changes to a website, for instance you have created an entire new site, which is located in a separate folder. You would use the redirect command and specify the old folder and then specify the new folder.
Hiding Your .htaccess
Because your .htaccess file can often contain information that is very pertinent to your website or information that can be potentially a security risk it is always better to limit access to it as much as possible. If you have set incorrect permissions or if your server is not as secure as it could be, a browser has the potential to view an htaccess file through a standard web interface and thus compromise your site/server. This, of course, would be a bad thing. However, it is possible to prevent an htaccess file from being viewed in this manner:
order allow,deny
deny from all
The first line specifies that the file named .htaccess is having this rule applied to it. You could use this for other purposes as well if you get creative enough. If you use this in your htaccess file, a person trying to see that file would get returned (under most server configurations) a 403 error code. You can also set permissions for your htaccess file via CHMOD, which would also prevent this from happening, as an added measure of security: 644 or RW-R–R–.
Adding MIME Types
IF you are using a file extension that is not set on the servers, which can be a common occurrence with MP3 or even SWF files, you can specify what type of file it is by adding this line of code to your .htaccess file;
AddType application/x-shockwave-flash swf
AddType is specifying that you are adding a MIME type. The application string is the actual parameter of the MIME you are adding, and the final little bit is the default extension for the MIME type you just added, in our example this is swf for ShockWave File.
If you need to find the application string of the file you are adding most of them are located at filext.com. Also, if you want to have a file who’s extension is specified on the server to open with something and you would rather have that downloaded (for instance .xml) you can specify the application string as;
application/octet-stream
Preventing Hot Linking
Hot linking refers to someone outside of your website using the path to one of the images on your website. This is considered very rude for two major reasons; the first is that you may have spent many hours working on a particular image and do not want it used by someone else, and the second is that everytime someone accesses that other person’s page it uses your bandwidth. If the site were to have many visitors it could end up that your website actually goes down to bandwidth over usage.
Using .htaccess, you can disallow hot linking on your server, so those attempting to link to an image or CSS file on your site, for example, is either blocked (failed request, such as a broken image) or served a different content (for example a different picture) .
Here’s how to disable hot linking of certain file types on your site, the case below takes into account images, JavaScript (js) and CSS (css) files on your site. Simply add the below code to your .htaccess file, and upload the file either to your root directory, or a particular subdirectory to localize the effect to just one section of your site;
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com/.*$ [NC]
RewriteRule \.(gif|jpg|js|css)$ - [F]
Be sure to replace “domain.com” with your own. The above code creates a failed request when hot linking of the specified file types occurs. In the case of images, a broken image is shown instead.
You can set up your .htaccess file to actually serve up different content when hot linking occurs. This is more commonly done with images, such as serving up an alternate image in place of the hot linked one. The code for this is;
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com/.*$ [NC]
RewriteRule \.(gif|jpg)$ http://www.domain.com/alternatepicture.gif [R,L]
Jesse S. Somer
Websites, there are literally billions of them out there in cyber-space. How many of them do you go to and just think this is boring, bland, or hard to use? It seems like too many to mention. So what makes a good website?
I reckon it’s about interaction. You’ve got to make the visitor interested. You’ve got to grab their attention. Many sites use plenty of bright and shiny gimmicks to attract you, but once you make it through to the content of the site it’s just not worthy. A good site uses easy navigation, relevant content, and interactive media like comments and message boards. If you’re fortunate, whoever builds your site may even have a few tricks up their sleeves to make it really fun with sound, video, and other interactive fun stuff.
Do you want people to come to your site and then tell their friend and family about it? Do you want to have huge amounts of visitors? Do you want to succeed in making your dreams come to fruition on the Web? Make your website exciting! It might be easier said than done, but there are people around whose job it is to construct and design sites for a living. If you can afford it, go for the best. How great is it when you come across a site that has some special feature that you’ve never seen elsewhere? Isn’t it great when you find a site that relates to one of your interests that is simple and easy to get to the information you want?
If you want to have people to come back again and again, you’ve got to keep updating the content to keep it fresh and interesting. Have a way for people to communicate with yourself and others who are into the same things. E.G. Forums, message boards and comments. The aim is to catch the ‘viewer’s’ interest.
A lot of sites just look like giant advertisements and you have to search for the needle in the haystack to find out what the actual site is for. I know advertising is a way of making money, but if you want your site to have an authentic, respectable atmosphere that exudes a feeling of integrity, you better be careful. People are becoming wary of this consumer driven, mindless attack at the average civilian’s wallet. Some people will automatically leave a site if a bunch of commercials pop-up on the screen. Pop-ups, don’t even make me go there…
So, the aim of the game is to make a site that offers the public to be part of the action as well as being a source of knowledge or information that is in demand. A simple to navigate, good ‘feel’, and if possible-innovative site is the means to becoming the popular Internet magnate you’ve always dreamed of becoming. Another important fact is the idea of ‘you’. Your website is a chance to put your identity out there in the world. Be yourself. If you try to appeal to an audience in a way that doesn’t reflect your true self, you’re destined to fail. Be honest and speak from your real perspective on life. Give it to us from the heart.
S. Housley
General web statistics give pertinent information about website visitors. Webmasters analyzing these statistics have a better understanding of who their website visitors are and how they perceive the website. A lot can be learned by evaluating navigation patterns, most-viewed pages and exit pages. Deciphering web logs could easily become a full-time job. The information that can be gleaned from close log scrutiny is extremely valuable. When a visitor comes to a website, the site has just a few seconds to grab the visitor’s interest. Slow-loading pages or broken graphics will send visitors and potential customers looking elsewhere. In order to make sense of web statistics, consider using a log analysis program. These programs tend to format the information in an easy-to-understand way, often providing graphs or visual representations that make understanding and seeing patterns that much easier. The downside to using software for web log analysis is that webmasters can easily be confused about what the actual results mean and which results matter the most. The information contained in the log file should be analyzed in conjunction with other information.
Let’s take a look at some of the critical areas. How many unique visitors visit the site each day? This statistic, by itself, is not terribly important, but when compared to a previous week’s or month’s logs, patterns will generally emerge. Sudden declines in site visitors might be indicative of downtime or dropped links, while sudden increases might be indicative of a successful ad campaign or improved search engine ranking. This assumption can only be made if sales for the corresponding time period have increased as well. Traffic alone is not the goal; qualified website traffic that converts a visitor into a buyer is generally the goal of most webmasters. Web statistics on their own do not always paint a true picture. Webmasters need to use logs to validate advertising campaigns and track where traffic is coming from. While details in a log file alone are not conclusive proof of an ad campaign’s success or failure, general assumptions can be made based on the patterns. Gen! eral statistics will help determine who your visitors are and what habits they have.
Specific areas to take a close look at:
How long are users staying on the website or a specific page? This question addresses a website’s “stickiness”. Stickiness gives webmasters an indication of how important their content is. If users return on a regular basis or remain on a specific page for an extended period of time, generally the content is considered valuable.
Site entry pages?
What pages in a website are visitors coming into? Is a specific page on the site drawing an unusually high amount of traffic? Do users come back to the website? Is there a reason for a visitor to come back to the website? Generally, content that is refreshed often will attract return visitors. What specific areas on the site are of interest to web visitors, and can those content sections be expanded to increase the overall value of the website?
Site exit pages?
What pages in a website are visitors leaving from? If a specific page has a large number of visitors leaving the site, perhaps the content needs updating. It is critical that you consider the source of the traffic. Are visitors coming to the website through a pay-per-click campaign with a landing page that does not relate to the initial search terms? Directing visitors to content-specific landing pages will help reduce quick site exits.
Who is making the referral?
What kind of website is sending traffic to your website? Assumptions can be made based on the quality of the referral source. Let’s face it, if a crack site is the leading referral generator to a software site, it is unlikely that the bulk of visitors will be interested in purchasing.
Bad requests?
Are visitors attempting to access pages on your website that are no longer active? Be sure to check logs for any pages or graphics that are generating errors for visitors.
Number of unique visitors?
Don’t get too hung up on the number of “hits” a website has, as this can be interpreted differently. Sometimes logs interpret graphic access as a hit. A more accurate reflection of traffic can be seen by tracking unique visitors.
There are a number of inexpensive yet quality log analysis applications available for download from: http://www.monitoring-software.net/ and http://www.monitoring-tools.net
By evaluating web logs webmasters can continuously improve their site and measure their success. Online or off, tracking results is critical to achieving success. If you don’t track, you don’t know what works. How can you improve what you don’t measure?
Alan Murray
What is the Robot Text File?
The robot text file is used to disallow specific or all search engine spider’s access to folders or pages that you don’t want indexed.
Why would you want to do this?
You may have created a personnel page for company employees that you don’t want listed. Some webmasters use it to exclude their guest book pages so to avoid people spamming. There are many different reasons to use the robots text file.
How do I use it?
You need to upload it to the root of your web site or it will not work - if you don’t have access to the root then you will need to use a Meta tag to disallow access. You need to include both the user agent and a file or folder to disallow.
What does it look like?
It’s really nothing more than a “Notepad” type .txt file named “robots.txt”
The basic syntax is
User-agent: spiders name here
Disallow:/ filename here
If you use
User-agent: *
The * acts as a wildcard and disallows all spiders. You may want to use this to stop search engines listing unfinished pages.
To disallow an entire directory use
Disallow:/mydirectory/
To disallow an individual file use
Disallow:/file.htm
You have to use a separate line for each disallow. You cannot you for example use
Disallow:/file1.htm,file2.html
You should use
Use-agent/*
Disallow:/file1.htm
Disallow:/file2.htm
For a list of spider names visit http://www.robotstxt.org/wc/active/html/
Make sure you use the right syntax if you don’t it will not work. You can check you syntax here http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
For help on creating robot text files there is a program call robogen.
There is a free version and an advanced version, which costs $12.99 http://www.rietta.com/robogen/