Warrick: Restoring website from internet caches

Warrick is a command-line utility for reconstructing or recovering a website when a back-up is not available. Warrick will search the Internet Archive, Google, MSN, and Yahoo for stored pages and images and will save them to your filesystem. Warrick is most effective at finding cached content in search engines in the first several days after losing the website since the cached versions of pages tend to disappear once the search engine re-crawls your site and can no longer find the pages. Running Warrick multiple times over a period of several days or weeks can increase the number of recovered files because the caches fluctuate daily (especially Yahoo’s). Internet Archive’s repository is at least 6-12 months out of date, and therefore you will only find content from them if your website has been around at least that long. If they don’t have your website archived, you might want to run Warrick again in 6-12 months.

Warrick is available here

Dynamic Virtualhosting (for subdomains) with apache

Apache provides a very easy method to provide hosting for dynamically created sub-domains with “VirtualScriptAlias” and “VirtualDocumentRoot” syntaxes

Lets say, we have mydomain.com, and you need to dynamically configure test.mydomain.com, monkey.mydomain.com etc. (and not required to manually configure apache or restart), then..

Step 1: Configure wild-card DNS, so that *.mydomain.com is a cname to mydomain.com

Step 2: Configure apache. Create a new virtualhost section for *.mydomain.com, like:

<VirtualHost *>
ServerAlias *.mydomain.com
CustomLog /www/www.logs/virtual.mydomain.com-access_log combined
ErrorLog /www/www.logs/virtual.mydomain.com-error_log
VirtualDocumentRoot /www/www.mydomain.com/virtualdomains/%0/docs
VirtualScriptAlias  /www/www.mydomain.com/virtualdomains/%0/cgi-bin/
</VirtualHost>

Step 3: Restart apache to activate the new configuration

Step 4: Now, say you need test.mydomain.com, all that is required is to create /www/www.mydomain.com/virtualdomains/test.mydomain.com/docs and /www/www.mydomain.com/virtualdomains/test.mydomain.com/cgi-bin/

 

Technorati Tags: , , ,

Tomcat UTF8 Characters

If you have issues with tomcat not able to parse/display UTF characters, try adding URIEncoding=”UTF-8″ to your tomcat settings (server.xml).

Example:

<Connector port=”8080″ maxThreads=”150″ minSpareThreads=”25″ maxSpareThreads=”75″ enableLookups=”false” redirectPort=”8443″ acceptCount=”100″ debug=”0″ connectionTimeout=”20000″ disableUploadTimeout=”true” URIEncoding=”UTF-8″/>