Thursday, September 20, 2007

September Topics of Interest

1) JSON
2) RSS , ATOM
3) GOF Design patterns
4) (Web) Application accelerators

Package Wise Java API Examples

Good site which holds java examples for every package in JDK.

http://www.exampledepot.com/egs/index.html

Small utility to extract a website based on a pattern of URLs

I had to pull down the webpages for the pattern of URLs from one of the online GRE sites. I tried my hands on a small program in java and backed up the contents so that I can browse offline.

Performance statistics :
It took approximately 5 secs to pull and download each file of 80 KB of html text.
In total it took 25 minutes to download 300 html pages each around 80 kb.

=============================================================


public static void main(String[] args) {
try {


//This file is to collect the performance metric for each page extract
File metFile = new File("C:\\phoenix school\\Exams\\GRE Wordlist\\english-test\\Metrics.txt");
metFile.createNewFile();
FileWriter metFileWr = new FileWriter(metFile);


//Repeat for each page in the site
for(int i=1;i<=300;i++) { try { GregorianCalendar gregCal = new GregorianCalendar(); // To note the start time

double start = gregCal.get(GregorianCalendar.SECOND)+gregCal.get(GregorianCalendar.MILLISECOND)/1000;

// The fileID varies for the 300 pages and is dynamic in the url pattern
String fileID = Integer.toString(1000+i).substring(1);
URL url = new URL("URL is not provided here for confidentiality");
System.out.println("Downloading URLs : "+url);
// FileWriter fw = new FileWriter("c:\\test\\Metrics.txt");
File f = new File("C:\\Wordlist\\"+fileID+".htm");
boolean isFileCreatedNow = f.createNewFile();
FileWriter fw = new FileWriter(f);
// Read all the text returned by the server
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
fw.write(str);
}
in.close();
fw.flush();
fw.close();


// To calculate the end time
GregorianCalendar gregCal1 = new GregorianCalendar();
double end = gregCal1.get(GregorianCalendar.SECOND)+gregCal1.get(GregorianCalendar.MILLISECOND)/1000;
System.out.println("Start : "+start+" end : "+end);
metFileWr.write(url+" -> "+Double.toString(end-start)+"\n\n");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} // End of try catch block
} // End of for
metFileWr.flush();
metFileWr.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}

Saturday, September 15, 2007

web services and WAN

In many cases, the roll out of web services must be accompanied with a WAN bandwidth increase to cope with the additional overhead. Unfortunately, there are more challenges as applications are now subjected to the chatty nature of the web (HTTP, DNS, and even FTP). Chatty applications are particularly susceptible to high-latency WAN links, and translate into dismal user performance. Although the flexibility and innovation web services promise are attractive, the implementation costs and poor user performance can make realization of these promises difficult.

Isn't there any other solution than upgrading the WAN bandwidth?

Although web services are based on XML, XML rides on HTTP. Application Acceleration of HTTP improves the user experience for web applications, ensuring their adoption and continued use. Without acceleration, the chatty nature of HTTP can make web applications quite frustrating, as each click signifies a long pause as new pages are fetched. In addition DNS and FTP can be accelerated.

Ref: http://www.expand.com/Solutions/Index.aspx?URL=/Solutions/Web-Applications.aspx

CAPTCHA

Did you ever wonder what is the scrambled, distorted text image that websites ask you to read and key in before allowing you to edit any text, make any post in a blog (ex: google blogspot)?

This is called a CAPTCHA.

More on it from wiki...

A CAPTCHA is a type of challenge-response test used in computing to determine whether the user is human. "CAPTCHA" is an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart", trademarked by Carnegie Mellon University. A CAPTCHA involves one computer (a server) which asks a user to complete a test. While the computer is able to generate and grade the test, it is not able to solve the test on its own. Because computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. The term CAPTCHA was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University), and John Langford (then of IBM). A common type of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a reverse Turing test.

ref : http://en.wikipedia.org/wiki/Captcha