Suppose you want to fetch a webpage. The following code does it:
# -*- coding: utf-8 -*- # python from urllib import urlopen print urlopen("http://xahlee.org/flatland/").read()
Sometimes in working with HTML pages, you need to create links. In URL, certain chars need to be encoded. For example, 〔http://example.com/~xah〕 needs to be 〔http://example.com/%7Exah〕. Basically, any reserved chars
! * ' ( ) ; : @ & = + $ , / ? # [ ] when not used for special purposes such as CGI parameters, needs to be encoded by its hexadecimal. For example,
~ has hexadecimal
7e, so it needs to be encoded as
In Python, the “quote” function does it. “unquote” reverses it.
# -*- coding: utf-8 -*- # python from urllib import quote print quote("~joe's home page") print 'http://www.google.com/search?q=' + quote("ménage à trois")
See also: URL Percent Encoding and Unicode ◇ URL Percent Encoding and Ampersand Char.
In Perl, there are several ways to get a webpage content. Long story short, the easiest way to get a webpage is to use the Perl program HEAD or GET in 〔/usr/bin〕 or 〔/usr/local/bin〕. For example, in shell, type:
The HEAD and GET are Perl scripts and are automatically installed when you install perl. When one of the networking module is installed, perl contaminate your bin dirs with these programs.
HEAD returns a summary of the page info, such as file size. GET returns the full HTML file. (HEAD and GET are two calling methods of the HTTP protocol. The Perl script are named that way for this reason.)
If you need more complexty, perl has “LWP::Simple” or “LWP::UserAgent”. (there are many others) Both of these are extra installations.
# -*- coding: utf-8 -*- # perl use strict; # use LWP::Simple; use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->timeout(120); my $url='http://yahoo.com/'; my $request = new HTTP::Request('GET', $url); my $response = $ua->request($request); my $content = $response->content(); print $content;
In the above, the
$ua -> timeout(120); is
a Object Oriented syntax. In many Perl modules, they have Object Oriented and also a normal functional syntax.