The right library makes life easier, and the LWP modules are the right ones for this task. The get function from LWP::Simple returns undef on error, so check for. Example Basic Perl script to fetch a page #!/usr/bin/perl use LWP::UserAgent ; use HTTP::Request::Common qw(GET); $UA = LWP::UserAgent->new(); $req. LWP modules (continued) Module name Purpose LWP::Authen::Basic Handle and responses LWP::MediaTypes MIME types configuration (text/html.

Author: Zulugami Maugor
Country: Pacific Islands
Language: English (Spanish)
Genre: Art
Published (Last): 18 January 2005
Pages: 95
PDF File Size: 10.2 Mb
ePub File Size: 2.36 Mb
ISBN: 753-8-73090-399-8
Downloads: 83344
Price: Free* [*Free Regsitration Required]
Uploader: Darr

Chapter 6. Simple HTML Processing with Regular Expressions

Extracting Links from a Bookmark File Example: Extracting Temperatures from Weather Underground. The preceding chapters have been about getting things from the Web.

But once you get a file, you have to process it. However, most of the interesting processable information on the Web is in HTML, so much of the rest of this book will focus on getting information out of HTML specifically.


lwpcook – libwww-perl cookbook

In this chapter, we will use a rudimentary approach to processing HTML source: This technique is powerful and most web sites can be mined in this fashion. We present the techniques of using regular expressions to extract data and show you how to debug those regular expressions.

Suppose we want to extract information from an Amazon book page. The first problem is getting the HTML.

Browsing Amazon shows that the URL for a book page is http: So to fetch the Perl Cookbook ‘s page, for example:. This regular expression describes the information we want a string of digits and commasas well as the text around the text we’re after Amazon.

It’s then straightforward to generalize the program by allowing the user to provide the ISBN on the command cpokbook, as shown in Example We could take this program in any direction we wanted. It would be trickier, but more useful, to have the program accept book titles instead of just ISBNs.


Simple HTML Processing with Regular Expressions (Perl & LWP)

A more elaborate version of this basic program is one of O’Reilly’s actual market research tools. Automating Data Extraction Suppose we want to extract information from an Amazon book lwl.

So to fetch the Perl Cookbook ‘s page, for example: The final program appears in Example