Crawling web sites with Perl WWW::Mechanize (and Moose)

My bank, while I’m very pleased with them overall, has a not-so-hot online banking system. My biggest (and simplest) gripe is the inability to get a daily email with the balance of my checking account. I’ve tried multiple ways with limited success, until I did it in Perl.

First, I outlined the steps required to log in, then used that as comments in my script. Next, I used WWW::Mechanize to process the pages and simple regexes to grab the data that I’m looking for.

Here is my Perl Module (Banking.pm) to do the entire process:

package Banking;
use Moose;

has 'Url'             => (is=>'rw', isa=>'Str', required=>1);
has 'UserLogin'       => (is=>'rw', isa=>'Str', required=>1);
has 'Password'        => (is=>'rw', isa=>'Str', required=>1);

sub GetChecking {
    my $self = shift;
    use WWW::Mechanize;

    # set some variables
    my $url  = $self->Url;
    my $user = $self->UserLogin;
    my $pass = $self->Password;

    # security question/answer can be one of three things
    # These should really be moved to a property...
    # Use a regex below to grab the part of the string to see which question that we are being prompted with.
    # These are not the real q/a's that I have, by the way.
    my $challengeFound = "Please answer the challenge question to continue logging in";
    my %challenge;
       $challenge{'color'} = "Answer to question one";
       $challenge{'vehicle'} = "Answer to question two";
       $challenge{'pet'} = "Answer to question three";

    # time to do the work...

    print "Initializing Mechanize...\n";
    my $mech = WWW::Mechanize->new();
    $mech->get($url);
    print "Got $url\n";

    $mech->form_number(1);
    $mech->field("j_username", $user);
    $mech->click();
    my $response = $mech->content();
    print "Clicked \"login\"\n";

    ## We are being challenged for our security answer.
    ## Use a simple regex to see what question they are asking
    ## and send the appropriate response.
    if ($response =~ m/$challengeFound/) {
        while (my ($question, $answer) = each %challenge) {
            if ($response =~ m/$question/) {
               $mech->form_number(1);
               $mech->field("answer", $answer);
               $mech->click();
               print "Answered challenge question: $question\n";
               last;
            }
        }
    }
    $mech->form_number(1);
    $mech->field("password", $pass);
    $mech->click();
    print "Entered password\n";

    ##this is supposed to remove all of the html tags, but it has
    ##a tendency to act up and do stupid stuff.
    $response = $mech->content(format=>'text');

    ##strip any remaining html
    $response =~ s/<[^>]+>//g;

    ##replace multiple spaces with one
    $response =~ s/\s+/ /g; 

    ##remove leading whitespace
    $response =~ s/^ //;       

    ##remove trailing whitespace
    $response =~ s/ $//;       

    ##remove html entity quotes
    $response =~ s/&amp;amp;amp;amp;#34;/"/g;;  

    ## grab our checking account line
    my $checkString = "";
    if ($response =~ m/Protection(.*?)Checking/) {
       $checkString = $1;
       print "Got \$checkString $1\n";
    }

    my @checkBal = split(' ', $checkString);

    ## log off
    $mech->follow_link(text => 'Exit');

    #return balance, available, and ytd-interest.
    return ($checkBal[1], $checkBal[2], $checkBal[0]);
}

1;

Now all I have to do is call it from a script (daily.pm) and throw that into my crontab to fire daily at, say 7:00AM.

#!/usr/bin/perl

use lib '/home/gkurts/scripts';
use Mail::Sendmail;
use Banking;

my $fortune = `fortune`;
my $msg = ""; 

## get our checking balance from the online banking website
my $banking = new Banking(
    Url => "https://www.mybankingurl.com/login",
    UserLogin => "mylogin",
    Password => "mypassword",
    );
my ($balance, $available, $ytdinterest) = $banking->GetChecking();

$msg .= "Our Bank Balance:\n";
$msg .= "\tBalance: $balance\n";
$msg .= "\tAvailable: $available\n";
$msg .= "\tYTD Interest: $ytdinterest\n";

## add a fortune for cuteness.
$msg .= "\n\n$fortune";

## send the email using Mail::Sendmail
$mail{body} = $msg;
$mail{subject} = "Daily Summary";
$mail{from} = "my\@email.com";
$mail{to} = "my\@email.com";
sendmail(%mail);

Comments are closed.