Sunday, January 25, 2009

RankSearch - Part II: parsing the Perl command line

RankSearch: The design
The design of our little script is laid out in the comment header from last post:

1. Get the parameters from the user
  • In a first version, ensure that all parameters are filled
  • Later, we can provide a default value for the search engine (Google)
  • Or even display all the results for a list of supported search engines
2. Launch an http request with the parameters given by the user
  • Spawn a process able to communicate back to our script
  • It will probably be in the form of "http:\\$engine-blabla-search_criteria-moreblabla"
  • Need to investigate different urls for different search engines
3. Parse and display the results transmitted by the http process
  • Search for the target URL
  • Keep track of rank count
  • Launch new http request with updated page number if target not found
  • Display result to user
Today, I'll strike off the first item of the list. Time to get interactive!
In order to ease the handling of user input, I discovered that Perl includes the Getopt::Long module by default. The link on CPAN will show you all possible uses of the module.
One must be careful not to omit the "\" character before the variable name (like I did at first).
We'll only use string (character chain) inputs:
  1. use Getopt::Long;  
  2. GetOptions (" engine="s" => \$SearchEngine);  
This will store in $SearchEngine the parameter entered from the following perl command line:
perl "$(FULL_CURRENT_PATH)" --engine www.google.com --target damienlearnsperl.blogspot.com --keyword "learn perl"
or
perl "$(FULL_CURRENT_PATH)" -engine www.google.com -target damienlearnsperl.blogspot.com -keyword "learn perl"
or even
perl "$(FULL_CURRENT_PATH)" -e www.google.com -t damienlearnsperl.blogspot.com -k "learn perl"
(provided you only have one entry in Getoptions starting with "e")

Here's the script:
  1. #!/usr/bin/perl -w  
  2. # --------------------------------------------------  
  3. # File   : RankSearch.pl  
  4. # Author : DLP  
  5. # Date   : January 24th 2009  
  6. # Object : Looks in a search engine what is the rank  
  7. #          for a given website and a given keyword  
  8. # Input  : - Search engine URL, eg. "www.google.com"  
  9. #          - URL of website to monitor  
  10. #          - Search expression to investigate  
  11. # Bugs   : None  
  12. # To do  : - Launch http request  
  13. #          - Read html result  
  14. #          - Analyse result and display website rank  
  15. # --------------------------------------------------  
  16. use strict;  
  17. use Getopt::Long;   #Load module  
  18.   
  19. # Global variable  
  20. my $PROG_NAME = "RankSearch";  
  21. my $VERSION   = "v0.0.1";  
  22. my $PROG_DATE = "January 24th 2009";  
  23.   
  24. # --------------------------------------------------  
  25. # Main program  
  26. # --------------------------------------------------  
  27. # More global variables  
  28. my $SearchEngine = "";  
  29. my $TargetURL = "";  
  30. my $Keyword = "";  
  31.   
  32. #Parse command line arguments  
  33. GetOptions ("engine=s"  => \$SearchEngine,  #string  
  34.     "target=s"  => \$TargetURL,  
  35.     "keyword=s" => \$Keyword);  
  36.   
  37. # Check user input  
  38. if ($SearchEngine eq "" ||  
  39. $TargetURL eq "" || $Keyword eq "")  
  40. {  
  41. print "  
  42. You must enter a valid string for:  
  43. --engine  = search engine URL  
  44. --target  = the target of the search  
  45. --keyword = the search criteria  
  46. ";  
  47. exit;  
  48. }  
  49.   
  50. print "  
  51. $TargetURL is ranked nth on the $SearchEngine  
  52. search engine for the \"$Keyword\" criteria.";  
  53.   
  54. __END__  
  55. Jan 24 2009 (0.0.1): first version of RankSearch  
  56. Jan 25 2009 (0.0.2): get params from command line  

After getting the parameters, we check to see if anything was entered at all.
If $SearchEngine, $TargetURL or $Keyword are still empty chains (or undefined) then we print an error message and exit the program (the operator for a logical OR is "||" or.. "or"! I don't get the differences yet).

In Notepad++ you can modify the execute command line (via the F6 shortcut) to:
perl "$(FULL_CURRENT_PATH)" -e www.google.com --target damienlearnsperl.blogspot.com --keyword learn perl

This result will appear as:

Notepad++ execution console
Note that the criteria entered by the user was "learn perl" and it was displayed as "learn" by the script. We'll just have to make sure that double quotes (") are used when the string input has a blank space.

French expression of the day:
"Ce que femme veut, Dieu le veut": A woman's will is God's will
As you can see, God and strong-minded women are universal.

Next posts:
  • More about CPAN
  • Our first Perl program - Part III: Launch a HTTP request
  • How to install Google Analytics on your Blogger blog
  • Our first Perl program - Part IV: Read results from a HTML page
  • Perl help resources
  • Our first Perl program - Part V: Result analysis
  • POD
  • Our first Perl program - Part VI: Add a GUI interface

1 comment :

  1. Thank you for both posts! It was really helpful and useful... I think that everyone which offers help to the "slower" users should be appreciated. keep up with your good work!

    ReplyDelete