Get Google results in a list of clean URLs

I wrote a perl script to perform certain search in Google, parse the results and save all the founded URLs in a text file.

This is extremely useful for a lot of things, for example, I made a search string for Google to find sites that have a security vulnerability, then I run an exploit to all this sites and I founded all the vulnerable sites. Just for research…

This is only an example… if you use your imagination… you will see a lot of things you can do…

It’s basically a parser of the Google results, so I can get Google results in any format.

Now I write the algorithm(a google parser) in PHP and publish online, you can use it under Tools section and see other interesting tools… or go to the Google parser online tool (GooParser).

Or go directly to the webmaster online tool: GooParser (Google Parser).

Share and Enjoy:
  • Digg
  • del.icio.us
  • Furl
  • Reddit
  • Slashdot
  • Technorati
  • Ma.gnolia
  • StumbleUpon
  • MisterWong
  • YahooMyWeb
  • Facebook
  • Google

27 Responses to “Get Google results in a list of clean URLs”


  1. 1 Chii Sep 23rd, 2007 at 11:57 pm

    would you be able to post the source code too? I m interested in how it was done =)

    ps. please ignore this if you already did (i hope i didnt miss it).

  2. 2 Mr.Kyle Sho Sep 24th, 2007 at 4:19 am

    hi

    that’s really great google parser tool and easily find to any related topic link

    thanks a lot

  3. 3 goohackle Oct 10th, 2007 at 6:33 pm

    Thanks for your comments! You’re welcome.

    Chii, now I’m extending the tool to parse more data and return the results in different formats… when I finish I’m gonna write a post and upload the upgraded online tool.

    I’m not thinking in publish the source code right now… but it isn’t difficult nor complex, it just parse the HTML.
    The only tricky part is to avoid Google ban my server IP but with a basic knowledge of the HTML protocol, HTML headers, browsers, web pages and people behavior it’s an easy task.

  4. 4 Waleed GadElKareem Nov 5th, 2007 at 2:29 am

    Good idea, any news about source code or regex used?
    check out another way to get it with Javascript using Google’s API
    http://gadelkareem.com/2007/01/28/using-google-ajax-api-as-an-array/

  5. 5 goohackle Nov 6th, 2007 at 11:55 pm

    First I parse Google results with a regex but now I get Google results using an an XML Parser.

    I tried the Google AJAX API too, but with it I can get only the first results, not all of them.

  6. 6 The big one Jan 14th, 2008 at 11:15 pm

    Hey man,
    you are talking to much just paste the code or shut up! Don’t waste our time.

  7. 7 goohackle Jan 15th, 2008 at 11:23 pm

    Hey “The big one”, I’m not gonna paste the code right now just because you want it.
    It’s very easy to write if you have basic knowledge on HTTP protocol and basic programming skills, read some basic Perl tutorial, or C, or whatever language you want… READ, LEARN and write it.

    If you have any questions or want to talk about the google parser, the method used, the language, etc… we can talk here but if you want someone else to do your job go somewhere else… and don’t waste YOUR time. Nobody here is working for you.

  8. 8 John Jan 17th, 2008 at 7:03 pm

    hallo goohackle

    tnx for the post

    i’m writing my own google parser on php
    i tried to use list of data centers, but i’ve got ban

    could you pleeez explain about

    >> HTML protocol, HTML headers, browsers, web pages and people behavior

    i mean … how to use it ? how to make useable code with it ?
    … explain please

    tnx a lot

    ps update please your online oarser, so it could give full results on one page

    for example 1000 urls for word “web”

    tnx

  9. 9 Google ban Feb 1st, 2008 at 2:41 am

    Yes. The Google ban is the only tricky part.

  10. 10 ValleyGeek Feb 8th, 2008 at 11:21 pm

    Nice tool.

    I did not understand your example of finding sites with vulnerabilities. Is this search string you came up with something that appears in html or some script?

    Have you thought about extending your tool to include links in sponsored links also. That would probably be interesting to all those advertising on Goog.

  11. 11 goohackle Feb 11th, 2008 at 8:07 pm

    That search string is to find servers with Webmin using Google. You can refine it more to avoid false positive results.

    Works because, generally, Webmin is in 10000 port and this appears in the URL and the other words are in the html body of the login page.

  12. 12 Sly Stone Feb 22nd, 2008 at 2:19 pm

    Hello, i managed easily enough to do the same thing as gooParser.
    I know that i can avoid google ban when running in a “stealthy” mode
    i usually wait for random period between queries so my script looks like human. but the problem is that i want to parse many things and unfortunately this will take a lot of time.

    I want to tell me if there is a way to speedup the whole process, i do not want to tell me the way, but if there is a way

  13. 13 John11 Feb 23rd, 2008 at 4:38 pm

    I myself need the urls of the Google search results. I was reading through the AJAX API and didnt find it to be of use. Is such parsing in anyway illegal? Have you considered using some other approach or API? And how are you avoiding the ban, using proxies?

    Would appreciate a reply.

  14. 14 goohackle Feb 23rd, 2008 at 10:32 pm

    Hi Sly Stone, in the beginning I used the google parser to get a lot of data without being banned and without doing nothing strange, just doing the requests just like your browser does and 1 or 2 seconds between requests. With it I can get thousands of search results without being banned.

    A very useful tip: use tcpdump, wireshark or something similar to view if your scripts is really sending the requests just like your browser. I solved several errors with it.

    After that you can use several IPs, randomize a lot of things to appear like human requests, etc, but with really a lot of requests by minute, with this methods you can only slow down the google ban depending the amount of requests that you have.

    So, if you want a lot of more results, a lot more, I still don’t have a totally automatic method. Now the online google parser shows the google captcha and let the user write the captcha when google ban it, then it can continue after bypassing the ban with the captcha.

    You catch me inspired to write ;)

  15. 15 goohackle Feb 23rd, 2008 at 10:56 pm

    John11, after reading the google terms of use I think that only parsing their results is permitted but it also says that doing automatic requests is not permited… but you probably need to read that to try to figure if what you are going to do is “illegal” or not…

    I think that I didn’t do automatic requests with this online google parser, the users do the requests… you can be more or less strict in the “automatic requests” interpretation…

    I didn’t found the google APIs to do this useful and the best approach that I found was this.

    And regarding the ban… I wrote a lot in my previous comment.

  16. 16 Sly Stone Feb 26th, 2008 at 4:55 am

    Thank you, you are very helpful i will check the way with wireshark :).
    I appreciate that you answer.

    Best Regards,
    Sly ;)

  17. 17 mobyhunr Mar 7th, 2008 at 10:27 pm

    How can I make it country specific. I want to use it for marketing research. I need this tool. seo elite doesn’t do search engine by country. Thanks. nice tool.

  18. 18 goohackle Mar 8th, 2008 at 12:05 am

    mobyhunr,

    If you need some particular development or tool, I can help you, it isn’t difficult modify my Google parser to make the searches country specific, just contact me at the mail here: http://goohackle.com/contact/

    If you can wait(I haven’t too much spare time for this these days) I can upgrade my tool to make it country specific, it’s a good idea. Thanks.

  19. 19 toxy Jul 14th, 2008 at 5:50 pm

    hi i can not parse google any more but as i see you still can

    i am using the code

    <?php
    header(’Content-type:text/html; charset=utf-8′);
    $s = file_get_contents(”http://www.google.com.tr/search?hl=tr&q=test&meta=”);
    preg_match(”/(.*?)/Us”,$s,$d);
    echo “”;print_r($d);
    ?>

    what is wrong here ? can you please help me

  20. 20 Ryan Sep 3rd, 2008 at 9:59 pm

    Definitely a cool idea, but I agree it’s pointless without source code. What is it, a government secret or something??

  21. 21 goohackle Sep 6th, 2008 at 1:49 am

    toxy, some weeks ago google changes some details of your html structure, check your regular expressions ;)

  22. 22 goohackle Sep 6th, 2008 at 2:05 am

    haha… thanks for the first part of your comment Ryan… and for the second too ;) … I think the point is that I explain how I did it and when I have time I answer all the questions here.

    The point is “teach” how to do this parser or basically any parser of web pages… not just write the source code here.

    Is simple enough and all the functions that you could need are very well documented in php.net.

  23. 23 Merimac Oct 14th, 2008 at 7:15 am

    Hello goohackle,
    Nice job you have done.

    I’m interesting to do something like you for my website but I don’t see the difference between your method and using the Google API.

    What are the advantages and disadvantages of each method ?

    Thanks

  24. 24 goohackle Oct 15th, 2008 at 12:17 am

    Hi Merimac,

    When I wrote this tool the Google API had important limitations, like you can only get the first 30 results for any search and number of searches per day limited too.
    I think that now still has this limitations but you can check it anyway.

    The Google API is easy and you can quickly do small things that looks good with it but if you want more freedom to parse anything without limitations it isn’t very useful.

    On the other hand, if you write your own Google parser you can get any amount of Google search results without any limitation and parse all the URLs or information that you need.
    This method requires more develop at the beginning and if some day Google changes your HTML code then you need to modify your parser code too. But if you write a good code, this will be very easy.

    Thanks for your comment.

  25. 25 Viren Nov 20th, 2008 at 10:25 am

    Hello,

    I think this parser is good to get list of clean url. But i want to know how much time it takes to crawl that url and I would like to see this parser(source code) if you are agree and it is free.

    Thanks,
    Viren

  26. 26 Effy Dec 1st, 2008 at 11:44 pm

    I’m trying to use Perl to browse google but google’s robots.txt Forbid me from doing it how do I bypass that?

  1. 1 Google Parser Online Tool Upgraded at GooHackle Pingback on Oct 13th, 2007 at 4:31 am

Leave a Reply






Advertising, Marketing & SEO


Free traffic to your site
( without initial fee! really!! ):


Make money from your site:

Goohackle is a project about a lot of things... the web, internet, programming(from PHP, Perl, Java... to C and assembler), GNU/Linux, security, webmaster tools, webmaster tips and research, SEO, web application security, network and protocol security and research in general...


Digital Photo Art
RSS Entries and RSS Comments

Your Ad Here