Break Google captcha

Here I’m gonna write how I did to break Google captcha or “automatically bypass” the Google captcha to let one of my online tools (Google Parser) run with a lot of requests and without my intervention.

.
First: What’s the problem to solve?

I have an online tool that does requests to Google and gets the search results. When it does too much requests Google ban it and I need to write the letters in the Google captcha to can continue to doing the requests.

Google Error

.

Second: How Google captcha ban works?

In a few words, when Google receive a lot of requests(there are a lot of another variables) from the same IP, it supposes that the requests are being done by an automatic script or spyware. Then Google ban that IP at least you write the letters of the captcha. If you write the correct letters Google returns a cookie to you that means “I’m a human, give me the search results” and then you can continue doing Google requests.

.

Third: Programming the solution…

The solution to “break Google captcha” is nothing difficult nor brilliant, just showing the captcha to the user who’s using the tool, letting him to write the letters, sending this to Google and saving the cookie to continue with the requests.

Google captcha defeated

This is the final solution, very simple, but the process wasn’t like that. To do this I had to be very carefully in the details of the HTTP requests and beat some Google tricks.

.

Fourth: The results…

Now the script is running, it can manage any amount of requests, there’s no time or number limit and the Google captcha isn’t a problem. :D

.

The phrase “break Google captcha” isn’t the most accurate for this, but I used it because this post is part of my SEO research too…

Share and Enjoy:
  • Digg
  • del.icio.us
  • Furl
  • Reddit
  • Slashdot
  • Technorati
  • Ma.gnolia
  • StumbleUpon
  • MisterWong
  • YahooMyWeb
  • Facebook
  • Google

10 Responses to “Break Google captcha”


  1. 1 Kamil Przeorski Feb 26th, 2008 at 4:02 pm

    Hello,

    In case what you gave us sometimes google block your search witout possiblity to write a captcha and unblock your search. Do you have any idea how to solve this problem? Has google many datacenters with diferent data about blocked IPs inside?

    Best Regards,
    Kamil

  2. 2 goohackle Feb 28th, 2008 at 1:17 am

    Hi Kamil, I found some Google error responses without the possibility to write a captcha, but I realize that is only for some combinations of country and language, with another combinations it always returns me the error page to write the captcha.

    For another things Google has many datacenters but they are synchronized, they can delay more or less in synchronizing… in the case of the blocked IPs, I think that Google has all of them centralized, they can be distributed in several servers or in only one, but I think that it isn’t a very large amount of data that needs to be in many datacenters. This is just my opinion, I don’t know for sure, I never get in the Google internal network… he…

  3. 3 Kamil Przeorski Feb 28th, 2008 at 11:25 am

    Did you tried use Google Search API ? There are any limits?

    Something interesting about this topic we can find there http://www.fiftyfoureleven.com/weblog/web-development/programming-and-scripts/apis/google-search-api

    Yahoo’s : http://api.search.yahoo.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=53x%20all%20day&results=
    :P

    Kamil

  4. 4 goohackle Feb 29th, 2008 at 12:10 am

    Kamil, I tried the Google AJAX search API, it has too many limitations.
    I didn’t use the previous SOAP API because when I want to use it Google are no longer giving new API keys.

    Thanks for the links, I will check them.

    BTW: I modify a little your second link to prevent problems with my advertisers…

  5. 5 Kamil Przeorski Feb 29th, 2008 at 5:28 pm

    “I tried the Google AJAX search API, it has too many limitations”

    What limitations do you mean?

    Kamil

  6. 6 goohackle Mar 1st, 2008 at 12:47 pm

    Basically, number of results limitations and no pagination over the rest of the results.
    A year ago when I used this API, you can only get the first 8 results for a specific search and you can’t paginate over the rest of the results.

    So if you don’t need the rest of the search results this API could be good for you.

    I just read that now the Google AJAX search API gives you the first 32 results and pagination only over them. Still the same limits but with a few more results.

  7. 7 Milan Jul 7th, 2008 at 12:03 pm

    How do you ensure that saved cookie is for the same session? Actually what do you have to save in the cookies in order to continue search. I guess you use curl library?

    Regards,
    Milan

  8. 8 goohackle Jul 7th, 2008 at 11:52 pm

    Milan, you are right.
    I use Curl library for the HTTP requests, with it I can use the same cookie for several GETs or POSTs.

    To represent a session I just use the same cookie, when I want to start a new session I delete the old cookie and done.

    Cheers

  9. 9 Milan Jul 8th, 2008 at 4:29 am

    Thanks for the quick response!

    Is this tool (google parser) going to be available for download? I see download section but no source code yet :( I think it would be great php/curl maybe regex learning lesson.

    I would like to develop something similar. It should run multiple queries in batch mode. It doesn’t have to be fast but should be stable. In your experience, if the queries run with like 2,3 seconds timeout should I face the captcha? I would like to develop similar way for user to fill captcha input and continue with queries. The problem is that processing script should run in the background - not the same script that generates user interface.

    Regards

  10. 10 Milan Jul 8th, 2008 at 4:57 am

    One more question ;)
    In order not to get captcha that often is it better to keep the same session (cookie) or to open new one as often as possible.

    Thanks again

Leave a Reply






Advertising, Marketing & SEO


Free traffic to your site
( without initial fee! really!! ):


Make money from your site:

Goohackle is a project about a lot of things... the web, internet, programming(from PHP, Perl, Java... to C and assembler), GNU/Linux, security, webmaster tools, webmaster tips and research, SEO, web application security, network and protocol security and research in general...


Digital Photo Art
RSS Entries and RSS Comments

Your Ad Here