Google Adwords Scraper – Expose your competition

We recently developed an improved version of our online Google scraper that allows us to not only parse organic results, but also the adwords, the paid or sponsored results that Google shows for different keywords, languages and countries.

The reach and value of the information that we can parse with our Google scraping technologies is huge. Imagine the online marketing research that you can do to discover and expose your competition. With the power and knowledge of being able to know which are the strongest competitors in the organic results for your target keywords and, sometimes more important, the competitors who are spending more money in adword campaigns for your niche or industry. This kind of marketing intelligence is going to move you a step forward of most of your competitors.

Another practical use of our Google adwords scraper is to do industry analysis and collect statistical information from thousand of keywords that is going to show how the markets are moving or which industries are going up or down. This information is essential or even critical for any online projects, or nowadays, almost everything has their online dimension so it’s vital for almost any new or existing project that can be benefited with an emarketing research.

Is going to be an important factor of nowadays SWOT analysis, that are an essential step on almost all new business venture. The competition analysis and industry research is fundamental in every well done SWOT analysis. And it’s more powerful yet, crossing that information with Google adwords price estimations for the keywords, you can know even the budgets of your competitors. Not exactly of course, but you can have a good idea of how much are spending particular projects and websites or a whole industry.

Of course, develop a system to be able to do huge number of queries for million of keywords isn’t easy. Our real time parsers can process a lot of information and our system was designed with scalability as one of the first concerns. That’s one of the reasons of how now we can have a Google scraper capable of doing adwords scraping too, without limitations.

Google SOAP API key end, XML PHP solution

The Google search SOAP API key service is about to end, at August 2009.

A couple of years ago I developed a PHP XML parser to get google search results in a list of URLs, like a Google API PHP (developed in that language).

Next a small introduction to what this Google search API is or what you can do with it…

This SOAP API returns a Google search result in XML format so you can use it in a lot of applications, like online SEO tools, other SEO/online marketing related applications to collect information, etc.

Anyway, this, like the new AJAX search API, are very good for small applications but also have a lot of restrictions. The old SOAP API was even better than the new AJAX one for big applications to get a lot of useful and valuable SEO information.

So, some years ago, thinking in the Google SOAP API restrictions and possible future restrictions that could have I was thinking in some way to doesn’t have any restrictions… and that was the beginning of the Google Parser idea.

Then, the time pass and a lot of improvements was made to that online version of the parser. For example, improvements on the XML PHP parser, parsing Google results for different countries, obtaining not only the URLs, also the titles, descriptions, positions and even the sponsored links.

Now with the end of the Google SOAP search API, there’s a need to, change the systems that use it or develop another Google search API, for example, a Google API PHP.

This could be useful for developers who have applications depending the Google SOAP API. So they have two choices, develop a system to emulate the Google API and then use it in your applications without any changes or change your applications in some other way to do what they do with the original API.

And that’s it, the first option is the best for me and probably I’m going to develop the Google API in PHP to do some tests.

Nowadays in my researchs I use the results in plain text but like the API use XML I will be probably developing and testing the way to get Google results in XML. It’s very easy if you have the core system already developed, just add the XML tags to the output and done.


Here are some of my best webseo online tools. You can use it absolutely free without limitations.

This kind of tools are useful for webmasters trying to do search engine optimization in your sites and for a lot of other things.

I wrote this webseo tools several months ago when I was researching about parsing google search results and obtain clean and structured data from them.

I wrote them in a period when I have some extra free time but these days I’m very busy and my ideas of extending this tools and write new ones was frozen. So the webseo tools was frozen too.

The next time that I will have some free time I probably write some new SEO tools because I already have the “hard”(it was easy ๐Ÿ˜‰ ) work done, doing the HTTP requests, parsing the results, creating objects with the data, etc.

If I have enough time even I could create another site just for the SEO research and online tools… may be… who knows…
Again, this post is also part of my SEO investigation…

Break Google captcha

Here I’m gonna write how I did to break Google captcha or “automatically bypass” the Google captcha to let one of my online tools (Google Parser) run with a lot of requests and without my intervention.

You probably know about the Google Sorry error page 503, next I’m going to write how to solve and bypass it.

First: What’s the problem to solve?

I have an online tool that does requests to Google and gets the search results. When it does too much requests Google ban it and I need to write the letters in the Google captcha to can continue to doing the requests.

Google Error


Second: How Google captcha ban works?

In a few words, when Google receive a lot of requests(there are a lot of another variables) from the same IP, it supposes that the requests are being done by an automatic script or spyware. Then Google ban that IP at least you write the letters of the captcha. If you write the correct letters Google returns a cookie to you that means “I’m a human, give me the search results” and then you can continue doing Google requests.


Third: Programming the solution…

The solution to “break Google captcha” is nothing difficult nor brilliant, just showing the captcha to the user who’s using the tool, letting him to write the letters, sending this to Google and saving the cookie to continue with the requests.

Google captcha defeated

This is the final solution, very simple, but the process wasn’t like that. To do this I had to be very carefully in the details of the HTTP requests and beat some Google tricks.


Fourth: The results…

Now the script is running, it can manage any amount of requests, there’s no time or number limit and the Google captcha isn’t a problem. ๐Ÿ˜€


The phrase “break Google captcha” isn’t the most accurate for this, but I used it because this post is part of my SEO research too…

Keyword popularity script

I wrote a Perl script to, from a keyword list, get the popularity of all the words or phrases in each line of the list.

To get the popularity of any keyword I get the Google results searching for that keyword and then I parse the Google result.

I repeat this for all the keywords in the list saving the data to a database.

The result is a list with a lot of words, keywords or phrases and the popularity of each of them according to Google.

There are a lot of things that you can do parsing Google results, the amount of information you can get is huge. You only need to understand the results and know what you are searching for.

And this isn’t only for Google, parsing the results of distinct search engines, news sites, social networks, etc… the information you can get is infinite…

Top 100 most popular english words in the web

Which are the most popular words in the world wide web?

I used my keyword popularity script with a complete keyword list of english words.

Now I have all the words in the english dictionary ordered by your popularity in the web accroding to Google.

The results are very interesting and information like this is useful for webmasters, SEO, keyword choosing, etc…

Word Popularity
in 6620000000
to 5680000000
the 5040000000
all 4810000000
and 4620000000
by 4620000000
of 4610000000
copyright 4600000000
reserved 4400000000
for 4230000000
on 4220000000
is 4090000000
this 3750000000
or 3660000000
with 3550000000
are 3520000000
an 3500000000
home 3500000000
no 3410000000
as 3280000000
us 3230000000
be 3210000000
that 3140000000
you 3080000000
from 3040000000
rights 3020000000
it 3010000000
about 2910000000
at 2900000000
not 2870000000
have 2840000000
page 2780000000
search 2780000000
was 2750000000
contact 2730000000
if 2710000000
new 2710000000
also 2690000000
en 2650000000
will 2630000000
your 2530000000
more 2520000000
one 2480000000
site 2460000000
so 2460000000
any 2440000000
can 2430000000
time 2410000000
top 2390000000
may 2370000000
other 2340000000
privacy 2330000000
up 2300000000
help 2280000000
has 2270000000
mail 2270000000
which 2240000000
do 2220000000
only 2180000000
see 2160000000
view 2160000000
use 2130000000
web 2120000000
our 2120000000
these 2110000000
but 2110000000
terms 2080000000
my 2060000000
we 2030000000
me 2020000000
out 2020000000
been 2000000000
when 1970000000
information 1950000000
la 1940000000
they 1940000000
there 1910000000
email 1900000000
free 1840000000
like 1840000000
next 1830000000
online 1820000000
date 1820000000
name 1810000000
index 1790000000
links 1790000000
over 1790000000
their 1790000000
first 1760000000
am 1750000000
id 1750000000
policy 1740000000
powered 1740000000
news 1730000000
please 1730000000
last 1690000000
service 1690000000
here 1660000000
add 1650000000
back 1640000000

Most common letters in the web

Which are the most common letters in the web?

I used my Perl script to get the popularity(according to Google) of all the words in a keyword list to find the answer.

So I have a list with the letters ordered by the most popular letters (or most common letters) in the web first.

The results are very interesting and reflect the predominant language in the web, english.

Letter Popularity
A 7150000000
I 4070000000
E 3950000000
S 3660000000
O 2840000000
T 2820000000
C 2720000000
D 2700000000
M 2560000000
N 2460000000
L 2190000000
P 2040000000
B 2020000000
F 1880000000
Y 1780000000
X 1720000000
V 1680000000
G 1670000000
W 1670000000
R 1610000000
K 1520000000
H 1490000000
J 1470000000
Z 1470000000
U 1320000000
Q 1130000000

Most popular countries in the web

This is a list of the most popular countries in the world wide web according to Google.
I made a Perl script to do this. You can obtain really interesting results with it.

How it works?

First I write or download a keyword list, for example this, a list with the names of all the countries in the world.

Then I run the script with some parameters to set how I want the Google results, the database, the table, google search parameters, etc…

The script get the Google results, parses the Google search results and save the data that I want to a MySql database, in this case the keyword and the keyword popularity in Google.
Then… I wrote this article.

Obviously this isn’t the exact measure of popularity of the countries in internet because some of them can have popularity from another sources, for example, the country named Jordan and the basketball player with the same surname. Also there are a lot of another factors who have influence over the popularity of any keyword.

Here are the results:

Country Popularity
France 1090000000
USA 754000000
China 727000000
Japan 651000000
Canada 633000000
Germany 593000000
Mexico 567000000
Australia 411000000
United Kingdom 399000000
Spain 394000000
Argentina 391000000
Italy 389000000
India 370000000
United States of America 363000000
Portugal 360000000
Ireland 347000000
South Georgia 312000000
Google Parser Online Tool Upgraded

Today I have a couple of minutes and I improve my Google Parser online tool. Now you can get a clean list of Hiperlinks, so you can quickly go to the returned URLs in your browser.

Of course there still has the option to get a clean list of only text URLs of the Google search results.

Error in Google Webmaster Tools

The error is in the “pages that link to yours” option. It isn’t accurate, it doesn’t show all the sites who really have links to yours. The external links to your site aren’t accurate.

I figure this with this new site who I’m writing. Google Webmaster Tools only shows 3 links to my site but really there are more. I write another web pages with links to my site(they already was indexed by Google), so it should appear like sites with links to yours, but in the option “pages that links to yours” of Google Webmaster Tools it doesn’t appears.

So… here comes the idea, I wrote a tool to find all the web pages that links to my site.

With this tool I found the real total number of web pages in internet with links to my site, and I can see this web pages to verify if it really have links to my site. And for my sites the results are 100% accurate.

Now my online tool shows you how many sites link to your site and the first 10 of this sites.

You should know the difference between this and the Google “link:” operator. This operator returns only the links to a exact URL, not to all the web pages of a site.

For example, you can test this tool with my URL, “”, if you use the “link:” operator in Google doesn’t return nothing, if you try with my tool, returns web pages with links to my site, the correct result expected.

