Archive for the ‘ SEO & Web 2.0 ’ Category

Google Adwords Scraper – Expose your competition

We recently developed an improved version of our online Google scraper that allows us to not only parse organic results, but also the adwords, the paid or sponsored results that Google shows for different keywords, languages and countries.

The reach and value of the information that we can parse with our Google scraping technologies is huge. Imagine the online marketing research that you can do to discover and expose your competition. With the power and knowledge of being able to know which are the strongest competitors in the organic results for your target keywords and, sometimes more important, the competitors who are spending more money in adword campaigns for your niche or industry. This kind of marketing intelligence is going to move you a step forward of most of your competitors.

Another practical use of our Google adwords scraper is to do industry analysis and collect statistical information from thousand of keywords that is going to show how the markets are moving or which industries are going up or down. This information is essential or even critical for any online projects, or nowadays, almost everything has their online dimension so it’s vital for almost any new or existing project that can be benefited with an emarketing research.

Is going to be an important factor of nowadays SWOT analysis, that are an essential step on almost all new business venture. The competition analysis and industry research is fundamental in every well done SWOT analysis. And it’s more powerful yet, crossing that information with Google adwords price estimations for the keywords, you can know even the budgets of your competitors. Not exactly of course, but you can have a good idea of how much are spending particular projects and websites or a whole industry.

Of course, develop a system to be able to do huge number of queries for million of keywords isn’t easy. Our real time parsers can process a lot of information and our system was designed with scalability as one of the first concerns. That’s one of the reasons of how now we can have a Google scraper capable of doing adwords scraping too, without limitations.

Google SOAP API key end, XML PHP solution

The Google search SOAP API key service is about to end, at August 2009.

A couple of years ago I developed a PHP XML parser to get google search results in a list of URLs, like a Google API PHP (developed in that language).

Next a small introduction to what this Google search API is or what you can do with it…

This SOAP API returns a Google search result in XML format so you can use it in a lot of applications, like online SEO tools, other SEO/online marketing related applications to collect information, etc.

Anyway, this, like the new AJAX search API, are very good for small applications but also have a lot of restrictions. The old SOAP API was even better than the new AJAX one for big applications to get a lot of useful and valuable SEO information.

So, some years ago, thinking in the Google SOAP API restrictions and possible future restrictions that could have I was thinking in some way to doesn’t have any restrictions… and that was the beginning of the Google Parser idea.

Then, the time pass and a lot of improvements was made to that online version of the parser. For example, improvements on the XML PHP parser, parsing Google results for different countries, obtaining not only the URLs, also the titles, descriptions, positions and even the sponsored links.

Now with the end of the Google SOAP search API, there’s a need to, change the systems that use it or develop another Google search API, for example, a Google API PHP.

This could be useful for developers who have applications depending the Google SOAP API. So they have two choices, develop a system to emulate the Google API and then use it in your applications without any changes or change your applications in some other way to do what they do with the original API.

And that’s it, the first option is the best for me and probably I’m going to develop the Google API in PHP to do some tests.

Nowadays in my researchs I use the results in plain text but like the API use XML I will be probably developing and testing the way to get Google results in XML. It’s very easy if you have the core system already developed, just add the XML tags to the output and done.

WebSEO

Here are some of my best webseo online tools. You can use it absolutely free without limitations.

This kind of tools are useful for webmasters trying to do search engine optimization in your sites and for a lot of other things.

I wrote this webseo tools several months ago when I was researching about parsing google search results and obtain clean and structured data from them.

I wrote them in a period when I have some extra free time but these days I’m very busy and my ideas of extending this tools and write new ones was frozen. So the webseo tools was frozen too.

The next time that I will have some free time I probably write some new SEO tools because I already have the “hard”(it was easy ;) ) work done, doing the HTTP requests, parsing the results, creating objects with the data, etc.

If I have enough time even I could create another site just for the SEO research and online tools… may be… who knows…
Again, this post is also part of my SEO investigation…

Break Google captcha

Here I’m gonna write how I did to break Google captcha or “automatically bypass” the Google captcha to let one of my online tools (Google Parser) run with a lot of requests and without my intervention.

You probably know about the Google Sorry error page 503, next I’m going to write how to solve and bypass it.

.
First: What’s the problem to solve?

I have an online tool that does requests to Google and gets the search results. When it does too much requests Google ban it and I need to write the letters in the Google captcha to can continue to doing the requests.

Google Error

.

Second: How Google captcha ban works?

In a few words, when Google receive a lot of requests(there are a lot of another variables) from the same IP, it supposes that the requests are being done by an automatic script or spyware. Then Google ban that IP at least you write the letters of the captcha. If you write the correct letters Google returns a cookie to you that means “I’m a human, give me the search results” and then you can continue doing Google requests.

.

Third: Programming the solution…

The solution to “break Google captcha” is nothing difficult nor brilliant, just showing the captcha to the user who’s using the tool, letting him to write the letters, sending this to Google and saving the cookie to continue with the requests.

Google captcha defeated

This is the final solution, very simple, but the process wasn’t like that. To do this I had to be very carefully in the details of the HTTP requests and beat some Google tricks.

.

Fourth: The results…

Now the script is running, it can manage any amount of requests, there’s no time or number limit and the Google captcha isn’t a problem. :D

.

The phrase “break Google captcha” isn’t the most accurate for this, but I used it because this post is part of my SEO research too…

Keyword popularity script

I wrote a Perl script to, from a keyword list, get the popularity of all the words or phrases in each line of the list.

To get the popularity of any keyword I get the Google results searching for that keyword and then I parse the Google result.

I repeat this for all the keywords in the list saving the data to a database.

The result is a list with a lot of words, keywords or phrases and the popularity of each of them according to Google.

There are a lot of things that you can do parsing Google results, the amount of information you can get is huge. You only need to understand the results and know what you are searching for.

And this isn’t only for Google, parsing the results of distinct search engines, news sites, social networks, etc… the information you can get is infinite…

You can use the online version of this tool here: Keyword popularity tool

Or read my first post about this webmaster online tool here: Online keyword popularity tool gives interesting results

Top 100 most popular english words in the web

Which are the most popular words in the world wide web?

I used my keyword popularity script with a complete keyword list of english words.

Now I have all the words in the english dictionary ordered by your popularity in the web accroding to Google.

The results are very interesting and information like this is useful for webmasters, SEO, keyword choosing, etc…

Word Popularity
in 6620000000
to 5680000000
the 5040000000
all 4810000000
and 4620000000
by 4620000000
of 4610000000
copyright 4600000000
reserved 4400000000
for 4230000000
on 4220000000
is 4090000000
this 3750000000
or 3660000000
with 3550000000
are 3520000000
an 3500000000
home 3500000000
no 3410000000
as 3280000000
us 3230000000
be 3210000000
that 3140000000
you 3080000000
from 3040000000
rights 3020000000
it 3010000000
about 2910000000
at 2900000000
not 2870000000
have 2840000000
page 2780000000
search 2780000000
was 2750000000
contact 2730000000
if 2710000000
new 2710000000
also 2690000000
en 2650000000
will 2630000000
your 2530000000
more 2520000000
one 2480000000
site 2460000000
so 2460000000
any 2440000000
can 2430000000
time 2410000000
top 2390000000
may 2370000000
other 2340000000
privacy 2330000000
up 2300000000
help 2280000000
has 2270000000
mail 2270000000
which 2240000000
do 2220000000
only 2180000000
see 2160000000
view 2160000000
use 2130000000
web 2120000000
our 2120000000
these 2110000000
but 2110000000
terms 2080000000
my 2060000000
we 2030000000
me 2020000000
out 2020000000
been 2000000000
when 1970000000
information 1950000000
la 1940000000
they 1940000000
there 1910000000
email 1900000000
free 1840000000
like 1840000000
next 1830000000
online 1820000000
date 1820000000
name 1810000000
index 1790000000
links 1790000000
over 1790000000
their 1790000000
first 1760000000
am 1750000000
id 1750000000
policy 1740000000
powered 1740000000
news 1730000000
please 1730000000
last 1690000000
service 1690000000
here 1660000000
add 1650000000
back 1640000000

You can use the online version of my keyword popularity script here:

Keyword popularity tool

Most common letters in the web

Which are the most common letters in the web?

I used my Perl script to get the popularity(according to Google) of all the words in a keyword list to find the answer.

So I have a list with the letters ordered by the most popular letters (or most common letters) in the web first.

The results are very interesting and reflect the predominant language in the web, english.

Letter Popularity
A 7150000000
I 4070000000
E 3950000000
S 3660000000
O 2840000000
T 2820000000
C 2720000000
D 2700000000
M 2560000000
N 2460000000
L 2190000000
P 2040000000
B 2020000000
F 1880000000
Y 1780000000
X 1720000000
V 1680000000
G 1670000000
W 1670000000
R 1610000000
K 1520000000
H 1490000000
J 1470000000
Z 1470000000
U 1320000000
Q 1130000000

You can use the online version of this tool here: Keyword popularity tool

Most popular countries in the web

This is a list of the most popular countries in the world wide web according to Google.
I made a Perl script to do this. You can obtain really interesting results with it.

How it works?

First I write or download a keyword list, for example this, a list with the names of all the countries in the world.

Then I run the script with some parameters to set how I want the Google results, the database, the table, google search parameters, etc…

The script get the Google results, parses the Google search results and save the data that I want to a MySql database, in this case the keyword and the keyword popularity in Google.
Then… I wrote this article.

Obviously this isn’t the exact measure of popularity of the countries in internet because some of them can have popularity from another sources, for example, the country named Jordan and the basketball player with the same surname. Also there are a lot of another factors who have influence over the popularity of any keyword.

You can use the online version of this tool to get the popularity of any keyword at: Keyword popularity tool

Here are the results:

Country Popularity
France 1090000000
USA 754000000
China 727000000
Japan 651000000
Canada 633000000
Germany 593000000
Mexico 567000000
Australia 411000000
United Kingdom 399000000
Spain 394000000
Argentina 391000000
Italy 389000000
India 370000000
United States of America 363000000
Portugal 360000000
Ireland 347000000
South Georgia 312000000
Austria 303000000
Korea, South 298000000
Korea 292000000
Venezuela 288000000
Taiwan 284000000
Chile 280000000
South Africa 272000000
Brazil 263000000
Indonesia 261000000
Hong Kong 260000000
Georgia 258000000
Israel 255000000
Vietnam 245000000
Singapore 235000000
Jersey 228000000
Peru 227000000
Colombia 226000000
Thailand 225000000
Netherlands 219000000
Poland 214000000
Ecuador 206000000
Sweden 205000000
Russia 204000000
Jordan 204000000
Uruguay 203000000
Romania 202000000
Iraq 202000000
Malaysia 200000000
Ukraine 197000000
Iran 192000000
Turkey 190000000
Panama 186000000
Greece 185000000
Czech Republic 184000000
Belgium 182000000
Switzerland 182000000
Bulgaria 176000000
Finland 171000000
Costa Rica 170000000
Cuba 170000000
Egypt 169000000
Philippines 168000000
Malta 168000000
Hungary 168000000
Denmark 167000000
Pakistan 166000000
New Zealand 166000000
Saudia Arabia 163000000
Luxembourg 163000000
Guatemala 163000000
Korea, North 162000000
Paraguay 159000000
Estonia 158000000
Monaco 156000000
Kuwait 155000000
Mali 154000000
Reunion 153000000
Norway 152000000
Puerto Rico 151000000
Bolivia 150000000
Honduras 150000000
Nicaragua 148000000
Guadeloupe 147000000
Liechtenstein 143000000
Martinique 142000000
Afghanistan 139000000
Kenya 139000000
Nepal 137000000
Senegal 137000000
Latvia 136000000
Andorra 133000000
Oman 132000000
Nigeria 129000000
Slovakia 129000000
Lithuania 129000000
Bahamas 128000000
Bangladesh 126000000
El Salvador 123000000
Cyprus 120000000
Barbados 119000000
Palau 119000000
Jamaica 118000000
Ghana 118000000
Croatia 118000000
Sri Lanka 118000000
Guinea 117000000
Slovenia 117000000
Gibraltar 117000000
Samoa 116000000
Belize 115000000
Guam 115000000
Zimbabwe 115000000
Togo 114000000
Angola 114000000
Qatar 113000000
Laos 112000000
Bermuda 111000000
Guyana 111000000
Lebanon 109000000
Burundi 108000000
Niger 108000000
Aruba 108000000
Bosnia 107000000
Botswana 106000000
Montserrat 106000000
Malawi 105000000
Mauritius 105000000
Tanzania 104000000
Syria 104000000
Lesotho 103000000
Kazakhstan 103000000
Uganda 103000000
Sudan 102000000
Vanuatu 102000000
Grenada 102000000
Namibia 102000000
Fiji 101000000
Bahrain 101000000
Iceland 101000000
Macedonia 101000000
Liberia 101000000
Haiti 99700000
Madagascar 99600000
Belarus 99200000
Turkmenistan 98500000
Dominica 98100000
Anguilla 96700000
Bhutan 96300000
Chad 96300000
Micronesia 96200000
Tonga 96200000
Turks 96100000
Suriname 96000000
United Arab Emirates 95900000
Albania 95500000
Yemen 95400000
Congo 94900000
Gambia 94900000
Mongolia 94500000
Wallis 94200000
Rwanda 93400000
Kiribati 93400000
Moldova 93400000
Uzbekistan 93300000
Tuvalu 93100000
Benin 92700000
Nauru 92400000
Gabon 92300000
Mozambique 92100000
Seychelles 91800000
Armenia 91500000
Morocco 91300000
Zambia 90800000
Somalia 90300000
Cambodia 89100000
Tunisia 88500000
Herzegovina 87100000
Mayotte 86900000
Eritrea 86700000
Djibouti 86000000
Swaziland 85700000
Azerbaijan 85600000
Maldives 84500000
Trinidad and Tobago 80900000
Niue 80700000
Central African Republic 80600000
Tajikistan 80000000
Macau 79800000
Algeria 79100000
Kyrgyzstan 76600000
Mauritania 76600000
Burkina Faso 76300000
Cameroon 74800000
Greenland 71200000
Malvinas 71100000
Ethiopia 71000000
Sierra Leone 70400000
Comoros 70000000
Libya 69600000
Congo, Republic of 68300000
San Marino 66800000
Myanmar 66300000
Antigua and Barbuda 64400000
Montenegro 64000000
Serbia 63800000
Dominican Republic 63300000
Norfolk Island 61700000
Bosnia and Herzegovina 59400000
Christmas Island 57600000
Marshall Islands 56400000
Cook Islands 55700000
Guinea-Bissau 55100000
Papua New Guinea 54600000
American Samoa 53300000
Cayman Islands 53200000
Saint Kitts and Nevis 53200000
Cape Verde 53000000
United States Virgin Islands 51400000
Netherlands Antilles 51200000
Equatorial Guinea 50200000
Solomon Islands 49100000
Cote d’Ivoire 48400000
Saint Lucia 46100000
New Caledonia 45700000
British Virgin Islands 45000000
Yugoslavia 44700000
Falkland Islands 42200000
Antarctica 41500000
Caicos Islands 40000000
Turks and Caicos Islands 39200000
Tokelau 36100000
French Polynesia 35200000
Sao Tome and Principe 33200000
Saint Helena 32600000
Western Sahara 31400000
Saint Vincent and the Grenadines 30900000
Congo, Democratic Republic of 29600000
Svalbard 29500000
Northern Mariana Islands 29100000
French Guiana 28900000
EEUU 25100000
USSR 24000000
Wallis and Futuna Islands 21400000
Futuna Islands 19200000
Faroe Islands 19100000
Guernsey 18300000
Saint Pierre and Miquelon 13400000
French Southern Territories 13200000
Pitcairn Island 12700000
Brunei Darussalam 12000000
Falkland Islands (Malvinas) 5320000
Isle of Man 4730000
Ă…land 4630000
Heard and McDonald Islands 3290000
British Indian Ocean Territory 3110000
Bouvet Island 2840000
Vatican City State (Holy See) 2660000
Jan Mayen Islands 2590000
South Sandwich Islands 2550000
South Georgia South Sandwich Islands 2480000
Cocos (Keeling) Island 2480000
Timor-Leste 2470000
Svalbard and Jan Mayen Islands 2450000
Ascension Island 2390000
US Minor Outlying Islands 2360000
Palestinian Territory, Occupied 2110000

Google Parser Online Tool Upgraded

Today I have a couple of minutes and I improve my Google Parser online tool. Now you can get a clean list of Hiperlinks, so you can quickly go to the returned URLs in your browser.

Of course there still has the option to get a clean list of only text URLs of the Google search results.

You can read the original post of this tool at: Get Google results in a list of clean URLs

Or you can use the online tool at: Google Parser

Any comments or suggestions are welcome.

Error in Google Webmaster Tools

The error is in the “pages that link to yours” option. It isn’t accurate, it doesn’t show all the sites who really have links to yours. The external links to your site aren’t accurate.

I figure this with this new site who I’m writing. Google Webmaster Tools only shows 3 links to my site but really there are more. I write another web pages with links to my site(they already was indexed by Google), so it should appear like sites with links to yours, but in the option “pages that links to yours” of Google Webmaster Tools it doesn’t appears.

So… here comes the idea, I wrote a tool to find all the web pages that links to my site.

With this tool I found the real total number of web pages in internet with links to my site, and I can see this web pages to verify if it really have links to my site. And for my sites the results are 100% accurate.

Now my online tool shows you how many sites link to your site and the first 10 of this sites.

You can try it here: GooLinks

You should know the difference between this and the Google “link:” operator. This operator returns only the links to a exact URL, not to all the web pages of a site.

For example, you can test this tool with my URL, “goohackle.com”, if you use the “link:” operator in Google doesn’t return nothing, if you try with my tool, returns web pages with links to my site, the correct result expected.

You can read a little more of my tool here: Who links to me