Archive for the ‘ Google Results ’ Category

Google Adwords Scraper – Expose your competition

We recently developed an improved version of our online Google scraper that allows us to not only parse organic results, but also the adwords, the paid or sponsored results that Google shows for different keywords, languages and countries.

The reach and value of the information that we can parse with our Google scraping technologies is huge. Imagine the online marketing research that you can do to discover and expose your competition. With the power and knowledge of being able to know which are the strongest competitors in the organic results for your target keywords and, sometimes more important, the competitors who are spending more money in adword campaigns for your niche or industry. This kind of marketing intelligence is going to move you a step forward of most of your competitors.

Another practical use of our Google adwords scraper is to do industry analysis and collect statistical information from thousand of keywords that is going to show how the markets are moving or which industries are going up or down. This information is essential or even critical for any online projects, or nowadays, almost everything has their online dimension so it’s vital for almost any new or existing project that can be benefited with an emarketing research.

Is going to be an important factor of nowadays SWOT analysis, that are an essential step on almost all new business venture. The competition analysis and industry research is fundamental in every well done SWOT analysis. And it’s more powerful yet, crossing that information with Google adwords price estimations for the keywords, you can know even the budgets of your competitors. Not exactly of course, but you can have a good idea of how much are spending particular projects and websites or a whole industry.

Of course, develop a system to be able to do huge number of queries for million of keywords isn’t easy. Our real time parsers can process a lot of information and our system was designed with scalability as one of the first concerns. That’s one of the reasons of how now we can have a Google scraper capable of doing adwords scraping too, without limitations.

Google Pacman Bug

Today Google does an interesting thing with their logo, an interactive Pacman game. The creativity of Google is still awsome!

After playing some seconds, I needed to let my Iceweasel window on the background and continue with some work in the console. Then I come back to my browser and I realized that my pacman was in a place where the ghost monster can’t reach, a bug in the Google Pacman game!

Google Pacman Bug

Google Pacman Bug Screenshot

Anyway, this Google Pacman bug isn’t any serious problem, just a funny detail. You can’t do nothing with your pacman in the game using this bug to stay still in that place.

Now the Google game is still in that place since more than one hour. I think that eventually some ghost is going to reach the pacman.

Google Pacman Game Bug after some minutes

Google Pacman Game Bug

This is just a quick post, feel free to comment and share with us if you can reproduce the bug!

Google SOAP API key end, XML PHP solution

The Google search SOAP API key service is about to end, at August 2009.

A couple of years ago I developed a PHP XML parser to get google search results in a list of URLs, like a Google API PHP (developed in that language).

Next a small introduction to what this Google search API is or what you can do with it…

This SOAP API returns a Google search result in XML format so you can use it in a lot of applications, like online SEO tools, other SEO/online marketing related applications to collect information, etc.

Anyway, this, like the new AJAX search API, are very good for small applications but also have a lot of restrictions. The old SOAP API was even better than the new AJAX one for big applications to get a lot of useful and valuable SEO information.

So, some years ago, thinking in the Google SOAP API restrictions and possible future restrictions that could have I was thinking in some way to doesn’t have any restrictions… and that was the beginning of the Google Parser idea.

Then, the time pass and a lot of improvements was made to that online version of the parser. For example, improvements on the XML PHP parser, parsing Google results for different countries, obtaining not only the URLs, also the titles, descriptions, positions and even the sponsored links.

Now with the end of the Google SOAP search API, there’s a need to, change the systems that use it or develop another Google search API, for example, a Google API PHP.

This could be useful for developers who have applications depending the Google SOAP API. So they have two choices, develop a system to emulate the Google API and then use it in your applications without any changes or change your applications in some other way to do what they do with the original API.

And that’s it, the first option is the best for me and probably I’m going to develop the Google API in PHP to do some tests.

Nowadays in my researchs I use the results in plain text but like the API use XML I will be probably developing and testing the way to get Google results in XML. It’s very easy if you have the core system already developed, just add the XML tags to the output and done.

Break Google captcha

Here I’m gonna write how I did to break Google captcha or “automatically bypass” the Google captcha to let one of my online tools (Google Parser) run with a lot of requests and without my intervention.

You probably know about the Google Sorry error page 503, next I’m going to write how to solve and bypass it.

.
First: What’s the problem to solve?

I have an online tool that does requests to Google and gets the search results. When it does too much requests Google ban it and I need to write the letters in the Google captcha to can continue to doing the requests.

Google Error

.

Second: How Google captcha ban works?

In a few words, when Google receive a lot of requests(there are a lot of another variables) from the same IP, it supposes that the requests are being done by an automatic script or spyware. Then Google ban that IP at least you write the letters of the captcha. If you write the correct letters Google returns a cookie to you that means “I’m a human, give me the search results” and then you can continue doing Google requests.

.

Third: Programming the solution…

The solution to “break Google captcha” is nothing difficult nor brilliant, just showing the captcha to the user who’s using the tool, letting him to write the letters, sending this to Google and saving the cookie to continue with the requests.

Google captcha defeated

This is the final solution, very simple, but the process wasn’t like that. To do this I had to be very carefully in the details of the HTTP requests and beat some Google tricks.

.

Fourth: The results…

Now the script is running, it can manage any amount of requests, there’s no time or number limit and the Google captcha isn’t a problem. :D

.

The phrase “break Google captcha” isn’t the most accurate for this, but I used it because this post is part of my SEO research too…

Keyword popularity script

I wrote a Perl script to, from a keyword list, get the popularity of all the words or phrases in each line of the list.

To get the popularity of any keyword I get the Google results searching for that keyword and then I parse the Google result.

I repeat this for all the keywords in the list saving the data to a database.

The result is a list with a lot of words, keywords or phrases and the popularity of each of them according to Google.

There are a lot of things that you can do parsing Google results, the amount of information you can get is huge. You only need to understand the results and know what you are searching for.

And this isn’t only for Google, parsing the results of distinct search engines, news sites, social networks, etc… the information you can get is infinite…

You can use the online version of this tool here: Keyword popularity tool

Or read my first post about this webmaster online tool here: Online keyword popularity tool gives interesting results

Google Parser Online Tool Upgraded

Today I have a couple of minutes and I improve my Google Parser online tool. Now you can get a clean list of Hiperlinks, so you can quickly go to the returned URLs in your browser.

Of course there still has the option to get a clean list of only text URLs of the Google search results.

You can read the original post of this tool at: Get Google results in a list of clean URLs

Or you can use the online tool at: Google Parser

Any comments or suggestions are welcome.

JBoss Security vulnerability JMX Management Console

Awesome! A lot of servers have their JBoss Management Console open to the world, without any authentication, no password, no security! A huge and silly vulnerability!

Any remote user can completely control the server, having full control to a lot of server configurations and internal network and infrastructure information disclosure, you can change the web service listening port (I test this with one of them, then I put back the original port), view internal IPs and start connections to a client, a lot of server absolute paths, you can change security configurations… too much power with almost no knowledge needed.

This vulnerable JBoss servers let open access to anybody to jmx-console and web-console, these are the online administration tools of JBoss.

There still are a lot of this kind of silly vulnerabilities in the Internet… theres not a JBoss vulnerability, theres a people vulnerability!

Oh, I almost forgot it… you can find all the vulnerable servers using my online Google Parser tool who I wrote a couple of weeks ago. With it you can get a clean list of all the vulnerable sites searching for:

intitle:”jboss management console” “application server” version inurl:”web-console”

or

intitle:”JBoss Management Console – Server Information” “application server” inurl:”web-console” OR inurl:”jmx-console”

You can try different Google search strings and get a clean list of URLs of the Google search results with my Google Parser online tool.

It’s amazing how developers and network administrators still doesn’t pay real attention to security!

Error in Google Webmaster Tools

The error is in the “pages that link to yours” option. It isn’t accurate, it doesn’t show all the sites who really have links to yours. The external links to your site aren’t accurate.

I figure this with this new site who I’m writing. Google Webmaster Tools only shows 3 links to my site but really there are more. I write another web pages with links to my site(they already was indexed by Google), so it should appear like sites with links to yours, but in the option “pages that links to yours” of Google Webmaster Tools it doesn’t appears.

So… here comes the idea, I wrote a tool to find all the web pages that links to my site.

With this tool I found the real total number of web pages in internet with links to my site, and I can see this web pages to verify if it really have links to my site. And for my sites the results are 100% accurate.

Now my online tool shows you how many sites link to your site and the first 10 of this sites.

You can try it here: GooLinks

You should know the difference between this and the Google “link:” operator. This operator returns only the links to a exact URL, not to all the web pages of a site.

For example, you can test this tool with my URL, “goohackle.com”, if you use the “link:” operator in Google doesn’t return nothing, if you try with my tool, returns web pages with links to my site, the correct result expected.

You can read a little more of my tool here: Who links to me

Who links to me

I wrote this simple online tool to find the external links to any web site, who and how many web sites in internet have links to any page of your web site or blog.

It uses a simple Google query applying basic Google operands knowledge to find sites with links to you(or backlinks or external links).

It doesn’t use the Google operator “link:” because, for example, “link:goohackle.com” returns only the links to “goohackle.com” home, if exists a link to another web page of goohackle.com, with this google operator you don’t gonna find it.

Then it returns the number of web sites with links to yours and the list of URLs who have links to your site.

I found useful for webmasters to get a measure of the popularity of your web site in internet and know who links to your web site or blog, to know the sites that link to yours.

You can use the online SEO tool here: Who links to me (GooLinks)

or go to SEO Tools section and take a look at my other web tools.

You can read how born this tool here: Error in Google Webmaster Tools

Online keyword popularity tool gives interesting results

I was testing the Web 2.0 technologies and I wrote a tool using Ajax and WebServices to get the popularity of any keyword in the web using the Google search engine results.

Google search amount of results

This can be very useful for webmasters to see the amount of competency who exists in the web for some keyword or phrase and perform a better SEO.

Also, it’s very interesting the results obtained for some words compared with others.

You can use it here: Keyword Popularity Tool

I’m working in a couple of other interesting tools related to Google SEO, the web and analysis about that… when I have time I’m gonna publish here…