Over a million developers have joined DZone.

Yahoo Domain Result Grabber

· Web Dev Zone

Make the transition to Node.js if you are a Java, PHP, Rails or .NET developer with these resources to help jumpstart your Node.js knowledge plus pick up some development tips.  Brought to you in partnership with IBM.

I released my PHP Google Grabber script about a month ago and it was a big hit, even spawning Python and Groovy versions. Obtaining the number of pages indexed in Google by simply providing a domain name (or multiple, if you loop the function) can save you a lot of time. I run this script on a monthly basis to keep track of my customers' websites -- many of them use CMS' we've built so I get to take a peak at how they're doing SEO-wise.

Although Yahoo! isn't nearly as relevant as Google in the search department, Yahoo! is still the most visited website on the internet. Since I already had the basic framework of the code built (from my Google Grabber), I thought it might be beneficial to take a few moments to Yahoo!ize it.

The PHP Code

/* return result number */ 
function get_yahoo_results($domain = 'davidwalsh.name') 
	// get the result content 
	$content = file_get_contents('http://siteexplorer.search.yahoo.com/search?p=site:http://'.$domain); 
	// parse to get results 
	$pages = str_replace(array(' ',')','('),'',get_match('/Pages (.*) |/isU',$content)); 
	$inlinks = str_replace(array(' ',')','('),'',get_match('/Inlinks (.*) /isU',$content)); 
	$return['pages'] = $pages ? $pages : 0; 
	$return['inlinks'] = $inlinks? $inlinks : 0; 
	// return result 
	return $return; 
/* helper: does the regex */ 
function get_match($regex,$content) 
	return $matches[1]; 

The Usage

domains = array('davidwalsh.name','digg.com','yahoo.com','cnn.com','dzone.com','some-domain-that-doesnt-exist.com'); 
foreach($domains as $domain) 
	$result = get_yahoo_results($domain); 
	echo $domain,': ',$result['pages'],' pages, ',$result['inlinks'],' inlinks'; 
//davidwalsh.name: 204 pages, 518 inlinks 
//digg.com: 20,700,000 pages, 14,300,000 inlinks 
//yahoo.com: 1,290,000,000 pages, 4,650,000 inlinks 
//cnn.com: 7,510,000 pages, 1,090,000 inlinks 
//dzone.com: 776,000 pages, 15,000 inlinks 
//some-domain-that-doesnt-exist.com: 0 pages, 0 inlinks

Much like my Google Grabber, you may need to adjust the method of connecting to Yahoo! based on your hosting environment. CURL may be the best option for you.

Learn why developers are gravitating towards Node and its ability to retain and leverage the skills of JavaScript developers and the ability to deliver projects faster than other languages can.  Brought to you in partnership with IBM.


Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}