DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Beyond Extensions: Architectural Deep-Dives into File Upload Security
  • How Laravel Developers Handle Database Migrations Without Downtime
  • Migrating from Monolith to Microservices Using PHP: A Step-by-Step Guide
  • Inheritance in PHP: A Simple Guide With Examples

Trending

  • The Hidden Cost of AI-Generated Frontend Code
  • On-Device Debugging and JUnit 5
  • AI Assessments Are Everywhere
  • Why Infrastructure Efficiency Is Becoming the New Cloud Profitability Metric
  1. DZone
  2. Coding
  3. Languages
  4. How to Collect the Images and Meta Tags from a Webpage with PHP

How to Collect the Images and Meta Tags from a Webpage with PHP

By 
Stoimen Popov user avatar
Stoimen Popov
·
Feb. 25, 11 · News
Likes (1)
Comment
Save
Tweet
Share
24.4K Views

Join the DZone community and get the full member experience.

Join For Free

Meta Tags and the Facebook Example

You’ve definitely seen the “share a link” screen in Facebook. When you paste a link into the box (fig. 1) and press the “Attach” button you’ll get the prompted cite parsed with a title, description and possibly thumb (fig. 2). This functionality is well known in Facebook, but it appears to be well known also in various social services. In fact Linkedin, Reddit, ‘s bookmarklet use it.

Facebook Attach a Link Prompt Screen


fig. 1 - Facebook Attach a Link Prompt Screen

Fist thing to notice is that this information, prompted by Facebook, is the same as the meta tag information. However there is a slight difference.

Facebook Attached Link Screen


fig. 2 - Facebook Attached Link Screen

Facebook prefers for the thumb the image set into the <meta property=”og:image” … />. In the case above this tag appears to be:

<meta property="og:image" content="http://b.vimeocdn.com/ts/572/975/57297584_200.jpg" />

And the image pointed in the SRC attribute is exactly the same as the one prompted by Facebook (fig. 3).

Vimeo Thumb


fig. 3 - Vimeo Thumb

First thing to note is that the real thumb is bigger than the thumb shown in Facebook, so Facebook resizes it and the second thing to note is that there are more meta tags of the og:… format.


Meta Tags and The Open Graph Protocol

By default meta tags contain various information about the web page. They are not visible in the webpage, but contain some info about it. The most common meta tags are the title, description and keywords tags. They of course contain the title of the page, not that this can be different from the <title> tag, a short description of the page and some keywords describing the content of the page. They are well known also because the search engines make use of them when trying to collect information about the page and the process of SEO passes through it.

However the default HTML meta tags cannot contain everything. Thus for example you cannot point the preferable thumbnail for a webpage. The solution is the Open Graph Protocol. It comes with meta tags that can contain more and more valuable info. Such a tag is the og:image meta tag. Note that all the Open Graph (og) meta tags are defined by the og: prefix before the entity name. Thus og:image comes for images, while og:longitude for geo positioning.

That’s really useful, but how you can read them?

PHP, Meta Tags and Regexps

When you try to read information from a webpage source the first possible path is by using regular expressions. However PHP is smart enough to offer you some useful functions. Such a function is get_meta_tags(). As you may guess this method reads the meta tags by given URL.

$a = get_meta_tags('https://vimeo.com/10758212');
var_dump($a);


However this method can’t read Open Graph tags. So finally you’ve to use some regexps.

preg_match('/<meta property="og:image" content="(.*?)" \/>/', $source, $matches);


Now you can grab the og:image tag. And even more – grab every image (<img>) from that page.

preg_match_all('/<img src="(.*?)"/', $source, $m);
PHP

Published at DZone with permission of Stoimen Popov. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Beyond Extensions: Architectural Deep-Dives into File Upload Security
  • How Laravel Developers Handle Database Migrations Without Downtime
  • Migrating from Monolith to Microservices Using PHP: A Step-by-Step Guide
  • Inheritance in PHP: A Simple Guide With Examples

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook