DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. Getting Data From the Web

Getting Data From the Web

In this article, we'll look at a way to get/copy a table from a Wikipedia web page, using a simple JavaScript syntax for data scraping.

Mustapha Mekhatria user avatar by
Mustapha Mekhatria
CORE ·
Feb. 21, 18 · Tutorial
Like (9)
Save
Tweet
Share
5.46K Views

Join the DZone community and get the full member experience.

Join For Free

Image title

In this article, I will show you a nice and clever way to get/copy a table from a Wikipedia web page, using a simple JavaScript syntax. If you are not familiar with JavaScript, don’t worry, you can still follow along.

By the way, this process is often described as scraping data with a browser.

Let’s start!

Go to the following Wikipedia web page. Scroll down to the economy section. Switch to the developer tool:

  • On Internet Explorer: press F12.

  • On Google Chrome: click on the menu, then More Tools, and select the Developer Tools (see picture below).

Now it is time to select the table to copy the data. Click on the arrow of the Developer Tools, then click on the first element of the first table, and click on the <tbody> tag to select the table (see below):

Notice that there is a $0 after the tag <tbody>, this sign allows us to process the element table, in other words, the $0 is the table now.

Click on the Console tab.

Then write $0 and click enter.

The table select earlier is now on the console.

Cool, right!

To save the table write this code and press enter:
var wholeTable = $$("tr", $0)

Warning:
Be sure to use the straight apostrophe instead of the curly apostrophe around the tr.

To access any cell in the table, e.g., the first cell, use this code and press enter:
wholeTable[0].cells[0].innerText

The table is a set of lines and columns, and this is how it is accessible: wholeTable[Line].cells[Column]. The inneText is just a way to display the data in the cell.

Now, let’s get the data by making the following loop:

var tempObj = [];
for (i = 0; i < wholeTable.length; i++) {
    tempObj[i] = {
        Country: "",
        GDP: ""
    };

    //Copy the first column
    tempObj[i].Country = wholeTable[i].cells[1].innerText;

    //Copy the second column
    tempObj[i].GDP = parseFloat(wholeTable[i].cells[2].innerText.replace(/[^\d\.\-]/g, ""));
}

I create an empty object tempObj (an array) to copy the cells’ data in the object properties’ Country and GDP.

Remark

If you are using another table, feel free to write the titles that correspond to the table you would like to copy.

You can copy any column by adding this line of code with the right column number: wholeTable[i].cells[NumberOfTheColumn].innerText; 

The following line parseFloat(wholeTable[i].cells[2].innerText.replace(/[^\d\.\-]/g, “”) is just a trick to convert the text to a value, otherwise I will get a string instead of a value.

Well, guess what, the table is ready. The last step is to copy and paste the tempObj and use it in any environment you want. Write copy(tempObj) and press enter. Check here.

Feel free to share your experience using this method or another method. You are also welcome to ask any questions about this topic.

Have a nice day!

Database Data (computing)

Published at DZone with permission of Mustapha Mekhatria. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The Changing Face of ETL
  • Tech Layoffs [Comic]
  • Core Machine Learning Metrics
  • The Key Assumption of Modern Work Culture

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: