DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • What Is Web Scraping and Why Do Software Developers Use It?
  • OneStream Fast Data Extracts APIs
  • Best Python Libraries for Web Scraping
  • How To Read Data From Excel Files Using RPA

Trending

  • Next Evolution in Integration: Architecting With Intent Using Model Context Protocol
  • Traditional Testing and RAGAS: A Hybrid Strategy for Evaluating AI Chatbots
  • Distributed Consensus: Paxos vs. Raft and Modern Implementations
  • Navigating and Modernizing Legacy Codebases: A Developer's Guide to AI-Assisted Code Understanding
  1. DZone
  2. Data Engineering
  3. Data
  4. SerpApi YouTube Data Extraction Tool

SerpApi YouTube Data Extraction Tool

YouTube data has become a major part of machine learning and data analytics. Here's how to use SerpApi to extract YouTube data and query it for analysis.

By 
Charles Mabwa user avatar
Charles Mabwa
·
Jan. 03, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
1.3K Views

Join the DZone community and get the full member experience.

Join For Free

With many people shifting to digital online broadcasting, the platform has grown exponentially. YouTube data has become a major part of the analysis in machine learning and data analytics. Using SerpApi, we will extract YouTube data and query it for analysis.

Prerequisites

The product is easy to use and flexible to tailor across multiple YouTube content depending on the field of interest. However, one will require a mid-level knowledge and understanding of:

1. Python 

2. VS code and/or Jupyter Notebook

Tool

To have access to the general tool, go to the YouTube Search API. The below tool has been tailored to extract video results based on searches of desired facets of the data analytics, data science, and data engineers tools:

Properties files
 
import pandas as pd
from serpapi import GoogleSearch
import json

api_key =  "serp_api_key"
engine_search = "youtube"

#data analysis tools:
code_langs = [
    {"name":"engine", "query":"sql"},
    {"name":"engine", "query":"excel"},
    {"name":"engine", "query":"tableau"},
    {"name":"engine", "query":" microsoft azure"},
    {"name":"engine", "query":"amazon web services"},
    {"name":"engine", "query":"r programming"}
    #{"name":"engine", "query":"name_of_product"}   
]

data_vids = pd.DataFrame([])

for lang in code_langs:
    params = {
        lang['name']:engine_search, 
        "search_query": lang['query'],
        "api_key": api_key
    }
    
    search = GoogleSearch(params)
    results = search.get_dict()
    playlist_results = results['video_results']
    data_vids = data_vids.append(pd.json_normalize(playlist_results), ignore_index = True)
    data_vids.to_csv('code_languages.csv')

The tool uses pandas and SerpApi libraries. One is required to have a basic familiarity with the two. In case you are beginning to use the product and are new to python, it is possible that you should first install the required libraries using conda install or pip install.

The code_langs is a tailored list of parameters to be looped over and desired video results appended in the data frame data_vids. For example:

Properties files
 
{"name":"engine", "query":" microsoft azure"},

Using for loop the tool will return microsoft azure content from YouTube. This will be the case for all the queried lists in the code_langs until all the desired videos are appended.

The data is then stored in a CSV file, find the CSV file here

Data

The data extracted has 13 columns. The first 10 are the essential columns with variables that could trigger an analysis.

Properties files
 
position_on_page	title	link	published_date	views	length	description	extensions	channel.name	channel.link	channel.verified	channel.thumbnail	thumbnail.static	thumbnail.rich

These variables include:

a. position on the page: This is the position of the specific video upon the search of a data analytics, data science, or engineering tool.

b. title: The title of the video. This tells what the video is all about.

c. published_date: The date the video was uploaded on YouTube.

d. views: The number of people who have watched the video.

e. length:This tells the duration it takes to watch the video.

f. channel.name: The channel from which a video was retrieved. 

To further know how to extract channel details, visit Channel Results 

The complete code to the tool can be found in this Github Repo. 

YouTube Search API documentation: https://serpapi.com/youtube-search-api

You can also contribute to our user forum here: https://forum.serpapi.com/

Data (computing) Data extraction

Published at DZone with permission of Charles Mabwa. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • What Is Web Scraping and Why Do Software Developers Use It?
  • OneStream Fast Data Extracts APIs
  • Best Python Libraries for Web Scraping
  • How To Read Data From Excel Files Using RPA

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!