DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Coding
  3. Languages
  4. A Look at Python Efficiency and Scalability through YouTube

A Look at Python Efficiency and Scalability through YouTube

Chris Smith user avatar by
Chris Smith
·
Mar. 27, 12 · Interview
Like (2)
Save
Tweet
Share
8.37K Views

Join the DZone community and get the full member experience.

Join For Free
At PyCon 2012, Mike Solomon gave a great talk about scalability, and it's importance for a site like YouTube that is essentially growing everyday.  The key to scalability, as Mike describes it, is using simple tools to do really cool stuff and YouTube has compiled an impressive stack for achieving this scalability that includes Python, Apache, Linux, and MySQL.

Mike Solomon - Scalability at YouTube



(Skip to the 10 minute mark for the beginning of Mike's presentation)


Most of the infrastructure on YouTube is Python, as there are currently over a million lines of Python code, and a lot of YouTube systems start out as one Python file that grows into a large ecosystem over the years.  As Mike put it:

A scalable system is one that's not in your way, that you're sort of unaware of.  It's not buzzwords or anything like that, its just about a general problem solving ethos…You need flexibility to solve problems and the minute you over specify something, you paint yourself into a corner.



YouTube is a prime example of this flexibility.  For those who weren't aware, YouTube began as a dating site, and had they remained a dating site, Mike would have been giving a much different presentation.  

Mike equates distributed applications to weather systems and says that debugging them is about as deterministic as predicting the weather.  YouTube uses Jitter, which adds some variance to things like cache expirations, preventing the creation of "thundering herds".  For example, if your general expiration time is 24 hours, Jitter allows you to vary that time from 18 to 30 hours for each machine.

Mike goes on to cover a number of Scalability techniques including:

Divide and Conquer - here, simple and loose connections are extremely valuable as work is partitioned out.

Approximate Correctness - the system is what it appears to be, so if a user doesn't know that something is missing, then technically, it isn't.

Expert Knob Twiddling - Adjust your system's consistency models based on the data you're processing.

Cheating - Or rather, "Knowing how to fake data." The fastest function call is the one that doesn't happen, so sometimes faking data is good enough.

Finally, Mike touched on the Efficiency of Python:

While C is more efficient, Python provides for greater scalability.  Mike explains that there are a lot of things in Python that are counterintuitive, such as the cost of garbage collection, and efficiency in Python is more about what not to do.  To counteract this efficiency issue with Python, YouTube uses efficient libraries like wiseguy, pycurl, and spitfire.

There's a lot of really powerful metaprogramming things [in Python] and how they interact and how dynamic you make things has a pretty direct correlation to how expensive it is to run your Python app.

In this case dumb = fast, meaning simple code is easier to grep for and easier to maintain.  The more complex a codebase is, the harder it is to decode.

So overall, when working with scalability in mind, keep things simple.  Find the simplest solution to your problem that has the loosest, most practical guarantees.
Python (language) Scalability Efficiency (statistics)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How To Perform Local Website Testing Using Selenium And Java
  • Top 10 Best Practices for Web Application Testing
  • 4 Best dApp Frameworks for First-Time Ethereum Developers
  • Best Practices for Writing Clean and Maintainable Code

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: