DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • How To Use Pandas and Matplotlib To Perform EDA In Python
  • Using Render Log Streams to Log to Papertrail
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally
  • Automating the Migration From JS to TS for the ZK Framework

Trending

  • How To Use Pandas and Matplotlib To Perform EDA In Python
  • Using Render Log Streams to Log to Papertrail
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally
  • Automating the Migration From JS to TS for the ZK Framework
  1. DZone
  2. Data Engineering
  3. Databases
  4. Blob Storage in Innodb

Blob Storage in Innodb

peter  user avatar by
peter
·
Mar. 02, 10 · News
Like (0)
Save
Tweet
Share
5.32K Views

Join the DZone community and get the full member experience.

Join For Free

i’m running in this misconception second time in a week or so, so it is time to blog about it.
how blobs are stored in innodb ? this depends on 3 factors. blob size; full row size and innodb row format.

but before we look into how blobs are really stored lets see what misconception is about. a lot of people seems to think for standard (“antelope”) format first 768 bytes are stored in the row itself while rest is stored in external pages, which would make such blobs really bad. i even seen a solution to store several smaller blobs or varchar fields which are when concatenated to get the real data. this is not exactly what happens :)

with compact and redundant row formats (used in before innodb plugin and named “antelope” in innodb plugin and xtradb) innodb would try to fit the whole row onto innodb page. at least 2 rows have to fit to each page plus some page data, which makes the limit about 8000 bytes. if row fits completely innodb will store it on the page and not use external blob storage pages. for example 7kb blob can be stored on the page. however if row does not fit on the page, for example containing two 7kb blobs innodb will have to pick some of them and store them in external blob pages. it however will keep at least 768 bytes from each of the blobs on the row page itself. with two of 7kb blobs we will have one blob stored on the page completely while another will have 768 bytes stored on the row page and the remainder at external page.

such decision to store first 768 bytes of the blob may look strange, especially as mysql internally has no optimizations to read portions of the blob – it is either read completely or not at all, so the 768 bytes on the row page is a little use – if blob is accessed external page will always have to be read. this decision seems to be rooted in desire to keep code simple while implementing initial blob support for innodb – blob can have prefix index and it was easier to implement index blobs if their prefix is always stored on the row page.

this decision also causes strange data storage “bugs” – you can store 200k blob easily, however you can’t store 20 of 10k blobs. why ? because each of them will try to store 768 bytes on the row page itself and it will not fit.

another thing to beware with innodb blob storage is the fact external blob pages are not shared among the blobs. each blob, even if it has 1 byte which does not fit on the page will have its own 16k allocated. this can be pretty inefficient so i’d recommend avoiding multiple large blobs per row when possible. much better decision in many cases could be combine data in the single large blob (and potentially compress it)

if all columns do not fit to the page completely innodb will automatically chose some of them to be on the page and some stored externally. this is not clearly documented neither can be hinted or seen. furthermore depending on column sizes it may vary for different rows. i wish innodb would have some way to tune it allowing me to force actively read columns for inline store while push some others to external storage. may be one day we’ll come to implementing this in xtradb :)

so blob storage was not very efficient in redundant (mysql 4.1 and below) and compact (mysql 5.0 and above) format and the fix comes with innodb plugin in “barracuda” format and row_format=dynamic. in this format innodb stores either whole blob on the row page or only 20 bytes blob pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. blobs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.

compressed row format is similar to dynamic when it comes to handling blobs and will use the same strategy storing blobs completely off page. it however will always compress blobs which do not fit to the row page, even if key_block_size is not specified and compression for normal data and index pages is not enabled.

if you’re interested to learn more about innodb row format check out this page in innodb docs:

it is worth to note i use blob here in a very general term. from storage prospective blob, text as well as long varchar are handled same way by innodb. this is why innodb manual calls it “long columns” rather than blobs.

InnoDB Row (database)

Published at DZone with permission of peter . See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • How To Use Pandas and Matplotlib To Perform EDA In Python
  • Using Render Log Streams to Log to Papertrail
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally
  • Automating the Migration From JS to TS for the ZK Framework

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: