I’m running in this misconception second time in a week or so, so it
is time to blog about it.
How blobs are stored in Innodb ? This depends on 3 factors. Blob size; Full row size and Innodb row format.
But before we look into how BLOBs are really stored lets see what
misconception is about. A lot of people seems to think for standard
(“Antelope”) format first 768 bytes are stored in the row itself while
rest is stored in external pages, which would make such blobs really
bad. I even seen a solution to store several smaller blobs or varchar
fields which are when concatenated to get the real data. This is not
exactly what happens
With COMPACT and REDUNDANT row formats (used in before Innodb plugin and named “Antelope” in Innodb Plugin and XtraDB) Innodb would try to fit the whole row onto Innodb page. At least 2 rows have to fit to each page plus some page data, which makes the limit about 8000 bytes. If row fits completely Innodb will store it on the page and not use external blob storage pages. For example 7KB blob can be stored on the page. However if row does not fit on the page, for example containing two 7KB blobs Innodb will have to pick some of them and store them in external blob pages. It however will keep at least 768 bytes from each of the BLOBs on the row page itself. With two of 7KB blobs we will have one blob stored on the page completely while another will have 768 bytes stored on the row page and the remainder at external page.
Such decision to store first 768 bytes of the BLOB may look strange, especially as MySQL internally has no optimizations to read portions of the blob – it is either read completely or not at all, so the 768 bytes on the row page is a little use – if BLOB is accessed external page will always have to be read. This decision seems to be rooted in desire to keep code simple while implementing initial BLOB support for Innodb – BLOB can have prefix index and it was easier to implement index BLOBs if their prefix is always stored on the row page.
This decision also causes strange data storage “bugs” – you can store 200K BLOB easily, however you can’t store 20 of 10K blobs. Why ? Because each of them will try to store 768 bytes on the row page itself and it will not fit.
Another thing to beware with Innodb BLOB storage is the fact external blob pages are not shared among the blobs. Each blob, even if it has 1 byte which does not fit on the page will have its own 16K allocated. This can be pretty inefficient so I’d recommend avoiding multiple large blobs per row when possible. Much better decision in many cases could be combine data in the single large Blob (and potentially compress it)
If all columns do not fit to the page completely Innodb will automatically chose some of them to be on the page and some stored externally. This is not clearly documented neither can be hinted or seen. Furthermore depending on column sizes it may vary for different rows. I wish Innodb would have some way to tune it allowing me to force actively read columns for inline store while push some others to external storage. May be one day we’ll come to implementing this in XtraDB
So BLOB storage was not very efficient in REDUNDANT (MySQL 4.1 and below) and COMPACT (MySQL 5.0 and above) format and the fix comes with Innodb Plugin in “Barracuda” format and ROW_FORMAT=DYNAMIC. In this format Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. BLOBs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.
COMPRESSED row format is similar to DYNAMIC when it comes to handling blobs and will use the same strategy storing BLOBs completely off page. It however will always compress blobs which do not fit to the row page, even if KEY_BLOCK_SIZE is not specified and compression for normal data and index pages is not enabled.
If you’re interested to learn more about Innodb row format check out this page in Innodb docs:
It is worth to note I use BLOB here in a very general term. From storage prospective BLOB, TEXT as well as long VARCHAR are handled same way by Innodb. This is why Innodb manual calls it “long columns” rather than BLOBs.