How Hyperscale Storage Is Becoming More Accessible
How Hyperscale Storage Is Becoming More Accessible
See an interview about how hyperscale storage is becoming more accessible.
Join the DZone community and get the full member experience.Join For Free
As part of IT Press Tour #35, we had the opportunity to hear from Eran Kirzner, Founder and CEO of Lightbits Labs, as well as Avigdor Willenz, Chairman & Lead Investor at Habana Labs founder of Galileo Technology, and angel investor in Lightbits.
We asked Eran, “How is Lightbits Labs helping companies with hyperscale storage?”
Lightbits is building a composable disaggregated NVMe over TCP storage solution. We provide very high-performance software-defined storage with the capability of a local flash device, even if it's remote in distance within the data center.
It is a scale-out solution that enables you to scale compute and storage independently through software-defined storage. So you can pick any client, any server, any network, we can run on a quanta server, HP Dell, we can run with Intel CPU, on AMD, or even on ARM.
There are two main components that I want to touch on. The first one is NVMe over TCP. This is basically a standard that we invented together with Facebook, Dell, Intel, and a few others. Today, the standard is now fully ratified. What we have here is a super optimized TCP stack user space that combined together with the NVMe stack, gives us the ability to support in a very large data center, thousands of connections, thousands of containers in bare metal, or a virtual environment.
The second very important layer is Global FTL. FTL is a flash translation layer, that's the layer you can find in every SSD. It's a very high level during the translation between the logical transaction to the storage system to the physical transaction to the flash.
We have many years of expertise building SSD solutions and custom solutions for hyperscalers and we build a Global Flash Translation Layer that manages and orchestrates all the SSDs that you have in one box divided to different pods so you can preserve and create a quality of service,
We have a read and write strategy that allows us to take thousands of transactions that come in simultaneously to the box and manage them in a way that we can create a quality of service. We can significantly reduce the write amplification to the SSD. And, we can reduce the latency and more importantly the tail latency.
The outcome of this a combined solution of the front end and the back end of the solution which translates to a significant cost reduction to the customers — sometimes even 50% cost reduction by increasing the utilization of the flash.
Some of the utilization improvement coming from defects is disaggregation to this application solution. Some of it is coming from the fact that better management of the flash reduces the wear leveling. By reducing the leveling, we get better latency and increase the endurance of the solution.
The overall result gives you better operational efficiency, which comes from better availability of the solution with erasure coding, dealing with a moment, transient failure of SSD, reducing the number of SKUs, and so on.
On top of the software solution, we also provide an acceleration card. It will be multiple form factors that we can basically mount on different AMD-based servers and Intel-based servers and that significantly accelerates performance with data services like data reduction, compression, and so on.
Does storage have to be in one data center or can it be distributed among several data centers or even among on-premise and the cloud?
NVMe over TCP gives you the ability to run the protocol over more than one. We do have some customers that even run it between data centers and more. We recently started to trial it with data streaming between Chicago and Europe and running bonding on multiple hundred gig interface NVMe over TCP. And we managed to get very close to saturating the line. One of the things we haven't invented NVMe over TCP for such use cases, but we found out it's very efficient in those use cases. Also within the data center, you know, some of the data centers are huge, so we can have multi-zone. So basically, creating a cluster that can create a replication, actually across zones, is also something that can be done.
We asked Avigdor to share the history that led to the founding of Lightbits Labs:
In 1993, I started a company named Galileo technology. We developed the first Ethernet switch, a system-level component, and it was the first Ethernet merchant silicon that was used by many companies.
When we did that, we had Sun Sparcstations, SunTools stations, and in each one of them, we had the disk. These were hard disks and they were failing all the time. And the network at that time was assured with repeaters, 10 megabit network, and it was a complete nightmare to service and run a long simulation to build a semiconductor as a vLSI chip. It was just a nightmare because we couldn't finish long simulations because the disks were failing inside the different servers.
Then one day it came to us. We were one of the first alpha sites worldwide of NetApp and they came to us with a storage box that sits on the network. And it really made our life very different. So we disaggregated the disk storage from the compute at that time, and our life was really different. We could sleep at night, we could run long simulations and so on.
Now, when Flash came out, it was such a fantastic storage media that could really enhance the performance of computers in a major way. But the networks were not fast enough. And it was the obvious solution the same way that storage hard disks went into SANs, flash went into servers.
When you look at the similarities, there are a lot of similarities between what happened to hard disk drives in servers and flash in modern servers. Now that the network speed increased dramatically, from 100 megabits to 100 gigabits today, moving to 400 gigabits. And latency is not a major issue. And bandwidth is fantastic. It makes a lot of sense to separate again compute and storage.
So when I started Annapurna, one of the things that we did with a Nitrile architecture they call it now the Nitro architecture, they call it now the Nitro architecture in Amazon. That's a company that I started in 2010-11. I started about 20 companies in the meantime, all are in the infrastructure of data centers, compute, network, and storage. With the Nitro Architecture in Annapurna that we sold later to Amazon, one of the things we did was separate the storage from the compute, and not on the standard like NVMe over the fabric with TCP/IP but with a modification.
Amazon is big enough that they don't need to go to the standard but they are the proof of the concept because it enabled Amazon to really give a better performance to its customers better tail latency, and much better economics and serviceability of their huge data centers.
After I sold Annapurna to Amazon, I started a few other companies and one of them in the storage space, which is a Lightbits. We went with the approach to disaggregate storage from compute with all its benefits.
But when we looked at the standards, we realize that it's based on RDMA. And none of the hyperscalers and the community in data centers didn't want to use and didn't want to adopt it. Apple adopted it for their maps for a while and then abandoned it, but that was with InfiniBand.
So no one wanted to really adopt RDMA. And so we said okay, let's look at something that is standard in the industry. We looked at TCP/IP. We tried it and we found that we can give a great performance, like 5 million IOPS. And so we decided to go with something that everybody can use and still give a phenomenal performance. So that was one of the main things that we wanted to do. Now in order to implement it in the right way and to get more benefits than you can ever get when you have direct-attached storage (DAS), the team developed global FTL. So this is the second component.
The second component was to do a Global FTL with all the benefits that you cannot do in DAS because you need the CPU power for the application. So in the compute node, you cannot use all the CPU power just for maintaining, doing the garbage collection, or all the other things we do with FTL.
The third component is to make it software-defined so that any and every one of our customers can basically use the storage box, a solution with component servers they're using anyhow and not to go to an appliance that is designed every three or four years comes to market.
So with the software-defined, anybody can choose the servers they like, every time there is a faster network they can come use the faster and network, internal network controller. Every time there is a new flash system, a flash drive higher capacity, they can change the flash, and anytime that there is a faster CPU, you can use the CPU and so on. So the open architecture to be able to do it software-defined with any hardware is the third leg.
Opinions expressed by DZone contributors are their own.