18 Mar Ceph Gets Fit And Finish For Enterprise Storage
Ceph, the open source object storage born from a doctoral dissertation in 2005, has been aimed principally at highly scalable workloads found in HPC environments and, later, with hyperscalers who did not want to create their own storage anymore.
For years now, Ceph has given organizations object, block, and file-based storage in distributed and unified cluster systems well into the tens of petabytes and into the exabyte levels, storage that takes high levels of expertise to deploy, run, and manage. Building and managing these massive object storage clusters takes the kind of skills that HPC, hyperscaler, cloud builder, and other service providers tend to have. But large enterprises and many Tier 2 and Tier 3 service providers do not have such skills. And the workloads they need to run – either themselves or on behalf of clients – is driving demand for object storage among more mainstream enterprises, who want to leverage artificial intelligence, analytics, containers, and similar advanced technologies but who do not have the expertise to manage complex Ceph environments.
Red Hat is looking to fix that. The company, a unit within IBM, has recently rolled out Red Hat Ceph Storage 4, with the goal of bringing petabyte-scale object storage to cloud-native development and data analytics workloads that are becoming more commonplace among enterprises and can take advantage of cloud-level economics. It also will help Red Hat broaden the markets for Ceph.
“Ceph has been used pretty much in the realm of the rocket scientist and PhD,” Pete Brey, marketing manager of hybrid cloud object storage at Red Hat, tells The Next Platform. “This will bring that into the realm of more junior administrators and more like everyday use, opening up the market addressability. In the past it’s been notorious that you had to be very, very careful how you set it up. Even if you’re experienced, you had to be very careful and you had to choose the right hardware in order to get the right performance and resiliency. There’s several different things that we’re doing in this launch that enable us to make both the installation experience much simpler but also the ongoing operational management experience.”
The rise of hybrid clouds has helped drive the development of object stores, not only from Red Hat but other vendors like Cloudian, Nutanix, and Dell EMC as well as open-source stores from the likes of Minio and SwiftStack. Brey says some estimates indicate that 70 percent of object store workloads can go to the public cloud. With Ceph 4, users will be able to deploy petabyte-scale object storage compatible with Amazon Web Services S3, the touchstone for object storage in the world.
The message around Ceph 4, which was based on the Nautilus release from last year of the Ceph open-source project, is that automation and other features within Ceph Storage 4 will make it easier to run but won’t hinder the performance or the scalability and that the data within the object store is secure, Brey says. In addition, Red Hat is looking to position Ceph Storage to work with other products, such as its OpenShift containerization software.
“Our strategy is going to be increasingly use Ceph, not just the technology, but Ceph as a product for OpenShift environments,” he says. “Our view of the world is that while today there’s a lot of excitement in the application development world for container technology, the reality is the data science side of the world is seeing the possibility of containers also. We’re seeing a lot of the open-source tools being ported to run on top of Kubernetes and so we want to be able to support that. Given the ability to support massively scalable environments, we think it’s the ideal platform. Without projecting too much, you’ll see us with that kind of positioning.”
Automation A Key To Ceph 4
Modern workloads, rapid data proliferation and distributed environments are taxing enterprise storage capabilities. In short time, the amount of data created outside the traditional datacenter will swamp that being generated at the core, driving the need for such scalable solutions like Ceph. Making it easier to use will be crucial to driving enterprise adoption, and automation is what enables simplification. That includes automating the installation process and some operational management tasks. For example, the company put a GUI onto the installer, with the software looking at the hardware and ensuring there is enough memory, that the network interface cards can handle the load and that the disk subsystem can deliver the needed performance. Red Hat also put in a dashboard for automated monitoring and problem detection and resolution. It also can detect and mitigate noisy neighbors, those virtual machines that consumes a lot of I/O.
“In the past, Ceph has always been very much CLI-driven configuration management interface because Ceph was developed for these massively scalable, hyperscaler-type of installations, where everything is scripted and it really doesn’t make sense in those enterprise environments,” Brey says. “But again, we’re trying to open up the market for Ceph and that’s why we’re making these changes and adding these features. For the mainstream user, we’re trying to make it simpler and we’ll give you guidance on what the exact configuration should look like. But under the covers, if you want to get in and pop the hood and you want to tweak knobs and you want to get the absolute best performance for your particular workload, you can still do that. We haven’t taken anything away from a crowd of people who know Ceph and like Ceph and know how to tune it.”
Automation also is found in the integrated bucket notifications that support Kubernetes-native serverless architectures. The goal is to create automated data pipelines. Brey uses the example of a person getting an X-ray during a doctor’s visit. The digital images are dropped into a container, which automatically triggers downstream serverless processes using Red Hat’s AMQ Streams, a productized version of the open-source Kafka data streaming platform. The image is analyzed and if the patient is at risk, another downstream serverless process is triggered to label the image and patient’s record and move them to a clinical bucket for another doctor to see and analyze.
The technology also can be used in other…