Previous Blogs

October 11, 2022
Google Unveils a Host of Open Data and AI Advancements at Cloud Next

October 5, 2022
From PCs to Cars: Nvidia, Qualcomm & Intel Race to Automotive Semis

September 14, 2022
Intel Highlights Key PC Platform Innovations for 13th Gen Core CPUs

September 1, 2022
VMware’s vSphere 8 Brings DPUs from AMD, Intel, and Nvidia to Life

August 17, 2022
Cloudera CDP One Brings Advanced Data Management to the Mainstream

August 10, 2022
IBM Research Tech Makes Edge AI Applications Scalable

July 20, 2022
Amazon Extends Alexa’s Reach with New Tools

July 19, 2022
Qualcomm Accelerates Wearables with W5 Platforms

July 12, 2022
New Research Highlights Opportunities and Challenges for Private 5G

June 29, 2022
Arm Aims to Make Mobile Graphics “Immortal-is”

June 14, 2022
Cisco Brings Simplicity and Observability to Networks, Collaboration and Cloud Apps

May 24, 2022
Microsoft Unveils Foundation for AI-Powered Client/Cloud Hybrid Loop

May 18, 2022
Citrix to Integrate with Microsoft Windows 365

May 3, 2022
Dell Expands APEX, Adds Analytics and Data Recovery

April 27, 2022
Arm Simplifies and Modernizes IoT Development with Virtual Hardware

April 21, 2022
Amazon’s Launch of Buy with Prime Highlights Growth of Logistics Business

March 30, 2022
Intel Spices Up PC Market with Arc GPU Launch

March 22, 2022
Nvidia GTC Announcements Confirm it’s a Connected, Multi-Chip World

March 15, 2022
Lenovo and AMD Announcements Highlight Spring PC Refresh

March 8, 2022
The Future of Semiconductors is UCIe

March 2, 2022
Qualcomm Demos Future of Connectivity with WiFi 7 and X70 5G Chips

February 24, 2022
5G Edge Computing Challenges Remain

February 9, 2022
Samsung Raises the Bar with Ultra Versions of S22 and Tab S8

January 20, 2022
US 5G Market Just Got Much More Interesting

January 4, 2022
Qualcomm Extends Automotive Offerings with Snapdragon Ride Vision, Digital Chassis

2021 Blogs

2020 Blogs

2019 Blogs

2018 Blogs

2017 Blogs

2016 Blogs

2015 Blogs

2014 Blogs

2013 Blogs

TECHnalysis Research Blog

October 21, 2022
Cloudera Extends Open Data Lakehouse Benefits to Hybrid Cloud

By Bob O'Donnell

With the amount of focus that most organizations have been spending on the application side of digital transformation projects, it’s easy to forget what a critical role data plays in these efforts. Yes, the move to cloud-native, container-based applications—whether hosted in public or private clouds—is critical, but so are projects to organize and analyze the vast amounts of data that most organizations generate and/or have access to. In particular, the need to support data portability and analytics tools across multiple public clouds and private clouds—the now classic “hybrid cloud” model—is becoming increasingly important.

One important element of that work involves deciding how to best leverage the data assets that are available. Recently, many companies are starting to consider the idea of what many are calling data lakehouses. As the name suggests, a data lakehouse combines some of the characteristics of a data lake and a data warehouse into a single entity.

Data lakes typically have huge amounts of unstructured and semi-structured data consisting of text, images, audio, video, etc., and are used for compiling troves of information on a given process or topic. Data warehouses, on the other hand, are commonly made up of structured data organized into tables of numbers, values, etc., and are used for traditional database querying types of applications. Data lakehouses extend the flexibility and capability of the data they store by allowing you to use the types of powerful analytics tools that were initially created for data warehouses on data lakes. In addition, they enable you to combine elements of the two data structure types for more sophisticated analysis, which many have discovered is extremely useful for AI (artificial intelligence) and ML (machine learning) applications.

A key proponent of the data lakehouse concept is Cloudera, a company that traces some of its roots back to the open-source Hadoop big data software utilities that served as a key driver for the growth of data lakes. At Cloudera’s recent Evolve 2022 event, the company unveiled some additions to its core Cloudera Data Platform (CDP) tools that should help enable the use of data lakehouses in more environments.

In particular, the company is bringing the ability to easily move data and analytics tools across multiple public clouds—including Amazon’s AWS and Microsoft’s Azure—as well as an organization’s private cloud. Cloudera refers to the concept as “hybrid data”, where data now takes on the flexibility and lock-in avoidance that hybrid cloud applications have begun to offer. To bring the hybrid data concept to life, Cloudera enabled three specific new capabilities. Portable Data Services lets companies move the analytics applications and services created for a particular data set across various public and private environments without needing to change a single line of code. Secure Data Replication moves an exact copy of the data itself across the various environments, ensuring that companies can have access to the data they need for a given workload. Finally, Universal Data Distribution leverages the company’s new Cloudera DataFlow tool for ingesting, or importing, into the platform with a particular focus on data streams, allowing live data feeds to be integrated into the company’s enterprise data management tools as well. Like most aspects of Cloudera’s solution, DataFlow is built on open-source tools, notably Apache NiFi.

Collectively, the three services give organizations all the tools they need to run data analytics workloads on a wide range of data types across multiple platforms and physical locations. Not only does this give companies the flexibility they’ve grown to appreciate with hybrid cloud architectures, but it opens up new opportunities as well. For example, it can also be leveraged to try an analytics workload in different environments to better understand the unique hardware acceleration or platform benefits that different clouds (public or private) can offer. Ultimately, the goal is to help companies build a flexible data fabric that can extend to whatever type of environment is best suited for a given project or data set.

As has become clear with cloud computing in general, the need to support hybrid, multi-cloud solutions has become table stakes for today’s modern application tools. As a result, vendors have either already created or are working to enable this type of flexible support. Logically, therefore, it makes tremendous sense to extend this type of approach to the data and analytics tools that often power these applications. With Cloudera’s latest extensions to CDP, it’s taking an important step in that direction.

Here's a link to the original column:

Bob O’Donnell is the president and chief analyst of TECHnalysis Research, LLC a market research firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech.