Fivetran, the company best known for helping enterprises build their data pipelines, on Tuesday announced the general availability of its newest product, the Fivetran Managed Data Lake Service.
The new service aims to remove the repetitive work of managing data lakes by automating and streamlining it for the company clients, freeing them up to focus on creating products on top of this data. As of now, the service supports Amazon S3, Azure Data Lake Storage (ADLS) and Microsoft OneLake, with support for Google Cloud on the roadmap.
Traditionally, Fivetran only supported data warehouses, which are typically used for storing structured, relational data to power analytics and business intelligence (BI) applications. Data lakes, on the other hand, are meant to store structured and unstructured data from a wide range of sources and for use cases that often include real-time analytics and machine learning workloads. Databricks also popularized the concept of the Lakehouse, which aims to combine the best of both worlds into a single data repository.
“The idea is that we’re bringing the scalable infrastructure that we’ve delivered to BI for the last nine years to AI and that whole workload environment,” Fivetran co-founder and COO Taylor Brown told me.
The Managed Data Lake Service uses Fivetran’s existing 500+ connectors and then normalizes and deduplicates it before sending it into one of the supported data lakes in either the Delta Lake or Apache Iceberg table formats. Once in the data lake, users can then work with the compute engine of their choice (think Databricks, Snowflake, Starburst or Redshift) to operationalize that data — or bring it to a machine learning platform to power their new AI applications.
“Fivetran has only really supported the data warehouses, […] and certainly some customers use those tools as data lakes, but we’ve had a lot of customers, requesting that we support more of Iceberg and Delta Lake format into data lakes, particularly the larger customers,” Brown said.
Brown told me that many of the customers who tried the new managed service during its preview period realized that they were building the same pipelines to load their data into data warehouses and data lakes.
One issue with data lakes is that it’s often hard to ensure that users only get access to the data that they are meant to use. In Tuesday’s announcement, Fivetran stressed that it integrates with existing data catalogs and governance solutions like AWS Glue, Databricks Unity Catalog and Microsoft Purview.
“We are very excited about Fivetran supporting Delta Lake as a direct destination,” said Databricks Director of Product Himanshu Raja. “With this new capability, customers can now use Fivetran to build an open lakehouse with Delta Lake powered by the Databricks Data Intelligence Platform. We are also very excited about the upcoming Fivetran integration with Unity Catalog to provide out-of-the-box governance and security for all Fivetran-generated tables.”
Until the end of August, Fivetran is making the new service available for free (up to $10,000 per customer). After that, Fivetran will apply its current consumption model to charge for it. “One of the benefits of using Fivetran’s Managed Data Lake Service is that the ingestion is free,” Brown said. “If you’re loading within Snowflake or Databricks or the other downstream consumers, you have to use the warehouse compute to actually ingest the data, which can be quite [expensive] in some cases.”
Comment