Google Cloud launches BigLake, a new cross-platform data storage engine – TechCrunch

At the Cloud Data Summit, Google today announced the preview launch of BigLake, a new data lake storage engine that makes it easier for enterprises to analyze the data in their data warehouses and data lakes.

The idea here is essentially to extend Google’s experience of running and managing its BigQuery data warehouse to data lakes on Google Cloud Storage, combining the best of data lakes and warehouses into a single service that takes away the underlying storage. formats and systems.

It’s worth noting that this data can be in BigQuery or it can also be on AWS S3 and Azure Data Lake Storage Gen2. Through BigLake, developers gain access to one unified storage engine and the ability to search the underlying data stores through a single system without the need to move or duplicate data.

“Managing data across different lakes and warehouses creates silos and increases risk and costs, especially when data needs to be moved,” explains Gerrit Kazmaier, VP and GM of Databases, Data Analytics and Business Intelligence at Google Cloud, in the announcement. from today. “BigLake enables companies to unify their data warehouses and lakes to analyze data without worrying about the underlying storage format or system, eliminating the need to duplicate or move data from a source and reducing costs and inefficiencies. reduced.”

Image Credits: Google

Using policy tags, BigLake allows administrators to configure their security policies at the table, row, and column level. This includes data stored in Google Cloud Storage, as well as the two supported third-party systems where BigQuery Omni, Google’s multi-cloud analytics service, enables these security checks. Those security controls also ensure that only the right data flows to tools such as Spark, Presto, Trino and TensorFlow. The service also integrates with Google’s Dataplex tool to provide additional data management capabilities.

Google notes that BigLake will provide granular access control and its API will include Google Cloud, as well as file formats such as the open column-oriented Apache Parquet and open-source processing engines such as Apache Spark.

Image Credits: Google

“The amount of valuable data organizations must manage and analyze is growing at an incredible rate,” Google Cloud software engineer Justin Levandoski and product manager Gaurav Saxena explain in today’s announcement. “This data is increasingly distributed across many locations, including data warehouses, data lakes, and NoSQL stores. As an organization’s data becomes more complex and spreads across different data environments, silos are created, increasing risks and costs, especially when that data needs to be moved. Our customers have made it clear; they need help.”

In addition to BigLake, Google also announced today that Spanner, the globally distributed SQL database, will soon be getting a new feature called ‘change streams’. It allows users to easily monitor changes to a database in real time, be it insertions, updates, or deletions. “This ensures customers always have access to the latest data as they can easily replicate changes from Spanner to BigQuery for real-time analytics, trigger downstream application behavior with Pub/Sub, or save changes to Google Cloud Storage (GCS) for compliance ” explains Kazmaier.

Google Cloud also today released Vertex AI Workbench, a tool for managing the full lifecycle of a data science project, out of beta and generally available, and launched Connected Sheets for Looker, as well as the ability to access Looker data models in its Data Studio. BI tool.


This post Google Cloud launches BigLake, a new cross-platform data storage engine – TechCrunch was original published at “https://techcrunch.com/2022/04/05/google-cloud-launches-biglake-a-new-cross-platform-data-storage-engine/”

Leave a Reply

Your email address will not be published.