Tag: Big Data

Big Data BigQuery March 18, 2024

Efficient BigQuery Data Modeling: A Storage and Compute Comparison - BigQuery storage and compute comparison for normalized, denormalized, and nested design: an in-depth analysis with actionable optimizations.

Big Data BigQuery Billing dbt Feb. 11, 2024

Reducing BigQuery Costs by 100–200x with dbt Incremental Models - Reducing costs for dbt models in BigQuery.

Big Data BigQuery Data Science Jan. 8, 2024

How Google BigQuery becomes an even more powerful Data Lakehouse - Recap 2023: What were the major Updates and what can we expect in 2024?

Big Data BigQuery dbt Dec. 11, 2023

Reduce DBT Incremental Materialization Compute Cost in BigQuery - utilizing partitioned tables and partition pruning to reduce BigQuery cost when using DBT.

Big Data BigQuery Oct. 9, 2023

Linting BigQuery SQL with sqlfluff - Using sqlfluff to linter BigQuery queries.

Big Data Sept. 4, 2023

Staying Up-to-Date with GCP: The Customizable Release Notes Solution - Stay informed with GCP Release Notes at your schedule and your preferred product with this simple deployment.

Big Data BigQuery GIS July 10, 2023

Blueprints to BigQuery: A Deep Dive into Large-Scale Spatial Joins for Building Footprints - Improving data processing efficiency for Geo data in BigQuery.

Big Data BigQuery Storage July 10, 2023

BigQuery Storage Billing Models - Can you save on your BigQuery Storage costs? Let’s see by exploring the different pricing models and how to use the information available.

Big Data BigQuery June 5, 2023

BigQuery — Best Practices - An in-depth overview of BigQuery.

Big Data BigQuery May 15, 2023

BigQuery Data Warehousekeeping: Nested, Repeated, Arrays, Structs… - Cookbook: how to organize data in your Data Warehouse.

Big Data BigQuery May 8, 2023

BigQuery — keep fresh data while avoiding large-scale mutations - Avoid merge or join and use deduplication and clone in large dataset updates.

Big Data Dataplex May 1, 2023

Data Profiling Using Dataplex - It’s your data but profiler knows it better. Let’s find out how?

Big Data BigQuery Data Science Python April 17, 2023

Simplify Data Science Workflows on BigQuery with Fugue and Python - Speed Up Iteration and Cut Computation Cost.

Airflow Big Data Cloud Dataproc Cloud Storage March 13, 2023

Event Driven Data Processing on Google Cloud Platform - An example of event-driven data pipeline.

Big Data BigQuery Feb. 13, 2023

How to Deal with Wildcard Tables in BigQuery - A couple of tricks to speed up Your Data Warehousing.

Big Data BigQuery Billing Storage Feb. 6, 2023

How BigQuery Physical Storage works - Calculating which BigQuery billing model for storage to use.

Big Data BigQuery Jan. 16, 2023

BigQuery WINDOW Functions | Advanced Techniques for Data Professionals - A complete guide for maximizing the potential of BigQuery WINDOW functions to manipulate and transform data.

Big Data BigQuery Machine Learning Jan. 16, 2023

Streamlining Machine Learning with BigQuery ML: A Comprehensive Overview - Unlocking the Power of Big Data with BigQuery ML: A Beginner’s Guide.

Big Data BigQuery Data Science Dec. 26, 2022

How I use BigQuery Analytic Functions as a Data Scientist - Practical examples on how to use advanced SQL to do analyses in BigQuery.

Big Data BigQuery Dec. 19, 2022

Deduplication in BigQuery Tables: A Comparative Study of 7 Approaches - Analyzing and comparing 7 ways of deduplicating rows in a BigQuery table.

Big Data BigQuery GCP Experience Aug. 29, 2022

BigQuery resource management - A custom solution to monitor BigQuery.

Big Data BigQuery Aug. 22, 2022

Google gives BigQuery some new UI Updates - How the new Feature makes work easier for Data Scientists und Engineers.

Big Data BigQuery Data Science July 11, 2022

Awesome new Feature: Change History in Google BigQuery - Using The Append Change history TVF in BigQuery.

Big Data Cloud Dataproc June 20, 2022

Big Data Processing using Google Dataproc - Google Dataproc is a very powerful option for Hadoop and Spark applications-enabled clusters.

Big Data Python June 13, 2022

How to build a DAG based Task Scheduling tool for Multiprocessor systems using python - Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag.

Big Data BigQuery Data Science June 6, 2022

A Senior’s Guide to Kickstart your BigQuery Journey - Missing basics you need to know when using BigQuery.

Big Data Cloud Dataproc June 6, 2022

Tuning Spark Applications to Efficiently Utilize Dataproc Cluster - Have you recently migrated your Spark application from the on-prem Yarn cluster to Dataproc? Then this blog post might help you to tune your Spark applications to efficiently utilize the GCP Dataproc and save cost.

Big Data BigQuery Cloud Functions GIS May 30, 2022

BigQuery Remote Functions, Cloud Functions 2.0, and Plus Codes Revisited - Using BigQuery remote Cloud Function to convert Geo coordinates to Plus Code.

Big Data BigQuery May 9, 2022

Enhancing BigQuery SEARCH features with SEARCH INDEX - A faster way to find text in unstructured text and semi-structured JSON in BigQuery.

Big Data BigQuery Data Analytics Data Science April 18, 2022

Google Data Cloud Summit 2022: Recap - An overview of the many new updates coming to Google Cloud Platform!

Big Data Official Blog April 18, 2022

Hands-on learning lab: Stream Google Cloud data into Splunk Cloud - Google Cloud and Splunk’s hands on lab takes you through core scenarios for data ingestion and data input in Google Cloud in 90 minutes or less.

Big Data Data Analytics Official Blog April 11, 2022

Limitless Data. All Workloads. For Everyone - Read about the newest innovations in data cloud announced at Google Cloud’s Data Cloud Summit.

Big Data Data Analytics March 14, 2022

Building a Data Lake on Google Cloud Platform - Big Data is gaining a lot of popularity. Here we explain how to build a big data pipeline on Google Cloud Platform using Open Source.

Big Data BigQuery Data Analytics Machine Learning March 7, 2022

Predicting the Fare on a Billion Taxi Trips with BigQuery - How long time does it take and how much does it cost to analyse and train a model on a billion taxi trips in the cloud?

Apache Beam Big Data Kotlin Feb. 28, 2022

Error handling with Apache Beam, Asgarde with Kotlin - In a previous article, we presented a library allowing error handling with Apache Beam with less code :.

Big Data Data Analytics Feb. 28, 2022

Data Workflow Modernization - Drive transformational improvement in users’ workflows, not an incremental improvement in the tools you use.

Big Data BigQuery Feb. 7, 2022

How to properly play Wordle using Dataflow and BigQuery. - This article will show you how to compute best combination of words for Wordle using Dataflow and BigQuery.

Big Data Cloud Bigtable Feb. 7, 2022

Easy CSV importing into Cloud Bigtable - Importing CSV data into Bigtable with cbt tool.

Big Data BigQuery Monitoring Jan. 31, 2022

Automated emails and data quality checks for your data - Formatting error messages in BigQuery email notifications.

Big Data Data Analytics GCP Experience Jan. 31, 2022

Journey of Transforming and Architecting Data Platforms using Lambda Architecture - An outline of architecting Data Platforms using Lambda architecture on Google Cloud.

Big Data Data Analytics Jan. 10, 2022

10 reasons why you are not ready to adopt data mesh - The goal of this article is to encourage constructive conversations around Data Mesh adoption by describing where Data Mesh may not be the right solution.

Big Data Machine Learning Vertex AI Jan. 3, 2022

How to set up custom Vertex AI pipelines step by step - MLOps using Vertex AI.

Big Data BigQuery NodeJS Dec. 13, 2021

Retrieve your BigQuery query history with NodeJS SDK - Retrieving BigQuery history logs to understand which queries are taking the most of the billing account using BigQuery NodeJS SDK.

Beginner Big Data BigQuery Dec. 13, 2021

Google BigQuery: An Introduction to Big Data Analytics Platform. - An overview of BigQuery.

Big Data Machine Learning Nov. 29, 2021

From Zero to Hero with Databricks on Google Cloud - This article will walk you through the main steps to become efficient with Databricks on Google Cloud.

Big Data BigQuery Nov. 22, 2021

How to extract real-time intraday data from Google Analytics 4 and Firebase in BigQuery - Bypassing automatic deletion of an intraday tables to get real time data from Firebase in BigQuery.

Big Data BigQuery Data Science Oct. 4, 2021

Mathematical Functions you should know in BigQuery - How to Work with Numbers in BigQuery.

Big Data BigQuery Cloud Dataproc GCP Experience Sept. 27, 2021

Comparing BigQuery Processing and Spark Dataproc - Paypal's approaches for evaluation for migrating processes from on-prem to GCP.

Apache Beam Big Data Dataflow Aug. 16, 2021

Entity Resolution using Google Cloud Dataflow - This article illustrates how data platform was modernized by implementing an entity resolution pipeline using Cloud Dataflow.

Big Data BigQuery Aug. 2, 2021

How to Sync data from MySQL to BigQuery - The purpose of this blog is to provide information on how data can be synced/replicated to BigQuery for data warehouse purposes.

Big Data Cloud Dataflow Cloud Pub/Sub July 5, 2021

Building a simple Google Cloud Dataflow pipeline: PubSub to Google Cloud Storage - This article examines building a streaming pipeline with Dataflow templates to feed downstream systems.

Big Data BigQuery Data Science Machine Learning June 28, 2021

Machine Learning with Google’s BigQuery - How to easily create and deploy ML Models with SQL.

Big Data BigQuery Data Science Public Datasets June 7, 2021

Working with OpenStreetMap Data - Analyzing OpenStreetMap data in BigQuery public dataset.

Big Data BigQuery June 7, 2021

Reverse US Geocoding in BigQuery - How to convert GPS coordinates into cities, counties, states and even ZIP codes for free!

Big Data BigQuery GCP Experience May 24, 2021

Learnings from Streaming 25 Billion Events to Google BigQuery - Experience of using BigQuery in PayPal.

Big Data BigQuery GCP Experience April 25, 2021

Hadoop to BigQuery Migration — New Edition - Process of migrating data from Impala and Hadoop to BigQuery.

Big Data Cloud Dataproc Python April 12, 2021

How to migrate your on-premise pyspark jobs to GCP using Dataproc Workflow Templates using Dataproc Workflow Templates with Production-Grade Best Practices Standards - Complete pattern example of how to migrate (or create from scratch) pyspark jobs to GCP with Dataproc Workflow Templates.

Big Data BigQuery April 12, 2021

How to build efficient and perfomant Data Structures in BigQuery - Ways of using Denormalization and Nested Data.

Big Data BigQuery GCP Experience Infrastructure April 5, 2021

Real-Time data delivery at scale with BigQuery - Using BigQuery Authorized Views to cut storage and processing costs.

Big Data BigQuery March 29, 2021

How to process large BigQuery tables/job result in a single memory machine with python - Python library to load large amount of data from BigQuery.

Big Data BigQuery Dataform March 22, 2021

Saving money with BigQuery and Dataform - An easy way to reduce cost and increase performance in Data Warehouses — find out how to implement partitioning using Dataform!

Big Data BigQuery Data Science March 1, 2021

BigQuery Hack: Flexible Queries For Any Number of Columns - How can we use BigQuery to handle tables with many columns? Here’s how using scripting and table metadata.

Big Data BigQuery Feb. 22, 2021

BigQuery repeated fields query optimization. - Optimization techniques for BigQuery queries when table contains repeated fields.

Big Data BigQuery Feb. 15, 2021

USING BigQuery’s LAST_VALUE() function to fill missing data - LAST_VALUE function explained.

Big Data BigQuery Tutorial Feb. 8, 2021

A Simple Way to Query Table Metadata in Google BigQuery - Effortless approach to determine what is in the BigQuery dataset and which tables are useful for analysis with INFORMATION_SCHEMA and TABLES.

Big Data BigQuery Data Studio Firebase Feb. 8, 2021

How to calculate Real Active Users. What are the numbers? - A complete SQL guide for marketers and machine learning engineers. MAU, DAU and WAU, Firebase and BigQuery example with Data Studio template.

Big Data BigQuery Feb. 1, 2021

Generating Unique Keys In BigQuery - The Ideal Primary Key For Data Warehousing.

Big Data BigQuery Data Science Jan. 18, 2021

BigQuery Hack: 1000x More Efficient Aggregation Using Materialized View - Learn how to supercharge your aggregation queries using Materialized View.

Big Data Cloud Dataflow Jupyter Notebook Jan. 18, 2021

Computing Time Series metrics at scale in Google Cloud - This blog post shows how data scientists and engineers can use GCP Dataflow to compute time-series metrics in real-time or in batch to backfill data at scale, for example, to detect anomalies in market data or IoT devices.

AI Platform Notebooks Big Data Data Science GPU Jan. 18, 2021

An Accelerated Big Data Workflow for the Data Analyst - Explore and analyze 1B loan records with RAPIDS & Nvidia A100 GPUs on Cloud AI Platform.

Big Data BigQuery Jan. 4, 2021

BigTips: Removing Duplicates while Maintaining Row History - Do you have late arriving facts and have a need to maintain row history while removing duplicates in BigQuery? Come look here!

Big Data BigQuery GCP Experience Dec. 21, 2020

Our way of dealing with more than 2 billion records in the SQL database - Improving performance on a big MySQL table with GCP products.

Big Data BigQuery Data Analytics Data Studio Public Datasets Dec. 21, 2020

How to compute a growth rate in BigQuery using SQL - Analyzing Google Analytics public dataset with BigQuery to obtain various data.

Big Data BigQuery Dec. 14, 2020

BigTips: INFORMATION_SCHEMA Views in BigQuery, Part 2, with extra Scripts and Procedures! - Making the INFORMATION_SCHEMA a little easier to use!

Big Data BigQuery Dec. 14, 2020

BigTips: Random Numbers and Random Dates - Generating random numbers in a range, and random dates in BigQuery.

Big Data Cloud Dataproc Data Analytics Official Blog Dec. 7, 2020

Best practices to use Apache Ranger on Dataproc - Run managed open source like Apache Hadoop and Spark in the cloud. Get tips on secure deployment with Dataproc and the Apache Ranger authorization OSS.

Big Data BigQuery Nov. 22, 2020

How to de-duplicate rows in a BigQuery table - Duplicate data sometimes can cause wrong aggregates or results in joins. You probably need to remove those duplicate rows before doing any….

Big Data BigQuery Nov. 16, 2020

BigTips: INFORMATION_SCHEMA Views in BigQuery - Working with INFORMATION SCHEMA views in BigQuery.

Big Data BigQuery Security Nov. 16, 2020

BigQuery Authorised View verification workflow - Verify your Views in a BigQuery dataset, to make sure the Authorised Views are going to work without disrupting your ETL.

Big Data Data Analytics Docker Nov. 9, 2020

A step-by-step guide deploying Amundsen on Google Cloud Platform - Amdunsen is Lyft’s Data Discovery Platform and metadata engine. It helps the data team to be more productive by saving time spent in the discovery phase — less time searching, more time finding.

Apache Beam Big Data Cloud Dataflow Oct. 26, 2020

Basic Streaming Data Enrichment on Google Cloud with Dataflow SQL - Learn the basics of Streaming and Batch Data Enrichment with Dataflow SQL.

Big Data Cloud Dataproc Data Analytics Official Blog Oct. 26, 2020

Preparing for serverless big data open source software - Serverless capabilities at Google Cloud continue to develop, and serverless is now meeting open source as tools like Dataproc let you build on your open foundation in the cloud.

Big Data BigQuery Sept. 28, 2020

Using BigQuery to Track and Estimate Home Heating Oil Deliveries - Google Sheets, Big Query, and Public Data Sets to calculate Degree Days and K-Factor.

API Big Data BigQuery Machine Learning Sept. 7, 2020

How we enabled product and pricing-availability feeds as APIs for external partners - This post demonstrates how to package your training application when it needs to connect to an external (On-Prem / Multi-Cloud) database to fetch the required source dataset.

Big Data BigQuery Data Science Aug. 31, 2020

Google Cloud for Genomics - Building a scalable, reproducible, and secure data processing pipeline on the cloud.

Big Data BigQuery Data Studio Firebase Aug. 17, 2020

I stopped using Firebase Dashboards. I’ve built my own instead. - Displaying Firebase Crashlytics and Performance data in Data Studio.

Big Data BigQuery Infrastructure Terraform Aug. 17, 2020

Data lake on GCP using Terraform - Using Terraform to set up infrastructure-as-code for a Data Lake on Google Cloud Platform.

Big Data BigQuery Billing Aug. 10, 2020

Big Data in Google Cloud — Cost Monitoring (part II) - The article explains how to analyze Billing data in BigQuery in order to get insights about most expensive queries etc.

Big Data Cloud Data Fusion Tutorial Aug. 10, 2020

Building some Data Pipeline with Google Data Fusion - Step by step tutorial on start using Data Fusion and creating pipelines.

Apache Beam Big Data Cloud Dataflow Cloud Pub/Sub Java July 20, 2020

Performing Deduplication in Real Time streaming pipeline with Apache Beam stateful processing - An example of doing PubSub message content deduplication in Apache Beam running on Dataflow.

Big Data BigQuery Cloud Dataflow July 6, 2020

Kafka to BigQuery using Dataflow - In this article, two different methods to connect Kafka to BigQuery using Dataflow are evaluated.

Big Data Cloud Storage July 3, 2020

Migrating HDFS Data to Google Cloud Storage - Moving data from Hadoop cluster to Cloud Storage with Cloud Storage Connector.

Big Data Cloud Data Fusion June 29, 2020

I’m your father… Data Lineage with Cloud Data Fusion - How to use data lineage with Cloud Data Fusion, the fully managed, cloud-native, enterprise data integration service for data integration.

Big Data Cloud Dataproc June 22, 2020

Sqoop Data Ingestion on GCP - Using Apache Sqoop (bulk data transfer) in Cloud Dataproc.

Big Data BigQuery GCP Experience June 15, 2020

DNC Tech Choices: Why we chose BigQuery - Thoughts about migrating to BigQuery.

Big Data Cloud Dataprep Cloud Functions Serverless June 8, 2020

How to Automate a Cloud Dataprep Pipeline When a File Arrives - With a better mastery of Cloud Functions, you can trigger a Dataprep job via API when a file lands in a Cloud Storage bucket.

Airflow Big Data BigQuery June 1, 2020

Data Pipelines at PasarPolis using Airflow and BigQuery - Use Airflow for data orchestration on BigQuery to maintain a data warehouse.

AI Platform Notebooks Big Data Data Science Machine Learning June 1, 2020

Hands-on Big Data Analysis on GCP Using AI Platform Notebooks - Example of working with AI Platform Notebooks.

Big Data BigQuery Cloud Dataproc Jupyter Notebook May 25, 2020

Apache Spark BigQuery Connector — Optimization tips & example Jupyter Notebooks - Learn how to use the BigQuery Storage API with Apache Spark on Cloud Dataproc.

Big Data Data Catalog May 25, 2020

Google Cloud Data Catalog — Keep Up With Your On-Prem Hive Server - Code samples with a practical approach on how to ingest metadata from an on-premise Hive server into Google Cloud Data Catalog.

Big Data Data Catalog Data Science May 18, 2020

Google Cloud Data Catalog — Integrate Your On-Prem RDBMS Metadata - Code samples with a practical approach on how to ingest metadata from on-premise Relational Databases into Google Cloud Data Catalog.

Big Data BigQuery Cloud Dataproc May 18, 2020

Import SQL Server data in BigQuery - A list of four approaches for a one-off data dumps from a RDBMS like SQL Server to BigQuery.

Big Data Terraform May 11, 2020

Query data in Google Cloud Storage with SQL using Apache Drill - Creating an Apache Drill cluster in GCP and query data stored in GCS.

Big Data Compute Engine May 11, 2020

Cloud-native Bioinformatics: HPC to GCP - Describing a process of migrating genomic analysis workflows on HPC to GCP.

Big Data Cloud Dataproc May 4, 2020

Migrating Data Processing Hadoop Workloads to GCP - Intro to Dataproc as well as tips for best usage.

Beginner Big Data BigQuery April 27, 2020

Introduction to Arrays in BigQuery - Tutorial on working with arrays in BigQuery.

Big Data Cloud Dataflow Data Analytics Official Blog April 20, 2020

Introducing Dataflow template to stream data to Splunk - Learn how to set up a streaming pipeline for Google Cloud data into Splunk Cloud or Enterprise with this new Pub/Sub to Splunk Dataflow template.

Big Data BigQuery April 13, 2020

BigQuery Materialized Views and Why You Should be Using Them - TL;DR BigQuery materialized views are great. You should use them!

Big Data BigQuery Data Analytics Python April 13, 2020

Ibis: A Python Data Analysis Framework for Development and Production - An example of using Ibis (Python Data Analysis Productivity Framework) with BigQuery.

Big Data Cloud Storage Data Catalog March 28, 2020

Google Cloud Data Catalog Filesets: unlock it’s full potential - Enrich your Google Cloud Storage Filesets with useful statistics about your files.

Big Data BigQuery March 23, 2020

Using BigQuery Execution Plans to Improve Query Performance - Explanation of BigQuery's execution plan.

Big Data BigQuery Public Datasets March 16, 2020

Processing 10TB of Wikipedia Page Views - Part 1 - Processing and uploading Wikipedia page views into BigQuery.

Big Data BigQuery Cloud Dataflow March 9, 2020

Data ingestion Google Big Query without the headaches - Schema conversions on the fly without the headaches with Dataflow and BigQuery.

Big Data BigQuery GCP Experience Go March 9, 2020

Loading and transforming data to BigQuery at large scale - Using serverless data loading to BigQuery to reduce daily costs $8K to $15 per day.

Big Data Cloud Bigtable Cloud Dataflow GCP Experience Feb. 24, 2020

How Spotify ran the largest Google Dataflow job ever for Wrapped 2019 - Spotify used Cloud Bigtable with Cloud Dataflow to lower costs of running one of its' biggest jobs.

Big Data Business Feb. 24, 2020

Snowflake announces general availability on Google Cloud - Snowflake is now available in the us-central1 (Iowa) and europe-west4 (Netherlands) regions with additional regions coming later this year.

Big Data BigQuery Data Loss Prevention API Data Studio Feb. 3, 2020

BigQuery, PII and DLP: The Perfect Match - Analyzing PII data in BigQuery with Data Loss Prevention and viewing results in Data Studio.

Big Data BigQuery Jan. 13, 2020

BigQuery Wildcards - The article describes how "*" wildcards can be used in BigQuery.

Big Data BigQuery Jan. 13, 2020

Why We Picked Google BigQuery over Snowflake as Our New Data Warehouse Solution - Comparing BigQuery and Snowflake for Data Warehouse selection.

Big Data Cloud AutoML Kaggle Jan. 13, 2020

AutoML and Big Data - Or how to use Google AutoML for 40+ GB datasets

Big Data BigQuery Dec. 23, 2019

Partition on any field with BigQuery - BigQuery has introduced integer partition capability. Now you can partition on numeric field, but not only, and surprisingly!

Big Data BigQuery Dec. 23, 2019

BigQuery Integer Partitioning is in Beta - Demonstrating a new BigQuery integer partition feature on New York Taxi dataset.

Big Data Data Analytics Official Blog Dec. 23, 2019

Opening doors, embracing change with cloud data warehouses - Cloud data warehouse migrations bring technology changes and new ways of working for data analysts and administrators. Change management is important.

Big Data BigQuery Dec. 16, 2019

k-Means Clustering in BigQuery now does better initialization - The Scalable k-Means++ initialization option in BigQuery ML

Big Data BigQuery Cloud Dataflow Data Analytics Official Blog Dec. 16, 2019

Using HLL++ to speed up count-distinct in massive datasets - There’s a better way to do the count distinct function using Google’s HyperLogLog++ algorithm in Dataflow and BigQuery.

Big Data GCP Experience Dec. 9, 2019

Democratizing Dataproc — dunnhumby’s journey on Google Cloud Platform - Experience of using Cloud Dataproc on Google Cloud Platform.

Big Data BigQuery Serverless Dec. 2, 2019

Write efficient queries on BigQuery - A few tips which improve speed of queries in BigQuery.

Big Data Cloud Dataflow Dec. 2, 2019

Trimming down the cost of running Google Cloud Dataflow at scale - Tips and tricks to lower the cost of running Dataflow pipelines

Big Data Business SAP Dec. 2, 2019

Google Cloud makes moves to appeal SAP and Oracle Users - A look at a recent development at Google Cloud.

Big Data Compute Engine Puppet Python Dec. 2, 2019

New ground — Automatic increase of Kafka LVM on GCP - Adding more storage to each node of Kafka cluster on Google Cloud.

Big Data BigQuery Python Nov. 25, 2019

Simplify BigQuery ETL jobs using SQLAlchemy - Extract and move data between BigQuery and relational databases using a plugin for SQLAlchemy.

Big Data BigQuery Cloud Dataproc Nov. 25, 2019

Querying External Data with BigQuery - Demonstration of BigQuery querying Parquet files from Google Cloud Storage.

Big Data BigQuery Data Science GCP Experience Nov. 18, 2019

Batch Processing Pipelines for Better Data Analysis - An overview of how Gojek is using batch processing to generate useful insights from our data warehouse.

Big Data BigQuery Data Science Nov. 18, 2019

BigQuery workflow from the Jupyter notebook - In this article, you will get to know how to create and schedule the BigQuery workflow using the Jupyter Lab and the Cloud Composer.

Apache Beam Big Data BigQuery Cloud Dataflow Nov. 4, 2019

How to build a cleaning pipeline with BigQuery and DataFlow on GCP - Creating a small transformation pipeline on Dataflow to clean data in BigQuery.

Big Data BigQuery Data Science Nov. 4, 2019

Let the kids into the library - An opinionated attempt at building a data driven company in the cloud.

Big Data BigQuery Nov. 4, 2019

Return of the Living Data - A story about BigQuery about underlying data formats.

Big Data BigQuery Data Science Python Oct. 28, 2019

How to get into BigQuery analysis on Kaggle with Python? - Exploring ways to use BigQuery in Kaggle.

Big Data BigQuery Data Studio Oct. 28, 2019

Unique dashboards for external customers with Google Cloud - Using BigQuery and Data Studio to create dashboards that are shared with different persons.

Big Data Data Science Oct. 28, 2019

A gentle introduction to Apache Druid in Google Cloud Platform - The article describes how to set up and use Apache Druid on GCP.

Big Data Oct. 28, 2019

Deploying a Production Druid Cluster in Google Cloud Platform - A process of setting Apache Druid Cluster on GCP.

Apache Beam Big Data Java Oct. 28, 2019

Testing in Apache Beam Part 1: Batch - A look into how to write unit and end to end tests in Beam.

Big Data BigQuery Official Blog Oct. 21, 2019

Migrating data warehouses to BigQuery: Introduction and overview - Solution series that helps you transition from an on-premises data warehouse to BigQuery.

Big Data BigQuery Official Blog Teradata Oct. 21, 2019

Migrating Teradata to BigQuery - Solution series that helps you transition from a Teradata data warehouse to BigQuery.

Big Data IoT Oct. 14, 2019

IoT Data Pipelines in GCP, multiple ways — Part 1 - Three part series about IoT Data pipelines in Google Cloud Platform.

Big Data BigQuery Oct. 14, 2019

Plus Codes (Open Location Code) and Scripting in Google BigQuery - A closer look at why Plus Codes are important, and using Google BigQuery scripting to encode them!

Apache Beam Big Data BigQuery Oct. 6, 2019

Type safe BigQuery in Apache Beam with Spotify’s Scio - Using Scala's Beam library for type-safe queries in BigQuery.

Big Data BigQuery Sept. 30, 2019

BigQuery DeDuplication — Window Function vs Group by For Stitch - Comparing the performance for BigQuery Deduplicate using window function vs group by. If you are using stitch you can do delete in BQ also.

Big Data Security Sept. 30, 2019

Help secure the pipeline from your data lake to your data warehouse - This article discusses the security controls designed to help manage data access to and prevent data exfiltration of the pipeline from data lake to data warehouse.

Big Data BigQuery Teradata Sept. 23, 2019

Teradata to Google BigQuery Migration. Converting the code - This article provides instructions on how to extract the schema of tables, views and SQL Queries from Teradata and convert it into BigQuery.

Big Data BigQuery Sept. 16, 2019

End-to-End Crypto Shredding (Part II): Data Deletion/Retention with Crypto Shredding - Crypto-deletion in various storages in GCP.

Big Data BigQuery Sept. 16, 2019

BigQuery Deduplication - Explore some techniques for deduplication in BigQuery both for the whole table and by partition.

Big Data BigQuery Sept. 16, 2019

A Journey into BigQuery Fuzzy Matching — 3 of [1, ∞) — NYSIIS - Another article in ongoing series about fuzzy matching in BigQuery.

Beginner Big Data BigQuery Sept. 9, 2019

The Caveat of Loading Data to Partitioned Table on BigQuery - Table Partitioning and Why

Apache Beam Big Data BigQuery Cloud Dataflow Sept. 2, 2019

Trimming down over 95% of your BigQuery costs using File Loads - Using BigQuery load jobs in Beam instead of streaming to reduce costs.

Big Data BigQuery Aug. 19, 2019

A Journey into BigQuery Fuzzy Matching — 2 of [1, ∞) — More Soundex and Levenshtein Distance - Doing fuzzy matching in BigQuery on first and last names.

Big Data BigQuery Aug. 19, 2019

Finding top programming language with BigQuery - Analyzing Github public dataset with BigQuery's to get most popular programming languages based on number of repositories.

Big Data BigQuery Aug. 19, 2019

Tips and Tricks to Seamlessly Migrate BigQuery Dataset Across Regions - Description of cross regional BigQuery data migration.

Big Data BigQuery Data Analytics Official Blog Aug. 12, 2019

Migrating Teradata and other data warehouses to BigQuery - Migration framework and architecture when moving data warehouse, like Teradata, to Google Cloud BigQuery.

Big Data BigQuery Aug. 5, 2019

Efficient Aggregation, Roll-ups with BigQuery HyperLogLog++ functions - Description of incremental count distinct processing using BigQuery’s HyperLogLog++ functions and how they provide fast, scalable, incremental processing properties.

Big Data Data Analytics Machine Learning July 29, 2019

Beginners Introduction to Data Lifecycle on Google Cloud Platform - Description of 4 categories of data lifecycle on GCP.

Big Data BigQuery Data Science Java July 15, 2019

Beast: Moving Data from Kafka to BigQuery - GOJEK’s open source solution for moving data from Kafka to Google BigQuery.

Big Data Data Analytics Data Catalog Data Science July 8, 2019

Google Cloud Data Catalog hands-on guide: templates & tags with Python - This quickstart guide brings a practitioner approach to Data Catalog, covering Templates & Tags management using the Python client library.

Big Data BigQuery July 8, 2019

BigQuery for Big Data and AI - A brief intro to start working with BigQuery.

Big Data Data Analytics July 1, 2019

Data and Analytics on Google Cloud Platform - Overview of data and analytics services available on Google Cloud Platform.

Big Data BigQuery June 24, 2019

Optimising queries in BigQuery for Beginners - Learn what BigQuery contains under the hood and how to run efficient queries from a public session dataset in this step by step guide.

Big Data BigQuery Data Analytics GCP Experience June 17, 2019

A Song of Data and Fire: Building Bnext Wall (Data Lake) - Process of building data lake on Google Cloud Platform.

Big Data BigQuery Official Blog June 17, 2019

Building hybrid blockchain/cloud applications with Ethereum and Google Cloud - This post describes applications for making internet-hosted data available inside an immutable public blockchain by placing BigQuery data available on-chain using a Chainlink oracle smart contract.

Big Data Official Blog Storage June 10, 2019

Announcing Snowflake on Google Cloud Platform - Snowflake (cloud-based data warehouse) will be available on GCP.

Big Data BigQuery Tutorial June 3, 2019

How to easy understand Analytics Functions on BigQuery - An in-depth explanation of analytical BigQuery functions.

Big Data BigQuery June 3, 2019

Loading Terabytes of Data From Postgres Into BigQuery - The article describes approaches of exporting data from PostgreSQL and loading into BigQuery.

Big Data BigQuery Cloud Dataprep Machine Learning June 3, 2019

BigQuery GIS + ML on government open data - Analyzing & visualizing housing data using BigQuery.

Big Data Cloud Data Fusion Kubernetes June 3, 2019

Journey Continues — Onward and Upwards! - A brief overview of things that are going on around CDAP (Data Fusion).

Apache Beam Big Data Cloud Dataflow Cloud Pub/Sub Machine Learning May 27, 2019

Game of Thrones Twitter Sentiment with Keras, Apache Beam, BigQuery and PubSub - End to end solution to analyze Tweets using GCP products.

Big Data Cloud Data Fusion May 27, 2019

Building a Data Lake on Google Cloud Platform with CDAP - Using CDAP (Cask Data Application Platform) on GCP.

Big Data Official Blog May 27, 2019

Delivering end-to-end data analytics and data management solutions with Informatica - We’re extending our strategic partnership with Informatica to help more enterprises take advantage of hybrid and multi-cloud data management solutions.

Big Data Official Blog Storage May 6, 2019

Principles and best practices for data governance in the cloud - The white paper which outlines best practices and guidelines for organizations to establish data governance in a cloud-first world.

Big Data Cloud Data Fusion April 29, 2019

Google Data Fusion - Cloud Data Fusion is the brand-new fully-managed data engineering product from GCP. It will help users to efficiently build and manage…

Big Data Docker Tutorial April 15, 2019

Deploy Spark on Google Cloud, (Docker+Swarm) - Deploying Spark cluster on Google Cloud using Docker containers and with Docker-compose.

Big Data BigQuery Cloud Dataflow April 15, 2019

From data ingestion to insight prediction: Google Cloud smart analytics accelerates your business transformation - Cloud Next '19 news in more detail related to analytics products.

Big Data BigQuery GCP Experience April 1, 2019

Reflections On Designing An Enterprise Data Warehouse - Description of process for Data warehouse development on Google Cloud using BigQuery.

Big Data BigQuery Official Blog March 25, 2019

Analyzing 3024 rice genomes characterized by DeepVariant - Exploring Rice genome dataset using BigQuery.

Big Data Python March 11, 2019

Enlightened DataLab Notebooks - Starting with Data Science on GCP.

Big Data BigQuery Cloud Marketplace R March 4, 2019

RStudio and BigQuery in under 30 minutes - Article describes steps to provision an RStudio instance on Google Compute Engine and use it to do complex analytics on BigQuery.

Big Data March 4, 2019

What is Google Snappy? High-speed data compression and decompression - Pros and cons of using Snappy (data compression library from Google) for compression.

Big Data BigQuery Cloud Composer GCP Experience March 4, 2019

How did we build a Data Warehouse in six months? - Sharing experience of creating data warehouse on Google Cloud Platform.

Apache Beam Big Data Cloud Dataflow Official Blog Feb. 25, 2019

Real-time diagnostics from nanopore DNA sequencers on Google Cloud - A scalable, reliable, and cost effective end-to-end pipeline for fast DNA sequence analysis built on Google Cloud and this new class of nanopore DNA sequencers.

Big Data Cloud Security Command Center Security Feb. 25, 2019

Google Cloud Platform Security Operations Center Data Lake - Some thoughts regarding security when building data lake on Google Cloud Platform.

Big Data Google Cloud Platform Official Blog Jan. 28, 2019

Google is named a leader in the 2019 Gartner Magic Quadrant for Data Management Solutions for Analytics - Gartner named Google a Leader in the 2019 Gartner Magic Quadrant for Data Management Solutions for Analytics (DMSA).

Big Data Compute Engine Jan. 7, 2019

Deploying PySpark ML Model on Google Compute Engine as a REST API - Step-by-step tutorial on Deploying PySpark ML Model on Google Compute Engine.

Big Data Nov. 26, 2018

How to capture and store tweets in Real Time with Apache Spark and Apache Kafka. Using cloud Platforms such as Databricks and GCP (Part 1) - Capture and store tweets in Real Time with Apache Spark and Apache Kafka.

Big Data Cloud Dataflow Cloud Datalab Python Serverless June 18, 2018

Analyzing Reddit’s Top Posts & Images With Google Cloud (Part 1) - Analyzing everything from Reddit.

Big Data Business May 21, 2018

Cask is joining Google Cloud - Cask is behind CDAP - open source big data integration platform.

Big Data Cloud Datalab Cloud Pub/Sub Cloud Storage May 21, 2018

Data Science for Startups: Data Pipelines - Example of creating data pipeline on Google Cloud Platform.

Apache Beam Big Data May 14, 2018

GCP Podcast - #126 Beam and Spark with Holden Karau

Big Data March 26, 2018

Public datasets: how nonprofits can drive social impact with planetary-scale data - Public datasets are freely hosted and accessible via Google BigQuery and Cloud Storage.

Big Data Business March 26, 2018

Room to Grow on the Big Data Maturity Curve - Report on Big Data ecosystems.

Big Data Business Official Blog March 19, 2018

Solutions : Build a Marketing Data Warehouse on Google Cloud Platform - Using fictional online cosmetics retailer as example of how to leverage Google Cloud Products to get key insights.

Big Data Official Blog March 5, 2018

How to handle mutating JSON schemas in a streaming pipeline, with Square Enix - Explore how Square Enix supports handling of mutating JSON schemas in a streaming pipeline.

Big Data Machine Learning TensorFlow Nov. 20, 2017

Automating ML and IoT with cloud-based image rendering, training, and device delivery - Architectural solutions for 3D rendering and machine learning.

Big Data Teradata Nov. 20, 2017

Transitioning from Data Warehousing in Teradata to GCP Big Data - Article describes how you can transition from on-premises and cloud data warehousing to Google Cloud Platform.

Big Data Sept. 11, 2017

Plumbing Big Data Pipelines - Qubit (provides personalization for companies when communicating with customers) describe their experience different Google Cloud Platform products

Big Data Cloud Dataproc Aug. 20, 2017

Easier integration with Apache Spark and Hadoop via Google Cloud Dataproc Job IDs and Labels - Best practices to use Job IDs and labels

Big Data Machine Learning July 31, 2017

New hands-on labs for scientific data processing on Google Cloud Platform - 7 new labs to try out Google Cloud Platform Big Data and Machine Learning products to solve real-world scientific problems using a variety of public datasets.

Big Data July 24, 2017

Moving Thumbtack’s data infrastructure to Google Cloud Platform - Moving data from PostgreSQL and MongoDB to Google Cloud Dataproc and BigQuery

Big Data Cloud Bigtable July 3, 2017

How Qubit deduplicates streaming data at scale with Google Cloud Platform - How Qubit solved issue regarding duplicated streaming data using Google Cloud Platform products

Big Data July 3, 2017

GCP Podcast - #83 Public Datasets with Mike Hamberg and Will Curran

Big Data Cloud Dataflow July 3, 2017

Introducing Cloud Dataflow Shuffle: For up to 5x performance improvement in data analytic pipelines

Big Data BigQuery June 26, 2017

The Google Data WareCity - Interesting and unique aspects of BigQuery’s data sharing capability

Big Data BigQuery June 26, 2017

GCE BigQuery vs AWS Redshift vs AWS Athena - Basic comparison on data loading and simple queries between Google BigQuery and Amazon Redshift and its cousin Athena.

Big Data Cloud Dataflow June 19, 2017

Visualization and large-scale processing of historical weather radar (NEXRAD Level II) data - Processing historical weather data for visualization with Cloud Dataflow

Big Data Business May 8, 2017

That giant sucking sound? Hadoop moving into the cloud - Companies are starting to move their Hadoop environments to Google Cloud Platform because of simplicity, stability, maturity

Big Data BigQuery April 10, 2017

BI Performance Benchmarks with Google BigQuery

Big Data Cloud Dataflow March 27, 2017

Google Cloud Dataflow In the Smart Home Data Pipeline - Handling data from Nest devices via Google Cloud Dataflow

Big Data March 13, 2017

Visualizing Big Data with Google Cloud

Big Data BigQuery PubSub March 6, 2017

Combining Thomson Reuters data with Google BigQuery and Google Cloud Pub/Sub API - Proof of concept to analyze data with BigQuery ingested from Reuters API

Big Data March 6, 2017

Data Science on the Google Cloud Platform: the first book - Interview with Valliappa Lakshmanan author of upcoming book Data Science on Google Cloud Platform

Big Data

Building a Data Lake on GCP with CDAP - First look on Google-acquired Cask’s open source platform.

 

Latest Issues




Contact

Zdenko Hrček
Třebanická 183
Prague, Czech Republic
Phone: +420 777 283 075
Email: [email protected]