Tag: Big Data

Big Data GCP Experience Dec. 9, 2019

Democratizing Dataproc — dunnhumby’s journey on Google Cloud Platform - Experience of using Cloud Dataproc on Google Cloud Platform.

Big Data BigQuery Serverless Dec. 2, 2019

Write efficient queries on BigQuery - A few tips which improve speed of queries in BigQuery.

Big Data Google Cloud Dataflow Dec. 2, 2019

Trimming down the cost of running Google Cloud Dataflow at scale - Tips and tricks to lower the cost of running Dataflow pipelines

Big Data Business SAP Dec. 2, 2019

Google Cloud makes moves to appeal SAP and Oracle Users - A look at a recent development at Google Cloud.

Big Data Google Compute Engine Puppet Python Dec. 2, 2019

New ground — Automatic increase of Kafka LVM on GCP - Adding more storage to each node of Kafka cluster on Google Cloud.

Big Data BigQuery Python Nov. 25, 2019

Simplify BigQuery ETL jobs using SQLAlchemy - Extract and move data between BigQuery and relational databases using a plugin for SQLAlchemy.

Big Data BigQuery Google Cloud Dataproc Nov. 25, 2019

Querying External Data with BigQuery - Demonstration of BigQuery querying Parquet files from Google Cloud Storage.

Big Data BigQuery Data Science GCP Experience Nov. 18, 2019

Batch Processing Pipelines for Better Data Analysis - An overview of how Gojek is using batch processing to generate useful insights from our data warehouse.

Big Data BigQuery Data Science Nov. 18, 2019

BigQuery workflow from the Jupyter notebook - In this article, you will get to know how to create and schedule the BigQuery workflow using the Jupyter Lab and the Cloud Composer.

Apache Beam Big Data BigQuery Google Cloud Dataflow Nov. 4, 2019

How to build a cleaning pipeline with BigQuery and DataFlow on GCP - Creating a small transformation pipeline on Dataflow to clean data in BigQuery.

Big Data BigQuery Data Science Nov. 4, 2019

Let the kids into the library - An opinionated attempt at building a data driven company in the cloud.

Big Data BigQuery Nov. 4, 2019

Return of the Living Data - A story about BigQuery about underlying data formats.

Big Data BigQuery Data Science Python Oct. 28, 2019

How to get into BigQuery analysis on Kaggle with Python? - Exploring ways to use BigQuery in Kaggle.

Big Data BigQuery Data Studio Oct. 28, 2019

Unique dashboards for external customers with Google Cloud - Using BigQuery and Data Studio to create dashboards that are shared with different persons.

Big Data Data Science Oct. 28, 2019

A gentle introduction to Apache Druid in Google Cloud Platform - The article describes how to set up and use Apache Druid on GCP.

Big Data Oct. 28, 2019

Deploying a Production Druid Cluster in Google Cloud Platform - A process of setting Apache Druid Cluster on GCP.

Apache Beam Big Data Java Oct. 28, 2019

Testing in Apache Beam Part 1: Batch - A look into how to write unit and end to end tests in Beam.

Big Data BigQuery Official Blog Oct. 21, 2019

Migrating data warehouses to BigQuery: Introduction and overview - Solution series that helps you transition from an on-premises data warehouse to BigQuery.

Big Data BigQuery Official Blog Teradata Oct. 21, 2019

Migrating Teradata to BigQuery - Solution series that helps you transition from a Teradata data warehouse to BigQuery.

Big Data IoT Oct. 14, 2019

IoT Data Pipelines in GCP, multiple ways — Part 1 - Three part series about IoT Data pipelines in Google Cloud Platform.

Big Data BigQuery Oct. 14, 2019

Plus Codes (Open Location Code) and Scripting in Google BigQuery - A closer look at why Plus Codes are important, and using Google BigQuery scripting to encode them!

Apache Beam Big Data BigQuery Oct. 6, 2019

Type safe BigQuery in Apache Beam with Spotify’s Scio - Using Scala's Beam library for type-safe queries in BigQuery.

Big Data BigQuery Sept. 30, 2019

BigQuery DeDuplication — Window Function vs Group by For Stitch - Comparing the performance for BigQuery Deduplicate using window function vs group by. If you are using stitch you can do delete in BQ also.

Big Data Security Sept. 30, 2019

Help secure the pipeline from your data lake to your data warehouse - This article discusses the security controls designed to help manage data access to and prevent data exfiltration of the pipeline from data lake to data warehouse.

Big Data BigQuery Teradata Sept. 23, 2019

Teradata to Google BigQuery Migration. Converting the code - This article provides instructions on how to extract the schema of tables, views and SQL Queries from Teradata and convert it into BigQuery.

Big Data BigQuery Sept. 16, 2019

End-to-End Crypto Shredding (Part II): Data Deletion/Retention with Crypto Shredding - Crypto-deletion in various storages in GCP.

Big Data BigQuery Sept. 16, 2019

BigQuery Deduplication - Explore some techniques for deduplication in BigQuery both for the whole table and by partition.

Big Data BigQuery Sept. 16, 2019

A Journey into BigQuery Fuzzy Matching — 3 of [1, ∞) — NYSIIS - Another article in ongoing series about fuzzy matching in BigQuery.

Beginner Big Data BigQuery Sept. 9, 2019

The Caveat of Loading Data to Partitioned Table on BigQuery - Table Partitioning and Why

Apache Beam Big Data BigQuery Google Cloud Dataflow Sept. 2, 2019

Trimming down over 95% of your BigQuery costs using File Loads - Using BigQuery load jobs in Beam instead of streaming to reduce costs.

Big Data BigQuery Aug. 19, 2019

A Journey into BigQuery Fuzzy Matching — 2 of [1, ∞) — More Soundex and Levenshtein Distance - Doing fuzzy matching in BigQuery on first and last names.

Big Data BigQuery Aug. 19, 2019

Finding top programming language with BigQuery - Analyzing Github public dataset with BigQuery's to get most popular programming languages based on number of repositories.

Big Data BigQuery Aug. 19, 2019

Tips and Tricks to Seamlessly Migrate BigQuery Dataset Across Regions - Description of cross regional BigQuery data migration.

Big Data BigQuery Data Analytics Official Blog Aug. 12, 2019

Migrating Teradata and other data warehouses to BigQuery - Migration framework and architecture when moving data warehouse, like Teradata, to Google Cloud BigQuery.

Big Data BigQuery Aug. 5, 2019

Efficient Aggregation, Roll-ups with BigQuery HyperLogLog++ functions - Description of incremental count distinct processing using BigQuery’s HyperLogLog++ functions and how they provide fast, scalable, incremental processing properties.

Big Data Data Analytics Machine Learning July 29, 2019

Beginners Introduction to Data Lifecycle on Google Cloud Platform - Description of 4 categories of data lifecycle on GCP.

Big Data BigQuery Data Science Java July 15, 2019

Beast: Moving Data from Kafka to BigQuery - GOJEK’s open source solution for moving data from Kafka to Google BigQuery.

Big Data Data Analytics Data Catalog Data Science July 8, 2019

Google Cloud Data Catalog hands-on guide: templates & tags with Python - This quickstart guide brings a practitioner approach to Data Catalog, covering Templates & Tags management using the Python client library.

Big Data BigQuery July 8, 2019

BigQuery for Big Data and AI - A brief intro to start working with BigQuery.

Big Data Data Analytics July 1, 2019

Data and Analytics on Google Cloud Platform - Overview of data and analytics services available on Google Cloud Platform.

Big Data BigQuery June 24, 2019

Optimising queries in BigQuery for Beginners - Learn what BigQuery contains under the hood and how to run efficient queries from a public session dataset in this step by step guide.

Big Data BigQuery Data Analytics GCP Experience June 17, 2019

A Song of Data and Fire: Building Bnext Wall (Data Lake) - Process of building data lake on Google Cloud Platform.

Big Data BigQuery Official Blog June 17, 2019

Building hybrid blockchain/cloud applications with Ethereum and Google Cloud - This post describes applications for making internet-hosted data available inside an immutable public blockchain by placing BigQuery data available on-chain using a Chainlink oracle smart contract.

Big Data Official Blog Storage June 10, 2019

Announcing Snowflake on Google Cloud Platform - Snowflake (cloud-based data warehouse) will be available on GCP.

Big Data BigQuery Tutorial June 3, 2019

How to easy understand Analytics Functions on BigQuery - An in-depth explanation of analytical BigQuery functions.

Big Data BigQuery June 3, 2019

Loading Terabytes of Data From Postgres Into BigQuery - The article describes approaches of exporting data from PostgreSQL and loading into BigQuery.

Big Data BigQuery Google Cloud Dataprep Machine Learning June 3, 2019

BigQuery GIS + ML on government open data - Analyzing & visualizing housing data using BigQuery.

Big Data Cloud Data Fusion Kubernetes June 3, 2019

Journey Continues — Onward and Upwards! - A brief overview of things that are going on around CDAP (Data Fusion).

Apache Beam Big Data Google Cloud Dataflow Google Cloud Pub/Sub Machine Learning May 27, 2019

Game of Thrones Twitter Sentiment with Keras, Apache Beam, BigQuery and PubSub - End to end solution to analyze Tweets using GCP products.

Big Data Cloud Data Fusion May 27, 2019

Building a Data Lake on Google Cloud Platform with CDAP - Using CDAP (Cask Data Application Platform) on GCP.

Big Data Official Blog May 27, 2019

Delivering end-to-end data analytics and data management solutions with Informatica - We’re extending our strategic partnership with Informatica to help more enterprises take advantage of hybrid and multi-cloud data management solutions.

Big Data Official Blog Storage May 6, 2019

Principles and best practices for data governance in the cloud - The white paper which outlines best practices and guidelines for organizations to establish data governance in a cloud-first world.

Big Data Cloud Data Fusion April 29, 2019

Google Data Fusion - Cloud Data Fusion is the brand-new fully-managed data engineering product from GCP. It will help users to efficiently build and manage…

Big Data Docker Tutorial April 15, 2019

Deploy Spark on Google Cloud, (Docker+Swarm) - Deploying Spark cluster on Google Cloud using Docker containers and with Docker-compose.

Big Data BigQuery Google Cloud Dataflow April 15, 2019

From data ingestion to insight prediction: Google Cloud smart analytics accelerates your business transformation - Cloud Next '19 news in more detail related to analytics products.

Big Data BigQuery GCP Experience April 1, 2019

Reflections On Designing An Enterprise Data Warehouse - Description of process for Data warehouse development on Google Cloud using BigQuery.

Big Data BigQuery Official Blog March 25, 2019

Analyzing 3024 rice genomes characterized by DeepVariant - Exploring Rice genome dataset using BigQuery.

Big Data Python March 11, 2019

Enlightened DataLab Notebooks - Starting with Data Science on GCP.

Big Data BigQuery Cloud Launcher R March 4, 2019

RStudio and BigQuery in under 30 minutes - Article describes steps to provision an RStudio instance on Google Compute Engine and use it to do complex analytics on BigQuery.

Big Data March 4, 2019

What is Google Snappy? High-speed data compression and decompression - Pros and cons of using Snappy (data compression library from Google) for compression.

Big Data BigQuery Cloud Composer GCP Experience March 4, 2019

How did we build a Data Warehouse in six months? - Sharing experience of creating data warehouse on Google Cloud Platform.

Apache Beam Big Data Google Cloud Dataflow Official Blog Feb. 25, 2019

Real-time diagnostics from nanopore DNA sequencers on Google Cloud - A scalable, reliable, and cost effective end-to-end pipeline for fast DNA sequence analysis built on Google Cloud and this new class of nanopore DNA sequencers.

Big Data Cloud Security Command Center Security Feb. 25, 2019

Google Cloud Platform Security Operations Center Data Lake - Some thoughts regarding security when building data lake on Google Cloud Platform.

Big Data Google Cloud Platform Official Blog Jan. 28, 2019

Google is named a leader in the 2019 Gartner Magic Quadrant for Data Management Solutions for Analytics - Gartner named Google a Leader in the 2019 Gartner Magic Quadrant for Data Management Solutions for Analytics (DMSA).

Big Data Google Compute Engine Jan. 7, 2019

Deploying PySpark ML Model on Google Compute Engine as a REST API - Step-by-step tutorial on Deploying PySpark ML Model on Google Compute Engine.

Big Data Nov. 26, 2018

How to capture and store tweets in Real Time with Apache Spark and Apache Kafka. Using cloud Platforms such as Databricks and GCP (Part 1) - Capture and store tweets in Real Time with Apache Spark and Apache Kafka.

Big Data Cloud Datalab Google Cloud Dataflow Python Serverless June 18, 2018

Analyzing Reddit’s Top Posts & Images With Google Cloud (Part 1) - Analyzing everything from Reddit.

Big Data Business May 21, 2018

Cask is joining Google Cloud - Cask is behind CDAP - open source big data integration platform.

Big Data Cloud Datalab Google Cloud Pub/Sub Google Cloud Storage May 21, 2018

Data Science for Startups: Data Pipelines - Example of creating data pipeline on Google Cloud Platform.

Apache Beam Big Data May 14, 2018

GCP Podcast - #126 Beam and Spark with Holden Karau

Big Data March 26, 2018

Public datasets: how nonprofits can drive social impact with planetary-scale data - Public datasets are freely hosted and accessible via Google BigQuery and Cloud Storage.

Big Data Business March 26, 2018

Room to Grow on the Big Data Maturity Curve - Report on Big Data ecosystems.

Big Data Business Official Blog March 19, 2018

Solutions : Build a Marketing Data Warehouse on Google Cloud Platform - Using fictional online cosmetics retailer as example of how to leverage Google Cloud Products to get key insights.

Big Data Official Blog March 5, 2018

How to handle mutating JSON schemas in a streaming pipeline, with Square Enix - Explore how Square Enix supports handling of mutating JSON schemas in a streaming pipeline.

Big Data Machine Learning TensorFlow Nov. 20, 2017

Automating ML and IoT with cloud-based image rendering, training, and device delivery - Architectural solutions for 3D rendering and machine learning.

Big Data Teradata Nov. 20, 2017

Transitioning from Data Warehousing in Teradata to GCP Big Data - Article describes how you can transition from on-premises and cloud data warehousing to Google Cloud Platform.

Big Data Sept. 11, 2017

Plumbing Big Data Pipelines - Qubit (provides personalization for companies when communicating with customers) describe their experience different Google Cloud Platform products

Big Data Google Cloud Dataproc Aug. 20, 2017

Easier integration with Apache Spark and Hadoop via Google Cloud Dataproc Job IDs and Labels - Best practices to use Job IDs and labels

Big Data Machine Learning July 31, 2017

New hands-on labs for scientific data processing on Google Cloud Platform - 7 new labs to try out Google Cloud Platform Big Data and Machine Learning products to solve real-world scientific problems using a variety of public datasets.

Big Data July 24, 2017

Moving Thumbtack’s data infrastructure to Google Cloud Platform - Moving data from PostgreSQL and MongoDB to Google Cloud Dataproc and BigQuery

Big Data Google Cloud Bigtable July 3, 2017

How Qubit deduplicates streaming data at scale with Google Cloud Platform - How Qubit solved issue regarding duplicated streaming data using Google Cloud Platform products

Big Data July 3, 2017

GCP Podcast - #83 Public Datasets with Mike Hamberg and Will Curran

Big Data Google Cloud Dataflow July 3, 2017

Introducing Cloud Dataflow Shuffle: For up to 5x performance improvement in data analytic pipelines

Big Data BigQuery June 26, 2017

The Google Data WareCity - Interesting and unique aspects of BigQuery’s data sharing capability

Big Data BigQuery June 26, 2017

GCE BigQuery vs AWS Redshift vs AWS Athena - Basic comparison on data loading and simple queries between Google BigQuery and Amazon Redshift and its cousin Athena.

Big Data Google Cloud Dataflow June 19, 2017

Visualization and large-scale processing of historical weather radar (NEXRAD Level II) data - Processing historical weather data for visualization with Cloud Dataflow

Big Data Business May 8, 2017

That giant sucking sound? Hadoop moving into the cloud - Companies are starting to move their Hadoop environments to Google Cloud Platform because of simplicity, stability, maturity

Big Data BigQuery April 10, 2017

BI Performance Benchmarks with Google BigQuery

Big Data Google Cloud Dataflow March 27, 2017

Google Cloud Dataflow In the Smart Home Data Pipeline - Handling data from Nest devices via Google Cloud Dataflow

Big Data March 13, 2017

Visualizing Big Data with Google Cloud

Big Data BigQuery PubSub March 6, 2017

Combining Thomson Reuters data with Google BigQuery and Google Cloud Pub/Sub API - Proof of concept to analyze data with BigQuery ingested from Reuters API

Big Data March 6, 2017

Data Science on the Google Cloud Platform: the first book - Interview with Valliappa Lakshmanan author of upcoming book Data Science on Google Cloud Platform

Big Data

Building a Data Lake on GCP with CDAP - First look on Google-acquired Cask’s open source platform.

 

Latest Issues




Contact

Zdenko Hrček
Třebanická 183
Prague, Czech Republic
Phone: +420 777 283 075
Email: zdenko@gcpweekly.com