Google's BigQuery and Go

February 15, 2019 5 minutes read

go • Notes

BigQuery • go • Google • Serverless • DataBases • DataPersistence

Notes

BigQuery(BQ) is Google’s serverless, highly scalable, enterprise data warehouse
SQL-like query language
Real time data analysis by “streaming ingestion” capability
Deals with gigabytes to petabytes of data volume
Offers data encryption and data access permissions via Cloud Identity and Access Management
Machine Learning and Bussiness Intel can be easily integrated with BQ
Support for Geospatial data

Interacting with BQ

To interact with the BQ we can use:

A CLI tool
The GCP BQ UI (New and Old Web UI)
REST API which also allows the use of several client libraries

First things first

To be able to use BQ, it must be enabled for a project. All new projects will have it enabled by default, but for old projects, it must be manually enabled.

On GCP Concepts

Projects

Any GCP resources that you allocate and use must belong to a project. In order to interact with most GCP resources, you must provide the identifying project information for every request. You can identify a project in either of two ways: a project ID, or a project number
Google Cloud Platform offers Cloud IAM, which lets you assign granular access to specific Google Cloud Platform resources and prevents unwanted access to other resources. Cloud IAM lets you control who (users) has what access (roles) to which resources by setting Cloud IAM policies on the resources.
The hierarchy of resources is as follows,

On possible queries

BQ lets you perform,

Run interactive queries
Run batch queries
Create a view, which is a virtual table defined by a SQL query. Views can be created by,
- Calling the tables.insert API method
- Using the GCP Console or classic BigQuery web UI
- Submitting a CREATE VIEW Data Definition Language (DDL) statement
Use partitioned tables to query a subset of your data,
- A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data.
In addition to querying and viewing data, you can manage data in BigQuery in the following ways:
- Listing projects, jobs, datasets, and tables
- Getting information about jobs, datasets, and tables
- Defining, updating, or patching datasets and tables
- Deleting datasets and tables
- Managing table partitions

BigQuery jobs

Jobs are actions that BigQuery runs on your behalf to load data, export data, query data, or copy data.
Because jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status.

Datasets

A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery.
Datasets are strongly linked to their location (an analogous concept to AWS regions)
Allow controlled access.

Useful snippets

Table schemas

BigQuery allows you to specify a table’s schema when you load data into a table, and when you create an empty table.
To specify the schema one can use,
- The UI
- CLI
- Create a schema file in JSON format.
A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum column name length is 128 characters. Can not use prefixes like _TABLE_,_FILE_ or _PARTITION.
Column names are case insensitive
Columns use standard SQL types
Columns accept NULLable modes (allow NULL entries). This is enabled by default
Here’s a link in how to send a table schema to the BQ
BigQuery performs best when your data is denormalized, so it is better to denormalize your data and take advantage of nested and repeated columns. Nested and repeated columns can maintain relationships without the performance impact of preserving a relational (normalized) schema. For example, a relational database used to track library books would likely keep all author information in a separate table. A key such as author_id would be used to link the book to the authors. In BigQuery, you can preserve the relationship between book and author without creating a separate author table. Instead, you create an author column, and you nest fields within it such as the author’s first name, last name, date of birth, and so on. If a book has multiple authors, you can make the nested author column repeated.
Here’s an example to have NESTED data in schema https://cloud.google.com/bigquery/docs/nested-repeated
Schemas can’t be changed, but can be improved by adding columns

BigQuery tables

A BigQuery table contains individual records organized in rows. Each record is composed of columns (also called fields).
Every table is defined by a schema that describes the column names, data types, and other information.
Table names must be unique for a dataset
Tables can be copied only to the same location (region, like in aws)
You can create a table in BigQuery go snippet to table creation:
- Manually using the GCP Console, the classic BigQuery web UI, or the command-line tool’s bq mk command
- Programmatically by calling the tables.insert API method
- From query results
- By defining a table that references an external data source
- When you load data
Table names can have up to 1024 characters

Partitioned tables

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.
Partitioned tables are ordered by data load date or data date of arrival

Views

A view is a virtual table defined by a SQL query
You can query and create views in BigQuery by using either of,
- Query editor box in the GCP Console
- Compose Query option in the classic BigQuery web UI
- BigQuery command-line tool’s bq query command
- BigQuery REST API to programmatically call the jobs.query or query-type jobs.insert methods
- BigQuery client libraries
To create a view take into account the snippet https://cloud.google.com/bigquery/docs/views
Views must be created before being viewed
Caution viewing views is not the same as viewing tables, views are virtual, tables not