How to start with google bigquery
Table of Contents
Here is the list of thing I noticed when started to work with bigquery.
First you need to create a project — https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui.
That’s how you create a service key for bigquery — https://cloud.google.com/docs/authentication/getting-started. Select your project on the very top of the page, and then create JSON key. And don’t forget to set is as a Product Owner for the project (to make sure you don’t have any troubles with permissions).
After that you need to specify path to that file via environment variable GOOGLE_APPLICATION_CREDENTIALS.
Now you are ready to start the small project. Here is an example on using public data — https://gitlab.com/bullgare/bigquery-covid19.
Queries example
Bigquery is mostly an SQL, but it has some additions. For instance, it can have nested array of structures.
For instance, you could have a list of products inside your order with a scheme like that:
Orders table: order.id order.total order.products.id Products table: product.id product.name
You can select orders with product ids separated by a semicolon like that:
SELECT o.id, o.total, ARRAY_TO_STRING(ARRAY_AGG(ARRAY(SELECT id FROM o.products)), ";", "") AS product_ids FROM [...].orders
More on that — https://stackoverflow.com/a/40943459/801426
Or you can join product names from Products table:
WITH ( SELECT o.id, o.total, p.name AS product_name FROM [...].orders LEFT JOIN UNNEST(o.products) AS op INNER JOIN [...].products AS p ON p.id = op.id ) AS original SELECT original.id, original.total, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT original.product_name), ";", "") AS product_names
Here is a list of functions — https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators.
Here is an article on how to work with arrays — https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays.
Working with bigquery in go
You can use module cloud.google.com/go/bigquery.
Here is how you can work with query parameters — https://cloud.google.com/bigquery/docs/parameterized-queries.
Here is a little example in go:
q := client.Query(`
SELECT
o.order_id,
o.total,
DATETIME(o.order_placed_at) AS order_placed_at,
FROM [...].orders AS o
WHERE
o.order_id IN UNNEST(@order_ids)
AND
o.order_placed_at BETWEEN TIMESTAMP_SUB(@current_date, INTERVAL @interval_days_from DAY) AND TIMESTAMP_SUB(@current_date, INTERVAL @interval_days_to DAY)
ORDER BY o.vendor.vendor_code, order_placed_at
LIMIT 2
;`)
q.Parameters = []bigquery.QueryParameter{
{
Name: "current_date",
Value: time.Now().Add(24 * time.Hour).Format("2006-01-02"),
},
{
Name: "interval_days_from",
Value: 3,
},
{
Name: "interval_days_to",
Value: 1,
},
{
Name: "order_ids",
Value: []int64(1,2,3),
},
}
err := q.Read(ctx)
if err != nil {
return fmt.Errorf("reading data for query: %w", err)
}
for {
var data [your-model]
err := iter.Next(&data)
if err == iterator.Done {
break
}
if err != nil {
return fmt.Errorf("iterating results: %w", err)
}
[work with your data]
}
You could do a healthcheck with a query like this:
SELECT TIMESTAMP_SUB("2020-04-20", INTERVAL 0 DAY);
LEAVE A COMMENT
Для отправки комментария вам необходимо авторизоваться.