How to start with google bigquery
Table of Contents
Here is the list of thing I noticed when started to work with bigquery.
First you need to create a project — https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui.
That’s how you create a service key for bigquery — https://cloud.google.com/docs/authentication/getting-started. Select your project on the very top of the page, and then create JSON key. And don’t forget to set is as a Product Owner for the project (to make sure you don’t have any troubles with permissions).
After that you need to specify path to that file via environment variable GOOGLE_APPLICATION_CREDENTIALS
.
Now you are ready to start the small project. Here is an example on using public data — https://gitlab.com/bullgare/bigquery-covid19.
Queries example
Bigquery is mostly an SQL, but it has some additions. For instance, it can have nested array of structures.
For instance, you could have a list of products inside your order with a scheme like that:
1 2 3 4 5 6 7 8 |
Orders table: order.id order.total order.products.id Products table: product.id product.name |
You can select orders with product ids separated by a semicolon like that:
1 2 3 4 5 6 |
SELECT o.id, o.total, ARRAY_TO_STRING(ARRAY_AGG(ARRAY(SELECT id FROM o.products)), ";", "") AS product_ids FROM [...].orders |
More on that — https://stackoverflow.com/a/40943459/801426
Or you can join product names from Products table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
WITH ( SELECT o.id, o.total, p.name AS product_name FROM [...].orders LEFT JOIN UNNEST(o.products) AS op INNER JOIN [...].products AS p ON p.id = op.id ) AS original SELECT original.id, original.total, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT original.product_name), ";", "") AS product_names |
Here is a list of functions — https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators.
Here is an article on how to work with arrays — https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays.
Working with bigquery in go
You can use module cloud.google.com/go/bigquery
.
Here is how you can work with query parameters — https://cloud.google.com/bigquery/docs/parameterized-queries.
Here is a little example in go:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
q := client.Query(` SELECT o.order_id, o.total, DATETIME(o.order_placed_at) AS order_placed_at, FROM [...].orders AS o WHERE o.order_id IN UNNEST(@order_ids) AND o.order_placed_at BETWEEN TIMESTAMP_SUB(@current_date, INTERVAL @interval_days_from DAY) AND TIMESTAMP_SUB(@current_date, INTERVAL @interval_days_to DAY) ORDER BY o.vendor.vendor_code, order_placed_at LIMIT 2 ;`) q.Parameters = []bigquery.QueryParameter{ { Name: "current_date", Value: time.Now().Add(24 * time.Hour).Format("2006-01-02"), }, { Name: "interval_days_from", Value: 3, }, { Name: "interval_days_to", Value: 1, }, { Name: "order_ids", Value: []int64(1,2,3), }, } err := q.Read(ctx) if err != nil { return fmt.Errorf("reading data for query: %w", err) } for { var data [your-model] err := iter.Next(&data) if err == iterator.Done { break } if err != nil { return fmt.Errorf("iterating results: %w", err) } [work with your data] } |
You could do a healthcheck with a query like this:
1 |
SELECT TIMESTAMP_SUB("2020-04-20", INTERVAL 0 DAY); |
Similar Posts
LEAVE A COMMENT
Для отправки комментария вам необходимо авторизоваться.