# core


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

This module provides an ergonomic wrapper around `pandas_gbq` to
simplify working with BigQuery in pandas. The main functions are:

- [`read()`](https://motdam.github.io/bqdf/core.html#read) - Load data
  from BigQuery queries or tables into DataFrames with automatic type
  conversion
- [`to()`](https://motdam.github.io/bqdf/core.html#to) - Write
  DataFrames to BigQuery tables
- [`ex()`](https://motdam.github.io/bqdf/core.html#ex) - Execute queries
  without returning results (useful for DDL/DML)
- [`exists()`](https://motdam.github.io/bqdf/core.html#exists) - Check
  if a table exists
- `ensure()` - Read a table if it exists, otherwise create it from a
  query first

All functions include verbose timing and size reporting by default.

**Imports**

``` python
from nbdev.showdoc import *
```

**Credentials helper** - Get credentials from environment variable for
CI/CD, or fall back to ADC

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L22"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_creds

>  get_creds ()

*Get credentials and project_id from environment variable or ADC*

**Core read function** - Wraps `pandas_gbq.read_gbq` with automatic type
conversion and timing

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L44"
target="_blank" style="float:right; font-size:smaller">source</a>

### read

>  read (query_or_table:str, verbose:bool=True, convert_dtypes:bool=True,
>            date_cols:list=None, str_cols:list=['scv_id'],
>            use_bqstorage_api:bool=True, credentials=None, project_id=None,
>            **kwargs)

*Load data from BigQuery query or table into DataFrame*

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>query_or_table</td>
<td>str</td>
<td></td>
<td>BigQuery SQL query or table reference</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>True</td>
<td>Print timing and size info</td>
</tr>
<tr>
<td>convert_dtypes</td>
<td>bool</td>
<td>True</td>
<td>Convert to pandas nullable types</td>
</tr>
<tr>
<td>date_cols</td>
<td>list</td>
<td>None</td>
<td>Additional columns to convert to datetime</td>
</tr>
<tr>
<td>str_cols</td>
<td>list</td>
<td>[‘scv_id’]</td>
<td>Columns to keep as string/object type</td>
</tr>
<tr>
<td>use_bqstorage_api</td>
<td>bool</td>
<td>True</td>
<td>Use Storage API for faster downloads</td>
</tr>
<tr>
<td>credentials</td>
<td>NoneType</td>
<td>None</td>
<td>Google credentials (defaults to ADC)</td>
</tr>
<tr>
<td>project_id</td>
<td>NoneType</td>
<td>None</td>
<td>GCP project ID (defaults to credential’s project)</td>
</tr>
<tr>
<td>kwargs</td>
<td>VAR_KEYWORD</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Type conversion helper** - Converts BigQuery types to appropriate
pandas nullable types

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L70"
target="_blank" style="float:right; font-size:smaller">source</a>

### convert_bq_dtypes

>  convert_bq_dtypes (df:pandas.core.frame.DataFrame, date_cols:list=None,
>                         str_cols:list=None)

*Convert BigQuery data types to pandas-compatible types*

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>DataFrame to convert</td>
</tr>
<tr>
<td>date_cols</td>
<td>list</td>
<td>None</td>
<td>List of columns to convert to datetime</td>
</tr>
<tr>
<td>str_cols</td>
<td>list</td>
<td>None</td>
<td>List of columns to keep as string/object type</td>
</tr>
</tbody>
</table>

**Write function** - Wraps `pandas_gbq.to_gbq` with timing info

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L90"
target="_blank" style="float:right; font-size:smaller">source</a>

### to

>  to (df:pandas.core.frame.DataFrame, destination_table:str,
>          verbose:bool=True, credentials=None, project_id=None, **kwargs)

*Write DataFrame to BigQuery table*

**Memory helper** - Calculate DataFrame size in GB

**Execute function** - Run queries without returning results (DDL/DML
operations)

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L114"
target="_blank" style="float:right; font-size:smaller">source</a>

### ex

>  ex (query:str, project_id:str=None, verbose:bool=True, credentials=None,
>          **kwargs)

*Execute query in BigQuery without returning results*

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L133"
target="_blank" style="float:right; font-size:smaller">source</a>

### parse_query

>  parse_query (q:str)

*Parse query to count different operation types*

<table>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>q</td>
<td>str</td>
<td>Query string</td>
</tr>
</tbody>
</table>

``` python
def create_top_terms(period, days):
    return f"""
    CREATE OR REPLACE TABLE `bq-sandbox-motdam.temporary.top_terms_{period}` AS
    WITH ranked AS (
      SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
        ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
      FROM `bigquery-public-data.google_trends.international_top_rising_terms`
      WHERE country_name = 'United Kingdom'
        AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
        AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
        AND rank <= 100
      GROUP BY region_name, term
    )
    SELECT region_name, term as top_term_{period}
    FROM ranked WHERE rn = 1;
    """

periods = [('today', 1), ('week', 8), ('month', 31), ('year', 366)]
query_pipeline = ''.join([create_top_terms(period, days) for period, days in periods])
final_table = 'bq-sandbox-motdam.temporary.top_terms_summary'
query_pipeline += f"""
CREATE OR REPLACE TABLE `{final_table}` AS
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `bq-sandbox-motdam.temporary.top_terms_today` as t
JOIN `bq-sandbox-motdam.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `bq-sandbox-motdam.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `bq-sandbox-motdam.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name;
"""
parse_query(query_pipeline)
```

    {'queries': 5, 'creates': 5, 'inserts': 0, 'deletes': 0, 'drops': 0}

**Table helper class** - Parse and represent BigQuery table references

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L147"
target="_blank" style="float:right; font-size:smaller">source</a>

### Table

>  Table (project:str, dataset:str, name:str)

``` python
t = Table.from_id(f'bq-sandbox-motdam.temporary.top_terms_summary')
t
```

    Table(project='bq-sandbox-motdam', dataset='temporary', name='top_terms_summary')

``` python
t.with_(dataset='dev', project="dev")
```

    Table(project='dev', dataset='dev', name='top_terms_summary')

``` python
t
```

    Table(project='bq-sandbox-motdam', dataset='temporary', name='top_terms_summary')

**Existence checker** - Check if a BigQuery table exists

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L172"
target="_blank" style="float:right; font-size:smaller">source</a>

### exists

>  exists (table_id:str, project_id:str=None, credentials=None,
>              verbose:bool=True, **kwargs)

*Check if a BigQuery table exists*

**Build read function** - Read table if exists, otherwise execute query
then read. Can be used to execute feature pipeline before executing
query

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L185"
target="_blank" style="float:right; font-size:smaller">source</a>

### build_read

>  build_read (table_id:str, query:str, project_id:str=None,
>                  force:bool=False, verbose:bool=True, credentials=None,
>                  **kwargs)

*Read table if exists, otherwise execute query then read*

**Drop function** - makes it easier to drop tables if needed

------------------------------------------------------------------------

<a href="https://github.com/motdam/bqdf/blob/main/bqdf/core.py#L193"
target="_blank" style="float:right; font-size:smaller">source</a>

### drop

>  drop (table_id:str, project_id:str=None, credentials=None,
>            verbose:bool=True, **kwargs)

*Drop a BigQuery table*
