# core This module provides an ergonomic wrapper around `pandas_gbq` to simplify working with BigQuery in pandas. The main functions are: - [`read()`](https://motdam.github.io/bqdf/core.html#read) - Load data from BigQuery queries or tables into DataFrames with automatic type conversion - [`to()`](https://motdam.github.io/bqdf/core.html#to) - Write DataFrames to BigQuery tables - [`ex()`](https://motdam.github.io/bqdf/core.html#ex) - Execute queries without returning results (useful for DDL/DML) - [`exists()`](https://motdam.github.io/bqdf/core.html#exists) - Check if a table exists - `ensure()` - Read a table if it exists, otherwise create it from a query first All functions include verbose timing and size reporting by default. **Imports** ``` python from nbdev.showdoc import * ``` **Credentials helper** - Get credentials from environment variable for CI/CD, or fall back to ADC ------------------------------------------------------------------------ source ### get_creds > get_creds () *Get credentials and project_id from environment variable or ADC* **Core read function** - Wraps `pandas_gbq.read_gbq` with automatic type conversion and timing ------------------------------------------------------------------------ source ### read > read (query_or_table:str, verbose:bool=True, convert_dtypes:bool=True, > date_cols:list=None, str_cols:list=['scv_id'], > use_bqstorage_api:bool=True, credentials=None, project_id=None, > **kwargs) *Load data from BigQuery query or table into DataFrame*

	Type	Default	Details
query_or_table	str		BigQuery SQL query or table reference
verbose	bool	True	Print timing and size info
convert_dtypes	bool	True	Convert to pandas nullable types
date_cols	list	None	Additional columns to convert to datetime
str_cols	list	[‘scv_id’]	Columns to keep as string/object type
use_bqstorage_api	bool	True	Use Storage API for faster downloads
credentials	NoneType	None	Google credentials (defaults to ADC)
project_id	NoneType	None	GCP project ID (defaults to credential’s project)
kwargs	VAR_KEYWORD

**Type conversion helper** - Converts BigQuery types to appropriate pandas nullable types ------------------------------------------------------------------------ source ### convert_bq_dtypes > convert_bq_dtypes (df:pandas.core.frame.DataFrame, date_cols:list=None, > str_cols:list=None) *Convert BigQuery data types to pandas-compatible types*

	Type	Default	Details
df	DataFrame		DataFrame to convert
date_cols	list	None	List of columns to convert to datetime
str_cols	list	None	List of columns to keep as string/object type

**Write function** - Wraps `pandas_gbq.to_gbq` with timing info ------------------------------------------------------------------------ source ### to > to (df:pandas.core.frame.DataFrame, destination_table:str, > verbose:bool=True, credentials=None, project_id=None, **kwargs) *Write DataFrame to BigQuery table* **Memory helper** - Calculate DataFrame size in GB **Execute function** - Run queries without returning results (DDL/DML operations) ------------------------------------------------------------------------ source ### ex > ex (query:str, project_id:str=None, verbose:bool=True, credentials=None, > **kwargs) *Execute query in BigQuery without returning results* ------------------------------------------------------------------------ source ### parse_query > parse_query (q:str) *Parse query to count different operation types*

	Type	Details
q	str	Query string

``` python def create_top_terms(period, days): return f""" CREATE OR REPLACE TABLE `bq-sandbox-motdam.temporary.top_terms_{period}` AS WITH ranked AS ( SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank, ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn FROM `bigquery-public-data.google_trends.international_top_rising_terms` WHERE country_name = 'United Kingdom' AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland') AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE() AND rank <= 100 GROUP BY region_name, term ) SELECT region_name, term as top_term_{period} FROM ranked WHERE rn = 1; """ periods = [('today', 1), ('week', 8), ('month', 31), ('year', 366)] query_pipeline = ''.join([create_top_terms(period, days) for period, days in periods]) final_table = 'bq-sandbox-motdam.temporary.top_terms_summary' query_pipeline += f""" CREATE OR REPLACE TABLE `{final_table}` AS SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year FROM `bq-sandbox-motdam.temporary.top_terms_today` as t JOIN `bq-sandbox-motdam.temporary.top_terms_week` as w ON t.region_name = w.region_name JOIN `bq-sandbox-motdam.temporary.top_terms_month` as m ON t.region_name = m.region_name JOIN `bq-sandbox-motdam.temporary.top_terms_year` as y ON t.region_name = y.region_name ORDER BY t.region_name; """ parse_query(query_pipeline) ``` {'queries': 5, 'creates': 5, 'inserts': 0, 'deletes': 0, 'drops': 0} **Table helper class** - Parse and represent BigQuery table references ------------------------------------------------------------------------ source ### Table > Table (project:str, dataset:str, name:str) ``` python t = Table.from_id(f'bq-sandbox-motdam.temporary.top_terms_summary') t ``` Table(project='bq-sandbox-motdam', dataset='temporary', name='top_terms_summary') ``` python t.with_(dataset='dev', project="dev") ``` Table(project='dev', dataset='dev', name='top_terms_summary') ``` python t ``` Table(project='bq-sandbox-motdam', dataset='temporary', name='top_terms_summary') **Existence checker** - Check if a BigQuery table exists ------------------------------------------------------------------------ source ### exists > exists (table_id:str, project_id:str=None, credentials=None, > verbose:bool=True, **kwargs) *Check if a BigQuery table exists* **Build read function** - Read table if exists, otherwise execute query then read. Can be used to execute feature pipeline before executing query ------------------------------------------------------------------------ source ### build_read > build_read (table_id:str, query:str, project_id:str=None, > force:bool=False, verbose:bool=True, credentials=None, > **kwargs) *Read table if exists, otherwise execute query then read* **Drop function** - makes it easier to drop tables if needed ------------------------------------------------------------------------ source ### drop > drop (table_id:str, project_id:str=None, credentials=None, > verbose:bool=True, **kwargs) *Drop a BigQuery table*