import pandas_gbq
import pandas as pdbqdf
Usage
Installation
Install latest from the GitHub repository:
$ pip install git+https://github.com/motdam/bqdf.gitor from conda
$ conda install -c motdam bqdfor from pypi
$ pip install bqdfDocumentation
Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.
How to use
This lib provides convenience functions for streamlining the interface of the pandas-gbq library to perform CRUD operations in BigQuery more quickly
top_terms_query = """
-- todays top 10 search terms in England
SELECT refresh_date, rank, term, score, percent_gain / 100 as percent_gain, country_name, week
FROM `bigquery-public-data.google_trends.international_top_rising_terms`
WHERE country_name = 'United Kingdom'
and refresh_date = current_date - 1
and region_name = 'England'
order by refresh_date desc, week desc, rank
limit 5
"""Reading a BigQuery table
df = read(top_terms_query, project_id='bq-sandbox-motdam')
df.head()Downloading: 0%| |Downloading: 100%|██████████|
Loaded 5 rows × 7 cols (0.0000 GB) from query in 0.91s
| refresh_date | rank | term | score | percent_gain | country_name | week | |
|---|---|---|---|---|---|---|---|
| 0 | 2025-11-30 | 1 | man city vs leeds united | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-30 | 2 | everton vs newcastle | 100 | 49.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-30 | 3 | moderate rainfall warning | 60 | 48.5 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-30 | 4 | tottenham vs fulham | 100 | 44.5 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-30 | 5 | tom stoppard | 100 | 26.5 | United Kingdom | 2025-11-23 |
To recreate the above with the original library you would need the below boiler plate to inspect the results and convert columns into pandas friendly dtypes.
df = pandas_gbq.read_gbq(top_terms_query, project_id='bq-sandbox-motdam')
df = df.astype({
'percent_gain':'Float64'
})
df['week'] = pd.to_datetime(df['week'])
df['refresh_date'] = pd.to_datetime(df['refresh_date'])
print(df.info())
df.head()Downloading: 0%| |Downloading: 100%|██████████|
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 refresh_date 5 non-null datetime64[ns]
1 rank 5 non-null Int64
2 term 5 non-null object
3 score 5 non-null Int64
4 percent_gain 5 non-null Float64
5 country_name 5 non-null object
6 week 5 non-null datetime64[ns]
dtypes: Float64(1), Int64(2), datetime64[ns](2), object(2)
memory usage: 427.0+ bytes
None
| refresh_date | rank | term | score | percent_gain | country_name | week | |
|---|---|---|---|---|---|---|---|
| 0 | 2025-11-30 | 1 | man city vs leeds united | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-30 | 2 | everton vs newcastle | 100 | 49.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-30 | 3 | moderate rainfall warning | 60 | 48.5 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-30 | 4 | tottenham vs fulham | 100 | 44.5 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-30 | 5 | tom stoppard | 100 | 26.5 | United Kingdom | 2025-11-23 |
Writing a df to BigQuery
The rest to function is unchanged beyond removing the redundant _gbq suffix. We can write our df back into BigQuery using hte to function.
# Write the dataframe to a temporary table
to(df, 'bq-sandbox-motdam.temporary.top_10_eng_search_terms', if_exists='replace') 0%| | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 6721.64it/s]
Sent 5 rows × 7 cols (0.0000 GB) to bq-sandbox-motdam.temporary.top_10_eng_search_terms in 4.83s
Executing SQL in BigQuery
The ex fucntion enables non df based CRUD operations within the same api which can be useful for creating feature processing pipelines.
table_id = """bq-sandbox-motdam.temporary.top_terms_today"""
if exists(table_id):
ex(f"drop table {table_id};")
ex(f"""
CREATE OR REPLACE TABLE {table_id} AS
SELECT region_name, term, COUNT(*) as appearances
FROM `bigquery-public-data.google_trends.international_top_rising_terms`
WHERE country_name = 'United Kingdom'
AND refresh_date = CURRENT_DATE() - 1
GROUP BY region_name, term
""", project_id="bq-sandbox-motdam")
exists(table_id)
;Table bq-sandbox-motdam.temporary.top_terms_today exists
Executed 1 queries, 1 drops: Processed 0.0000 GB, 0 rows affected in 0.73s
Executed 1 creates: Processed 0.3265 GB, 0 rows affected in 2.26s
Table bq-sandbox-motdam.temporary.top_terms_today exists
''
You can then check whether your operations worked with exists.
Building and reading tables
The build_read function combines table creation and reading in one step. It checks if a table exists, executes a query to create it if needed (or if force=True), then reads and returns the result as a DataFrame. This is particularly useful for creating complex multi-step pipelines where you want to materialize intermediate tables and read the final result.
def create_top_terms(period, days):
return f"""
CREATE OR REPLACE TABLE `bq-sandbox-motdam.temporary.top_terms_{period}` AS
WITH ranked AS (
SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
FROM `bigquery-public-data.google_trends.international_top_rising_terms`
WHERE country_name = 'United Kingdom'
AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
AND rank <= 100
GROUP BY region_name, term
)
SELECT region_name, term as top_term_{period}
FROM ranked WHERE rn = 1;
"""
periods = [('today', 1), ('week', 8), ('month', 31), ('year', 366)]
query_pipeline = ''.join([create_top_terms(period, days) for period, days in periods])
final_table = 'bq-sandbox-motdam.temporary.top_terms_summary'
query_pipeline += f"""
CREATE OR REPLACE TABLE `{final_table}` AS
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `bq-sandbox-motdam.temporary.top_terms_today` as t
JOIN `bq-sandbox-motdam.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `bq-sandbox-motdam.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `bq-sandbox-motdam.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name;
"""
if exists(final_table):
drop(final_table)
build_read(final_table, query_pipeline, project_id="bq-sandbox-motdam", force=True)Table bq-sandbox-motdam.temporary.top_terms_summary exists
Executed 1 drops: Processed 0.0000 GB, 0 rows affected in 0.74s
Table bq-sandbox-motdam.temporary.top_terms_summary not found
Executed 5 queries, 5 creates: Processed 27.1335 GB, 0 rows affected in 14.30s
Downloading: 0%| |Downloading: 100%|██████████|
Loaded 4 rows × 5 cols (0.0000 GB) from table in 0.61s
| region_name | top_term_today | top_term_week | top_term_month | top_term_year | |
|---|---|---|---|---|---|
| 0 | England | man city vs leeds united | man united vs everton | moderate rainfall warning | moderate rainfall warning |
| 1 | Northern Ireland | man city vs leeds united | man united vs everton | moderate rainfall warning | moderate rainfall warning |
| 2 | Scotland | man city vs leeds united | liverpool vs psv | moderate rainfall warning | moderate rainfall warning |
| 3 | Wales | man city vs leeds united | man united vs everton | moderate rainfall warning | moderate rainfall warning |
drop can also be used as a shorthand to for dropping a single table.
Developer Guide
If you are new to using nbdev here are some useful pointers to get you started.
Install bqdf in Development mode
# make sure bqdf package is installed in development mode
$ pip install -e .
# make changes under nbs/ directory
# ...
# compile to have changes apply to bqdf
$ nbdev_prepare