# bqdf


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Usage

### Installation

Install latest from the GitHub
[repository](https://github.com/motdam/bqdf):

``` sh
$ pip install git+https://github.com/motdam/bqdf.git
```

or from [conda](https://anaconda.org/motdam/bqdf)

``` sh
$ conda install -c motdam bqdf
```

or from [pypi](https://pypi.org/project/bqdf/)

``` sh
$ pip install bqdf
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/motdam/bqdf)’s
[pages](https://motdam.github.io/bqdf/). Additionally you can find
package manager specific guidelines on
[conda](https://anaconda.org/motdam/bqdf) and
[pypi](https://pypi.org/project/bqdf/) respectively.

## How to use

This lib provides convenience functions for streamlining the interface
of the pandas-gbq library to perform CRUD operations in BigQuery more
quickly

``` python
import pandas_gbq
import pandas as pd
```

``` python
top_terms_query = """
-- todays top 10 search terms in England
SELECT refresh_date, rank, term, score, percent_gain / 100 as percent_gain, country_name, week
FROM `bigquery-public-data.google_trends.international_top_rising_terms` 
WHERE country_name = 'United Kingdom'
  and refresh_date = current_date - 1
  and region_name = 'England'
order by refresh_date desc, week desc, rank
limit 5
"""
```

### Reading a BigQuery table

``` python
df = read(top_terms_query, project_id='bq-sandbox-motdam')
df.head()
```

    Downloading:   0%|          |Downloading: 100%|██████████|
    Loaded 5 rows × 7 cols (0.0000 GB) from query in 0.91s

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

<table class="dataframe" data-quarto-postprocess="true" data-border="1">
<thead>
<tr style="text-align: right;">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">refresh_date</th>
<th data-quarto-table-cell-role="th">rank</th>
<th data-quarto-table-cell-role="th">term</th>
<th data-quarto-table-cell-role="th">score</th>
<th data-quarto-table-cell-role="th">percent_gain</th>
<th data-quarto-table-cell-role="th">country_name</th>
<th data-quarto-table-cell-role="th">week</th>
</tr>
</thead>
<tbody>
<tr>
<td data-quarto-table-cell-role="th">0</td>
<td>2025-11-30</td>
<td>1</td>
<td>man city vs leeds united</td>
<td>100</td>
<td>63.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">1</td>
<td>2025-11-30</td>
<td>2</td>
<td>everton vs newcastle</td>
<td>100</td>
<td>49.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">2</td>
<td>2025-11-30</td>
<td>3</td>
<td>moderate rainfall warning</td>
<td>60</td>
<td>48.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">3</td>
<td>2025-11-30</td>
<td>4</td>
<td>tottenham vs fulham</td>
<td>100</td>
<td>44.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">4</td>
<td>2025-11-30</td>
<td>5</td>
<td>tom stoppard</td>
<td>100</td>
<td>26.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
</tbody>
</table>

</div>

To recreate the above with the original library you would need the below
boiler plate to inspect the results and convert columns into pandas
friendly dtypes.

``` python
df = pandas_gbq.read_gbq(top_terms_query, project_id='bq-sandbox-motdam')
df = df.astype({
    'percent_gain':'Float64'
})
df['week'] = pd.to_datetime(df['week'])
df['refresh_date'] = pd.to_datetime(df['refresh_date'])
print(df.info())
df.head()
```

    Downloading:   0%|          |Downloading: 100%|██████████|
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 7 columns):
     #   Column        Non-Null Count  Dtype         
    ---  ------        --------------  -----         
     0   refresh_date  5 non-null      datetime64[ns]
     1   rank          5 non-null      Int64         
     2   term          5 non-null      object        
     3   score         5 non-null      Int64         
     4   percent_gain  5 non-null      Float64       
     5   country_name  5 non-null      object        
     6   week          5 non-null      datetime64[ns]
    dtypes: Float64(1), Int64(2), datetime64[ns](2), object(2)
    memory usage: 427.0+ bytes
    None

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

<table class="dataframe" data-quarto-postprocess="true" data-border="1">
<thead>
<tr style="text-align: right;">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">refresh_date</th>
<th data-quarto-table-cell-role="th">rank</th>
<th data-quarto-table-cell-role="th">term</th>
<th data-quarto-table-cell-role="th">score</th>
<th data-quarto-table-cell-role="th">percent_gain</th>
<th data-quarto-table-cell-role="th">country_name</th>
<th data-quarto-table-cell-role="th">week</th>
</tr>
</thead>
<tbody>
<tr>
<td data-quarto-table-cell-role="th">0</td>
<td>2025-11-30</td>
<td>1</td>
<td>man city vs leeds united</td>
<td>100</td>
<td>63.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">1</td>
<td>2025-11-30</td>
<td>2</td>
<td>everton vs newcastle</td>
<td>100</td>
<td>49.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">2</td>
<td>2025-11-30</td>
<td>3</td>
<td>moderate rainfall warning</td>
<td>60</td>
<td>48.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">3</td>
<td>2025-11-30</td>
<td>4</td>
<td>tottenham vs fulham</td>
<td>100</td>
<td>44.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">4</td>
<td>2025-11-30</td>
<td>5</td>
<td>tom stoppard</td>
<td>100</td>
<td>26.5</td>
<td>United Kingdom</td>
<td>2025-11-23</td>
</tr>
</tbody>
</table>

</div>

### Writing a df to BigQuery

The rest [`to`](https://motdam.github.io/bqdf/core.html#to) function is
unchanged beyond removing the redundant \_gbq suffix. We can write our
df back into BigQuery using hte
[`to`](https://motdam.github.io/bqdf/core.html#to) function.

``` python
# Write the dataframe to a temporary table
to(df, 'bq-sandbox-motdam.temporary.top_10_eng_search_terms', if_exists='replace')
```

      0%|          | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 6721.64it/s]

    Sent 5 rows × 7 cols (0.0000 GB) to bq-sandbox-motdam.temporary.top_10_eng_search_terms in 4.83s

### Executing SQL in BigQuery

The [`ex`](https://motdam.github.io/bqdf/core.html#ex) fucntion enables
non df based CRUD operations within the same api which can be useful for
creating feature processing pipelines.

``` python
table_id = """bq-sandbox-motdam.temporary.top_terms_today"""
if exists(table_id):
    ex(f"drop table {table_id};")

ex(f"""
CREATE OR REPLACE TABLE {table_id} AS
SELECT region_name, term, COUNT(*) as appearances
FROM `bigquery-public-data.google_trends.international_top_rising_terms`
WHERE country_name = 'United Kingdom'
  AND refresh_date = CURRENT_DATE() - 1
GROUP BY region_name, term
""", project_id="bq-sandbox-motdam")
exists(table_id)
;
```

    Table bq-sandbox-motdam.temporary.top_terms_today exists
    Executed 1 queries, 1 drops: Processed 0.0000 GB, 0 rows affected in 0.73s
    Executed 1 creates: Processed 0.3265 GB, 0 rows affected in 2.26s
    Table bq-sandbox-motdam.temporary.top_terms_today exists

    ''

You can then check whether your operations worked with
[`exists`](https://motdam.github.io/bqdf/core.html#exists).

### Building and reading tables

The [`build_read`](https://motdam.github.io/bqdf/core.html#build_read)
function combines table creation and reading in one step. It checks if a
table exists, executes a query to create it if needed (or if
`force=True`), then reads and returns the result as a DataFrame. This is
particularly useful for creating complex multi-step pipelines where you
want to materialize intermediate tables and read the final result.

``` python
def create_top_terms(period, days):
    return f"""
    CREATE OR REPLACE TABLE `bq-sandbox-motdam.temporary.top_terms_{period}` AS
    WITH ranked AS (
      SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
        ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
      FROM `bigquery-public-data.google_trends.international_top_rising_terms`
      WHERE country_name = 'United Kingdom'
        AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
        AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
        AND rank <= 100
      GROUP BY region_name, term
    )
    SELECT region_name, term as top_term_{period}
    FROM ranked WHERE rn = 1;
    """

periods = [('today', 1), ('week', 8), ('month', 31), ('year', 366)]
query_pipeline = ''.join([create_top_terms(period, days) for period, days in periods])
final_table = 'bq-sandbox-motdam.temporary.top_terms_summary'
query_pipeline += f"""
CREATE OR REPLACE TABLE `{final_table}` AS
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `bq-sandbox-motdam.temporary.top_terms_today` as t
JOIN `bq-sandbox-motdam.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `bq-sandbox-motdam.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `bq-sandbox-motdam.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name;
"""
if exists(final_table):
    drop(final_table)

build_read(final_table, query_pipeline, project_id="bq-sandbox-motdam", force=True)
```

    Table bq-sandbox-motdam.temporary.top_terms_summary exists
    Executed 1 drops: Processed 0.0000 GB, 0 rows affected in 0.74s
    Table bq-sandbox-motdam.temporary.top_terms_summary not found
    Executed 5 queries, 5 creates: Processed 27.1335 GB, 0 rows affected in 14.30s
    Downloading:   0%|          |Downloading: 100%|██████████|
    Loaded 4 rows × 5 cols (0.0000 GB) from table in 0.61s

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

<table class="dataframe" data-quarto-postprocess="true" data-border="1">
<thead>
<tr style="text-align: right;">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">region_name</th>
<th data-quarto-table-cell-role="th">top_term_today</th>
<th data-quarto-table-cell-role="th">top_term_week</th>
<th data-quarto-table-cell-role="th">top_term_month</th>
<th data-quarto-table-cell-role="th">top_term_year</th>
</tr>
</thead>
<tbody>
<tr>
<td data-quarto-table-cell-role="th">0</td>
<td>England</td>
<td>man city vs leeds united</td>
<td>man united vs everton</td>
<td>moderate rainfall warning</td>
<td>moderate rainfall warning</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">1</td>
<td>Northern Ireland</td>
<td>man city vs leeds united</td>
<td>man united vs everton</td>
<td>moderate rainfall warning</td>
<td>moderate rainfall warning</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">2</td>
<td>Scotland</td>
<td>man city vs leeds united</td>
<td>liverpool vs psv</td>
<td>moderate rainfall warning</td>
<td>moderate rainfall warning</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">3</td>
<td>Wales</td>
<td>man city vs leeds united</td>
<td>man united vs everton</td>
<td>moderate rainfall warning</td>
<td>moderate rainfall warning</td>
</tr>
</tbody>
</table>

</div>

[`drop`](https://motdam.github.io/bqdf/core.html#drop) can also be used
as a shorthand to for dropping a single table.

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you
started.

### Install bqdf in Development mode

``` sh
# make sure bqdf package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to bqdf
$ nbdev_prepare
```
