Getting started
This guide will instruct you through:
- Creating your first R2 bucket and enabling its data catalog.
 - Creating an API token needed for query engines to authenticate with your data catalog.
 - Using PyIceberg ↗ to create your first Iceberg table in a marimo ↗ Python notebook.
 - Using PyIceberg ↗ to load sample data into your table and query it.
 
- Sign up for a Cloudflare account ↗.
 - Install 
Node.js↗. 
Node.js version manager
 Use a Node version manager like Volta ↗ or nvm ↗ to avoid permission issues and change Node.js versions. Wrangler, discussed later in this guide, requires a Node version of 16.17.0 or later.
- 
If not already logged in, run:
npx wrangler login - 
Create an R2 bucket:
npx wrangler r2 bucket create r2-data-catalog-tutorial 
- 
In the Cloudflare dashboard, go to the R2 object storage page.
Go to Overview - 
Select Create bucket.
 - 
Enter the bucket name: r2-data-catalog-tutorial
 - 
Select Create bucket.
 
Then, enable the catalog on your chosen R2 bucket:
npx wrangler r2 bucket catalog enable r2-data-catalog-tutorialWhen you run this command, take note of the "Warehouse" and "Catalog URI". You will need these later.
- 
In the Cloudflare dashboard, go to the R2 object storage page.
Go to Overview - 
Select the bucket: r2-data-catalog-tutorial.
 - 
Switch to the Settings tab, scroll down to R2 Data Catalog, and select Enable.
 - 
Once enabled, note the Catalog URI and Warehouse name.
 
Iceberg clients (including PyIceberg ↗) must authenticate to the catalog with an R2 API token that has both R2 and catalog permissions.
- 
In the Cloudflare dashboard, go to the R2 object storage page.
Go to Overview - 
Select Manage API tokens.
 - 
Select Create API token.
 - 
Select the R2 Token text to edit your API token name.
 - 
Under Permissions, choose the Admin Read & Write permission.
 - 
Select Create API Token.
 - 
Note the Token value.
 
You need to install a Python package manager. In this guide, use uv ↗. If you do not already have uv installed, follow the installing uv guide ↗.
We will use marimo ↗ as a Python notebook.
- 
Create a directory where our notebook will be stored:
mkdir r2-data-catalog-notebook - 
Change into our new directory:
cd r2-data-catalog-notebook - 
Initialize a new uv project (this creates a
.venvand apyproject.toml):uv init - 
Add marimo and required dependencies:
Python uv add marimo pyiceberg pyarrow pandas 
- 
Create a file called
r2-data-catalog-tutorial.py. - 
Paste the following code snippet into your
r2-data-catalog-tutorial.pyfile:Python import marimo__generated_with = "0.11.31"app = marimo.App(width="medium")@app.celldef _():import marimo as moreturn (mo,)@app.celldef _():import pandasimport pyarrow as paimport pyarrow.compute as pcimport pyarrow.parquet as pqfrom pyiceberg.catalog.rest import RestCatalog# Define catalog connection details (replace variables)WAREHOUSE = "<WAREHOUSE>"TOKEN = "<TOKEN>"CATALOG_URI = "<CATALOG_URI>"# Connect to R2 Data Catalogcatalog = RestCatalog(name="my_catalog",warehouse=WAREHOUSE,uri=CATALOG_URI,token=TOKEN,)return (CATALOG_URI,RestCatalog,TOKEN,WAREHOUSE,catalog,pa,pandas,pc,pq,)@app.celldef _(catalog):# Create default namespace if neededcatalog.create_namespace_if_not_exists("default")return@app.celldef _(pa):# Create simple PyArrow tabledf = pa.table({"id": [1, 2, 3],"name": ["Alice", "Bob", "Charlie"],"score": [80.0, 92.5, 88.0],})return (df,)@app.celldef _(catalog, df):# Create or load Iceberg tabletest_table = ("default", "people")if not catalog.table_exists(test_table):print(f"Creating table: {test_table}")table = catalog.create_table(test_table,schema=df.schema,)else:table = catalog.load_table(test_table)return table, test_table@app.celldef _(df, table):# Append datatable.append(df)return@app.celldef _(table):print("Table contents:")scanned = table.scan().to_arrow()print(scanned.to_pandas())return (scanned,)@app.celldef _():# Optional cleanup. To run uncomment and run cell# print(f"Deleting table: {test_table}")# catalog.drop_table(test_table)# print("Table dropped.")returnif __name__ == "__main__":app.run() - 
Replace the
CATALOG_URI,WAREHOUSE, andTOKENvariables with your values from sections 2 and 3 respectively. - 
Launch the notebook editor in your browser:
uv run marimo edit r2-data-catalog-tutorial.pyOnce your notebook connects to the catalog, you'll see the catalog along with its namespaces and tables appear in marimo's Datasources panel.
 
In the Python notebook above, you:
- Connect to your catalog.
 - Create the 
defaultnamespace. - Create a simple PyArrow table.
 - Create (or load) the 
peopletable in thedefaultnamespace. - Append sample data to the table.
 - Print the contents of the table.
 - (Optional) Drop the 
peopletable we created for this tutorial. 
Was this helpful?
- Resources
 - API
 - New to Cloudflare?
 - Directory
 - Sponsorships
 - Open Source
 
- Support
 - Help Center
 - System Status
 - Compliance
 - GDPR
 
- Company
 - cloudflare.com
 - Our team
 - Careers
 
- © 2025 Cloudflare, Inc.
 - Privacy Policy
 - Terms of Use
 - Report Security Issues
 - Trademark