Automating Excel Workflows in Box Using Python, Box SDK, and OpenPyXL
Automate Excel workflows on Box using Python, Box SDK, and openpyxl to sync data, eliminate manual updates, and streamline reporting.
Join the DZone community and get the full member experience.
Join For FreeIn many organizations, MS Excel remains the go-to tool for storing and sharing structured data, whether it’s tracking project progress, managing audit logs, or maintaining employee or resource details. Yet, a surprisingly common challenge persists: data is still being copied and updated manually.
Teams across different functions, especially management and DevOps, often find themselves entering or syncing data from one source into Excel spreadsheets manually and repeatedly. This not only consumes time but also introduces room for errors and inconsistencies.
For example:
- A manager who regularly fetches data from a project board to Excel to track progress.
- A DevOps engineer who tracks resource utilization across environments.
- An auditor who needs to sync logs from internal tools into an Excel sheet stored in Box for compliance review.
These tasks are ripe for automation.
Box is a popular cloud storage provider used by several organizations to store files related to projects and employees. With the help of modern tools like the Box Python SDK, openpyxl, and pandas, it's possible to read and write Excel files directly in your code, no downloading or manual editing required.
In this blog post, we’ll explore how to:
- Connect to Box using Python
- Read Excel files stored in Box
- Update and append new data to sheets
- Upload the updated file back to Box
This workflow is especially useful for DevOps, SREs, and team leads who need to keep operational data synced across systems, or for managers and analysts looking to automate routine Excel updates.
Setting Up Your Box App
Step 1: Create a Box Developer App
- Go to the Box Developer Console.
- Click Create New App.
- Choose "Custom App" as the app type.
- Give your app a name and description.
- Select "OAuth 2.0 with JWT (Server Authentication)".
This will generate a new application that can interact with your Box account using server-side authentication.
Step 2: Configure App Permissions
Once your app is created:
- Go to your app’s Configuration tab.
- Under Application Scope, enable:
- Read and write all files and folders stored in the Box.
Important: Once configured, you'll need to submit the app for authorisation by your Box Admin (if your account is part of a Box Enterprise). Until it’s approved, API calls will be limited or denied.
Step 3: Generate and Download Configuration File
Back in the Configuration tab:
- Scroll down to the App Credentials section.
- Click Generate a Public/Private Keypair.
- A config JSON file will be downloaded. This contains:
- Client ID and Secret
- JWT Public Key ID
- Private Key (for signing the JWT)
- Passphrase for the private key
- Enterprise ID
Step 4: Share the Box Folder/File With the Box App Automation User
- Go to your app's General settings.
- Under the Service account ID, copy the automation user ID.
- Share the box folder/file with this automation user ID. This will give read and write privileges to the box app.
Keep this file safe and private — it grants full API access.
Installing Required Libraries
Before diving into code, let's install the required Python libraries. These will enable interaction with Box and manipulation of Excel files.
pip install boxsdk openpyxl pandas
Here’s a quick overview of what each library does:
- boxsdk: The official Python SDK for Box’s APIs. It allows you to authenticate, access files, upload/download content, and manage folders using the Box Developer API.
- openpyxl: A powerful library for reading and writing Excel .xlsx files in Python. It lets you work with individual cells, formulas, styles, and sheets.
- pandas: A data analysis library in Python. Useful when you want to process or filter Excel data in a tabular format using DataFrames.
Authenticating With Box in Python
To interact with Box via the SDK, you'll first need to authenticate your application.
def init_box_client():
auth = JWTAuth.from_settings_file(‘<path/to/config.json>’)
client = Client(auth)
return client
Once authenticated, the client can now access any file, folder, or metadata your app has permissions for (Yes, you need to add the app you created as an editor to the file you want to automate).
Reading Excel Files from Box
Once authenticated, you can access Excel files in Box either by their file ID or by searching for their name.
1. Accessing a File by ID
ID is the last part of the URL of a box file.
box_file = client.file(file_id)
2. Downloading to Memory Using BytesIO
You don’t need to write the file to disk. You can load it directly into memory:
import io
from openpyxl import load_workbook
file_stream = io.BytesIO()
box_file.download_to(file_stream)
file_stream.seek(0) # Rewind to beginning
workbook = load_workbook(file_stream)
3. Accessing a Specific Sheet
sheet = workbook["Sheet1"]
Or dynamically get the first sheet:
sheet = workbook.active
4. Reading into a Pandas DataFrame (Optional)
If you'd prefer to use pandas for data analysis:
import pandas as pd
file_stream.seek(0) # Ensure pointer is at start
df = pd.read_excel(file_stream, sheet_name="Sheet1")
Overall, the read Excel function would look like this:
def load_excelsheet_from_box(client, id, sheet_name):
box_file_ref = client.file(id)
box_file_metadata = box_file_ref.get()
file_stream = io.BytesIO()
box_file_ref.download_to(file_stream)
file_stream.seek(0)
workbook = load_workbook(file_stream)
sheet = workbook[sheet_name]
return [box_file_ref, box_file_metadata, workbook, sheet]
Working With Excel Data
With your Excel sheet loaded, you can now read and manipulate rows using openpyxl or pandas.
1. Iterating Over Rows With openpyxl
for row in sheet.iter_rows(min_row=2, values_only=True):
print(row)
This will print data from the second row onward (assuming row 1 is headers).
2. Accessing Headers and Filtering
You can extract headers using:
headers = [cell.value for cell in sheet[1]]
You might use this to filter or map column positions.
3. Using pandas for Heavy Processing
If you want to filter, pivot, or merge data, load it into a DataFrame:
df = pd.read_excel(file_stream)
filtered_df = df[df["Status"] == "Active"]
This approach is powerful when working with large Excel sheets or when doing analysis.
Saving Changes Back to Box
After modifying the Excel workbook, the next step is to save the updated file and upload it back to Box.
1. Saving to Memory
We use BytesIO to avoid writing to disk:
updated_stream = io.BytesIO()
workbook.save(updated_stream)
updated_stream.seek(0)
2. Uploading to Box
Box allows you to replace the contents of a file using its ID:
client.file(file_id).update_contents_with_stream(updated_stream)
This creates a new version of the file in Box, preserving version history and allowing collaboration without risk of losing older data.
End-to-End Example: Sync Missing Names
Here’s a complete example that demonstrates reading from Box, checking for missing names, appending new entries, and uploading the updated file back to Box.
Use Case
Let’s say you want to track details of all employees who are onboarding to your platform.
from boxsdk import Client, JWTAuth
from openpyxl import load_workbook
import io
# Step 1: Authenticate
auth = JWTAuth.from_settings_file("box_config.json")
client = Client(auth)
# Step 2: Download the Excel file from Box
file_id = "1234567890"
box_file = client.file(file_id)
file_stream = io.BytesIO()
box_file.download_to(file_stream)
file_stream.seek(0)
# Step 3: Load and read the Excel sheet
workbook = load_workbook(file_stream)
sheet = workbook["Sheet1"]
existing_names = {
(row[0] or "").strip().lower()
for row in sheet.iter_rows(min_row=2, values_only=True)
if row[0]
}
# Sample new data. This should come from your platforms’ api.
new_rows = [
{"name": "Suresh", "id": "A001", "owner": "Team A"},
{"name": "Ramesh", "id": "B002", "owner": "Team B"},
]
# Step 4: Append new entries
for entry in new_rows:
name = (entry.get("name") or "").strip().lower()
if name and name not in existing_names:
sheet.append([entry["name"], entry.get("id", ""), entry.get("owner", "")])
# Step 5: Save and re-upload
updated_stream = io.BytesIO()
workbook.save(updated_stream)
updated_stream.seek(0)
box_file.update_contents_with_stream(updated_stream)
Conclusion
Automating Excel tasks with Box can save hours of manual effort each week. Whether you're a management team syncing reports or a DevOps team tracking infrastructure resources, this workflow helps:
- Reduce human error
- Improve efficiency
- Ensure data consistency
By combining Box SDK, openpyxl, and pandas, you unlock a powerful set of tools to manipulate Excel files in the cloud, without even opening a UI. From daily reports to audit tracking, once set up, these automations are a game-changer for your productivity.
You can refer to the Box documentation for detailed information about available Box APIs:
Opinions expressed by DZone contributors are their own.
Comments