Logan Fisher
BSBA: Computer Information Systems
Concentration: Data Analytics
  • Residence:
    North Carolina
  • City:
    Greenville
  • Age:
    24
Excel
Python
Data & Viz
{(Excel)}
| VBA Macros | PowerQuery |
| M | Query Staging |
| Relational Data Joins |
{(Python)}
| pandas | plotly | matplotlib |
| Batch Data Scripting |
{(Data Modeling and Viz)}
| PowerBI | Tableau |
| Advanced Dax |
| Custom |
[Columns-Measures-Tables]
 

Python - Combining Excel Files

This Google Colab script automates merging Excel files from ZIP archives. It installs essential libraries, uploads and extracts ZIP files, compiles a unified column schema, standardizes data, and exports the combined DataFrame into a new Excel file.

Installing Dependencies

!pip install pandas openpyxl zipfile36

These libraries enable data manipulation, Excel file processing, and ZIP file handling.

Uploading Files via Google Colab

from google.colab import files
uploaded = files.upload()

This allows users to upload ZIP archives, storing files in a dictionary with filenames as keys.

Extracting Uploaded ZIP Files

import zipfile, io, os

for filename in uploaded.keys():
    with zipfile.ZipFile(io.BytesIO(uploaded[filename]), 'r') as zip_ref:
        zip_ref.extractall('/content/extracted_files')

Extracts ZIP file contents into /content/extracted_files for further processing.

Preparing for Excel File Processing

import pandas as pd, warnings
warnings.filterwarnings("ignore", category=UserWarning, module='openpyxl')

Suppresses non-critical warnings and prepares data processing libraries.

Collecting Unique Columns

combined_df = pd.DataFrame()
directory = '/content/extracted_files'
all_columns = set()

for filename in os.listdir(directory):
    if filename.endswith(".xlsx"):
        file_path = os.path.join(directory, filename)
        try:
            df = pd.read_excel(file_path, engine='openpyxl')
            all_columns.update(df.columns)
        except Exception as e:
            print(f"Error processing file {filename}: {e}")

Gathers all unique column headers to create a standardized format for merging.

Complete Script

!pip install pandas openpyxl zipfile36

from google.colab import files
uploaded = files.upload()

import zipfile, io, os

for filename in uploaded.keys():
    with zipfile.ZipFile(io.BytesIO(uploaded[filename]), 'r') as zip_ref:
        zip_ref.extractall('/content/extracted_files')

import pandas as pd, warnings
warnings.filterwarnings("ignore", category=UserWarning, module='openpyxl')

combined_df = pd.DataFrame()
directory = '/content/extracted_files'
all_columns = set()

for filename in os.listdir(directory):
    if filename.endswith(".xlsx"):
        file_path = os.path.join(directory, filename)
        try:
            df = pd.read_excel(file_path, engine='openpyxl')
            all_columns.update(df.columns)
        except Exception as e:
            print(f"Error processing file {filename}: {e}")

for filename in os.listdir(directory):
    if filename.endswith(".xlsx"):
        file_path = os.path.join(directory, filename)
        try:
            df = pd.read_excel(file_path, engine='openpyxl')
            df = df.reindex(columns=all_columns)
            df['filename'] = filename
            combined_df = pd.concat([combined_df, df], ignore_index=True)
        except Exception as e:
            print(f"Error processing file {filename}: {e}")

combined_df.to_excel('combined_file.xlsx', index=False)
from google.colab import files
files.download('combined_file.xlsx')

This is the complete version of the script, combining all steps into a single execution.

Logan Fisher