LA County Small Business Finder

Overview

The purpose of this project was twofold: to uncover patterns among small businesses within Los Angeles County and to develop a user-friendly dashboard enabling users to easily search for such businesses. Users are provided with the capability to filter businesses based on city, zip code, or business type. Additionally, I analyzed the age demographics, which can provide insights into whether a small business is situated in an area aligned with its target demographic

Data Sources

To gather comprehensive data, I utilized sources such as the Office of Finance, U.S. Small Business Administration, and the U.S. Census Bureau.

Data Cleaning

Data Cleaning Notebook in R

Active Business

Removed invalid/null: Zip codes, NAICS business desriptions, and business names. I decided to keep invalid coordinates values to not impeded data analysis.

Small Business

Removed invalid/null zip codes, combined contact names from columns marked as 'unknown', and removed miscellaneous columns.

California Demographics

Kept relevant column names for the age groups in California zip codes, extracted zip codes from geographic area names, and dropped any rows containing null values.

Future Work

Improving data quality

While attempting to link the Small Business dataset with the Active Business dataset, I encountered several challenges. The method I used to establish this connection was not efficient. I resorted to creating a unique key by combining the business name and street address. However, this approach limited me to utilizing only 10% of the small business dataset on the map.

This step is critical because the Active Business data contains NAICS business descriptions that categorize businesses effectively along with business location coordinates. In contrast, the small business dataset relies on user-inputted business descriptions, resulting in a plethora of unique descriptions that hindered my ability to group businesses based on these descriptions.

To address this issue, I am considering two potential solutions. One approach involves using a Language Model (LM) to automatically categorize the existing small business descriptions. Alternatively, I could explore the option of obtaining a dataset with more accurate business descriptions.

Gathering data from other counties

Another valuable enhancement would involve retrieving small business data from other counties in California. This broader scope would provide a comprehensive view of small businesses across the state. Furthermore, it would empower users to locate nearby small businesses in any zip code of their choice.