Overview
The purpose of this project was twofold: to uncover patterns among small businesses within Los Angeles County and to develop a user-friendly dashboard enabling users to easily search for such businesses. Users are provided with the capability to filter businesses based on city, zip code, or business type. Additionally, I analyzed the age demographics, which can provide insights into whether a small business is situated in an area aligned with its target demographic
Data Sources
To gather comprehensive data, I utilized sources such as the Office of Finance, U.S. Small Business Administration, and the U.S. Census Bureau.
Data Cleaning
Data Cleaning Notebook in RActive Business
Removed invalid/null: Zip codes, NAICS business desriptions, and business names. I decided to keep invalid coordinates values to not impeded data analysis.
Small Business
Removed invalid/null zip codes, combined contact names from columns marked as 'unknown', and removed miscellaneous columns.
California Demographics
Kept relevant column names for the age groups in California zip codes, extracted zip codes from geographic area names, and dropped any rows containing null values.
Analysis
Data Analysis Notebook in RSmall Business count vs Population(Zip code)
After obtaining the number of small businesses within a specific zip code and comparing it to the population of that same zip code, I conducted a correlation analysis. The results indicate a relatively weak correlation between the population and the amount of small businesses.
Small businesses established by year in the LA County
The graph clearly illustrates exponential growth in small businesses until the year 2003, after which there was a noticeable decline. The reasons for this decline can only be speculated upon, including factors such as the growth of large corporations, buyouts of small businesses, or insufficient funding. I was not expecting to see this, which, to me, further underscores the importance of going the extra mile to support small businesses.
Future Work
Improving data quality
While attempting to link the Small Business dataset with the Active Business dataset, I encountered several challenges. The method I used to establish this connection was not efficient. I resorted to creating a unique key by combining the business name and street address. However, this approach limited me to utilizing only 10% of the small business dataset on the map.
This step is critical because the Active Business data contains NAICS business descriptions that categorize businesses effectively along with business location coordinates. In contrast, the small business dataset relies on user-inputted business descriptions, resulting in a plethora of unique descriptions that hindered my ability to group businesses based on these descriptions.
To address this issue, I am considering two potential solutions. One approach involves using a Language Model (LM) to automatically categorize the existing small business descriptions. Alternatively, I could explore the option of obtaining a dataset with more accurate business descriptions.
Gathering data from other counties
Another valuable enhancement would involve retrieving small business data from other counties in California. This broader scope would provide a comprehensive view of small businesses across the state. Furthermore, it would empower users to locate nearby small businesses in any zip code of their choice.