Portfolio

 

My Projects

 

Project 01

 

SEC EDGAR Financial Data Pipeline

Python · pandas · zipfile · requests · SQLAlchemy · MySQL

ETL pipeline that bulk downloads 10 years (1.82GB) of SEC EDGAR financial data across 40 quarterly ZIP files. Files are extracted and renamed locally, filtered for R&D, Net Income, and Revenue, then loaded into MySQL for downstream analysis.

Pipeline Steps
1
Bulk download 40 quarterly ZIP files from SEC EDGAR
2
Extract and rename num.txt and sub.txt from each zip
3
Filter for R&D, Net Income, and Revenue
4
Clean and type-convert data
5
Load into MySQL — financial_data and metadata tables
How to Run
1
Create a MySQL database called aggregated_stock_data
2
Create tables using schema in queries_edgar.sql
3
Fill in your MySQL credentials in the script
4
Run edgbatch_DL&EXT_SQL.py
See it in action →

Data Source

SEC EDGAR Financial Statement Data Sets (2014 Q1 – 2023 Q4)

sec.gov/data-research →
Tools Used
  • Python — pandas, zipfile, requests, SQLAlchemy
  • MySQL
Project 02

 

R&D Expense Spikes Predict Short-Term Price Declines

Python · yfinance · zipfile · requests · SQLAlchemy · MySQL

R&D expense spikes predict short-term price declines with 55–68% accuracy across 1,404 public companies. 1.82GB SEC EDGAR data, ETL from zip to MySQL to Python, enriched with yfinance price data.

Definitions
01
A spike is identified when exactly one quarter's R&D expense exceeds the average of all three quarters for that year.
02
Spike percent is defined by the percentage increase in spending relative to the three-quarter average.
03
Spikes are sorted into buckets: 0–10%, 10–20%, 20–50%, 50–100%, 100%+.
04
Price decreases are measured using stock prices taken 30 days before and after the spike date.
Findings
R&D Spike Magnitude Companies Drop Rate
0 – 10%~280 55.6%
10 – 20%~280 61.1%
20 – 50%~280 67.3%
50 – 100%~280 68.5%
100%+~284 66.1%

Tracking post-spike short-term price behavior across all bands suggests a systematic and repeatable market inefficiency.

Analytical Queries
  1. Ranking the first 10 companies by spike percent
  2. Sorting all companies' average spike percentage into 4 buckets
  3. Comparing the spike percentage to the prior filing date's spike percentage
  4. Finding the running average of spike percentage per company
  5. Finding the max spike percentage per company
Limitations
  • Stock price window of 30 days may predate 10-K filings
  • Only 3 quarters of data available — Q4 not filed in 10-K
  • Data range limited to 2014–2023, not live
  • Some data lost due to ticker symbol resolution issues

A version addressing all listed limitations is planned.

See it in action →
Data Source

SEC EDGAR Financial Statement Data Sets (2014 Q1 – 2023 Q4)

sec.gov/data-research →
Tools Used
  • Python — yfinance, zipfile, requests, SQLAlchemy
  • MySQL
Project 03

 

Personal Finance Tracker

Python · FastAPI · SQLAlchemy · SQLite · APScheduler

Showcasing a fulling working SQLite Database with a mobile app.

 

A mobile expense and subscription tracker with a FastAPI backend. Full CRUD for expenses and subscriptions, automatic recurring expense detection, and spending summaries broken down by day, month, year, category and yearly projections.

Actual App Running on Expo Go
Iphone App
Features
  • Full CRUD for expenses and subscriptions
  • Automatic subscription management
  • Spending summaries by day, month, year, and all-time
  • Category breakdowns across all transactions
  • Recurring expense detection with annualized projections
API Endpoints
1
GET /expenses — retrieve all recorded expenses
2
POST /expenses — add a new expense entry
3
PUT /expenses/{id} — update an existing expense
4
DELETE /expenses/{id} — remove an expense
5
GET /subscription · POST · PUT · DELETE — full subscription management
See it in action →
Tools Used
  • Python — FastAPI, SQLAlchemy, APScheduler
  • SQLite
Project 04

 

S&P 500 Anomaly Detection

Python · scikit-learn · FastAPI · React · Claude AI · yfinance

Real-time anomaly detection across all S&P 500 stocks using Z-Score and Isolation Forest, with AI-generated summaries explaining the likely cause of each anomaly using recent news. Deployed as a full-stack application with a FastAPI backend and React frontend.

How It Works
1
Pulls 90 days of price and volume data for all 500 S&P 500 constituents via yfinance
2
Flags anomalous days using Z-Score (|z| > 3) on both daily return and volume
3
Runs Isolation Forest as a second detection layer high confidence where both models agree
4
Fetches relevant news headlines for each anomaly via NewsAPI
5
Claude AI generates summaries explaining each anomaly
6
Results served via FastAPI and rendered in a filterable, searchable React dashboard
Detection Methods
01
Z-Score measures how many standard deviations a stock's return or volume deviates from its own 90-day history. Threshold set at |z| > 3.
02
Isolation Forest detects outliers by measuring how easily a data point can be isolated from the rest. Catches complex patterns missed by Z-Score alone.
03
High confidence anomalies are rows flagged by both models — approximately 1% of all observations across 500 stocks.
More Info →
Live App

The dashboard shows all high confidence anomalies with their return and volume Z-scores, signal type, and AI-generated summary. Click any row to expand the analysis.

 

Launch LIVE APP →
(Allow one minute to load)

 

Data Source

Yahoo Finance via yfinance — 90-day rolling window, all S&P 500 constituents

GitHub →
Tools Used
  • Python — pandas, yfinance, scikit-learn
  • FastAPI · Render
  • React · Vercel
  • Claude AI (Anthropic)
  • NewsAPI