Posts by Collection

projects

getPaid

Published:

Abstract:

Created a SparkML RandomForest model to predict total employee compensation. Queried data with SparkSQL, ran PySpark scripts to run EDA, pre-process data, and train model achieving with 0.98 R2 score.

Citation:


Cancer Detection as a Microservice

Published:

Abstract:

Trained a Voting Classifier to predict cancer type with the Wisconsin Breast Cancer Dataset. Hosted the model as a Flask-based microservice running within a Docker container.

Citation:


Product Shipping Status Classifier

Published:

Abstract:

Created a RandomForest model to predict issues in the product supplychain which might delay product shipment to customers. Employed PCA and LDA optimization for visualization and dimension reduction. RandomForest model achieved accuracy of 97% (with PCA) and 92% (with LDA).

Citation:


Interpretable Parking Ticket Location Classifier

Published:

Abstract:

Built a SparkML RandomForest classifier to predict the police-precinct of ticketed vehicles using the NYC parking-ticket dataset. Used PySpark for parallel computation. Achieved 98% accuracy, and identified police precincts with anomalous ticketing practices.

Citation:



LLM-as-a-SupremeCourt-Judge

Published:

Abstract:

Developed an AI framework with LLMs (GPT API, fine-tuning, chain-of-thought reasoning) to evaluate 200+ Supreme Court cases, revealing how factors such as panel size, prompting strategies, and ideology personas influence alignment between LLM reasoning and human judges.

Citation:


Detecting Deception: Intelligent Systems for Fighting Misinformation

Published:

Abstract:

Designed a logic-gated fact-checking system integrating RoBERTa classifiers with contradiction detection, validated on 7,200+ claims. Improved explainability and reliability by maintaining 83% accuracy while enabling selective overrides for low-confidence predictions.

Citation:



The Sound of Suffrage - Modeling Gender and Power in Parliamentary Speech

Published:

Abstract:

Developing NLP pipeline over 200 years of Hansard debates (1803–2005) using LLMs to study gender bias, framing, and speaker dynamics. Applying representation analysis and framing detection to uncover temporal shifts in discourse, generating insights for bias and fairness in AI systems.

Citation: