Projects — Afnitha P AbdulRahman

PUBG Game Winner Prediction

Goal: Predict a player's win placement percentage (winPlacePerc) using in-game performance metrics to identify factors that influence winning probability.

Dataset

Source: Proprietary dataset from Rubixe AI Solutions, 4,436,306 rows × 33 columns. Real-world gameplay statistics. Target: winPlacePerc.

Tools & Techniques

Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, CatBoost. Preprocessing, feature engineering (walkDistance, headshot_rate, totalDistance, healsnboosts), model training and comparison.

Implementation

Exploratory analysis and sampling strategy to handle scale.
Feature engineering to normalize metrics across match types.
Trained CatBoostRegressor, RandomForest and XGBoost, tuned with cross-validation.

Key Findings

Strong positive correlation between walkDistance, damageDealt and winPlacePerc.
Kills and headshot kills are high-impact features. Excessive healing correlates with defensive play.
Best model: CatBoostRegressor, R² = 0.93 (RMSE ≈ 0.08).

Challenges & Solutions

Scale: used representative sampling and optimized Pandas operations to iterate fast.
Match-type imbalance: created normalized features (killsNorm, damageDealtNorm, matchDurationNorm).

Repository: github.com/Afnitha701/PUBG_Winner_Prediction

Credit Score Classification

Goal: Classify customers into Good / Bad credit history to support lending decisions and reduce default risk.

Dataset

Source: Proprietary dataset from GoodCredit Bank via internship. Size: 23,896 rows × 92 columns. Target variable: Bad_label (0 = Good, 1 = Bad).

Tools & Techniques

Python, Pandas, Scikit-learn, Gradient Boosting. Data cleaning, feature selection, SMOTE for imbalance handling, model evaluation using confusion matrix, precision, recall, F1, and Gini.

Implementation

Feature selection and type conversion to address high dimensionality.
Applied SMOTE and robust cross-validation.
Tested Logistic Regression, Decision Tree, KNN, and Gradient Boosting; tuned hyperparameters for best performance.

Key Findings

Gradient Boosting performed best, with accuracy near 93–95% and Gini = 1.0 on the evaluation dataset.
Top predictors: credit limit, current balance, and payment frequency.
EDA showed customers with low past due amounts and stable credit limits have better credit outcomes.

Challenges & Solutions

High dimensionality: used feature selection and removed low-variance fields.
Missing values: median/mode imputation and domain-driven replacements.

Repository: github.com/Afnitha701/Credit_Card_Fraud_Detection

Hospital Stay Duration Prediction

Goal: Predict patient length of stay to help hospitals optimize bed management and resources.

Dataset

Size: 318,438 rows × 18 columns. Mixed numerical and categorical healthcare data from Rubixe AI Solutions internship.

Tools & Techniques

Python, Pandas, Scikit-learn, Matplotlib, Seaborn. Preprocessing included label encoding for ordinal features, scaling, and resampling to address class imbalance.

Implementation

Exploratory Data Analysis to understand length-of-stay patterns by severity, admission type, and ward.
Tried Logistic Regression, Decision Tree, KNN, and ensembles; tuned with GridSearchCV and RandomizedSearchCV.

Key Findings

Emergency admissions and higher illness severity correlated with longer stays.
Department and ward type affect average stay duration significantly.
Ensemble model achieved ≈83% testing accuracy after tuning.

Challenges & Solutions

Class imbalance: applied resampling prior to training.
Overfitting: tuned tree-based hyperparameters to improve generalization.

Repository: github.com/Afnitha701/Hospital_Stay_Duration_Prediction

Sales Performance Dashboard (Power BI)

Goal: Build an interactive dashboard to analyze revenue, customers, and product performance for business decision-making.

Dataset

Synthetic practice dataset with Sales_Data, Customer_Data, Products_Data, Regions_Table. Approximately 5,000 rows across tables.

Tools & Techniques

Power BI Desktop, Power Query, DAX for calculated measures, interactive visuals and slicers for ad-hoc analysis.

Implementation

Designed star-schema model and cleaned data in Power Query.
Implemented DAX measures for Profit, Revenue, and KPI tracking.
Built interactive visuals: bar charts, maps, KPI cards, and slicers for dynamic filtering.

Key Findings

Top customers and regions concentrated most revenue; top products contributed majority of sales.
Distributor channel produced highest revenue share relative to export and wholesale.
Interactive slicers enabled fast ad-hoc answers for managers.

Challenges & Solutions

Missing profit column: created Profit measure using DAX.
Formatting and layout: standardized visuals and alignment to improve consumption by business users.

Repository: github.com/Afnitha701/Power-BI-Sales-Performance-Dashboard