About

Data Science is my passion.I love to play with data and using it to help in making better decisions.I am Graduate student at San Jose State University. I have worked on projects which included cleaning, preparing & analyzing data, building and testing statistical models, predictive analysis based on historic data using language such as R, Python, SQL with data visualization software like Tableau.

Skills I Have

Mixed background with Statistics, Mathematics and Business

  • Python,SQL,R
  • Microsoft Access, MS SQL server, MySQL, PostgreSQL
  • Linear Regression, Logistic Regression, Decision Trees Classification, Random Forests, Time Series Forecasting, K-Mean Clustering, Ensemble Method, Naive Bayes, kNN,NLP
  • Pandas,Numpy,Sklearn,Matplotlib, Seaborn,Spacy,Nltk
  • Tableau, Power BI, MS Excel, Google Sheets

Projects

Developed end-to-end Machine Learning projects to solve real-world business problems.

Sentiment Analysis with Machine Learning(NLP)

Objective : To determine the sentiment of a given review. (Negative/Positive)

• Performed Sentiment analysis on more than 100K rows of Amazon fine food reviews with span a period of more than ten years.
• Refined the data with preprocessing methods such as stop words removal ,lemmatization, etc.
• Implemented different vectorization techniques like count vectorizer, bigram-ngram, and tf-idf to encode the text to numeric data.
• Developed the models that were up to 87% efficient to determine the sentiment of a given review.
Link to GitHub Repository

Air Fare Prediction

A Regressor model which predict the airfares (Random Forest Regressor,Label Encoder,Mutual Information)

• Performed data preprocessing, feature engineering, feature selection over 100k rows of data with more than 30 features to identify the features that affect the target variable Price the most.
• Trained Machine Learning models to predict the prices of flight tickets for various airlines, also compared theirperformance and hyper-tuned models which were approximately give 85% R-square value.
Link to GitHub Repository

Data Scientist Salary Estimator

Created a Model that estimates data science salaries to help data scientists negotiate their income when they get a job.

• Performed Exploratory Data Analysis(EDA) on the data set to understand it and make some accurate discovery.
• Engineered features from the text of each job description to quantify the value companies put on python, excel, aws, and spark.
• Optimized Linear, Lasso, and Random Forest Regressors using GridsearchCV to reach the best model.
• The Random Forest model far outperformed the other approaches on the test and validation sets with $ 11K~MAE
Link to GitHub Repository

911 Emergency Calls Analysis EDA

911 is an emergency telephone number for the North American Numbering Plan (NANP). Analysing emergency calls dataset and discovering hidden trends and patterns will help in ensuring that the emergency response team is better equipped to deal with emergencies.

Considering road accidents, fire accidents etc, high numbers in specific areas indicate that there is a high demand for ambulance services in those areas. Road accidents in some areas might be due to road conditions which need to be improved. High frequency of emergencies due to respiratory problems might be due to harmful pollutants in the air in that specific area. Association rule mining will thus help in discovering such patterns. The dataset contains Emergency 911 calls in Montgomery County located in the Commonwealth of Pennsylvania. The attributes chosen include: type of emergency, time stamp, township where the emergency has occurred.
Highlights:
• Emergency calls has been made from total 197 zip codes.
• Maximum number of emergency calls (43075) have been received from Norristown township having zip code 194010
• About 16.8% (7233/43075) emergency calls in this area are related to vehical accidents only. So this area needs to improve in this field to avoid such cases, like reckless driving, driving in bad weather conditions, not stopping while the red light is running etc.
Link to GitHub Repository

Contact

Feel free to connect and start a conversation with me or reach me at :


  • LinkedIn
  • Github
  • Email