My Portfolio

Welcome

Welcome to my portfolio! I am a David Mwale, a final-year MSc student in Data Science for Health and Social Care at the University of Edinburgh. I am using this platform to showcase my evolving journey in data science, in doing so intend to use data to to improve health and social care outcomes. Over the past 2 years, I have immersed my self in a continuous learning process, learning foundational concepts, unlearning, and re-learning through practical application. What I have learned, shaped by intensive course work, hands on projects, is much more than i will be able to showcase through this platform. I am experienced in Python, SQL, R programming, and health data analysis, with focus on reproducible research, and ethical data practices. Explore my work, and feel free to connect with me to discuss potential collaborations, role openings in data science, or insights.

My Projects

  • Data Wrangling in R : Cleaning and transforming datasets using tidyverse and other R packages like nanair, and janitor. A quick example is my work in the adjacent with the Iris dataset where i have cleaned the data, and handled missing values.

  • Data Visualization : Creating insightful visualizations with ggplot2 and interactive dashboards. Through out my projects in this portfolio, I use ggplot2 to make visualizations.

  • Statistical Modelling for Epidemiology : Applying statistical models to analyze health data trends. In the stats modelling tab i apply one of the commonly used models - logistic regression to analyse health data, and using other packages like epitools and reportROC.

  • Machine Learning : Building predictive models using tidymodels. A highlight of this project is under the Machine Learning tab where i use both logistic regression and Random Forest models to predict diabetes.

Skills

Languages

  • R : Proficient in data manipulation, visualization and modelling with experience in reproducible workflows using RMarkdown, and Quarto.

  • SQL : Skilled in querying health databases, optimizing joins, sub-queries, common table expressions(CTEs) using SQLite, MySQL, and PostgreSQL.

  • Python: Experienced in pandas for data wrangling, matplotlib/seaborn for visualization.

  • Version Control: Git and Github for collaborative and reproducible research.

  • Microsoft Azure Databricks : Experienced in using MS Azure to create storage containers, resource groups, SQL database, workspace and building data pipelines.