Renata Thomazelli - Portfolio

About me

Hi! I'm Renata — a curious mind who turned spreadsheets into dashboards and questions into models. With over 15 years of experience connecting data with real business decisions, I started my journey in the Tax domain, where I created performance dashboards, trained teams, and simplified workflows through automation — even before I knew what ETL was! When I became a mother, I decided to reinvent myself. I moved into tech and fell in love with programming, data storytelling, and Machine Learning. Today, I bring together my business background, analytical mindset, and developer skills to create real impact. Whether building dashboards or predictive models, my goal is the same: make data clear, human, and powerful.

I'm looking for a position as a Data Analyst, a Junior Data Scientist or a Python Developer

A little about myself.

e-mail: recalesi@yahoo.com.br
Country: Brazil
City: Sao Paulo

A little more about me: I'm married and have two daughters.
Hobby: I play piano

“I am a tenacious and lifelong learner, fueled by enthusiasm to collaborate with the company and drive toward achieving exceptional results.”

Portfolio

Welcome to my portfolio — a collection of data-driven projects that demonstrate my ability to transform complex problems into actionable insights through Python, Power BI, and Microsoft Fabric.

These projects span Machine Learning modeling, regression analysis, time management optimization, clustering with PySpark, and advanced data storytelling — all built with libraries such as pandas, numpy, matplotlib, seaborn, scikit-learn, and skmultilearn. Each initiative reflects not only technical skill, but also a real-world focus and business context.

I’d like to thank Data Science Academy for their guidance and inspiration throughout my learning journey.

If you’re curious to explore dashboards, notebooks or code, you’ll find the links to my GitHub and project pages below. Enjoy the data-driven dive!

Individual Projects

Machine Learning projects with Python
- Multilabel classification with amphibians species.
- Linear Regression with Bikes in Seoul.
Power BI Projects
- Linear Regression with Bikes in Seoul
- Analysis of the Amphibians Project
Microsoft Fabric Projects
- Optimizing My Study Routine with Microsoft Fabric
  A Data-Driven Journey to Mastering Time Management
- Clusterization Analysis with PySpark
  Clusterization Analysis in Microsoft Fabric

Group Projects

Commodities Agrícolas Inteligentes
- Linear regression with wheat commodity. - Este projeto está em português.

Case Study Amphibians Multi-Label Classifier

Introduction

This project aims to identify the presence of amphibians in the vicinity of two road construction sites in Poland. Our objective is to predict the presence of various types of amphibians near these construction sites and assess the environmental impact on these animals.

font:www.wiltshirewildlife.org/great-crested-newt-ponds

Problem Description

The dataset provided by the UC Irvine Machine Learning Repository includes information on buildings, reservoirs, urban or rural areas, reservoir surface characteristics, water reservoir types, the number of reservoirs, minimum distances from buildings and roads, reservoir maintenance status, and shore types.

One of the initial challenges was understanding the different types of amphibians provided in the dataset. As I lack expertise in amphibians, I had to familiarize myself with the fact that the dataset doesn't just include frogs but also various types of frogs, newts, and toads. Additionally, it was important to recognize that each species prefers a specific type of environment.

Another challenge was comprehending the information in the dataset and understanding the meaning of each feature. Upon closer examination, it was surprising to find discrepancies between the expected values and the actual values provided. For instance, for the variable "TR," we expected values ranging from 'a' to 'j' or from 1 to 10. However, the dataset included different numeric categories such as ‘1’, ‘5’, ‘12’, ‘14’, ‘15’, ‘11’, ‘2’, and ‘7’ for the TR feature, with no accompanying information regarding their meanings. These numbers likely represent internal codes used by the authors of the dataset to denote water types, but without a clear key, it was challenging to interpret their significance. This lack of clarity hindered our ability to explore certain features in more detail. Such preliminary analysis is crucial before constructing a model.

Bar Plot by type of Amphibians — The bar chart provides information about the number of amphibians divided into species indicators during the construction of roads (Road A and Road B) in Poland. Overall, frogs are the most representative amphibians, followed by toads, while the newt is the least representative species indicator. Among frogs, the most common type is the brown frog, with 148 species, representing 45% of all frogs. Additionally, green frogs account for 108, and the remaining three frogs contribute 71, totaling 54% of the frog population. Regarding toads, there are 124 units, making it the second-highest number on the chart. Interestingly, the fire-bellied toad has the same unit count as the common newt, both with 58 units. The least representative figure is the Great Crested newt, with only 21 units

Number of Amphibian's Class by Type of Reservior

Number of water reservoirs vs surface of the water reservoir less 20000 - Frogs

Methodology

In our feature engineering process, we performed numerical and categorical transformations and created new features and classifications to aid in the analysis of amphibian types. Before constructing our model, we visualized relationships and constructed charts to better understand the data. For this project, we utilized the Binary Relevance scheme with a Random Forest model, assuming that all class labels are independent of each other. We employed pandas for data manipulation, matplotlib and seaborn for data visualization, and sklearn for preprocessing the data using MultiLabelBinarizer. Additionally, we built a model using skmultilearn’s Binary Relevance technique with a Random Forest model. Our workflow involved understanding the data, exploring variable categories, transforming variables, and optimizing memory usage. We addressed missing values and unique value considerations, converting categorical types to binary representations.

Analysis and Results

The model achieved a Total Hamming Loss of 0.25 and an F1 Score of 0.69. This result is satisfactory, especially when compared to other schemes such as Label Powerset and Classifier Chain, which achieved 29.76% and 27.38% respectively. Our ROC curve indicates good coverage, with values close to 1. Thus, the Random Forest with Binary Relevance Scheme demonstrated superior performance for our classification problem.

Roc Curve of two types of Amphibians on test

Roc Curve of two types of Amphibians on train

Lessons Learned

This project marked my first experience with multilabel classification. As a result, I now feel more comfortable tackling other multilabel classification problems. I am confident that this project has significantly contributed to my knowledge in this area.

You can see details about this project on my page on the Medium site.Clique aqui

Project information

Category: Machine Learning MultiLabel Classification Problem
Client: UC Irvine Machine Learning Repository
Project date: August, 2023
Project URL: www.github.com/Renata-Thomazelli/amphibians
Medium URL: www.medium.com/@recalesi/a-multi-label-classification-with-machine-learning-dffc986fe885

Case Study Rental Bikes in Seoul a Linear Regression

Introduction

Bike-sharing systems have become an essential part of urban transportation. Accurate predictions of bike rental demand can help optimize fleet management, improve user experience, and ensure a stable supply of rental bikes.

Our dataset comprises hourly records, including weather data (temperature, humidity, etc.) and the total number of bikes rented. Each observation represents a specific hour of the day.

Our goal is to build a predictive model that accurately estimates the bike rental count for each hour based on various factors. This project is not finished yet. What you will see on my GitHub page is a work-in-progress.

Rental Bikes in Seoul — font:https://10mag.com/seoul-bike-rental/

Data analyzed with Power BI

Data analyzed with Matplotlib and Seaborn

Glacing at each categorical feature distribution

Looking for Outliers

The Machine Learning' Models Plots

Forward Selection Linear Regresson — A Forward Linear Regression Plot by Month

Foward Selection Linear Regression Clipped by Month

A little introduction to show you the Web Application in action

To achieve this, I used Linear Regression. The user inputs data about the calendar, weather, and hour, and. The trained model processes this information to estimate the number of bikes available.

Project information

Category: Linear Regression
Client: Seoul Bike Sharing System
Project date: April, 2024
Project URL: https://github.com/Renata-Thomazelli/seoul_bikes

A Data-Driven Journey to Mastering Time Management

🧠 Case Study: A Data-Driven Journey to Mastering Time Management

How I Turned My Study Routine into a Personal Analytics Project Using Microsoft Fabric and Power BI

Background

During my postgraduate studies in Data Science, I found myself overwhelmed by complex subjects like mathematics and statistics. I realized that my biggest challenge wasn’t just the content—it was the way I was managing my time. I wasn’t studying efficiently, and I needed a smarter approach.

The Challenge

Like many learners, I struggled with distractions and an uneven study schedule. Traditional calendars and to-do lists weren’t cutting it. I wanted more than reminders—I wanted real insights into how I was spending my time and whether I was allocating it wisely across subjects.

The Solution

Applied Pomodoro and Timeboxing techniques to improve focus.
Retrieved calendar event data using Microsoft Graph API connected to my business account for secure access.
Processed the data using Python and PySpark in Microsoft Fabric Notebooks.
Organized the data into a medallion architecture with Bronze, Silver, and Gold layers.
Cleaned and transformed data with Power Query and built semantic models.
Queried the information with SQL Analytics Endpoint and visualized everything in Power BI dashboards.

What I Gained

A clear understanding of how I spent my study time.
Insight into where I faced the most doubts or needed extra help.
A reusable framework for tracking productivity and learning progress.
Hands-on mastery of Microsoft Fabric and Power BI, demonstrated in a real-world use case.

The Outcome

By transforming my personal challenge into a professional project, I not only optimized my study routine but also created a powerful portfolio piece. It’s a practical example of how data can improve everyday life—and how technology can support continuous learning.

🔧 Techniques Used

Data extraction using Microsoft Graph API
Data manipulation and transformation with Python, PySpark, and Power Query
Layered architecture organization (Bronze, Silver, Gold)
Analytical dashboard creation with Power BI
Analysis with SQL and integration with Dataflow Gen2

🔐 Access to the TimeManagement Project

The TimeManagement Dashboard is a personal initiative focused on time management and study routine tracking, developed using Power BI, Microsoft Fabric, and Microsoft 365.

Because this project contains sensitive data and insights that reflect my personal journey, the repository remains private on GitHub. However, I’m open to sharing access with individuals who are genuinely interested in exploring the structure, visualizations, and solutions applied.

📩 If you’d like to view the project, feel free to reach out to me directly. I’ll be happy to generate a temporary access token with read-only permissions so you can explore the dashboard and related materials.

Access is granted selectively and securely, with the goal of preserving my privacy while encouraging meaningful exchanges with professionals, students, and enthusiasts in the field.

🎓 Video Series: Managing My Time with Microsoft Fabric & Power BI

This video series documents the technical journey I undertook to optimize my study schedule through data engineering and visualization tools. Each video highlights a unique stage of my process.

📘 Video 1: Retrieving Calendar Data with Microsoft Graph

I used Microsoft Graph and Graph Explorer to access calendar data from Outlook. Queries were refined using Postman, ensuring correct time zone settings. The results were saved as a JSON file and uploaded to a Lakehouse in Microsoft Fabric.

📘 Video 2: Data Organization and Layering in Microsoft Fabric

I introduced my data workspace and structured raw calendar data using Bronze, Silver, and Gold layers. I developed a function to clean and transform data, applied basic feature engineering, and converted a DataFrame to JSON for use in PySpark.

📘 Video 3: Structuring Data with PySpark

Defined a schema and created a PySpark table with key event data. Performed transformations and validated the dataset—then saved the output to the Silver layer and prepared it for enhancement in the Gold layer.

📘 Video 4: Data Extraction and Category Analysis

Extracted large datasets using a PySpark notebook. Created new columns and performed category-based analysis to uncover insights on time usage, calendar events, and data patterns.

📘 Video 5: Building Efficient Calendars for Study and Learning

Transitioned from a unified calendar to separate ones focused on studying and learning English. Uploaded JSON files, processed them using PySpark, and refined the setup with a clean, three-step method—later combining the tables in Power Query.

📘 Video 6: SQL Analysis and Power Query Deep Dive

Started with SQL analysis in Microsoft Fabric and moved into Power Query transformations. I selected relevant columns, generated new ones, and validated the transformation logic for calendar event data.

📘 Video 7: Advanced Transformations with Dataflow Gen2

Performed detailed transformations using Dataflow Gen2—changing column types, extracting strings, filtering rows, renaming fields, and building targeted datasets for Power BI dashboards.

📘 Video 8: Power BI + Microsoft Fabric Integration

Connected Power BI Desktop directly to my Lakehouse in Microsoft Fabric. Imported tables to begin analysis and created visualizations based on structured calendar and time usage data.

First page of my personal time management dashboard — built with Microsoft Fabric and Power BI This project helps me track and optimize my study routines. Still in progress, with Planner integration and more data updates coming soon!

Project information

Category: Microsoft Fabric
Client: A Personal Project of Time Management
Project date: April, 2025
Project URL: https://github.com/Renata-Thomazelli/TimeManagement

Clusterization Analysis

Introduction

This notebook explores different ways to group data into meaningful clusters, helping to identify patterns and structure within complex information. The goal is to find the most effective clustering model—one that balances accuracy, efficiency, and practical usability.

To achieve this, we tested several widely used approaches, including K-Means, Bisecting K-Means, and Gaussian Mixture models. By comparing these methods, we aimed to understand which one performs best in organizing data meaningfully.

Throughout the analysis, we evaluated key indicators that measure how well the clusters are formed, such as the Silhouette Score and Davies-Bouldin Index. Additionally, we assessed processing times to determine how efficiently PySpark handles clustering tasks.

All of this work was executed within Microsoft Fabric, a unified analytics platform that enabled scalable data processing and simplified orchestration of clustering experiments.

This notebook serves as a comprehensive overview of our findings, offering visual representations, comparisons, and insights into how these clustering techniques behave under different conditions. Ultimately, the goal is to provide a clear understanding of the strengths and trade-offs of each model, supporting informed decisions when applying clustering to future challenges.

📘 Introduction to Clustering

📘 Clustering with Bar Chart

📘 Clustering with Line Chart

📘 Clustering with Bar Chart Over Time

Project information

Category: Clusterization
Project date: Jun, 2025
Project URL: https://github.com/Renata-Thomazelli/ClusterizationAnalysis

🚴 Real-Time Data Engineering with Microsoft Fabric – BikeTelemetry Project

In this project, I orchestrated a full-cycle real-time analytics pipeline for urban bike telemetry using Microsoft Fabric and KQL. The initiative involved designing and implementing a robust event house to stream incoming data, refining ingestion strategies, and engineering transformations to ensure data quality and reliability.

🧩 Key Accomplishments

✅ Event House Configuration: Initiated a Fabric Eventstream to capture telemetry events from the urban bike system in real time.
🧠 Data Cleaning and Standardization: Engineered a transformation pipeline to fix inconsistencies in neighborhood naming using dynamic join logic.
🏠 Lakehouse Deployment: Established a Lakehouse to consolidate cleaned records and serve as a durable analytical store.
⏳ Temporal Gap Resolution: Investigated and resolved a hidden ingestion gap between manual append and policy activation, isolating and correcting a 366-record delta.
📊 Validation & Reconciliation: Implemented checks to confirm full sync between raw and cleaned datasets (Bikes vs BikesCleaned) — with zero data loss post-fix.

This hands-on project showcases my proficiency in real-time data flows, temporal diagnostics, and query optimization using KQL — alongside practical problem-solving in stream ingestion and lakehouse architecture.

📊 Bicycle EventStream Dashboard

This dashboard was created using data from the Bicycling EventStream sample available in Microsoft Fabric. Although the dataset is simulated, it mirrors real-time behavior from actual bike-sharing systems—making it ideal for testing ingestion flows, building dashboards, and writing advanced KQL queries.

The dashboard is divided into two pages. In this first section, you'll find a breakdown of data records, including ingestion volume over time, distribution by neighborhood, and detailed record patterns. These visuals demonstrate how real-time data—whether simulated or live—can be transformed into actionable insights for operational analysis and decision-making.

📘 Bicycle EventStream Dashboard part 1

🚲 Citywide Dock Status Analysis

In this second chapter of our Bicycle EventStream Dashboard, we shift our focus from ingestion volume to actionable patterns of behavior. Powered by Microsoft Fabric and real-time streaming data, this section dives into how bike dock availability fluctuates across London — from quiet neighborhoods to mobility hotspots.

Using two dynamic visuals, we reveal the operational pressure points:

Visual 1: A citywide overview showing which dock stations are full or empty most frequently, offering a spatial lens on urban accessibility.
Visual 2: A detailed breakdown of bikepoints with the highest saturation and scarcity rates — highlighting the specific locations where intervention is most urgent.

These insights go beyond observation. They form the backbone for decisions in redistribution, station expansion, and fleet optimization — making the case for data as a frontline urban planning ally.

📘 Bicycle EventStream Dashboard part 2

Project information

Category: Real-Time Intelligence Eventstream with Microsoft Fabric
Client: Eventstream Microsoft Fabric
Project date: August, 2025
Project URL: https://github.com/Renata-Thomazelli/BicycleEventStreamDashboard

Commodities Agrículas Inteligentes

Skills

Technical proficiency across data science, programming, analytics and enterprise tools.

🔍 Programming & Data Science

Python 90%

SQL 75%

Machine Learning 60%

Exploratory Data Analysis & Forecasting 80%

Multivariate & Statistical Analysis 50%

Scikit-learn & ML Algorithms 60%

MongoDB 40%

C# with .NET 40%

R 30%

📊 BI, Cloud & Enterprise Tools

Power BI & DAX 70%

Looker Studio 40%

Microsoft Azure 60%

Microsoft Fabric 70%

Amazon SageMaker 40%

Databricks 60%

Docker & Kubernetes 40%

ETL & Feature Engineering 70%

Data Storytelling & Dashboarding 70%

SAP (FI / Localization) 80%

Git & Version Control 60%

Excel & Financial Modeling 80%

Education & Certifications

🎓 Academic Background

Postgraduate Diploma in Data Science (2024–2025) – DSA
Data Scientist 4.0 (2023–2025) – DSA
MBA in Governance in Tax Planning (2015–2017) – Infoco
Bachelor of Business Administration – Finance (2003–2007) – UMC

📜 Licenses & Certifications

SQL for Data Analysis and Data Science – DSA (2025)
Data Scientist 4.0 Course - DSA (2025)
AI Applied to Finance – DSA (2023)
Deploying ML Models – DSA (2023)
Cloud Computing for DS & AI – DSA (2022)
Programming and ML in C# e .Net Core – DSA, Agu.2022
Cloud Computing for DS and AI – DSA, Jul.2022
Advanced Power BI for Data Analysis – DSA, Jun.2021
Cloud Storage and Processing with Microsoft Azure – DSA, Oct.2021
MongoDB for Python – MongoDB University (2021)
Genetic Algorithms – Udemy (2021)
SAP FI Academy – Ka Solution (2011)

Experiences

Professional Experience

Python Developer & Data Analyst

Dec 2021 - May 2023

Xcelis, Remote

Agile methodologies
Working with CLEAN Code
Automated tests, CI/CD pipelines
Unit testing, integration testing
Building visualizations with PowerBI and DAX and M Language
Troubleshooting
Working with remote team

Python Developer

2020 - 2021

RThomaz Tecnologia de Dados MEI, Sao Paulo, SP

Worked as a freelancer in my own company.
Employed Python, pandas, and C# for data munging and dashboard creation.
Developed Power BI dashboards.
Employed Python, pandas, Matplotlib, and Seaborn to construct charts.
Applied Scikit-learn to construct Machine Learning Models.
Utilized standard database tools such as SQL in Microsoft SSMS.
Applied Genetic Algorithms.

Data Analyst

2020 - 2021

Project: Price of Wheat Commodity around the world , Sao Paulo, SP

Analyzed global time series data encompassing temperature, precipitation, and worldwide wheat production, exportation, and importation.
Leveraged a suite of data analysis and visualization tools, including Python libraries (pandas, numpy, matplotlib, chardet), Jupyter Notebook, Colab, and Power BI, to conduct in-depth data analysis and create impactful visualizations.

Recommendations

“Renata participated with me in an award-winning data science project in the Data Science Academy 2021 data science mentoring program. An excellent professional, dedicated, committed to results, and always seeking knowledge!”

Hi, I'm Renata and I'm looking for jobs

About me

I'm looking for a position as a Data Analyst, a Junior Data Scientist or a Python Developer

Portfolio

Individual Projects

Group Projects

Case Study Amphibians Multi-Label Classifier

Project information

Case Study Rental Bikes in Seoul a Linear Regression

Data analyzed with Power BI

Data analyzed with Matplotlib and Seaborn

Looking for Outliers

The Machine Learning' Models Plots

A little introduction to show you the Web Application in action

Project information

A Data-Driven Journey to Mastering Time Management

🧠 Case Study: A Data-Driven Journey to Mastering Time Management

How I Turned My Study Routine into a Personal Analytics Project Using Microsoft Fabric and Power BI

Background

The Challenge

The Solution

What I Gained

The Outcome

🔧 Techniques Used

🔐 Access to the TimeManagement Project

🎓 Video Series: Managing My Time with Microsoft Fabric & Power BI

📘 Video 1: Retrieving Calendar Data with Microsoft Graph

📘 Video 2: Data Organization and Layering in Microsoft Fabric

📘 Video 3: Structuring Data with PySpark

📘 Video 4: Data Extraction and Category Analysis

📘 Video 5: Building Efficient Calendars for Study and Learning

📘 Video 6: SQL Analysis and Power Query Deep Dive

📘 Video 7: Advanced Transformations with Dataflow Gen2

📘 Video 8: Power BI + Microsoft Fabric Integration

Project information

Clusterization Analysis

📘 Introduction to Clustering

📘 Clustering with Bar Chart

📘 Clustering with Line Chart

📘 Clustering with Bar Chart Over Time

Project information

🚴 Real-Time Data Engineering with Microsoft Fabric – BikeTelemetry Project

🧩 Key Accomplishments

📊 Bicycle EventStream Dashboard

📘 Bicycle EventStream Dashboard part 1

🚲 Citywide Dock Status Analysis

📘 Bicycle EventStream Dashboard part 2

Project information

Commodities Agrículas Inteligentes

Skills

🔍 Programming & Data Science

📊 BI, Cloud & Enterprise Tools

Education & Certifications

🎓 Academic Background

📜 Licenses & Certifications

Experiences

Professional Experience

Python Developer & Data Analyst

Dec 2021 - May 2023

Python Developer

2020 - 2021

Data Analyst

2020 - 2021

Recommendations

“Renata participated with me in an award-winning data science project in the Data Science Academy 2021 data science mentoring program. An excellent professional, dedicated, committed to results, and always seeking knowledge!”

Rafael Schema, M.Sc. - Data scientist