How I Met My Data Science Mentor

Blossom Onunekwu
10 min readAug 5, 2020
Learning about KMeans clusters in a Jupyter notebook

This is part 2 of How a black woman with no CS degree or bootcamp is teaching herself data science. If you haven’t read it, don’t forget to check it out here.

In this blog post, I’ll disclose how I met my data science mentor who ALSO happens to be a black woman!

Remember the woman I met at the Women in Data meet-up in November? Well, her name was Susan. While I was at the meetup, Susan invited me and a few others to join another organization, ATLytiCS. The tagline for ATLytics is Using Data Science for Good. And the reason it’s capitalized weirdly is because of ATLanta and ( C)community (S)ervice is at the end. Many of our projects were health-based, meaning we’d be using data to help solve a public health issue in Atlanta. There was an opioid project, a homeless population project, and near the end of the year, ATLytiCS was hosting a hackathon. I’ve never participated in a hackathon, but that’s not what attracted me to ATLytiCS. Using data to solve health problems was music to my Public Health bachelor ears. The hackathon and $10,000 prize money were just icing on the cake.

The hackathon was in late December/early January of this year, I believe. And it required us to work in teams and design a marketing campaign for Porsche. Using consumer data, teams were to design a campaign to help Porsche sell their latest electric vehicle to the Atlanta market.

I think it goes without saying that my team lost.

But I thought ATLytics was a great organization to join. So I made sure to do more than just show up. I signed up to be part of the Advocacy team because I talk too much and I love presenting.

And while I was basically just waiting for impactful things to do in the organization, I continued my daily routine of learning data science. At this point, I was still hashing out the lessons from my Udemy course I talk about in my last blog post.

My daily routine looked something like this:

6 AM-6:15 — Wake up

6:15–7:30ish — Data Science

7:30–9 AM — Do chores around the house, tidy up my living area, check Feedly

9 AM–10:15 AM — Gym

And then work on projects for my clients for the rest of the day until about 11–12 pm. Oh, and food and other breaks sprinkled in there too (while my name is Blossom, I haven’t quite learned the art of photosynthesis).

My schedule changed a bit once I picked up a part-time job. Staying at home for so long behind a computer was nice, but it was getting a little hectic at home and lonely. So I found a job at Staples mid-February. But I still made sure to wake up and get at least one hour of data science in.

Sometimes, however, I’d have to do a shift that starts at 8, which meant I had to either go to the gym earlier or not at all. And if I went earlier, that would mean no data science practice for the day.

Even though I wasn’t practicing data science as much because of work, I still did my best to remain vigilant about the activities going on with ATLytiCS. I volunteered to work on a project involving the opioid crisis in Atlanta, but because I was still in my introductory levels of Python and data scraping, I wasn’t very confident about my capabilities and let other people do the heavier lifting. I was much better at preparing and presenting presentations for ATLytics.

So I made sure to stick to the Advocacy team and join in on their bi-monthly ZOOM calls. Honestly, in the beginning, it was rough because they had their meetings at 10 AM on Saturdays and that’s when I was asleep! I would (set an alarm and) join in on their biweekly advocacy chats to learn about the latest presentation we should prepare for. Near the end of the meetings, each of our team members would brief us about what they’re going through.

I was slowly but surely relearning how I could fit data science back into my schedule. Everything was going okay, and then bam!

Pandemic.

The good thing about the pandemic is that my hours at Staples got cut drastically. But the bad thing is that it was becoming even more difficult to commit an hour a day to data science. I barely had the motivation to get up early in the morning at the butt crack of dawn. And don’t even get me started on working out.

Working out started to become more like a chore and I absolutely hated at-home workouts. So much, that I eventually just stopped working out.

And when I stopped working out, I’m pretty sure I stopped trying to learn data science daily as well. I believe it became more of a weekly thing to do, or maybe twice a week (I know, it sounds horrible). Even though we had so much time in the world to stay inside and do work, it just didn’t feel right to me. It was hard for me to stay disciplined when I was in the house almost all the time and in the same place doing the same things.

The world had stopped. Even in ATLytiCS, a few of our presentations had gotten either canceled or postponed because we were in the swings of a pandemic.

My mentor was right under my nose

Ok, Blossom, great exposition, but how in the world did you find your mentor?!

During the end of one of our advocacy meetings where we went in the virtual circle to tell each other how we were doing, I was honest with them. I told them I was learning data science but was having trouble disciplining myself. I thought I was a master at self-discipline until this pandemic hit.

At that time, Melissa, another member of the advocacy team, shared that she works as an actual data scientist. And I’d known who Melissa was before the conversation, but it was just now clicking to me that I actually knew a black female data scientist. She was apart of my advocacy team! Right under my nose!

And it’s not like it was a secret that she was a data scientist, either. But I was expecting her job title to be “Data Scientist” and not “Analyst Architect”. I’m learning that people who study data science don’t have to have “Data Scientist” as the name of their role. I guess it’s similar to how you can study chemistry but you can apply to roles that are more than just “Chemist.”

Once Melissa learned about my passion for learning data science, she offered to help me along my lonely journey. She sent out a biweekly invite for us to talk about data science during the week.

It was like a breath of fresh air. I was meeting with her twice a week and would show her what I’m doing in Udemy.

I say it’s great to have another pair of eyes on you when you’re self-learning because you don’t know what you don’t know. There were many times when she gave me tips on being a better data scientist that my course didn’t cover. Like using plt.show() in the Jupyter notebook to get rid of that mumbo jumbo before every visualization. And importing your libraries all at once in the beginning and not right when you need them. These are pointers I wouldn’t have gotten from that Udemy course alone.

But even though she was helping me, she wouldn’t stop to clarify that sometimes there would be even new things that she’s learning from me. So if you’re wondering, Udemy courses aren’t awful. It’s still an affordable source for quality learning.

Building my own data science projects

While my Udemy course did come with capstones and projects of its own, Melissa still encouraged me to venture out and do my own projects using what I’ve learned.

One of the very first models I learned about was linear regression. So I used Kaggle to find a data set that would be appropriate for the model.

Let me tell you. Finding data sets is one of my least favorite things about data science!

It wasn’t too difficult to find one for linear regression, but once we started getting to other more complicated algorithms, it was becoming even more and more challenging to find a good data set. Sometimes it would take me 30 minutes to a whole hour just to find a data set.

I currently get all my data sets from Kaggle. If you all have other places you go to get data, please drop them below. Before, I would type in the name of the algorithm and then filter by dataset to find some data. So for this, I typed in “linear regression.” This way worked until I learned about more advanced models.

Once I found a data set, I would look and briefly skim to see how the uploaders used it for their model. Then I would download the data set and close the window. By the way, don’t do that. Always leave your tabs open. Even if you have 20+ tabs! When Melissa told me that, the RAM-conscious part of me started crying.

You see, in data science, there’s a lot of copying and pasting. And that’s okay. So just in case you have a question on the code you pasted and want to see how the originator used it, you can easily go back to the tab because you didn’t close it.

My mentor told me that once you’re done with the model, THEN you can close the tabs.

I would use what I learned from my Udemy teacher (Jose Portilla) to create my model, referencing, copying and pasting from my notes.

Most of the time, I’d learn that my notes weren’t enough. I was dealing with real data now, and not just ones from the sklearn data library (i.e the iris and titanic data set).

I remember with my linear regression, my data revolved around cars at a dealership. Cars have names, condition status, colors, and other features that aren’t exactly numerical. I tried to feed it into my model once and received an error I never got while working with the datasets by Udemy. It’s because many machine learning models can only use numerical data. It was then that my mentor told me to google “feature engineering”. And from there, “label encoding”. Having a mentor really helps you know what it is you don’t know.

(By the way, if you’re new to data science you might be confused about what I mean by model. I promise I’m not pushing out mini computerized versions of Tyra Banks. I actually cannot explain to you in English what a model is. I just know what it does. But I will break this down in the near future!)

It took a bit of time for me to finish that algorithm, and to be honest, I don’t think it’s even perfect. But like I said in my last blog post, the enemy of perfect is done!

My mentor also encouraged me to push my projects to GitHub. When I’m ready to start applying for jobs, GitHub will serve as a portfolio.

I was wary about putting my work on GitHub, though, because I knew my first model wasn’t neat or descriptive with lots of notes. And I knew there were things I still should have included, but I didn’t. I didn’t want recruiters to see my work and see an unfinished project.

It wasn’t until I came across a job posting MONTHS later that I decided to just bite the bullet and post on Github. After all, Melissa told me earlier that I could always go back and edit a repository (basically a blog post) on GitHub.

You could even leave entire blocks of code followed by an error message and still post it to GitHub! And if recruiters see it, so what? They most likely will even be happy to see that you at least tried.

I honestly don’t know why that idea of publishing unfinished and unpolished work sounded so foreign to me. It’s merely showing what you have so far and editing to make it better. It’s what I do for my own blog posts.

You can check out my first linear regression model here [link].

And by the way, I got the interview! I think that’s enough to celebrate, don’t you think? The interview is next week, so if you follow me on Linkedin, you’ll know if I got the job!

Lessons I’ve learned so far from my data science mentor

Having a mentor can be really helpful when you’re teaching yourself. Besides the technical fun stuff I’ve learned from her, here are a few more golden nuggets of truth Melissa gave me that I think can also be applicable to you:

You are competing against people with STEM degrees, people from India, China, people from boot camps, etc. So you have to be serious and do the best you can.

Nothing you’re doing is unique. As in, I shouldn’t try too hard to write code from scratch all the time because most times, someone has already done it on StackOverflow.

— StackOverflow is your friend.

If you don’t have to code, at least be reading about data science and the different python libraries. Not gonna lie, I haven’t been doing a great deal of this. However, I have been following a lot of nice women in data on Twitter. Does that count?

I suck at googling. Data science involves programming and a lot of programming is just googling.

And lastly, by being a minority (woman) within a minority (African American) in tech, people will look at me as a representation of all black women in data science. So when I’m given opportunities like a job or an internship, I have to put my game face on and stay ready so I don’t have to get ready.

Cheers to you all for making it to the end! I do plan on adding a video series on my YouTube channel, so I will link to that when done.

For now, I’m going to keep writing about my data science journey here. In my next blog posts, I will be translating common techy words like “algorithm” and “model” into English (wish me luck).

If you like what you’re reading, don’t forget to clap it up and share it. I know I’m traveling a path that might not be so well-lit, but I’m hoping this can shine a light on you who may also feel like they’re traveling alone.

Don’t forget to check out part 1: How a black woman with no CS degree or bootcamp is teaching herself data science

My twitter where I follow data babes and retweet job postings

--

--

Blossom Onunekwu

I'm a college and health blogger and freelance writer with a passion for food and sanity. Have a laugh or two with my witty, informative posts!