How to Find Unique Data Science Project Ideas That Make Your Portfolio Stand Out

Author:Murphy | View: 21978 | Time: 2025-03-23 18:49:11

If you're trying to build a career in Data Science, side-projects are a surefire way to demonstrate your skills and boost your chances of getting a job or promotion.

In 2023, however, it's no longer enough to knock out a Titanic survival predictor or MNIST digit classifier. Projects like these have been done a million times, and they add very little to your profile because there's no way for employers to verify that the work is your own. For all they know, you've just copied and pasted the code from some random Kaggle Grandmaster.

The best way to stand out from the crowd is through a set of unique, interesting projects that showcase your skills and interests. But, if you're anything like me, then you'll know that coming up with project ideas and finding unique data sources is hard.

In this article, I'll show you how I approach this problem. After a brief overview of what makes a good Data Science project, I'll explain my system for generating project ideas, and illustrate this with plenty of examples from the projects in my portfolio. My aim is to give you a system for generating ideas of your own, ultimately giving you a tool that helps you build a unique Portfolio and advance your career in Data Science.

What makes a good Data Science project?

Data Science projects can take many forms, but all the best ones have three characteristics:

They have a narrow scope
They're original (in some small way)
They're relevant to real-world problems

Let's talk through these one by one.

Narrow scope

In my experience, a good Data Science project is narrow in scope: it focuses on solving a very specific problem (or part of a problem).

I know that that might seem counterintuitive. In the real world, after all, a typical Data Science task will have many stages, from Problem Definition and Data Collection to Analysis and Visualisation.

When you're building a portfolio, however, each individual Data Science side project doesn't need to cover all of these. Don't get me wrong: across your portfolio as a whole you need to show that you understand each of these stages. Each individual project, however, needs to be very specific in its aims. It doesn't need to prove that you can do everything; it just needs to plug specific gaps that the rest of your portfolio and CV don't cover.

Originality

There's a reason so many Data Scientists caution you against relying too heavily on "classic" projects like Titanic Survival Prediction: they're boring, and it's hard to set yourself apart by doing them because so many others have already attempted them. The best projects, by contrast, show some of the author's personality and interests and provide space to show a bit of creativity in your approach.

I'm not saying it's a bad idea to use existing datasets, and I'm definitely not saying you have to reinvent the wheel. There will be plenty of times when a simple Linear Regression model with a well-known dataset is all that's needed. But – and here's the clincher – if you do plan on using a well-known dataset, you still need to show that you can think creatively and critically about how to use it. Don't just reproduce others' work with this dataset; try and identify a new approach you can take with the data and/or show how it links back to the industry you're targeting. I'll explain how I approach this very shortly.

Relevance

The third sign of a good Data Science project is that it's relevant to the industry or companies you're targeting. Not only does this signal that you're capable of solving real-world problems; by doing these kinds of projects, you'll build your understanding of what the industry's challenges are, giving you great fodder for interviews.

The reason relevance is so important is that, as techies, I think we can have a tendency to get so absorbed in the details of our algorithms and tech stacks that we forget to make it clear to non-techies which problems we're solving. This is a problem because if we can't articulate this, we're unlikely to convince anyone to adopt our solutions and we're unlikely to drive any value.

My personal approach to generating project ideas

How do I find projects that satisfy these three conditions? My process essentially boils down to four steps:

Write a list of problems faced in the industry you've interested in
For each problem, identify ways in which machine learning could be used to tackle part of the issue
Find a dataset
Find ONE way to put your unique spin on things

Let's talk through these in a bit more detail.

Write a list of problems faced in the industry you've interested in

All organisations are essentially problem-solving machines – their whole reason for existing is to solve people's problems.

Specifically, there are two types of problem that all organisations wrestle with: (a) big picture problems, and (b) operational problems.

An organisation's big picture problem is the reason for its existence. Take Netflix as an example. What's Netflix's big picture problem? Telling stories. The fundamental problems that Netflix are trying to tackle include deciding which stories to tell, how to tell them, and how to make sure as many people as possible are moved by them.

An organisation's operational problems, by contrast, tend to be much less specific. Operational problems are the day-to-day problems that a business faces as it tries to solve its big picture problem. Netflix, for example, faces operational problems like churn (people cancelling their contracts), marketing (how to attract new customers), recruitment, demand forecasting, measuring performance, as well as thousands more. As you can see, operational problems aren't really specific to Netflix at all; in some way, all organisations will face problems like this.

Because organisations are so obsessed with solving problems, the first thing I do when trying to come up with a good Data Science project idea is to make a list of all the problems faced by organisations that I'm interested in working for. How I do I identify these? A good place to start is companies' vision statements and About pages, which are usually the places where they've tried to articulate in their own words which problems they're trying to solve.

Then, open up Google and search for things like "industry trends in {the-industry-im-interested-in, e.g. media, finance, basket weaving, etc.}". Chances are that you'll be able to find some fancy-pants report written by a consultant or industry analyst that neatly summarises the main problems for you. Thank you very much, Deloitte!

For each problem, identify ways in which machine learning could be used to tackle part of the issue

Now that you've got your list of problems, take some time to think about how machine learning could be used to tackle them. How do you do this? First, start by writing down a list of common machine learning tasks. Four of the most common, for example, are:

Classification: categorising things into buckets
Ranking: sort some options into an optimal order
Regression: predicting continuous values
Community detection: identify hidden patterns or clusters in the data

Now, go back through your list of industry problems and try to work out how those machine learning approaches could apply. Your aim is to produce a simple grid that looks something like this (again, using Netflix as an example):

Two example problems and some corresponding machine learning use cases. Image by author

As you can see from my grid, it's OK if there are lots of question marks! You're not expected to know it all at this stage.

If you're struggling to get off the ground and you're not sure which machine learning tasks could be relevant, you could start by thinking about what some relevant datasets could be, and then take a look at sites like Kaggle to see if others have used machine learning on any similar datasets. For example, if you identify churn as an organisational problem your target company faces, go onto Kaggle and search for datasets related to churn, and see what others have done using these datasets.

Another way to get ideas is to see whether your target company has a blog where they document their Data Science and Engineering work. Here, you'll be able to see examples of problems they have worked on and how machine learning is being used to tackle them. Of course, small companies might not have these kinds of blogs, but large ones often will. Some of my personal favourites are the blogs of Netflix, Tripadvisor, Duolingo, Meta and Spotify. By reading these, you'll quickly get a sense of how their Data Science teams are framing their companies' big picture and organisational problems in terms of machine learning.

If both of those approaches are drawing blanks, try searching on Google Scholar for something like "machine learning {the-industry-im-interested-in, e.g. media, finance, basket weaving, etc.}". Chances are that you'll be able to find some examples of where people have tried applying machine learning in this space.

Find a dataset

You've now got a list of machine learning tasks you could try. Deciding which one you should develop into a project will depend on many things, especially the availability of datasets.

As I always say, there's a lot more to life than Data Science, and you shouldn't spend more time than necessary finding good datasets. For my projects, I almost always use existing public datasets because it's hard to get juicy data which is tightly (and often rightly) protected by big corporates.

Where can you find good data? Kaggle is an obvious place to start, but if you're looking for something a bit more bespoke you might want to check out sources like BigQuery Public Datasets and Harvard Dataverse. I've also found a lot of success with looking at the GitHub pages of Data Science academics, which often include customised datasets which are freely available to use. That's how I came across the Fragile Families dataset and Hate Speech Data.

Put your unique spin on the project

Once you've decided on a task and dataset, you need to think carefully about how to put your unique spin on the project. The crucial thing to recognise here is that you don't have to reinvent the wheel. It's perfectly OK to take inspiration from past work, using pre-existing datasets and tackling the problems using common algorithms like Linear Regression and Random Forests. Your goal is not to make the project original in every single way; your goal is to find ONE meaningful way in which you can put your unique spin on things.

To put this in really practical terms, that means picking one of the following ways to differentiate your project from the others that are out there:

Modelling approach – Use a new/underused type of model, for example a fine-tuned model from HuggingFace
Feature engineering – Find a new way to extract features out of the dataset
Visualisation -Ditch the matplotlib or seaborn default charts and create some really compelling or interactive visuals
Code organisation – Rather than just organising your code as a continuous stream of consciousness, refactor it into proper pipelines

For example, when I built a hate speech classifier in my portfolio, I didn't use a particularly innovative data source or visualisation/engineering approach, but was still able to differentiate the project by using a cutting-edge NLP model from HuggingFace. When I completed the Fragile Families Challenge project, on the other hand, my modelling approach was incredibly simple (Linear Regression and Random Forests). The way I differentiated my work was by using NLP techniques to extract more nuanced features and using sklearn.pipeline.Pipeline to neatly organise my code and package it up. On another occasion, when completing the most cliched project possible (at the time) – analysis of mobility patterns during Covid-19 lockdowns – I differentiated by producing some customised visualisations through which I tried to show my flair for communicating data insights.

Conclusion

The point I'm making is that "good" Data Science projects don't need to be ground-breaking OR back-breaking. They just need to be narrow, relevant and have some element of originality/creativity.

If any of these ideas seem a little out of your comfort zone, don't worry. Data Science projects are as much about your own learning and development as they are about signalling your competency to employers, so don't be afraid to take on challenges which seem (ahem) challenging.

Of course, Data Science projects alone will not be enough to get you a job or promotion. For that, you'll need to take a broader approach and focus on networking, learning and validating your skills. If you'd like to read about how I've approached this, check out some of my recent articles:

11 Practical Things That Helped Me Land My First Data Science Job

Career change into Data Science in 2023: Was it worth it?

If you'd like to get unlimited access to all of my stories (and the rest of Medium.com), you can sign up via my referral link for $5 per month. It adds no extra cost to you vs. signing up via the general signup page, and helps to support my writing as I get a small commission. If you can't afford this (I'd completely understand!), it would mean a lot if you followed me. Thanks for reading!

Tags: Careers Data Science Data Science Careers Office Hours Portfolio