
Image by author
If you know how to create a machine learning decision tree, congratulations, you have the opportunity. Same level of coding experience. like ChatGPT and the thousands of other data scientists competing for the job you want.
A fascinating trend among hiring managers lately is that raw coding ability is no longer enough. To get hired, you need to go a step beyond knowing languages, frameworks, and how to search StackOverflow. You need a lot more conceptual understanding and an understanding of the current data science landscape, including things you think should only concern the CEO of a company, such as data governance and ethics.
There are many technical and non-technical aspects. data science skills You should know that, but if you're struggling to get a job, these less common data science skills could be the ticket to getting your foot in the door of employment.
Previously, data scientists worked in isolation, in dark underground basements, producing models. The models would create predictions or insights; those would be passed on to senior management executives, who would act accordingly without understanding the model that had produced these predictions. (I'm exaggerating a little, but not that much.)
Today, leadership takes a much more active role in understanding data scientists' products. That means that you, as a data scientist, must be able to explain why the models do what they do, how they work, and why they came up with that particular prediction.
While you could show your boss the actual code that runs your model, it's much more useful (read: employable) to be able to show them how your model works through visualization. For example, imagine you've developed a machine learning model that predicts customer churn for a telecommunications company. Instead of a screenshot of your lines of code, you could use a flowchart or decision tree diagram to visually explain how the model segments customers and identifies those at risk of churn. This makes the model logic transparent and easier to understand.
Knowing how to illustrate code is a rare skill, but it's certainly worth developing. There are no courses yet, but I recommend trying a free tool like Miro to create a flowchart documenting your decision tree. Better yet, try explaining your code to a friend or family member who is not a data scientist. The more you wear, the better.
Image by author
Many data scientists tend to focus more on the model algorithms than the nuances of the input data. Feature engineering is the process of selecting, modifying, and creating features (input variables) to improve the performance of machine learning models.
For example, if you are working on a predictive real estate price model, you can start with basic characteristics such as square footage, number of bedrooms, and location. However, through feature engineering, more nuanced features can be created. You can calculate the distance to the nearest public transportation station or create a function that represents the age of the property. You could even combine existing features to create new ones, such as a “location desirability score” based on crime rates, school ratings and proximity to amenities.
It is a rare skill because it requires not only technical knowledge, but also deep domain knowledge and creativity. do you really need get your data and the problem at hand, and then creatively transform the data to make it more useful for modeling.
Feature engineering is often covered as part of larger machine learning courses on platforms like Coursera, edX, or Udacity. But I think the best way to learn is through practical experience. Work with real-world data and experiment with different feature engineering strategies.
Here's a hypothetical question: Imagine you're a data scientist at a healthcare company. You have been tasked with developing a predictive model to identify patients at risk of suffering from a certain disease. What will probably be your biggest challenge?
If you answered “dealing with ETL pipelines,” you are wrong. Your biggest challenge is likely to be ensuring that your model is not only effective but also compliant, ethical and sustainable. That includes ensuring that all data you collect for the model complies with regulations like HIPAA and GDPR, depending on your location. You need to know when it is legal to use that data, how you should anonymize it, what consent you require from patients, and how to obtain that consent.
And you must be able to document data sources, transformations, and model decisions so that a non-expert can audit the model. This traceability is vital not only for regulatory compliance but also for future audits and model improvements.
Where to learn about data governance: It's dense, but a great resource is the Global Data Management Community.
Picture of datasedo
“I know data science can basically know statistics, create models, find trends, but if you ask me, I couldn't think of any real ethical dilemmas, I think data science just reveals the real facts.” saying Reddit user Carlos_tec17, mistakenly.
Beyond legal compliance, there is an ethical aspect to consider. You need to ensure that any model you create does not inadvertently introduce biases that could lead to unequal treatment of certain groups.
I love the example of Amazon's old hiring model to illustrate why ethics are important. If you're not familiar with this, Amazon's data scientists attempted to speed up their hiring workflow by creating a model that could screen potential hires based on their resumes. The problem was that they trained the model based on their existing resume base, which was very male-dominated. Their new model was biased toward male hiring. That is extremely unethical.
We are already past the “move fast and break things” stage of data science. Now, as a data scientist, you should know that your decisions will have a real impact on people. Ignorance is no longer an excuse; You need to be fully aware of all the possible ramifications your model could have and why you make the decisions you do.
UMichigan has a useful course on the “ethics of data science.” I also liked this book to illustrate why and how ethics emerge even in “numbers-based” sciences like data science.
A secret trick is that the better you know how to market, the easier it will be for you to get a job. And by “market” I mean “knowing how to make things attractive.” With the ability to market, you will be better able to craft a resume that sells your skills. You'll be better at charming an interviewer. And specifically in data science, you'll be able to better explain why your model (and the results of your model) matter.
Remember, it doesn't matter how good your model is if you can't convince anyone that it's necessary. For example, imagine that you have developed a model that can predict equipment failures in a manufacturing plant. In theory, their model could save the company millions in unplanned downtime. But if you can't communicate that fact to senior management, your model will languish unused on your computer.
With marketing skills, you can demonstrate the use and need for your model with a compelling presentation that highlights the financial benefits, potential for increased productivity, and long-term advantages of adopting your model.
This is a very rare skill in the world of data science because most data scientists are numbers people at heart. Most aspiring data scientists truly believe that simply doing your best and keeping your head down is a winning career strategy. Unfortunately, it's not computers that hire you, but people. Being able to market yourself, your skills, and your products is a real advantage in today's job market.
To learn how to market, I recommend some free courses for beginners. as “Marketing in a digital world”, offered by Coursera. I especially liked the section on “Offering product ideas that hold up in a digital world.” There are no specific data science marketing courses, but I liked it this blog post which explains how to market yourself as a data scientist.
It's tough out there. Despite there being a projected growth of data scientist employment, according to the Bureau of Labor Statistics, many more entry-level data science applicants are finding it difficult to get a job, as these reddit publications to illustrate. There is competition from ChatGPT and the layoff vultures are circling.
To compete and stand out in the job market, you have to go beyond technical skills. Data governance, ethics, model visualization, feature engineering, and marketing skills make you a more thoughtful, strong, and intriguing candidate for hiring managers.
Nate Rosidi He is a data scientist and in product strategy. He is also an adjunct professor of analysis and is the founder of StrataScratch, a platform that helps data scientists prepare for their interviews with real questions from top companies. Connect with him on Twitter: StrataScratch either LinkedIn.