What I Learned from the Best and the Worst Machine Learning Team Leads | by Aliaksei Mikhailiuk | Feb, 2023

Become An Effective Machine Learning Team Lead

Managing communication, infrastructure and documentation

While some of us were lucky enough to work only with great team leads, most of us have had both great and terrible experiences. And although terrible leadership can make the team members’ life horrible, bitter experiences foster great team leads from the team members — helping them learn what behaviours to avoid.

Technical management of software engineering projects is very established, with multiple tools and techniques at the disposal of a team lead, such as Agile. Meanwhile, machine learning projects, where accurately predicting timelines, outcomes of the tasks, and task feasibility are challenging, are hard to fit into these paradigms.

Navigating projects with high uncertainty at every step requires skills and knowledge that machine learning team leads need to gain through experience. In this article, I will summarise my learnings as a team member under good and bad management, and acquired through leading machine learning projects myself.

While the article focuses on specific techniques of managing machine learning projects, certain generic aspects of project management are particularly important and hence I will stress them out as well.

First, I talk about why managing machine learning projects can be challenging. I then talk about the ways to overcome these challenges — through an improved communication, better managed infrastructure and documentation.

While software engineering and machine learning projects have much in common, leading machine learning projects can be different due to several reasons:

Machine learning projects often carry higher risks. Unlike software projects, where the task is to build a system from deterministic blocks, feasibility of the problems tackled by machine learning projects is often unknown beforehand – will we have enough data, will the data be of reasonable quality, will the model be large enough to capture the relationships, will we have enough compute power?
It is harder to set deadlines, milestones and plans. While the field is still establishing itself, and there are no out of the box solutions to every problem, machine learning models need to be built from scratch. Hence, machine learning projects end up being research-oriented. For these projects predicting timeframes can be difficult, as many decisions can only be made once a certain phase has been completed.
Lack of performance metrics makes tracking and evaluating the performance challenging. Some problems have a clear goal — for example, evaluating a spam filter, we can objectively tell that it filters out 99% of spam messages, while for more subjective problems, it is harder to define performance metrics. Consider, for example, a beautification filter — how can we measure if it performs well without a user study? And obtaining sufficient amounts of reliable data from the users is often costly, slow and challenging, especially with tight GDPR requirements.
It is very common to have multidisciplinary projects. Thus, team members are likely to have very narrow specialisation and for the project to progress communication flow within the team needs to be excellent.

Many difficulties discussed above can be overcome by focusing on aspects that matter for machine learning projects – communication, infrastructure and documentation.

It is not customary for software engineering teams to prepare presentations for the meetings. It is, nevertheless, much more common for machine learning projects — as many components are easier to communicate via a plot — loss functions, data distribution, artefacts in the generated results, and many more.

Visualising helps build communication skills, lets everyone take ownership of their work, facilitates information sharing and encourages feedback from the rest of the team, as it is much easier to point to the slide when asking for clarification.

Machine learning projects are often cross-disciplinary and keeping everyone engaged is difficult when the material is not delivered well. To keep attention it is important to start with a high level picture and always explain things in simple terms.

Practical Notes

Start and finish on time. Factor in a few minutes at the beginning for people to join and have a chat to tune in for the meeting. Small talk at the beginning wouldn’t take much time from technical discussion but would add to the team morale.
Give people time to gather up, unmute themselves on zoom and speak up. After asking if there are any questions count to ten in your head.
Be consistent and lead by example. If you take something as a rule, practice it. Asked the team to present? Set the standard.
If you are not sure if the point has gotten through, ask the team member to talk through the task and identify potential challenges.
Not getting any feedback? Perhaps the question is not formulated well. Instead of asking “Are there any questions or issues with current approach?”, be more specific and let people own the solution, rephrase to “What are the better ways to load the data?”.
Praise in public, criticise in private — an all time classic. Want to encourage a behavioural pattern? Reward it in a group meeting. For example, many people are uncomfortable asking questions. But raising questions and highlighting issues is important — someone raised an issue in a group chat during the week? Group meetings are a good time to thank them. Great book to read on how to give feedback is “Radical Candor” by Kim Scott.
Don’t try to do both – engineering and managing. If you are leading the project, then lead and manage. Delegate. With how specialised each machine learning team member is, micromanaging would not only infuriate the team but would also damage the results.
Team members need to know why what they are doing is important. Let the team participate in the process of task formation and prioritisation, most of the team members are domain experts. It will be easier to delegate if the decision on how to proceed is coming from the team — your role then comes down to moderation.
Honesty helps build trust between you and team members. If you don’t understand something, be honest about it. If you see that the task is not very interesting but necessary, don’t try to sell it, but explain things as they are. Be honest and this will be appreciated.
Everyone has different priorities, expectations and fears. Learning what the person wants from work would help the team lead to set the person the right tasks.
Unless you know the person, be careful about assuming what they want. Dropping “It will be good for your CV”, when assigning a task against a team member’s goals easily be perceived as a cheap manipulation.

Machine learning is an area in which being systematic is an unavoidable necessity. The quality of the decisions depends on how data, code and results are organized.

However, being systematic is also very challenging due to the complex relations between every aspect of the final model — data, architecture and hyper-parameters. Furthermore, each experiment generates tremendous amounts of data, and since projects are research-oriented, each module is bound to change frequently.

Whenever giving feedback or guiding the team in building the code or preparing the data — ask yourself a question, will it be easy for someone new joining the team to get their head around what is happening? And if the answer is no, you need to re-think the process.

Infrastructure

Interestingly an excellent source for project management are warfare books, for example, “The Art of War” by Sun Tzu. One of the wisdoms the book is sharing is that the fate of the battle is decided before it starts — the one who prepares best wins.

One of the most important aspects of combat is supply lines. Infrastructure is like supply lines — for the project to evolve, infrastructure shouldn’t have sudden breaks or blocks and should be easy to use and expand.

Best practices in managing the data and infrastructure that I have came across so far are:

Have config files for training and log them. This will help re-producibility, allow to go back to see how the solution was obtained as well as reduce the errors in setting the experiments.
Have config files for training data. This can be as simple as a list of pairs input-output for the model. This will help ensure that you know exactly what data the model is trained on and simplify the dataloader.
Use sensible default values for the arguments — people who will be running the code would expect it to work out of the box without the need to dig too deep. If you need a few hyper-parameter, having a bash script with the command to execute is handy, as it is easier to modify compared to the text in the terminal.
In case resources allow, parts of deployment that can be automated, must be automated. Automation saves time for the team by eliminating repetitions and prevents errors from happening.
How training data is organised tremendously affects data loading, training speed, flexibility and convenience to analyse the results. Make sure that the data is separated by type and don’t dump everything into a single folder — this will make it easier to visually explore existing data and add new one.
Training and testing pipelines must be as similar as possible — in how the data pre-processed, post-processed and how the model is loaded.

Model tracking

Having multiple team members working on different parts of the same model, we need to compare all the results. Having a local tensorboard for each team member would be highly inefficient. At the same time, piling all the experiments in the same folder would be very cumbersome.

There are powerful alternatives to tensoboard with an excellent toolkit for analysing model performance, such as wandb (which can be free for small teams) and aimstack (completely free at the time of writing).

Model comparison

Machine learning projects generate lots of models that need to be compared. However, running tests in an isolated development environment to validate performance isn’t enough. To highlight potential issues testing and comparison must be performed as part of the full pipeline on the dedicated inference device as soon as possible.

Have a dedicated test data set that best reflects the final product scenario with as many corner cases as possible that would help identify potential problems with the final model. For example, a human detection computer vision problem would have a diverse set of humans with various body shapes, skin tones and complexions.

When comparing the results, always have a benchmark. It is a good idea to start the project with one. Always have a clear metric reflecting the model properties required for production. That can either be a mathematical metric or a product pilot results with numbers from the user data.

Keep track of how the model was built and what aspects were crucial for the model to performance to improve. Document — for example a ranking table with links to demos, code and comments is a good starting point.

Code management

Working on multiple experiments, it is inevitable to have a branch explosion problem. I saw projects that had 500+ GitHub branches, and it is obvious that there is no way they are all tracked and well-documented.

Ensure to encourage merging the branches or their archival if unnecessary, as otherwise, and it is easy to get lost. Encourage small commits and PRs. Otherwise, these would take ages to review and eventually will be outdated.

Use frameworks that help simplify and abstract parts of the code away — as such is pytorch-lightning, that brings in structure to the training code and removes parts of the code related to setting up the environment.

Another challenge is associating the experiments with the code. Which code were the models trained on? Is it easy to trace? There is a solution — either keep a well-documented project or use automated loggers — MLFlow.

Documentation provides reproducibility, accountability, knowledge sharing, quality and performance control, and a structured way of looking at the problem. Being very important, it is, however, often overlooked.

Rule of thumb — if the question arises more than two times it needs to be documented. Similarly any important process that is done once must also be documented. As such is for example data preparation, or environment set up.

There are several important documents that can significantly improve the quality of communication between the team members and the quality of the solution: design document, roadmap and weekly notes.

Design doc

I had a lot of misunderstandings with my PhD supervisor when he insisted on starting to write an article at a very early stage of the project. Later in my career, I discovered this was a very useful practice.

Writing down the skeleton of the project slows you down but gives more time to think over every part of the solution, helps define constraints and metrics. The key is to remove ambiguity in the outcomes of the project.

Design doc explains ins and outs of the projects and reports the results. and evolves through every stage of the project — starting as a plan of the project it become an explanation of how the final result has been achieved.

In machine learning projects it is common to have failed experiments, and while these can be irrelevant for the final solution, they also serve as a source of knowledge and should have a dedicated section.

Don’t forget to include the data — what datasets have been used and how they have been cleaned. Since it is customary to reuse the data from project to project, having a detailed section will help future projects.

As the doc is likely to be skimmed by someone at a later stage, or someone with a non-technical background, always start with motivation and the big picture of the project in simple terms. Similar to group meetings visualise processes — for example a diagram of the proposed pipeline would be an important part of the design doc.

Roadmap

Roadmap is a step-by-step plan of the project with key milestones and steps that need to be taken to achieve the goal. A good roadmap would contain a goal, its description, the timeframe, task owner and the main findings and links to the results.

Setting a timeframe for machine learning tasks can be difficult as outcomes might be not clear — for example setting a task in a roadmap with an expectation of 98% recall is not realistic, as the experiment might or might not work out. Instead tasks should be split into small components, with the deliverable of a block of code or a dataset.

Weekly notes

For machine learning projects visualisation is very important and hence ideal format would be a presentation with as much visual information and links to the code related to the described data. These are typically stored in a shared space, where each team member can look and find references without nudging the responsible person.

For you as a team leader, it is useful to have a summary of each person’s weekly results/work before the meetings, as it is easier to have targeted questions during their presentation.

Practical notes:

Since quite a few documents need to be maintained it is not unreasonable to keep a single doc with the shortcuts to the main documents.
Need to onboard new members, have an onboarding doc, but make sure that you have done all the steps from scratch yourself so as not to omit anything.
Encourage team members to document the performed tasks with links to code and results — this will help them during performance reviews, when they would need to recall what has been done throughout the year.
Machine learning projects are guided by research. Most of the industrial projects that I have been part of relied on usage of pre-trained models or re-implementing the models from the papers. To effectively manage and track state-of-the-art developments as well as communicate existing works to the team a good practice is to keep a shared doc with the links.

Since the team leader takes responsibility for the project delivery, one of the essential qualities of a leader is long-term thinking about the success of the team and every individual in particular.

Think about the project as the way to help the team as a whole and each team member in particular to grow. For that give autonomy to each team member and encourage ownership, proactively reward the right behaviours and best practices, visualise, visualise and once more visualise, setting the example for the rest of the team. An excellent list of qualities to encourage in your team is summarised by Amazon.

Liked the author? Stay connected!

Have I missed anything? Do not hesitate to leave a note, comment or message me directly on LinkedIn or Twitter!

What I Learned from the Best and the Worst Machine Learning Team Leads | by Aliaksei Mikhailiuk | Feb, 2023

Technical Terrence Team

FTSE correction: a once-in-a-decade chance to overtake Warren Buffett?

Leave a Reply Cancel reply

Recommended.

LLM4Decompile: Open source large language models for decompilation with emphasis on code executability and compileability

China's innovative electric vehicle and Ferrari's strong quarter excite analysts

Teenage Engineering’s KO II groovebox is feature-rich and only $300

Tether and RAK DAO Partner to Promote Bitcoin and Stablecoin Education in Ras Al Khaimah

Southwest Airlines Chairman Kelly to retire amid pressure from Elliott By Reuters

Categories

Important Links

What I Learned from the Best and the Worst Machine Learning Team Leads | by Aliaksei Mikhailiuk | Feb, 2023

Become An Effective Machine Learning Team Lead

Managing communication, infrastructure and documentation

Practical Notes

Infrastructure

Model tracking

Model comparison

Code management

Design doc

Roadmap

Weekly notes

Practical notes:

Liked the author? Stay connected!

Related

Technical Terrence Team

FTSE correction: a once-in-a-decade chance to overtake Warren Buffett?

Leave a Reply Cancel reply

Recommended.

LLM4Decompile: Open source large language models for decompilation with emphasis on code executability and compileability

China's innovative electric vehicle and Ferrari's strong quarter excite analysts

Teenage Engineering’s KO II groovebox is feature-rich and only $300

Tether and RAK DAO Partner to Promote Bitcoin and Stablecoin Education in Ras Al Khaimah

Southwest Airlines Chairman Kelly to retire amid pressure from Elliott By Reuters

Categories

Important Links

Get daily news updates to your inbox!