Launching an ML Project: A Quick Guide for Founders and Managers

In this article, I aim to shed light on the complexities and nuances inherent in machine learning (ML) projects and highlight key considerations for product founders.

Consider this tutorial as a checklist designed to significantly reduce the potential for errors in managing an ML project. It will be particularly beneficial for product owners and management team members, including team leads, technical leads, and project managers.

Key Considerations Before Starting Your Project

Before embarking on any ML project, it's crucial to address the following questions:

Research Objectives and Business Goals

The outcome of an ML project, which relies on experimentation, is largely dependent on the precise definition of the business problem. It's essential to clearly articulate the project's objectives and the criteria for its completion.

The Necessity of ML for the Task at Hand

One of the first steps is to determine whether the problem can be solved without ML, either through manual efforts or traditional coding. This may require consulting with one or two ML domain experts, but it won't take much time.

If you're looking to launch a new product feature but lack sufficient product metrics, you're essentially working with a product hypothesis. In such cases, adopting a lean startup approach to test this hypothesis more cost-effectively, for instance, through manual emulation of an ML model, can be wise. This method helps validate the product's feasibility.

Economic Justification of the Project

ML development typically demands significant financial investment due to its reliance on iterative experimentation and hypothesis testing. While experiments often provide insights into solving the problem, not all hypotheses get validated. Predicting the outcome of each training iteration of the model can also be challenging.

I'm skeptical of clients who cannot articulate the economic benefits of ML development. Imagine having an established business process involving your employees, which you wish to replace with an ML model. This approach is justified if you've compared the costs of employee salaries against the expenses of implementing the ML project and found the latter to be economically more beneficial. For instance, the advantage of an ML system for quality control in a small window installation company might not be immediately apparent.

We're keen to tackle any challenge presented to us, but our priority is to assist companies in creating products that foster their growth and development. Therefore, we always highlight risks to ensure our clients are fully aware of what they're committing to. A technical and economic justification serves as a basic source of metrics and target values for evaluating the quality of the models and algorithms developed.

Developing ML Algorithms

Machine learning (ML) technologies have seamlessly integrated into our lives, gaining popularity and becoming accessible to almost everyone. In the business realm, numerous ready-to-use ML models can tackle various tasks after a few training iterations on a dataset.

However, when a project aims to address a unique problem, it ventures into the realm of research and development, necessitating the exploration of different models and algorithms for preprocessing (inputs to the model) and postprocessing (how the model's output is transformed for further use in the system). ML projects typically adopt an iterative approach, aiming to test hypotheses and formulate conclusions with a plan for future work. It's nearly impossible to predict the time required to adapt and launch models into production before starting a project, and costs are estimated on a rough basis.

Data Requirements

Every ML model is trained on data, which could include images, text, audio, and video materials. Gathering a dataset of sufficient size is a crucial first step for any ML project—the larger the dataset, the better. For example, the well-known Segment Anything Model (SAM) was trained on a dataset of 11 million images and over 1 billion masks. The dataset requirements should be clarified before the project starts: to begin work on a task, a dataset of a specific size must be provided by a certain date, a requirement we always establish with our clients at the start of ML projects. Otherwise, the data collection process may extend unnecessarily, wasting time while the development team cannot proceed efficiently. It may also become apparent that the dataset needs to be larger than initially thought, a risk you as the client should be prepared for, though these risks are hard to estimate in advance.

Your Expectations for the Final Outcome

Aligning the expectations of the client and the team from the outset is crucial for maintaining focus on achieving target metrics. It's essential to discuss and agree on the following:

  • The desired accuracy of the algorithm, its rationale, and how it's calculated. Ensuring that both you and the team understand the calculation methods is vital, as well as outlining any assumptions, limitations, and risks that could affect the expected accuracy. For example, in one of our case studies, we'll show how accuracy might vary with weather conditions, time of day, etc.

  • The operational environment for the ML model: whether it needs to run in offline or online mode, on specific hardware, with certain data processing speed requirements, etc. The environment can significantly impact the methods used to achieve results. For instance, the aforementioned SAM requires over 30 GB of RAM for stable operation, making it unsuitable for offline use on mobile devices.

  • Whether the project has access to the computational resources needed for training the ML model. Different models and training frequencies may require varied computational resources. If frequent model training is necessary and the process is time-consuming, consider renting a powerful virtual machine (VM) on popular platforms or purchasing dedicated hardware. The cost of equipment can be substantial. For instance, acquiring an NVIDIA TESLA A100 GPU might cost between $100,000 and $200,000, depending on the specifications. Therefore, it's crucial to assess the need for such hardware at the project's outset.

When Evaluating an ML Project: Key Considerations

As we delve deeper, this information shifts towards practical application by managers on ML projects, though product owners might find the forthcoming development insights intriguing as well. Our approach to developing ML solutions adheres to the following principles:

Once these preliminary discussions with the client are settled, you can proceed to evaluate the ML project. It's crucial to consider:

  • Dataset Size and Labeling Responsibility: Clarify whether labeling can be automated and who will validate the labeled dataset.

  • Review of Existing Solutions: Typically conducted during the initial phase or pre-project research to understand how similar challenges were addressed.

  • Model Training Iterations and Expected Outcomes: Estimate the number of iterations and anticipated improvements.

  • Pre- and Post-Processing Needs: If data processing is required before or after the model's application, include this in your project estimate.

  • Algorithm Accuracy Verification Methods: Ensure there's a plan to assess the precision of your algorithms.

  • Additional considerations should include:

    • Model "grounding" and optimization for deployment on specific hardware.

    • If the model needs to run in an environment not supporting Python, allocate time for code translation and adaptation to different programming languages. ML models are generally developed in Python due to its extensive array of ready-to-use tools for such tasks.

    • Optimizing the algorithm's execution speed.

    • Further training of the model with real-world data as the software is used.

Kickstarting Your ML Project

The initial stages should focus on reducing uncertainty by verifying:

  • The problem has a known solution.

  • Appropriate models for addressing the issue.

  • The model's compatibility with the required operating environment.

Addressing these high-uncertainty questions early on is crucial.

Structuring the ML Project Process Within Your Team

Adhering to four fundamental aspects is essential:

Iterations and Client Demos

ML project workflows are inherently iterative, emphasizing regular updates and final client presentations after each cycle.

Task Prioritization and Decomposition

While clients can view tasks in broad strokes, teams need tasks broken down into no more than 16-hour segments for effective progress monitoring. Identifying concurrent tasks and prioritizing them is key.

Eliminating Work Blockers

Ensure all obstacles are removed for your team, such as providing high-performance computing resources for experiments. Pre-emptively simulating ML model results for the development team can prevent blockers, allowing them to proceed without waiting for the final ML outcomes.

Checkpoints

Given the nature of ML tasks, teams might focus excessively on metric improvements at the expense of time. Setting clear checkpoints prevents aimless exploration.

Managing Customer Expectations and Communicating Results

First and foremost, divide the work into short iterations to frequently showcase progress to the client. Regularly solicit their feedback. This simple rule ensures both the team leader and the client stay informed and engaged.

Secondly, visualize the results. Clients "eat with their eyes," and ML metrics are often not translated into a business-friendly language. Thus, it's the team's task to either implement a metric understandable to the client or explain how ML metrics are derived from specific parameters and coefficients.

For instance, in tasks such as object detection or segmentation in images, a visual demonstration of results is a winning strategy.

Thirdly, avoid setting unrealistic expectations for the client. Make it clear that working on an ML algorithm doesn't always guarantee the desired outcome. Some experiments may not progress to production, yet they are crucial for the team to understand the right dependencies and make the best choice.

For example, conducting several experiments simultaneously allows for their results to be compared, and the most effective one to proceed to production. However, even an experiment with less effective outcomes is considered successful as it guided the team towards a more suitable solution.

Instead of a Conclusion

If asked about the essentials for a successful ML project implementation, I would certainly highlight the following points:

  1. Clearly define the business problem that needs solving with ML technologies.

  2. Ensure that using ML is justified and effective for your specific issue.

  3. Engage both the team and the product owner in understanding the processes and nuances of ML development.

  4. Identify and document any risks that might arise during the project.

  5. Establish data and information requirements, ensuring the client provides what's needed for the project.

  6. Clearly communicate the desired outcome, ensuring both the team and client share the same vision.

  7. Verify the team has all necessary resources to complete the task.

  8. Reduce uncertainty at every project stage.

  9. Conduct frequent cycles of results demonstration and feedback collection.

  10. Set checkpoints throughout the ML task resolution process.

  11. Visualize results in a way that's understandable to non-experts.

This is a foundational list that can be optimized and expanded based on your project's specifics.

Need help with challenges that require machine learning algorithms?

Feel free to book a free call with our CTO, or leave your contact details on our website. We'll consult you, answer all your questions, and if you like what you hear, we'll deliver the best solution for any challenge.

Last updated