In this insideHPC Guide, “10 Questions to Ask When Starting With AI,” our friends over at WEKA offer 10 important questions to ask when starting with AI, specifically planning for success beyond the initial stages of a project. Reasons given for these failures include not having a plan ahead of time, not getting executive or business leadership buy-in, or failing to find the proper team to execute the project. Chasing the hot technology trend without having a proper strategy often leads companies down the path of failure.
Artificial intelligence (AI) and machine learning (ML) technologies are disrupting virtually all industries globally—and AI technologies are not just being applied within robotics and vehicle automation. Companies from financial services to retail, from manufacturing to health and life sciences are seeing business improvements through insights generated by AI and ML.
#9 How does my infrastructure look on day 3 vs. day 300?
AI projects are constantly changing and evolving. The algorithms or software could change, as could the computing infrastructure, meaning that the model could start to run on company-owned servers and then convert to running in a public cloud or a hybrid platform. If a company has aligned its AI data strategy with the organization’s overall compute strategy (see question #3), this is not much of a problem.
“For example, today a company might be running on premises, with one or two data scientists running from their laptops with an external GPU,” says Ben David. “I know that if everything works out in a year, then I’ll have 20 data scientists, and then I’ll need a heavier infrastructure. You want to plan for that. Again, the notion is that if you know it on day one, two and three, etc., then you can plan ahead for it.”
As data volumes scale and the models become more complex, so does the need for more robust compute; otherwise, the fact that you have 20x the volumes of data means that your models will take 20x longer, reducing productivity and agility. Compute needs pipes that can saturate it, so you want to make sure that you can expand your pipes, (i.e., your network) accordingly.
One frequent and expensive mistake companies make is not planning for the significant data growth over the course of the project. Amassing 20x more data means a significant increase in storage costs and additional delays, often due to storing more data in cold tiers and moving them back and forth to hot/fast tiers. Those reads and writes are time consuming. Some companies tier some data in the cloud for economies of scale and flexible capacity, which introduces management overhead with multiple name servers and different operational models.
Newer file systems, such as WekaFS, manage the different tiers under a single name server with throughput that is comparable to local storage. Using a modern file system can dramatically alleviate the cost and management burden, helping you to keep productivity high as data increases. Most modern file systems are designed from the ground up to support exabytes of data and AI and ML workloads.
#10 How do we future-proof the project?
Ben David says he sees many companies kicking off AI projects with high hopes for success, but the team has not taken a holistic view of the entire project, so down the line they run into trouble when it comes to growth. “We see projects that are starting with some environments that are adequate for one to five data scientists, but then the environment expands and suddenly they need additional infrastructure,” he says. “More often than not, you see customers trying to extend their existing infrastructure instead of re-architecting it.”
For example, a data scientist might start to work on a single laptop, and then additional data scientists are brought in, and suddenly the team needs to work on a network-attached storage appliance. On the other hand, a project might start in the cloud, but then the team suddenly has 10 to 50 data scientists contributing to the project, so business leaders determine that it is more cost-effective to buy on-premise equipment for the computing, network, and storage environment. Having a strategy around how to effectively manage the growth and to scale the project can help future proof a company’s AI project.
Conclusion
Why is “more data” not necessarily better? Knowledge is the key.
It is possible for many AI projects to succeed without having all of the answers or without following the strategies that were laid out here. Nevertheless, the long-term success of a project must have an AI team willing to be flexible on infrastructure changes, willing to fine-tune their model, and forward thinking enough to have a plan to move and store data safely and efficiently. With these plans in place, your chances for success will go beyond the 15% to 50% rates that many of today’s AI projects experience.
Over the past few weeks we explored Weka’s new insideHPC Guide:
- Introduction, #1 Have we clearly defined a goal and identified the right questions to get us there?, #2 What data is required to achieve your goal or solve your problem?
- #3 Where will I get my data if I don’t have it already?, #4 What is our organizational compute strategy: on-premises, cloud, or hybrid?, #5 What is our plan to move and store the data?
- #6 How will we remove bias and validate our model’s results?, #7 How often will we fine-tune the models?, #8 How do we deploy a new model?
- #9 How does my infrastructure look on day 3 vs. day 300?, #10 How do we future-proof the project?, Conclusion
Download the complete 10 Questions to Ask When Starting With AI courtesy of Weka.