Artificial intelligence offers companies new opportunities to optimize their business processes and save resources. However, the success of an AI workflow does not only depend on powerful algorithms and GPUs. Efficient storage solutions are crucial.
The starting point for any AI workflow is the collection of large data sets.
Uwe Kemmer is Director EMEAI Field Engineering at Western Digital Corporation.
Artificial intelligence has ushered in a new era of technological achievements, from the impressive performance of intelligent language models to generative AI's ability to create images from text. Companies also benefit from these new opportunities: with technical know-how, they are able to develop and train their own AI models. There are hardly any limitations on which business processes can be analyzed and optimized by AI.
When it comes to AI hardware, GPUs with their enormous computing power are often the focus. However, they alone cannot realize an AI workflow. The reason is the data involved. Whether for analysis, training, or rapid decision-making, AI requires and generates huge amounts of information at every step. Storage systems—at the edge, on-premises, or in the cloud—serve as infrastructure to collect, store, and manage these datasets.
Choosing the right storage technology can sometimes be crucial for using AI efficiently and ultimately achieving the desired results. It's all the more important to know where or how data is used throughout an AI workflow. Fortunately, despite its seemingly high complexity, the process can be broken down into four fundamental steps: data collection, model creation, training, and deployment.
The four phases of the AI workflow
1. Data Collection: The starting point for any AI workflow is the collection of large amounts of data. During collection, raw data is generated from sources such as sensors, cameras, and databases. The storage solutions used in this phase must efficiently capture and organize structured and unstructured formats such as images, texts, and videos. This is fundamental to the entire AI process. Typically, raw data lands on a local storage platform but can also be gradually uploaded to the cloud for analysis.
In some special cases, physical data transport devices—such as an external hard drive or a rugged edge server—are required to transport large amounts of information to the data center. This method is usually employed when the upload is too large and/or costly. Robust edge solutions can also ensure seamless data collection in extreme environments such as a desert or ocean, where an internet connection is not possible.
2. Model Creation: With a clearly defined problem in mind, AI experts in this phase engage in various processing steps to refine the algorithms and extract the desired insights from the data. Model creation and the training phase are the most compute-intensive processes in the AI workflow. Choosing the right storage media is particularly important here and is not necessarily limited to fast all-flash arrays. Hard disk drives (HDDs) play a crucial role in storing large datasets and snapshots for future training. Machine learning algorithms repeatedly process these datasets to optimize the model. While HDDs provide cost-effective mass storage, flash ensures the speed, allowing training and model development to proceed without delay.
3. Training: During training, the previously refined model is applied to and tested on a comprehensive dataset. Training times vary greatly; even the most popular language models took up to a year. Other models may only need hours to days or months: the duration of training always depends on the problem and the dataset used. Essentially, every AI model operates in iterative loops, where process optimization is done before each run. The required GPU performance is immense, and the resulting data must be accessible for the next round of training. At first glance, a pure flash setup seems ideal for the training phase. In reality, however, an AI should always be able to draw on the largest possible data pool so that insights from past iterations continue to be considered in the training algorithm. Similar to model creation, a hybrid approach of HDDs and flash drives is therefore optimal.
4. Deployment: Once the training process is completed and the algorithm is finalized, it can be deployed. The most common method is to use the cloud and enable use via web-based services. For example, companies can utilize an algorithm across multiple locations or offer it as a service. In combination with edge locations, this can also be supplemented with real-time data analysis. Of course, deployment on a smaller scale is just as feasible. In the case of SMEs, the algorithm can reside on the local server and be accessible throughout the corporate network.
Individual storage strategy for each AI workflow
Based on the described phases of the AI workflow, important insights can be gained for choosing the right storage. Fundamentally, there is no single solution. Rather, the optimal storage strategy depends on the specific use case. It is important to be aware of the individual requirements of the AI model and not to lose sight of the desired goal:
Date: 08.12.2025
Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.
Consent to the use of data for promotional purposes
I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.
In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.
Right of revocation
I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.
Data collection strategy: What is the basic approach to data collection— bulk transfer or incremental upload? In some scenarios, a physical data transfer device or a rugged edge server may be necessary.
Training environment: Is the training conducted in the cloud, on the company's own system, or possibly directly with an external provider offering a pre-trained model? Each option has its own advantages and necessary trade-offs.
Application: Who is intended to use the final algorithm, and how is it accessible? If the goal is, for example, edge inferencing, then it must be ensured that the hardware requirements for the necessary edge scenarios are met everywhere.
AI and data go hand in hand
Artificial intelligence is here to stay. The extent of changes for society is still difficult to foresee. However, it is already clear that data plays a crucial role. It is no coincidence that new units of measurement, Quetta and Ronna, were introduced in 2022 to keep the exponentially growing global data quantities quantifiable. For AI, mere storage is secondary; much more important are the speed and efficiency with which models can access information and operate based on it. Relying on an ill-considered storage strategy imposes an avoidable bottleneck in the long run.
When introducing artificial intelligence, companies should therefore keep a close eye on the interplay of data collection, model creation, training, and deployment. Because the choice of suitable storage solutions is ultimately crucial for the success or failure of an AI workflow.