The Rise of the AI Architect
How AI architects can bridge AI and engineering to ship reliable decision-making systems
AI systems and technology are very quickly impacting so many areas of modern life. Whether it be a seemingly simple task, such as detecting a cat in an image, or the autonomous operation of a vehicle. There is also a shift toward greater acceptance of AI technologies into society. AI systems are primarily built into complex software. What makes AI-enabled systems unique is that they have, at their heart, the notion of decision making. This extended functionality puts new demands on the processes and guides for building these sorts of systems.
Architecting fundamentals can help navigate these complexities and the successful building and operation of these sorts of systems. Historically, an architect would aid in the building of a system and then would be able to sort of “walk away” or execute a new project. In this age of AI, this is simply not the case. The challenges of observability, integrated development, testing, the voice of the user, and their centrality to business outcomes place new demands on an architect.
Challenges that impact AI enabled systems
Complex integrations: For example, is the system going to be a hybrid cloud or on premise deployment? What dependencies exist on external software?
High performance bar: the system is expected to be deployed to impact key business functions or activities that the organization shall depend upon.
Depending on the technology, using the AI/ML can incur significant costs.
If not well architected or instrumented, the system can be a challenge to troubleshoot.
What can architecture do to help?
By modeling and conducting robust conceptual designs, the unique perspective and aspects of the AI technology are identified.
The use of tactics and patterns partitions the system for better performance and meeting non-functional requirements.
Robust architecting helps keep a holistic view of the end to end system.

The rise of AI architects
History has conclusively shown that as modern technology progresses, the practice of architecting also evolves. We are now witnessing the rise of the AI architect. The modern AI architect will be closer to operations and must help with managing the complexity of systems. These include:
The architect is the principal approver for the completion of acceptance gates for going from an AI prototype to a production deployment. He ensures the concept has merit, evaluates the results of the prototype, guides design of the production model, evaluates the performance of prototypes, and finally accepts the production deployment.
They integrate and synthesize solutions to balance the data science, data engineering, software development, operations, and business teams.
The architect is the principal owner of the non-functional requirements and interfaces across the AI system.
They must be involved in the observability, where they need to understand what decisions the software is making.
From Idea to Production Life cycle
As AI engineering is maturing, a notional process is emerging that takes an AI idea to production. The initial stage is discovery and evaluation. It looks to determine if the data, algorithms, and compute resources exist to make a system that will have impact. The next stage is prototyping; here one now adds complexity to the software, both by adding more production domain elements, to test how well the new system would work. It is at the end of the prototype phase where a key decision is made to determine if it is warranted to take the prototype to a production environment.
With the passage out of the prototyping phase, a controlled execution of the model is done, where it is observed and monitored to ensure it is working as expected and delivering value. With the monitoring phase complete, a controlled rollout of the system is done to production, with another layer of monitoring to ensure the systems it is now impacting still also function as expected. Finally, with these gates completed, a full deployment can be done and now the system goes into a monitor and evaluation phase. The evaluation is to see if the model needs to be adapted or retrained.
One key item is as the system is now in operation – a new set of metrics for system observability need to be captured. Below are some basic ones; this should not be considered exhaustive:
Accuracy
Data throughput
Area under the Curve
Time for processing
Cost per transaction
How often these metrics should be reviewed really depends on how time dependent AI performance is to the enterprise. That said, these observability metrics should be reviewed frequently by the operations teams, and on a weekly basis or sooner by the other stakeholders, to see how AI model performance is impacting their areas of responsibilities.
Specific areas an AI architect can influence
As has been mentioned, the AI architect needs to have a solid understanding of, and be able to communicate in the language of, AI/ML and data science topics. This includes the perspective in understanding the role of non-functional requirements as applied to AI/ML systems. How to extend their knowledge for architecture modeling should incorporate systems where there is to be a sort of non-human decision making. They will also need to fortify and understand how users and stakeholders of the system are to be impacted, aided, and potential friction points. They will need to be close to guiding and evaluating the rapid prototyping in support of software development. The architect needs to consider interfaces and how their management will impact the decision making of the system. The use of AI/ML technologies inherently add a layer of complexity. The architect acts at his own peril with the notion that a simple AI application can readily be made in a less rigorous manner. Adding a layer of AI into almost any application raises the complexity significantly.

The AI architect must by necessity straddle AI/ML and software engineering. This is to ensure the architect can navigate the alphabet soup of techniques, algorithmic aspects, and technologies that exist. It is very easy for an AI architect to be overcome by the various other roles needed to deploy modern AI systems. In many enterprises, there are typically several functional teams: Data Team, Development Team, ML Ops, Legal/Compliance. Below are some of the major facets that the architect influences and needs to be involved with:
Data Team
The architect works with the data team to ensure that the product question can indeed be answered by the data that exists or is to be used. He is pivotal in levying requirements for the speed, format, and consistency checks that must be met.
Development Team
The architect works to ensure top-level designs, interfaces, and non-functional requirements are being met by the designs being created. He is more in a reviewer mode. He also aids in clarifying engineering requirements and providing guidance to address design questions and challenges.
MLOps Team
The architect, in this role, defines how well the production system is required to perform, and how the system is to be instrumented and monitored for correct execution. He will also define the canaries and fail-safe mechanisms to ensure issues with the AI components do not compromise the full production environment. The mechanisms for model and system updating are also laid out by the architect.
Legal and Compliance Team
In this capacity, the architect provides the technical insight, design guidance, and evidence that the AI system is meeting the legal and compliance needs of the enterprise. The architect also oversees re-design activities and troubleshooting to ensure legal and compliance aspects are indeed met.
Incident Run Book
There needs to exist a run book that the architect is a principal in its development. A run book should look to use a layered manner to address an error in the AI model or its interfaces. Clear mechanisms and architecture need to exist to be able to disengage the model from a production system, and put the production system into a nominal configuration. The run book should also provide for clear expected outputs of the different stages of the system – that is, it should be documented what “correct” looks like and why at each stage to enable quicker troubleshooting. The error handling within the system should also be readily indexed in the run book so that system, data, and model tracing can occur. Finally, the run book should also include clear traceability to software repositories, configuration information, and points of contact.
Conclusion
These are exciting times for being an AI architect. The field is in its infancy and there are many excellent and challenging domains where one can make a significant impact. In this new era one needs to accept that there will be constant learning and adaptation. As the techniques and concepts of AI/ML continue to increase, the combinations and applications simply keep growing. That said, having a solid foundational knowledge of AI/ML concepts shall mitigate short-term knowledge gaps. Are you ready to become an AI architect?




