LLMOps, short for large language model operations, covers the practices and tools teams use to deploy, run, and monitor LLM-based applications in production. It adapts MLOps to the specifics of language models, including prompt management, evaluation, versioning, retrieval, latency, and cost control.
A support team running a retrieval-augmented assistant, for example, uses LLMOps to version prompts, score answer quality on a fixed test set, and watch latency and spend as traffic grows.
Most LLMOps work tracks the model and the application code. A frequent blind spot is the data state behind each run. Reproducing an answer that worked in a pilot requires binding the run to its exact inputs and data version, not the model version alone. Run binding and reproducible execution close that gap.