Topic: Running LLM based chatbots on edge and low compute environments
Abstract: Deploying conversational AI systems on resource-constrained edge computing devices requires sophisticated optimization techniques to ensure computational efficiency without compromising performance quality. This paper presents a comprehensive study on the optimization and deployment of an end-to-end conversational AI pipeline - comprising Whisper automatic speech recognition (ASR), a 3-billion parameter Llama-3 large language model (LLM) and lightweight text-to-speech (TTS) - running entirely on a Raspberry Pi module.