On Device LLM Serving Engine
Optimizing and supporting language models for execution on NPU backend.
Summary
We currently working on model acceleration using an NPU backend.
In particular, we are collaborating with machine learning engineers to support language models, optimize model inference, and enhance performance on edge devices. We have achieved very strong performance in this effort.
As this work is confidential, detailed information is closed.