On Device LLM Serving Engine

Optimizing and supporting language models for execution on NPU backend.

Summary

We currently working on model acceleration using an NPU backend.

In particular, we are collaborating with machine learning engineers to support language models, optimize model inference, and enhance performance on edge devices. We have achieved very strong performance in this effort.

As this work is confidential, detailed information is closed.