AIBullisharXiv – CS AI · Mar 267/10
🧠
ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators
Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.