AIBullisharXiv โ CS AI ยท 1d ago7/10
๐ง
ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators
Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.