If this works, is there any theory why training models with low rank layers (y =...

		jprafael on March 24, 2023 \| parent \| context \| favorite \| on: LoRA: Low-Rank Adaptation of Large Language Models If this works, is there any theory why training models with low rank layers (y = (A.B).x + b) directly doesnt work? (or do they?)