oh neat ill check that one out. i dont get that much speedup from ssd/128gb unified vs vram if im doing like a predefined set of prompts, since i have it load it from disk anyway and im just doing one forward pass per prompt, and just like load part of it at a time. its a bit slower if im doing cpu inferencing but i only had to do that with one model so far.
but yeah on demand would be a lot of ssd churn so id just do it for testing or getting some hidden state vectors.
but yeah on demand would be a lot of ssd churn so id just do it for testing or getting some hidden state vectors.