In regards to self-hosting, there shouldn't be any copyright infringement as long as you are not re-serving the contents. Even then, there are exceptions, e.g. archive.org and Google Cache.
The biggest likely legal concern is possible accidental server DDOS, but as long as it respects robots.txt and it paces itself, that shouldn't happen.