>If Apache is your web server, you can use Spark as an ETL tool for R.
Oh dear, no. The Apache Software Foundation oversees the development of both the Apache web server, and Spark — which you can use to query Hadoop clusters. Confusing, I know.
I thought about stopping but decided to keep reading. I'm not sure these guys know what ETL Tooling means, some of the entries were just bespoke R script only packages that just extract data into R.
I see Pentaho Kettle on the list and as a user I wouldn't recommend it for large jobs unless you have plenty of time. In my experience row processing speeds of hundreds per second are typical and thousands of rows per second are rare. Loading tens of millions of rows is an exercise in patience.
Oh dear, no. The Apache Software Foundation oversees the development of both the Apache web server, and Spark — which you can use to query Hadoop clusters. Confusing, I know.