Cascading

The Cascading project is a collection of applications, languages, and APIs for developing data-intensive applications.

It was originally designed to provide a much more user friendly API over Apache Hadoop MapReduce, but has evolved over the years to support other computing platforms like Apache Tez, and offering a stand-alone (local-mode) backend for local data streaming.

Cascading, embedded in a serverless fabric (sans Hadoop), is currently used at Salesforce to ingress and (re)pave 8TB of data a day to power multiple internal stakeholder and external customer facing products.

One of it’s key features is how the query planner can partition large complex computation graphs and place them efficiently into a cluster.

Cascading was acquired in 2016 (via the acquisition of Concurrent, Inc.), but not abandoned. New versions continue to be released under a new domain name, https://cascading.wensel.net/, formerly https://cascading.org/.

Tessellate, built on Cascading, was designed to be used stand-alone, or as a Clusterless workload.

Reach out to learn more.

Chris K Wensel
Chris K Wensel
Data and Analytics Architect