We are happy to announce the first GA release (1.0) for Spring for Apache Hadoop, almost one year to the date from the release of its first milestone release.
During that time we have incorporated a great deal of your feedback to drive the road-map, so thanks everyone from the community who have helped! While new features have been added over the year, the goal of Spring for Apache Hadoop remains the same, to simplify the development of Hadoop based applications.
Simplified Programming Model & Consistency
What we have observed is that using the standard out of the box tools that come with Hadoop, you an easily end up with Hadoop applications that are poorly structured collection of command line utilities, scripts and pieces of code stiched together. The different origins of the various projects in the Hadoop ecosystem, such as Hive and Pig focusing on declarative usage or Cascading and HBase for a programmatic angle, have led to different approaches to configuration and API designs.
Spring for Apache Hadoop provides a consistent programming and configuration model across a wide range of Hadoop ecosystem projects: rather then dictating what to use, the framework embraces and enhances your technology stack, staying true to the core Spring principles.
Spring’s familiar Template API design pattern is applied to Hadoop, with the results being helper classes such as HBaseTemplate, HiveTemplate and PigTemplate. This brings with it familiar Spring data access Template features such as translation to Spring’s portable data access exception hierarchy, thread-safe access to underlying resources, and lightweight object mapping features.
Java-centric APIs, such as Cascading, can be used freely, with or without additional configuration, through Spring Framework's excellent Java configuration.
Start small and grow as needed
Another theme that has emerged over the past year is to encourage the approach where you can start small and grow into complex solutions. The introduction of various Runner classes allows the execution of Hive, Pig scripts, vanilla Map/Reduce or Streaming jobs, Cascading flows but also invocation of pre and post generic JVM-based scripting all through the familiar JDK Callable contract. You can mix and match the runners as needed but as complexity grows, one can easily upgrade to Spring Batch, such that multiple steps can be coordinated in a stateful manner and administered using a REST API. Spring Batch’s rich functionality for handling the ETL processing of large file translates directly into Hadoop use cases for the ingestion and export of files form HDFS. The use of Spring Hadoop in combination with Spring Integration allows for rich processing of event streams that can be transformed, enriched, filtered, before being read and written from HDFS or other storages such as NOSQL stores, for which Spring Data provides plenty of support.
We have covered a variety of scenarios through the sample applications (no need to compile them, they are already compiled and ready to be downloaded) that complement the comprehensive user documentation (it includes even a section on how to get stared with Spring for Apache Hadoop using Amazon’s Elastic MapReduce service).
Additionally, as a companion to the samples, one can use the recent Spring Data book for the full feature set that can be achieved using Spring technologies, Hadoop and NOSQL.
Spring for Apache Hadoop is being tested daily against the various Hadoop 1.x distributions (such as vanilla Apache Hadoop, Cloudera CDH3 and CDH4, Greenplum HD): we want to make sure SHDP works reliably no matter your Hadoop environment. We are working actively to improve the user experience – Spring for Apache Hadoop is provided out of the box in the Greenplum HD distribution.
We keep a close eye on Hadoop 2.x development and working towards providing support for it in the near future as well.
If you are using Spring for Apache Hadoop, we would love to hear from you. Please take our survey and share your experiences.
As always, we look forward to your feedback!
 The author royalties from Spring Data book sales are donated to Creative Commons organization.
- Highlights of Spring for Apache Hadoop 1.0.0 M2
- Introducing Spring for Apache Hadoop
- Introducing Spring XD
- Spring Remains at the Forefront of Enterprise Java: BigData, NoSQL, and Cloud Portability
- Spring and Open Source at the Pivotal Initiative