Spark Summit East 2016 Report Day 1
Matei Zaharia told us about all the amazing things coming in the next release in May. This includes API consolidation, new structured streaming and performance enhancements. A preview of this can be run inside the Databricks Cloud.
Shaun Connolly from Hortonworks did an excellent keynote on Spark and Hadoop in practice and really interesting talk about all that you can do with the wealth of open source tools in the space.
Anjul Bhambhri from IBM did a great keynote about IBM’s innovation, investment, and enhancement to Spark through SystemML and other features and libraries they are adding. IBM’s billion dollar investment and weaving Spark into products like Watson really helps extend Spark into the Enterprise.
Netflix did a great talk (as their team always does) on an interesting application they are working for that makes Netflix cool. “Distributed Time Travel for Feature Generation” was a really cool example of adding historic data to Spark jobs. They will be open sourcing another tool around this soon. Netflix is a great contributor to open source and their embrace of releasing amazing tools and products is an example to all enterprises.
Prasad Chalasani from MediaMath did an excellent talk, “Monte Carlo Simulations in Ad-Lift Measurement Using Spark”. They are doing some cool data science with Spark for determining if advertisements are working.
Great news coming out of Databricks is now they have a free Community Edition of the Databricks Cloud being made available soon. Spark Summit East 2016 participants will be the beta testers and get early access. I will review this once I get my account. From the demos, it’s the same great features of the Databricks Cloud along with social feature for up to 3 users to share notebooks, extensive documentation with built-in runnable notebooks, and the entire content from th Inde Spark MOOC also with built-in runnable notebooks. Again Databricks embracing the free training of developers in Spark and helping build the community. This is a must do.
Another cool item from Databricks is their Dashboard view of Databricks Cloud Notebooks. You can now share HTML views of your notebook over the Internet. This is for the enterprise version and the community edition. This is awesome! I will embed them in some
upcoming posts once I get access. This allows for sharing of live data and a is great asset for learning, bloggers and the enterprise. Thanks Databricks! Indeed, “Spark, I am Your Father”, rings true. Databricks is a great example of a company stewarding an amazing open source project.
Read on The Big Data Zone