Back to all updates

over 1 year ago

GraphFrames in Spark + new Medicare data

Quick reminder - we’re holding online office hours this Thursday, 7/28, at 12pm ET, where IBM Developer Advocate David Taieb will demo how to use GraphX in Spark (we’ll have open Q&A, too).

If you’d like to get a head start on the material, check out David’s detailed blog, where he covers:

  • How to use GraphFrames (and any other Spark packages) within an IPython notebook, including for the IBM Analytics for Apache Spark service on Bluemix.
  • The pixiedust module that, among other things, provides a simple API to create compelling in-context interactive visualizations.
  • How to create a graph from data stored in the Cloudant JSON database service.
  • A few of the graph computation APIs provided by GraphFrames.

David’s exercises and code are also available in a completed Jupyter Notebook. If you haven’t already, you can RSVP for the webinar here.

NEW! Medicare cost and medical research study data from 4Quant (Data available in JSON dataframe and raw formats; simple DBC Scala Notebook example also available)

Biomedical image analytics company 4Quant has generously shared some of their medical industry datasets for your use during the hackathon. Data includes cost and usage information for U.S. Medicare, as well as study and research output from medical publications worldwide.

4Quant suggests searching for leading indicators in health care costs and usage in these two datasets and establishing the links between them. Check out their GitHub repo for data access, possible use case hypotheses, and demos in Python and Scala.


We're here to help. If you have any questions about the hackathon, post on the discussion forum or email and we'll respond as soon as we can.