From time to time I help clients making the most of their data. In the following I present a related non-complete list of tools that I’m experienced in.

  • Amazon Web Services (AWS)
  • SQL-Databases including dialects MySQL, PostgreSQL and HIVE (on Qubole)
  • self-written Python scripts using Numpy, Scipy and pandas
  • common queueing systems like PBS and SGE
  • methods from network theory
  • C++-based packages for network-related problems
  • D3.js and Phaser.js to create interactive JavaScript figures
  • GNU/Linux or Mac OSX

My consultation entails solving problems in multiple layers of abstraction.

Meta problems

  • what kind of questions do you want to have answered
  • is the right data to answer those questions already available?
  • what kind of available data is actually usable
  • which other data sources can we acquire
  • planning: let’s find the most efficient way to answer the initially posed questions given we answered all of the above

Number crunching

  • collect relevant internal data
  • collect relevant external data (e.g. from social network sites)
  • data analysis using statistical methods and state-of-the-art tools of distributed computing


  • extraction of the relevant information from the analysis
  • interactive summary of the results in a visually appealing manner

I’d be happy to help on your projects, too! Please contact me if you feel like I could be an asset to the solution of your problem!

ben Written by:

Ben is a final-year PhD student at the Robert Koch-Institute / HU Berlin and a freelance data scientist.