Scalable Processing of Scientific Data in the Age of Data Science
Speaker: Johann-Christoph Freytag, HU/ECDF
My talk addresses different aspects of of Data Science to emphasize that Data Science comprises of more than machine learning. In the first part of the talk, I outline how research in the natural sciences has changed over the last centuries leading to data driven science approach as we observe it today. I describe a specific example in the astronomy domain exemplifying the new approach to scientific research. Based on this historic view I discuss how to define the term data science more precisely. Using massive amounts of data for research motivated scalable data processing platforms such as Hadoop, Spark, or Flink which only become true due to dramatic changes in computer hardware. The talk discusses some properties of these systems and contrasts them with existing relational DBMSs. In the third part of the talk I focus on challenges of privacy when managing and manipulating personal data in domains such as in the health/medical domain or business (customer) data. I outline the challenges and briefly describe possible solutions that still allow scientists to use personal data in a useful manner.