Accepted papers will be anounced till 30th of April
5 May 2019
Last day for registration
Hurry up! register before 5th of May
Conference
Gdansk University of Technology
Organization Committee
The people putting it together
Olgun Aydin
Co-head
Lives in Gdansk, Poland
Ania Rybinska
Co-head
Lives in Gdansk, Poland
Michal Maj
Co-head
Lives in Gdansk, Poland
Patryk Jasik
Member
Lives in Gdansk, Poland
Asst. Prof. Dr. Karol Flisikowski
Member
Lives in Gdansk, Poland
Diversity
satRday is dedicated to providing a harassment-free and inclusive conference experience for all in attendance regardless of, but not limited to, gender, sexual orientation, disabilities, physical attributes, age, ethnicity, social standing, religion or political affiliation.
We do not tolerate harassment of participants (including organisers and vendors) in any form. Sexual innuendos and imagery are not appropriate for any conference venue, including presentations.
Anyone violating these rules may be given warning or expelled from the conference (without a refund) at the discretion of the conference organisers.
Our code of conduct/anti-harassment policy can be found here.
Contact us
Should you have any questions, Please write to us!
Machine Learning meets Design
Machine Learning meets Design
We are surrounded by machine learning models. They decide which articles we see, which ads are displayed, what kind of shop offers, movies, clips, medical therapies are recommended for us. But, do we understand how these decisions are being made?
Machine learning models are complex. A single decision is sometimes calculated based on millions of parameters. How we can possibly understand them? We desperately need an effective language to explore, explain, debug and validate complex models.
During this talk we will present a visual language designed to explore predictive models. We will show visual elements of this language, describe how they were designed, and how they can be used.
Bio:
Przemysław Biecek is a statistician obsessed with interpretable machine learning and model visualisation. Works as associate professor at Warsaw University of Technology and recently at Samsung R&D. Maker interested in model-human interfaces and applications of predictive models in personalised medicine. Initiator of the Beta and Bit project for data literacy.
Date: 18 May 2019
Category: Invited talks
Design meets Machine Learning
Design meets Machine Learning
We are surrounded by machine learning models. They decide which articles we see, which ads are displayed, what kind of shop offers, movies, clips, medical therapies are recommended for us. But, do we understand how these decisions are being made?
Machine learning models are complex. A single decision is sometimes calculated based on millions of parameters. How we can possibly understand them? We desperately need an effective language to explore, explain, debug and validate complex models.
During this talk we will present a visual language designed to explore predictive models. We will show visual elements of this language, describe how they were designed, and how they can be used.
Bio:
Hanna Piotrowska is a graphic designer based in Warsaw, focusing mainly on data visualization, branding and visual communication. Awarded in Information Is Beautiful Awards, HOW International Design Awards, Polish Graphic Design Awards, KTR and others. Strongly interested in UX, data art and theory of perception.
Date: 18 May 2019
Category: Invited talks
Drawing ROC curves
Drawing ROC curves
Sometimes you want to draw a ROC curve while not having a classification model (yet). For example: you consider investing in a new credit scoring system. You hear you may increase your AUROC (area under ROC) by 5 percentage points. In order to translate it into financial results, you need to draw a nonexistent ROC curve.
I will review available options of modelling ROC curves for credit scoring and show which fit the real data best.
Bio:
Błażej Kochański tries to teach statistics at Politechnika Gdańska, consults banks in risk management if they want to listen, pretends to study physics, helps a startup fashion company to build data intelligence, promotes SME cooperation, raises a 3-year old and a bunch of terrible teens who look nice in pictures.
Date: 18 May 2019
Category: Invited talks
Joint modeling and dynamic predictions
Joint modeling and dynamic predictions with applications to cancer research using R package frailtypack
In the medical research different kinds of patient information are gathered over time together with clinical outcome data such as overall survival (OS). Joint models enable analysis of correlated data of different types such as individual repeated data together with OS. The repeated data may be recurrent events e.g., appearance of new lesions or a longitudinal outcome called biomarker e.g., tumor size. Moreover, joint models are useful for individual dynamic predictions of death using a patient’s history. The talk will introduce joint frailty models for recurrent events and a terminal event as well as present a multivariate joint model for longitudinal marker, recurrent events and survival in the context of treatment evaluation in cancer clinical trials. Presentation of implementing these methods will be given using the R package frailtypack.
Bio:
Agnieszka Król graduated in Mathematics at Politechnika Gdańska. She did her PhD training in Biostatistics at University of Bordeaux on development and applications of statistical models to the analysis of survival and longitudinal data coming from clinical trials on cancer treatment evaluation. As a postdoctoral research fellow she continued her research in correlated survival outcomes analysis at the Lunenfeld-Tanenbaum Research Institute (LTRI) at Mount Sinai Hospital, Toronto and, currently, at AstraZeneca. In parallel, she develops statistical software (R packages) for proposed methods and publish them for other researchers. Mother of two wonderful kids: two-year-old girl and 7-month-old boy.
Date: 18 May 2019
Category: Invited talks
Modern and beautiful dashboards: building Shiny Apps using SemanticUI components
Dominik Krzemiński, Krzysztof Sprycha
Ever wonder why the majority of Shiny apps look the same? Do you want your Shiny app to stand out? If yes, this tutorial on building extraordinary Shiny dashboards is for you. In this session, we will cover how to build dashboards with SemanticUI components. First, we will review our options and discuss the differences between the shinydashboard and semantic.dashboard packages. Second, we will create the first app using standard Bootstrap components. Next, we will switch to the semantic.dashboard and extend the app with SemanticUI components to create a fresh and highly interactive UI using both elements. We will further create SemanticUI Shiny inputs and show how to build more advanced components such as text inputs, date inputs, dropdowns and more. After that, we will change the default theme and show how to modify it. We will demonstrate how to use CSS and classes to bring a dashboard from research to business level. Finally, we will add an Extras section that will cover deployment and authentication. We assume our participants have basic knowledge of R Shiny. The only requirement is to bring a laptop with the latest version of the Shiny, shiny.dashboard, shiny.semantic and semantic.dashboard packages.
Bio
Dominik Krzemiński is an Open Source Tech Lead at Appsilon Data Science. He enjoys contributing to open source tools, primarily in R and Python. Dominik participated in Google Summer of Code program twice, developing Python tools supporting neuroscientific analysis. Before focusing entirely on Appsilon’s open source projects, he participated in bioinformatics data analysis and visualization projects. Currently, Dominik pursues PhD in computational neuroscience at Cardiff University. Privately, a fan of all kinds of board sports and capoeira.
Krzysztof Sprycha is graduate of Electronics and Telecommunication at Technical University in Gdańsk with specialization in Sound and Vision engineering. Before diving into data-science related topics he used to create apps for mobile platforms as well as games with Unreal Engine. Currently working at Appsilon Data Science as a Software Engineer. After hours he loves spinning records and rollerskating among other activities.
Date: 17-05-2019
Category: Workshops
Introduction to deep learning in R with Keras
Michal Maj
With the release of the R Keras package (https://keras.rstudio.com/) (by JJ Allaire and Francois Chollet) at the end of 2017 / beginning 2018 the topic of artificial neural networks and especially deep learning in R became red-hot within the R community.
In this workshop you will get answers for the following questions:
• What are fully connected and convolutional neural networks ?
• How to build a sequential model in Keras (keras_model_sequential() function) ?
• How to compile and fit neural networks in Keras (compile() and fit() functions) ?
• How to add regularization to neural networks (L1, L2, dropout) ?
• How to save and load existing models ?
• How to perform data ingestion and augmentation using generators ?
• How to use pre-trained models and perform fine-tuning ?
• How to use callbacks ?
Please make sure to bring your laptop including an up to date R version, RStudio and install Keras:
Setup Keras (Make sure to install required prerequisites, before installing Keras using the commands below)
install_keras() # CPU version
install_keras(tensorflow = ‘gpu’) # GPU version (recommended)
Bio
I’m a Data Scientist and a freelancer working primarily in R. In a last 3+ years I was working for companies like Appsilon Data Science, Grupa Wirtualna Polska and PBS when I was solving complicated machine learning problems and building advanced Shiny applications. Nowadays I’m interested mostly in deep learning and it’s applications.
Date: 17-05-2019
Category: Workshops
Web Scraping with rvest
Andrew Collier
There’s a wealth of data available on the internet which can be used for data augmentation or to create entirely new datasets. In this workshop you’ll learn how to use R to selectively scrape content from websites.
Andrew Collier is a Data Scientist. He works mostly in R, Python and SQL, but dabbles in a range of other technologies.
Date: 17-05-2019
Category: Workshops
Introduction to Tidyverse
Jakub Borkowski, Jan Wasilewski
No matter if You have already encountered R environment or just heard about the general idea. We are here not only to introduce this brilliant, complete tool but also to provide You with further guidelines. When it comes to describing the program’s greatest advantage, it could be associated with its simplicity and robustness. Tidyverse is one of the examples of that simple yet complex solutions. Itconsists of all packages that entry level data scientist could need. So if You are considering learning data science in R or maybe taking up an R programming course this lecture will make You stop racking Your brain on and start immediately.Majority of the tidyverse components and their usage will be discussed.
You will find following setup helpful during the meeting.
Tidyverse installed and loaded. Use install.packages(‘tidyverse’) command.
Slack account created, code chunks will be uploaded there. Link: https://slack.com/
Bio
Jakub Borkowski; Math student, AI solutions passionate, pension founder. Unsteady soul keeps me on the run, in seek of new development tracks. Recently, I have learned how to implement a handful of machine learning models, bake souffl ́e properly and manage small hotel. Moreover, with my colleague and a few Shiny developers we decided to improve healthcare system in our country. My main interests are developing AI- powered solutions, exploring foreign
countries (mainly Africa/ Iberian Peninsula), instrumental jazz, tennis, sometimes risk management methods and startup founding.
Jan Wasilewski; Studies financial mathematics, for the time being encompassed by a vision of getting more and more familiar with machine learning. During the semester eager to delve into variety of more and less professional realms, in a time of a journey happens to be just a laidback traveler. If You would not mind learning in a rhytm of african drums, You definetely should listen to the lecture.
Interests: Travels around Asia, Bayesian neural networks, underlying mathematical theory of AI, white blues and salsa.
Date: 17-05-2019
Category: Workshops
Nonnegative Matrix Factorization as a Tool
Nonnegative Matrix Factorization as a Tool to Segment Respondents in a High Dimensional Survey
From the current segmentation one requires them to follow following features: it should be balanced, segments should be distinctive, the discovered over and under indexed features within segments should create a meaningful story, and in the best case the amount of differentiative factors that drives segmentation should be small.
The last requirement often is a bottleneck in the scenario of a survey where respondents are asked enormous amount of question.
The solution, one from many, to this use case can be the nonnegative matrix factorization that in one attempt segments respondents and their features!
I'll present concept of the NMF decomposition and I'll present applications in R, with the explanation of diagnostic plots.
Working with high dimensional data? Often facing the need to group observations? That's a good presentation for you.
Date: 18 May 2019
Category: Talks
predPCR: an automated classification of qPCR curves
predPCR: an automated classification of qPCR curves
Quantitative Real-Time PCR (qPCR) is a simple and high-throughput method common in research, forensics, and medicine. The popularity of qPCR resulted in the plethora of software devoted to the analysis of qPCR data, including R packages. Despite that fact, there are no methods for automated assessment of qPCR results. Thus, we propose the predPCR tool for reproducible classification of qPCR data. Our model benefits not only from advancements in machine learning, as Bayesian optimization of hyperparameters, but also from theoretical advancements in the field of qPCR analysis. A combination of these two factors results in a powerful (AUC ~ 0.95 )yet simple to interpret model. We present the complete analytical workflow, along with supplementary tools explaining the decision-making process of predPCR tailored for users less fluent in machine learning. The machine learning approach enabled reliable, scalable, and automated qPCR curve classification with broad potential clinical and epidemiological applications. A web server is available from http://www.smorfland.uni.wroc.pl/shiny/predPCR/.
Bio:
Bioinformatician (Warsaw University of Technology), R enthusiast, a founding member of STWUR (Wroclaw R User Group) and the Why R? foundation.
Date: 18 May 2019
Category: Talks
Sit, relax, monitor. How to maintain models and how R can help?
Sit, relax, monitor. How to maintain models and how R can help?
In today’s financial institutions, analytical models are high-value strategic assets. As models are needed to run the business and keep up with regulations, they must be managed for optimal performance once in production. Model performance can go down over time no matter how good model is. In this talk, I will discuss best practices for preventing the output disaster from a data science perspective.
Bio:
A graduate in computer science, econometrics and sociology at the University of Gdańsk. For 3 years now, a data analyst at the BEST debt collection company, where she deals with strategy optimization based on data mining and machine learning techniques. Previously associated with the PBS research company, where she created analytical solutions for the telecom, energy and financial industries.
Date: 18 May 2019
Category: Talks
Tuning & Bootstrapping Performance ML Model
Tuning & Bootstrapping Performance ML Model
Building Machine Learning models seems to be simple. We have many easily applicable libraries for different purpose functions. However, it does not always turn out that our models bring the expected results. Sometimes it even happens that the learned model on the training set is evaluated high, and after testing on new sets, it incorrectly assesses the reality. Tuning hyperparameters and modeling by re-sampling the sample data and performing inferences on the sample from the re-tested data may be a solution to improve the quality of already built models. The main advantage of such bootstrap samples is the measurability of inference based on a real sample from the re-tested data.
Bio:
Monika Nawrocka is a data analyst at BEST S.A. and a doctor in optimization and mathematical modeling. Beginning her career at Machine Learning, she sought constructive remarks about her own work in the knowledge of outstanding scientists during scientific conferences at renowned discipleships, including Massachusetts Institute of Technology, Osaka City University or the Imperial College of London.
Date: 18 May 2019
Category: Talks
A Case Study on Machine Learning Classification Algorithms in R
Olgun Aydin , Ezgi Nazman Çabuk
With the continuous development of machine learning packages in R, statistical classification algorithms have been widely preferred to learn the performance of discrete label output (groups) with the help of these packages in many fields. On the other hand, researchers can confronted with some data sets which have no groups while it is available to be grouped by many other statistical methods. Data Envelopment Analysis (DEA), is a non-parametric method based on linear programming principles, can generate the grouped data as efficient and inefficient.
It is also important to evaluate this grouping results with classification algorithms. As it is known, Logistic Regression, Decision Trees, Artificial Neural Networks, Support Vector Machine the algorithms are most widely used classification methods in literature.
With this respect, we aimed to compare these classification performances on DEA grouping result. First, we obtained data set from Social Progress Index includes data from only 123 non-OECD countries on 7 indicators which consist of undernourishment (% of population; 5 signifies ≤5), depth of food deficit (calories/undernourished person), deaths from infectious disease (deaths/100.000), traffic deaths (deaths/100.000), greenhouse gas emission (CO2 equivalents per GDP), maternal mortality deaths (deaths/100.000 live births), and life expectancy (at years). The undernourishment, depth of food deficit, deaths from infectious disease, and traffic deaths were considered as related indicators on life expectancy. Second, we constructed an input oriented Charnes-Cooper-Rhodes (CCR) model (Charnes, Cooper, and Rhodes, 1978), where the life expectancy is output and other indicators are inputs. As a result, efficient and inefficient countries were obtained in terms of these indicators using Benchmarking package in R.
Finally, we evaluated the grouping results in terms of accuracy rate using several classification algorithms such as Logistic Regression, Decision Tree, Artificial Neural Networks, Support Vector Machine on this grouped data set using glm, keras , caret packages in R.
Bio:
Olgun Aydin is a PhD candidate at the Department of Statistics at Mimar Sinan University, and is studying deep learning for his thesis. He also works as a data scientist. Olgun is familiar with big data technologies, such as Hadoop and Spark, and is a very big fan of R. He has already published academic papers about the application of statistics, machine learning, and deep learning. He loves statistics, and loves to investigate new methods and share his experience with other people.
Ezgi Nazman Çabuk is a Research Assistant in Gazi University Department of Statistics where she has been studying for a PhD since 2015. She completed her Bachelor education at Ege University Department of Statistics in 2012. She was Erasmus student in University of Hradec Kralove, Informatics and Management between September 2011 - June 2012. From 2013-2015, she held Master of Science study in Statistics, Gazi University on combined cluster and discriminant analysis. Nowadays, she is studying on her PhD thesis.Her research interests lie in the area of statistics, data mining and machine learning, ranging from theory to implementation
Date: 18 May 2019
Category: Talks
A shiny application enabling facial attractiveness evaluation for purposes of plastic surgery
Lubomír Štěpánek
The ways how to evaluate facial attractiveness complexly and how to make comparisons between facial images of patients before and after facial plastic surgery procedure are still unclear and require ongoing research.
In this study, we have developed a web-based shiny application providing facial image processing, both manual and automated landmarking, facial geometry computations and machine-learning models allowing to identify geometric facial features associated with an increase of facial attractiveness after undergoing rhinoplasty, common facial plastic surgery.
Patients’ facial image data were processed, landmarked and analysed using the application. Facial attractiveness was measured using Likert scale by a board of independent observers. Machine-learning built-in approaches were performed to select predictors increasing facial attractiveness after undergoing rhinoplasty.
The shiny web framework enables to develop a complex web interface including HTML, CSS and javascript front-end and R-based back-end bridging C++ library dlib which performs image computations. In addition, the connected shinyjs package offers a user-server clickable interaction useful for the landmarking.
keywords: shiny, R, machine learning, facial attractiveness, plastic surgery
Bio:
Biostatistician, Software Developer, Junior Lecturer, PhD Candidate at First Faculty of Medicine, Charles University & Faculty of Biomedical Engineering, Czech Technical University in Prague (CZ)
I am a Master’s student in Statistics and a Ph.D. candidate in Biomedical Informatics focused on medical decision-making systems and facial attractiveness evaluation for purposes of plastic surgery. Moreover, I am a big R enthusiast, a medical doctor – former oncologist, and a junior university lecturer in Introductory Informatics, Statistics and R courses. Besides, I work as a data analyst for a small company consulting machine-learning, econometrics, actuarial sciences, and insurances.
Date: 18 May 2019
Category: Talks
Elasticsearch and R - deal with it!
Bartłomiej Staszkiewicz
As larger quantities of data are being stored and managed by enterprises of all kinds, NoSQL storage solutions are becoming more popular. Elasticsearch is a popular, high-performance NoSQL data storage option, but it is often unfamiliar to end users and difficult to navigate for day to day analytic tasks. It provides a distributed full-text search engine with a HTTP web interface and schema-free JSON documents.
This presentation will briefly discuss the benefits and disadvantages of Elasticsearch on Amazon Web Services (Amazon Elasticsearch Service) and describe in detail and with examples, how efficiently transfer data between ES and R. Three packages designed for this work - elastic, elasticsearchr and uptasticsearch (R and similar on Python) will be introduced. In addition, methods to deal with nested data (from JSON) and its conversion to data frame will be presented.
Bio:
Data Analyst at Excedo (joint venture of Nikkei Japan and Financial Times)
Date: 18 May 2019
Category: Talk
drake: reproducible workflow management in R
Dominik Rafacz
drake (Landau 2018, https://cran.r-project.org/package=drake) is an R package for reproducible data analysis workflows. Inspired by the GNU Make build system, drake controls all tasks involved in the data analysis. The abstraction system of drake incorporates all analytic steps, from data acquisition to report generation. The package orchestrates the execution of the workflow and automatically caches the results of every step. Additionally, drake monitors dependencies between these tasks pointing out when the alteration of a single task affects the others. It reduces the time needed to rerun the workflow, as drake evaluates only altered step while relying on cached results of the non-altered. Finally, drake simplifies accessing the parallelization backends and streamlines running the workflow on multicore systems or even supercomputers. drake pipelines, designed directly in R and exported as R objects further enhance the workflow reproducibility. During my presentation, I show how with the help of containers and the drake package reach the pinnacles of reproducibility in R.
Bio:
Student of Data Analysis and Engineering at Warsaw University of Technology. R enthusiast, passionate about data analysis and bioinformatics
Date: 18 May 2019
Category: Lightning Talks
Bayesian inference in big data analysis
Katarzyna Sidorczuk
The ‘large p small n’ problem occurs when the number of available covariates is significantly larger than sample size. Such short fat data is a common issue in medical studies, where the availability of patients is limited. Frequentist methods fail in such cases as the correction for multiple testing results in very high p-values. In consequence, it is difficult to distinguish significant effects from noise. To address these problems, we used Stan software for performing Bayesian modeling and inference. The R interface to Stan is available as the rstan package allowing to easily fit models from R and access its outputs. The Bayesian approach provides a widespread value distribution over, e.g., the effect size that is more interpretable than p-values. We applied this method for analysis of data derived from peptide arrays - collections of short protein fragments used in research and diagnostics. They allow testing thousands of peptides simultaneously, bringing advantage to search for new biomarkers used for clinical diagnoses of different diseases, including cancers. We compared frequentist and Bayesian methods and present advantage of the latter in big data analysis.
Bio:
Bioinformatician, MSc student at Faculty of Biotechnology, University of Wrocław), member of STWUR (Wrocław R User Group).
Date: 18 May 2019
Category: Lightning Talks
HaDeX - analysis of HDX-MS data
Complex and precise experiments like Mass Spectrometry generate an enormous amount of data. Such datasets require manual pre-processing, which due to their size, is tedious, time-consuming and error-prone. To automatise these steps and also provide a whole analytic workflow, we present HaDeX, an R package for analysis and visualization of Hydrogen/Deuterium Exchange Mass Spectrometry (HDX-MS) data. As we do not want to limit our tools to users familiar with programming, HaDeX is also available as a Shiny web server. It facilitates complete data analysis, including quality control and Bayesian framework for differential analysis. The sheer volume of data requires highly efficient data processing which is ensured by the data.table package. Our tool also provides a collection of data visualizations that comprehensively summarize HDX-MS results. In addition to that, our analytic methodology is discussed in-depth in the package vignette. The package is available on GitHub: https://github.com/michbur/HaDeX.
Bio:
Bioinformatician, Data Scientist and PhD Student in the Institute of Biochemistry and Biophysics Polish Academy of Sciences
Date: 18 May 2019
Category: Lightning Talks
AmyloGram: analysis of proteins in R
The structure and therefore function of proteins are encoded in the linear sequence of amino acids. Our toolkit, the biogram R package, provides a set of useful tools for encoding protein sequences into features understandable by machine learning algorithms. Our software, inspired by natural language processing, extracts n-grams of amino acids from proteins and selects only the most informative ones using developed by us Quick Permutation Test (QuiPT). We present advantages of our approach using AmyloGram, an R package and shiny server (link) for prediction of amyloids, proteins associated with the number of clinical disorders (e.g., Alzheimer’s, Creutzfeldt-Jakob’s and Huntington’s diseases). Amyloid proteins are extremely diverse sequence-wise, but all of them can undergo a unique self-aggregation. AmyloGram effectively recognizes patterns responsible for this behavior (AUC = 0.8972) outperforming existing amyloid-predicting software. Moreover, predictions of AmyloGram were verified experimentally as our tool led to the discovery of a novel amyloid protein, MspA, produced by Methanospirillum hungatei JF-1. www.smorfland.uni.wroc.pl/shiny/AmyloGram/
Bio:
Experimentalist and bioinformatician, PhD student in the BioTechNan program at the Faculty of Biotechnology, University of Wrocław.