Digital Skills

Return to Course Catalogue

Advanced Data Management and Manipulation using R

The analysis of large data sets (“big data”) is becoming increasingly important in science and elsewhere. In this course, you will learn how to use R to manage and manipulate large data sets, i.e. to sort, merge, subset, aggregate and reshape data, including outlier detection and gap filling algorithms.

For advanced data manipulation, we are going to use novel developments such as plyr/dplyr (“A Grammar of Data Manipulation”), the pipe operator (%>%) for simpler R-coding and data.table for the fast aggregation of large data sets. Furthermore, we will have a closer look at R-data base connections, MySQL queries and the creation of new data bases from R. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.

Individual Performance and Assessment: In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (24 learning hours)
Biennial (Spring semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich

Top

Plant Science Data Analysis Clinic

In this course, students will bring a problem from their PhD or master’s project to share with the class. Students will get tailored advice about the appropriate statistical analysis/approach for their data in this course. They will work on their problem with guidance from the instructor and peer support through regular online meetings. At the end of the course, students will present their solution to the class. Workshop 1: Introduction and organization, Students’ presentation of their problem. Week 2-Week 11: Weekly group online classes (1.5 hours). Workshop 2: Final presentations of students’ projects, discussion and feedback.

Learning Objectives - Students will gain hands-on experience using GitHub within RStudio for version control and collaboration - Students will develop an R script tailored to analyze their specific research problem - Students will contribute to a shared repository of R scripts, enhancing collective learning and code reusability.

Individual Performance and Assessment: Active participation includes a mandatory 6 hours of in-person engagement, 10 online classes, and the submission of an R script presenting the solution to the problem.

1 ECTS (24 learning hours)
Annual (Fall Semester)
Lecturer: Dr. Natacha Bodenhausen
Location: University of Basel
Course Code in UNIBAS VVZ: 76283

Top

Reporting using Quarto, R Markdown & Shiny

R Markdown, Quarto and Shiny are powerful R tools for static and dynamic reports, publications and dashboards that can be created fully reproducible using a highly intuitive notebook interface. In this course, you will learn to create Markdown documents consisting of code, text and the YAML header. We will use CSS files to format our reports and look at further customizations like section headings, citations, cross-references, animations, interactive plots, tables, comments and many more. While the main focus of this course will be on R Markdown and Quarto, we also will introduce Shiny for interactive web applications, shinythemes and htmlwidgets – and last but not least learn how to embed Shiny into R Markdown and Quarto docs. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.

Individual Performance and Assessment: In order to obtain the credit points, participants are required to attend *both* course days and hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Annually (Spring Semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich

Top

Scientific Visualisation Using R

Visualisations can decide about the success of scientific lectures, poster presentations or journal articles. In this course you will get a very brief introduction into general design principles and guidelines for data visualisations. Based on this theoretical framework we will spend most of the course time to learn how to use R as a powerful graphical software to create a wide range of customised graphics that include - but are not limited to - traditional scatter plots, bar plots, mosaic plots, box plots, density plots, violin plots, and interactive graphics as well as grid-based geographic maps and state-of-the-art multipanel conditioning plots (and many more). You will learn about the two pillars of the R graphics systems, i.e. Traditional and Grid graphics. The course focuses on the latter system and more recent developments such as ggplot2 and other advanced libraries based on the “The grammar of graphics”-concept. Depending on the course progress, there will be scope for students to work on small projects and / or their own data sets.

Individual Performance and Assessment: Attendance and active participation during the course days (16 hours). In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home (preparation work of 14 hours). The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Annually (Fall Semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich or University of Basel

Top

Introduction to Machine learning for Plant Scientists - Module 1 & 2

This course will introduce machine learning with emphasis on plant sciences. In Module 1 we will discuss topics like data pre-processing, feature extraction, clustering, regression, and classification. In Module 2, we will take first steps towards modern deep learning. Both modules consist of 50% lectures and 50% hands-on programming in python, where students will directly implement learned theory as a software to help solving problems in plant sciences. Module 2 also includes homework that has to submitted. In addition, a discussion round will allow to give feedback to the individual assignments and student's own data processing pipeline for module 2 on an additional day.

Students with a non-technical background will be introduced to machine learning. Emphasis is on hands-on programming and implementation of basic machine learning concepts to demystify the subject, equip participants with all necessary insights and tools to develop their own solutions, and to come up with original ideas for problems related to the context of plant sciences. Specific importance is placed upon the reconciliation of the predictions, which have been generated by automated processes, with the realities. By the end of the course, students will be able to decide where (and where not) to use machine learning, what method to choose for what research task, and how to critically evaluate model outputs in the context of plant sciences.

Prior Knowledge: Students should bring their laptops to the exercises because we will program on laptops directly. It is required that students enrolling in this course have successfully passed a course in basic data science and are familiar with programming (preferably in Python). Teaching assistants will help with all programming exercises.

Individual Performance and Assessment: Participation in Module 1 yields 1 ECTS. Participation in Module 1 and Module 2 and successful fullfilment of the homework assignments yields. 3 ECTS.

1-3 ECTS (30-90 learning hours)
Annually (Autumn Semester)
Lecturer: Prof. Dr. Jan Dirk Wegner (UZH)
Location: ETH Zurich

Top

Compositional Data Analysis

Compositional data analysis is a methodology used to describe the parts/compounds of a whole, conveying relative information. Typical examples in different fields are: geology (geochemical elements), medicine (body composition: fat, bone, lean), food industry (food composition: fat, sugar, etc), chemistry (chemical composition), ecology (abundance of different species), agriculture (nutrient balance ionomics), environmental sciences (soil contamination), plant science (water, carbon and nitrogen content, composition of soil or microbial communities, species composition) and genetics (genotype frequency). This type of data appears in most applications, and the interest and importance of consistent statistical methods cannot be underestimated. Compositional data analysis is the solution to the problem of how to perform a proper statistical analysis of this type of data i.e., to solve the problem of spurious correlation as it was named by Karl Pearson. This course will introduce compositional data analysis with emphasis on plant sciences.

Individual Performance and Assessment: tba.

1 ECTS (24 learning hours)
Biennial (fall semester 2024)
Lecturer: Dr. Matthias Templ (ZHAW)

Top

General Linear and Linear Mixed Models in R

together with Ecology Program

In this 6-day blocked course, the participants will learn to analyse experimental and observational data with general linear and linear mixed models. The course will be held as workshop, with lecture-type parts introducing important concepts and exercises in which the participants will work on data sets provided or their own data. A key goal will be that the participants learn to recognize the essential structure of data sets and to implemented them adequately in statistical models with fixed and random effects. Specifically, the course will deal with issues of experimental design, analysis of variance, hypothesis testing, variance components, models with multiple error terms as well as balanced and unbalanced data.
This course is not about generalized linear mixed models [GLMM, non-normal data], although it is possible to deal with such data in the projects.

Individual Performance and Assessment: In order to obtain the ECTS point, each participant is required to actively participate in the case-study work, discussions, and presentations during the course days.

1 ECTS (30 learning hours)
Annually (Spring semester)
Lecturer: Dr. Pascal Niklaus (UZH)
Location: University of Zurich

Top

Introduction to Genome-Wide Association Studies (GWAS)

In collaboration with URPP

In this course, we will discuss the pre-eminent tool for identifying genes that underlie natural phenotypic variation: genome-wide association studies (GWAS). Originally developed by human geneticists to fine-map genes that underlie human disease, GWAS have the capacity to revolutionize all of the biological sciences. Plant biologists, in particular, have already taken advantage of improvements in sequencing technology in order to characterize genetic variation across the genomes of several species. Doing so has enabled the use of GWAS to fine-map genes that underlie ecologically and agriculturally relevant traits. At the beginning of the course, we will provide an introduction to GWAS. Then, we will discuss the history of gene mapping and the genetic and statistical background on which GWAS are based. The course has a strong practical component, and students will gain experience analyzing real data on the computer. At the end of the course, students will be able to interpret GWAS results and carry out their own analyses. We will also discuss basic concepts (and challenges) in population genetics, genomics, and quantitative genetics. For preparation, the students will have to read some literature which will be sent out prior to the course.

Individual Performance and Assessment: This 2-day course will be split between lectures and tutorials. Required: attendance, active participation during the exercises (16 hours) and handing in of an individual exercise after the course (14 hours of preparation work).

1 ECTS (30 learning hours)
Annually (Spring semester)
Lecturer: Prof. Thomas Wicker
Location: University of Zurich

Top

Introduction to R

This basic introduction to R focuses on the technical aspects of data organisation, handling, analysis and presentation using the wide-spread command line program R. This course is not an introduction to statistics, but lays the foundation to efficiently use statistical applications of R, which are introduced in other courses. No previous experience with programming languages is required. The course addresses students who would like to become familiar with a powerful, single and freely available alternative to spreadsheet programmes (excel), other, less flexible commercial statistical packages (SPSS, Jump, Minitab etc.) and graphics software for presenting data (excel, Sigmaplot etc.). Topics covered include the proper organisation of the workspace, reading and writing data files, using R as a calculator, using logic operators, manipulating data frames, summarising and aggregating data, programming ‘ifelse’ statements, loops, short routines, handling time fields in data frames, drawing and customising graphs. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.

Individual Performance and Assessment: Attendance and active participation during the course days (16 hours). In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home (preparation work of 14 hours). The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Annually (Fall Semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich

Top

Introduction to Structural Equation Modeling

In collaboration with PhD Program Ecology.

Structural equation models are increasingly used in ecology and evolution to disentangle the complex direct and indirect interactions that occur in nature. This course is an introduction to structural equation modeling (SEM) aimed at biologists who want to answer questions in observational and experimental settings. For more details, see abstract.

Individual Performance and Assessment: Active participation throughout the course. Full attendance. The assignment must be completed to obtain 1 ECTS.

1 ECTS (30 learning hours)
Annually (Autumn Semester 2023)
Lecturer: Frank Pennekamp (UZH)
Location: University of Zurich

Top

Statistical Modelling

In statistical modeling, the relationships between a response variable and one or more explanatory variables are estimated. In this class, we consider the theory of linear regression with one or more explanatory variables. Moreover, we also study robust methods and nonlinear models. Several numerical examples will illustrate the theory. You will learn to perform a regression analysis and interpret the results correctly. We will use the statistical software R to get hands-on experience with this. You will also learn to interpret and critique regression analyses done by others.

Individual Performance and Assessment: In order to obtain the credit points, participants are required to attend all course days and hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Biennial (Spring Semester)
Lecturer: Dr. Matthias Templ (ZHAW) and Barbara Templ (ETHZ)
Location: ETH Zurich

Top

Explore the responsible use of AI in generating scientific texts, images, audio and code for scientific purposes

Interested in using Generative AI in your scientific work processes but in responsible and ethical responsive way? This block course for PhD students allows experimenting with generative AI to generate texts, images and audio that can be used in science from scientific presentations to publications. PhD students are invited to to experiment together in a problem-based setting around concern and critical topics when using AI-based tools with research data, private data and for output generation as images, text, short videos, audio or code. Invited experts will show their knowledge for example generation of scientific illustrations or customization of AI models in several hands-on workshops.

1 ECTS (30 learning hours)
New Course from autumn semester 2024 and spring semester 2025
Lecturers: Melanie Paschke, Jeanine Reutemann, Thomas Frei, Réka Mihálka, Alexis Shakas, Paulina Zybinska and other experts
Location: ETH Zurich

Top

Get R_eady: Dynamic Reporting & Reproducibility in Research

by UZH School for Transcdisciplinary Studies

course details

Larger collections of data are becoming increasingly available. To exploit their potential, statistical analysis skills are needed. The direct link between data and visualization/reporting of results is highly relevant in all empirical research disciplines, as several scientific fields have recently been criticized for lack of reproducibility.

Dynamic reporting tools can be used to directly link data, visualization and analysis outputs, allowing for rapid adaption after possible changes in the dataset, e.g. after data preparation, validation or in the context of manuscript revision.

Tailored to applications in empirical research, the course covers the basics of dynamic programming in R, including examples of dynamic reports for presentations, manuscripts, and html websites. Research methodology is reflected upon, especially in relation to reproducibility, Open Science and transdisciplinarity. Exemplary reports from different disciplines will be compiled and presented by the students.

Open and Reproducible Science: Dependable Computations and Statistics

by UZH School for Transdisciplinary Studies

course details

The course is divided in five topics. Version control through Gitlab and the tricks and techniques learned in Reproducible computing are practiced throughout the seven course weeks. Students acquire and practice skills in R programming such as the writing and use of bespoke functions as well as unit testing. The practice part includes several aspects of Good Statistical Practice such as the correct use and interpretation of p-values, sample size calculations, multiple and sequential testing. The course concludes with a summary look at meta data and their importance for reproducibility.

Version control
Reproducible computing with R
Questionable Research Practices
Good statistical practice
Tools in R for meta data handling

Education at FGCZ

Courses at Functional Genomic Center

Quicklinks

Main navigation

Digital Skills

Return to Course Catalogue

DATA MANAGEMENT

Additional Information

Advanced Data Management and Manipulation using R

Plant Science Data Analysis Clinic

Reporting using Quarto, R Markdown & Shiny

DATA VISUALISATION

Additional Information

Scientific Visualisation Using R

MACHINE LEARNING

Additional Information

Introduction to Machine learning for Plant Scientists - Module 1 & 2

STATISTICS & MODELING

Additional Information

Compositional Data Analysis

General Linear and Linear Mixed Models in R

Introduction to Genome-Wide Association Studies (GWAS)

Introduction to R

Introduction to Structural Equation Modeling

Statistical Modelling

SCIENTIFIC WORKFLOWS AUTOMATISATION USING AI

Additional Information

Explore the responsible use of AI in generating scientific texts, images, audio and code for scientific purposes

CROSS-LISTED COURSES

Additional Information

Get R_eady: Dynamic Reporting & Reproducibility in Research

Open and Reproducible Science: Dependable Computations and Statistics

Education at FGCZ

Additional Information