Navigation auf uzh.ch
The analysis of large data sets (“big data”) is becoming increasingly important in science and elsewhere. In this course, you will learn how to use R to manage and manipulate large data sets, i.e. to sort, merge, subset, aggregate and reshape data, including outlier detection and gap filling algorithms.
For advanced data manipulation, we are going to use novel developments such as plyr/dplyr (“A Grammar of Data Manipulation”), the pipe operator (%>%) for simpler R-coding and data.table for the fast aggregation of large data sets. Furthermore, we will have a closer look at R-data base connections, MySQL queries and the creation of new data bases from R. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.
Individual Performance and Assessment: In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.
1 ECTS (24 learning hours)
Annually (Spring semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich
R Markdown, Quarto and Shiny are powerful R tools for static and dynamic reports, publications and dashboards that can be created fully reproducible using a highly intuitive notebook interface. In this course, you will learn to create Markdown documents consisting of code, text and the YAML header. We will use CSS files to format our reports and look at further customizations like section headings, citations, cross-references, animations, interactive plots, tables, comments and many more. While the main focus of this course will be on R Markdown and Quarto, we also will introduce Shiny for interactive web applications, shinythemes and htmlwidgets – and last but not least learn how to embed Shiny into R Markdown and Quarto docs. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.
Individual Performance and Assessment: In order to obtain the credit points, participants are required to attend *both* course days and hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.
1 ECTS (30 learning hours)
Annually (Spring Semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich
Visualisations can decide about the success of scientific lectures, poster presentations or journal articles. In this course you will get a very brief introduction into general design principles and guidelines for data visualisations. Based on this theoretical framework we will spend most of the course time to learn how to use R as a powerful graphical software to create a wide range of customised graphics that include - but are not limited to - traditional scatter plots, bar plots, mosaic plots, box plots, density plots, violin plots, and interactive graphics as well as grid-based geographic maps and state-of-the-art multipanel conditioning plots (and many more). You will learn about the two pillars of the R graphics systems, i.e. Traditional and Grid graphics. The course focuses on the latter system and more recent developments such as ggplot2 and other advanced libraries based on the “The grammar of graphics”-concept. Depending on the course progress, there will be scope for students to work on small projects and / or their own data sets.
Individual Performance and Assessment: Attendance and active participation during the course days (16 hours). In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home (preparation work of 14 hours). The details will be explained during the course. The assignment is due no later than one week after the course has ended.
1 ECTS (30 learning hours)
Annually (Fall Semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich
This course will introduce machine learning with emphasis on plant sciences. In Module 1 we will discuss topics like data pre-processing, feature extraction, clustering, regression, and classification. In Module 2, we will take first steps towards modern deep learning. Both modules consist of 50% lectures and 50% hands-on programming in python, where students will directly implement learned theory as a software to help solving problems in plant sciences. Module 2 also includes homework that has to submitted. In addition, a discussion round will allow to give feedback to the individual assignments and student's own data processing pipeline for module 2 on an additional day.
Students with a non-technical background will be introduced to machine learning. Emphasis is on hands-on programming and implementation of basic machine learning concepts to demystify the subject, equip participants with all necessary insights and tools to develop their own solutions, and to come up with original ideas for problems related to the context of plant sciences. Specific importance is placed upon the reconciliation of the predictions, which have been generated by automated processes, with the realities. By the end of the course, students will be able to decide where (and where not) to use machine learning, what method to choose for what research task, and how to critically evaluate model outputs in the context of plant sciences.
Prior Knowledge: Students should bring their laptops to the exercises because we will program on laptops directly. It is required that students enrolling in this course have successfully passed a course in basic data science and are familiar with programming (preferably in Python). Teaching assistants will help with all programming exercises.
Individual Performance and Assessment: Participation in Module 1 yields 1 ECTS. Participation in Module 1 and Module 2 and successful fullfilment of the homework assignments yields. 3 ECTS.
1-3 ECTS (30-90 learning hours)
Annually (Autumn Semester)
Lecturer: Prof. Dr. Jan Dirk Wegner (UZH)
Location: ETH Zurich
In collaboration with URPP
The aim of this course is to introduce students to the Linux/Unix command line and shell scripting by taking a hands-on approach. Short lectures present an overview of the Linux/Unix command line focusing on commands for working with files/directories and text files. Students also practice how to install and run software. Participants learn how to write simple shell scripts as they are often used to automate repetitive tasks and to build software pipelines. They will also discuss recommendations for reproducible research such as good coding practices. The course is composed of lectures and guided computer exercises. Students will spend most of the time solving computer exercises.
Individual Performance and Assessment: Attendance at lectures and active participation in the hands-on exercises are required.
0 ECTS (8 learning hours)
Annually (Autumn semester)
Lecturer: Dr. Deepak Tanwar UZH
Location: ETH Zurich
Compositional data analysis is a methodology used to describe the parts/compounds of a whole, conveying relative information. Typical examples in different fields are: geology (geochemical elements), medicine (body composition: fat, bone, lean), food industry (food composition: fat, sugar, etc), chemistry (chemical composition), ecology (abundance of different species), agriculture (nutrient balance ionomics), environmental sciences (soil contamination), plant science (water, carbon and nitrogen content, composition of soil or microbial communities, species composition) and genetics (genotype frequency). This type of data appears in most applications, and the interest and importance of consistent statistical methods cannot be underestimated. Compositional data analysis is the solution to the problem of how to perform a proper statistical analysis of this type of data i.e., to solve the problem of spurious correlation as it was named by Karl Pearson. This course will introduce compositional data analysis with emphasis on plant sciences.
Individual Performance and Assessment: tba.
1 ECTS (24 learning hours)
Biannually (fall semester 2024)
Lecturer: Dr. Matthias Templ (ZHAW)
together with Ecology Program
In this 6-day blocked course, the participants will learn to analyse experimental and observational data with general linear and linear mixed models. The course will be held as workshop, with lecture-type parts introducing important concepts and exercises in which the participants will work on data sets provided or their own data. A key goal will be that the participants learn to recognize the essential structure of data sets and to implemented them adequately in statistical models with fixed and random effects. Specifically, the course will deal with issues of experimental design, analysis of variance, hypothesis testing, variance components, models with multiple error terms as well as balanced and unbalanced data.
This course is not about generalized linear mixed models [GLMM, non-normal data], although it is possible to deal with such data in the projects.
Individual Performance and Assessment: In order to obtain the ECTS point, each participant is required to actively participate in the case-study work, discussions, and presentations during the course days.
1 ECTS (30 learning hours)
Annually (Spring semester)
Lecturer: Dr. Pascal Niklaus (UZH)
Location: University of Zurich
In collaboration with URPP
In this course, we will discuss the pre-eminent tool for identifying genes that underlie natural phenotypic variation: genome-wide association studies (GWAS). Originally developed by human geneticists to fine-map genes that underlie human disease, GWAS have the capacity to revolutionize all of the biological sciences. Plant biologists, in particular, have already taken advantage of improvements in sequencing technology in order to characterize genetic variation across the genomes of several species. Doing so has enabled the use of GWAS to fine-map genes that underlie ecologically and agriculturally relevant traits. At the beginning of the course, we will provide an introduction to GWAS. Then, we will discuss the history of gene mapping and the genetic and statistical background on which GWAS are based. The course has a strong practical component, and students will gain experience analyzing real data on the computer. At the end of the course, students will be able to interpret GWAS results and carry out their own analyses. We will also discuss basic concepts (and challenges) in population genetics, genomics, and quantitative genetics. For preparation, the students will have to read some literature which will be sent out prior to the course.
Individual Performance and Assessment: This 2-day course will be split between lectures and tutorials. Required: attendance, active participation during the exercises (16 hours) and handing in of an individual exercise after the course (14 hours of preparation work).
1 ECTS (30 learning hours)
Annually (Spring semester)
Lecturer: Prof. Thomas Wicker
Location: University of Zurich
This basic introduction to R focuses on the technical aspects of data organisation, handling, analysis and presentation using the wide-spread command line program R. This course is not an introduction to statistics, but lays the foundation to efficiently use statistical applications of R, which are introduced in other courses. No previous experience with programming languages is required. The course addresses students who would like to become familiar with a powerful, single and freely available alternative to spreadsheet programmes (excel), other, less flexible commercial statistical packages (SPSS, Jump, Minitab etc.) and graphics software for presenting data (excel, Sigmaplot etc.). Topics covered include the proper organisation of the workspace, reading and writing data files, using R as a calculator, using logic operators, manipulating data frames, summarising and aggregating data, programming ‘ifelse’ statements, loops, short routines, handling time fields in data frames, drawing and customising graphs. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.
Individual Performance and Assessment: Attendance and active participation during the course days (16 hours). In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home (preparation work of 14 hours). The details will be explained during the course. The assignment is due no later than one week after the course has ended.
1 ECTS (30 learning hours)
Annually (Fall Semester)
Lecturer: Dr. Jan Wunder
Location: ETH Zurich
In collaboration with PhD Program Ecology.
Structural equation models are increasingly used in ecology and evolution to disentangle the complex direct and indirect interactions that occur in nature. This course is an introduction to structural equation modeling (SEM) aimed at biologists who want to answer questions in observational and experimental settings. For more details, see abstract.
Individual Performance and Assessment: Active participation throughout the course. Full attendance. The assignment must be completed to obtain 1 ECTS.
1 ECTS (30 learning hours)
Annually (Autumn Semester 2023)
Lecturer: Frank Pennekamp (UZH)
Location: University of Zurich
In statistical modeling, the relationships between a response variable and one or more explanatory variables are estimated. In this class, we consider the theory of linear regression with one or more explanatory variables. Moreover, we also study robust methods and nonlinear models. Several numerical examples will illustrate the theory. You will learn to perform a regression analysis and interpret the results correctly. We will use the statistical software R to get hands-on experience with this. You will also learn to interpret and critique regression analyses done by others.
Individual Performance and Assessment: In order to obtain the credit points, participants are required to attend all course days and hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.
1 ECTS (30 learning hours)
Annually (Spring Semester)
Lecturer: Dr. Matthias Templ (ZHAW) and Barbara Templ (ETHZ)
Location: ETH Zurich
Interested in using Generative AI in your scientific work processes but in responsible and ethical responsive way? This block course for PhD students allows experimenting with generative AI to generate texts, images and audio that can be used in science from scientific presentations to publications. PhD students are invited to to experiment together in a problem-based setting around concern and critical topics when using AI-based tools with research data, private data and for output generation as images, text, short videos, audio or code. Invited experts will show their knowledge for example generation of scientific illustrations or customization of AI models in several hands-on workshops.
1 ECTS (30 learning hours)
New Course in Autumn semester 2024
Lecturers: Melanie Paschke, Jeanine Reutemann and other experts
Location: ETH Zurich
by UZH School for Transcdisciplinary Studies
course details
Larger collections of data are becoming increasingly available. To exploit their potential, statistical analysis skills are needed. The direct link between data and visualization/reporting of results is highly relevant in all empirical research disciplines, as several scientific fields have recently been criticized for lack of reproducibility.
Dynamic reporting tools can be used to directly link data, visualization and analysis outputs, allowing for rapid adaption after possible changes in the dataset, e.g. after data preparation, validation or in the context of manuscript revision.
Tailored to applications in empirical research, the course covers the basics of dynamic programming in R, including examples of dynamic reports for presentations, manuscripts, and html websites. Research methodology is reflected upon, especially in relation to reproducibility, Open Science and transdisciplinarity. Exemplary reports from different disciplines will be compiled and presented by the students.
by UZH School for Transdisciplinary Studies
course details
The course is divided in five topics. Version control through Gitlab and the tricks and techniques learned in Reproducible computing are practiced throughout the seven course weeks. Students acquire and practice skills in R programming such as the writing and use of bespoke functions as well as unit testing. The practice part includes several aspects of Good Statistical Practice such as the correct use and interpretation of p-values, sample size calculations, multiple and sequential testing. The course concludes with a summary look at meta data and their importance for reproducibility.