Navigation auf uzh.ch

Suche

Digital Skills

Return to Course Catalogue

Advanced Data Management and Manipulation using R

 

The analysis of large data sets (“big data”) is becoming increasingly important in science and elsewhere. In this course, you will learn how to use R to manage and manipulate large data sets, i.e. to sort, merge, subset, aggregate and reshape data, including outlier detection and gap filling algorithms.

For advanced data manipulation, we are going to use novel developments such as plyr/dplyr (“A Grammar of Data Manipulation”), the pipe operator (%>%) for simpler R-coding and data.table for the fast aggregation of large data sets. Furthermore, we will have a closer look at R-data base connections, MySQL queries and the creation of new data bases from R. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.

Individual Performance and Assessment: In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.

 

1 ECTS (24 learning hours)
Annually (Spring semester)
Lecturer:
Dr. Jan Wunder

Compositional Data Analysis

 

Compositional data analysis is a methodology used to describe the parts/compounds of a whole, conveying relative information. Typical examples in different fields are: geology (geochemical elements), medicine (body composition: fat, bone, lean), food industry (food composition: fat, sugar, etc), chemistry (chemical composition), ecology (abundance of different species), agriculture (nutrient balance ionomics), environmental sciences (soil contamination), plant science (water, carbon and nitrogen content, composition of soil or microbial communities, species composition) and genetics (genotype frequency). This type of data appears in most applications, and the interest and importance of consistent statistical methods cannot be underestimated. Compositional data analysis is the solution to the problem of how to perform a proper statistical analysis of this type of data i.e., to solve the problem of spurious correlation as it was named by Karl Pearson. This course will introduce compositional data analysis with emphasis on plant sciences.

Individual Performance and Assessment: tba.

1 ECTS (24 learning hours)
Biannually (fall semester 2024)
Lecturer:
Dr. Matthias Templ (ZHAW)

General Linear and LInear Mixed Models in R

together with Ecology Program 

In this 6-​day blocked course, the participants will learn to analyse experimental and observational data with general linear and linear mixed models. The course will be held as workshop, with lecture-​type parts introducing important concepts and exercises in which the participants will work on data sets provided or their own data. A key goal will be that the participants learn to recognize the essential structure of data sets and to implemented them adequately in statistical models with fixed and random effects. Specifically, the course will deal with issues of experimental design, analysis of variance, hypothesis testing, variance components, models with multiple error terms as well as balanced and unbalanced data.
This course is not about generalized linear mixed models [GLMM, non-​normal data], although it is possible to deal with such data in the projects.

Individual Performance and Assessment: In order to obtain the ECTS point, each participant is required to actively participate in the case-​study work, discussions, and presentations during the course days.

 

1 ECTS (30 learning hours)
Annually (first: spring 2022)
Lecturer: Dr. Pascal Niklaus (UZH)

Introduction to Functional Genomics

The aim of the course is to enable participants to design and interpret functional genomics experiments and critically evaluate available technical options. Demonstrations of available technologies at the FGCZ will be included. In the postgenomic era emphasis of research shifts from merely accumulating sequence data towards the identification of functional significance of gene products. The goal of functional genomics is to understand the relationship between genome sequence and phenotype. An important aspect here is the measurement of molecular activities with the high-throughput ‘omics’ technologies transcriptomics, proteomics and metabolomics. The course comprises a theoretical introduction to mass spectrometry, the key technology for protein and metabolite analyses, and to transcriptional profiling. The diverse set of available technologies and most recent developments will be presented, including bioinformatic approaches to analyse data and comprehend large amounts of data.

Individual Performance and Assessment: In order to obtain the credit point, active participation during all three courses days is mandatory.

1 ECTS (30 learning hours)
Every two years (last 2017, next to be announced)
Lecturer: Dr. Bernd Roschitzki, Dr. Endre Laczko, Andrea Patrignani, Dr. Lucy Poveda and Dr. Giancarlo Russo (Functional Genomics Center Zurich)

Introduction to Genome-Wide Association Studies (GWAS)

In collaboration with URPP

 

In this course, we will discuss the pre-eminent tool for identifying genes that underlie natural phenotypic variation: genome-wide association studies (GWAS). Originally developed by human geneticists to fine-map genes that underlie human disease, GWAS have the capacity to revolutionize all of the biological sciences. Plant biologists, in particular, have already taken advantage of improvements in sequencing technology in order to characterize genetic variation across the genomes of several species. Doing so has enabled the use of GWAS to fine-map genes that underlie ecologically and agriculturally relevant traits. At the beginning of the course, we will provide an introduction to GWAS. Then, we will discuss the history of gene mapping and the genetic and statistical background on which GWAS are based. The course has a strong practical component, and students will gain experience analyzing real data on the computer. At the end of the course, students will be able to interpret GWAS results and carry out their own analyses. We will also discuss basic concepts (and challenges) in population genetics, genomics, and quantitative genetics. For preparation, the students will have to read some literature which will be sent out prior to the course.

Individual Performance and Assessment: This 2-day course will be split between lectures and tutorials. Required: attendance, active participation during the exercises (16 hours) and handing in of an individual exercise after the course (14 hours of preparation work).

1 ECTS (30 learning hours)
Annually (next FS 2024)
Lecturer: Prof. Thomas Wicker

Introduction to Machine learning for Plant Scientists

This course will introduce machine learning with emphasis on plant sciences. In Module 1 we will discuss topics like data pre-processing, feature extraction, clustering, regression, and classification. In Module 2, we will take first steps towards modern deep learning. Both modules  consist of 50% lectures and 50% hands-on programming in python, where students will directly implement learned theory as a software to help solving problems in plant sciences. Module 2 also includes homework that has to submitted. In addition, a discussion round will allow to give feedback to the individual assignments and student`s own data processing pipeline for module 2 on an additional day.

Students with a non-technical background will be introduced to machine learning. Emphasis is on hands-on programming and implementation of basic machine learning concepts to demystify the subject, equip participants with all necessary insights and tools to develop their own solutions, and to come up with original ideas for problems related to the context of plant sciences. Specific importance is placed upon the reconciliation of the predictions, which have been generated by automated processes, with the realities. By the end of the course, students will be able to decide where (and where not) to use machine learning, what method to choose for what research task, and how to critically evaluate model outputs in the context of plant sciences.

Individual Performance and Assessment: Participation in Module 1 yields 1 ECTS.
Participation in Module 1 and Module 2 and successful fullfilment of the homework assignments yields. 3 ECTS. 

1-3 ECTS (30-90 learning hours)
Annually (Autumn Semester 2023)
Lecturer:
Prof. Dr. Jan Dirk Wegner (UZH)

Introduction to Meta-analysis and Research Synthesis in Ecology

In collaboration with PhD Program Ecology

This course aims to promote and facilitate the thoughtful and critical use of meta-analysis for research synthesis in ecology by: 1) Explaining the principles and advantages of meta-analysis for research synthesis, 2) Demonstrating the range of applications of meta-analysis in ecology, 3) Promoting understanding of the assumptions and limitations of meta-analysis, 4) Providing first-hand experience in question formulation, data extraction, database design, use of software for meta-analysis and report preparation. The course program includes: Lectures on the history of meta-analysis, types of quantitative research synthesis, conversion of ecological data to effect sizes, and question formulation; combining effect sizes across studies and testing for moderators in meta-analysis (meta-regression), practical on conducting meta-analysis using OpenMEE software; publication bias, dealing with varying research quality and non-independence of observations; format of meta-analysis report, review of case studies of meta-analysis in ecology, and critique of meta-analysis. Practical exercises on data extraction and inclusion criteria and metrics of effect size for their own meta-analysis, testing for moderators; testing for publication bias in own dataset, and considering sources of non-independence.

1 ECTS (30 learning hours)
Annually (next FS 2024)
Lecturer: Prof. Dr. Julia Koricheva, UK

Introduction to R

This basic introduction to R focuses on the technical aspects of data organisation, handling, analysis and presentation using the wide-spread command line program R. This course is not an introduction to statistics, but lays the foundation to efficiently use statistical applications of R, which are introduced in other courses. No previous experience with programming languages is required. The course addresses students who would like to become familiar with a powerful, single and freely available alternative to spreadsheet programmes (excel), other, less flexible commercial statistical packages (SPSS, Jump, Minitab etc.) and graphics software for presenting data (excel, Sigmaplot etc.). Topics covered include the proper organisation of the workspace, reading and writing data files, using R as a calculator, using logic operators, manipulating data frames, summarising and aggregating data, programming ‘ifelse’ statements, loops, short routines, handling time fields in data frames, drawing and customising graphs. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.

Individual Performance and Assessment:  Attendance and active participation during the course days (16 hours). In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home (preparation work of 14 hours). The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Annually (Fall Semester)
Lecturer: 
Dr. Jan Wunder

Introduction to Structural Equation Modeling

In collaboration with PhD Program Ecology.

Structural equation models are increasingly used in ecology and evolution to disentangle the complex direct and indirect interactions that occur in nature. This course is an introduction to structural equation modeling (SEM) aimed at biologists who want to answer questions in observational and experimental settings. For more details, see abstract.

Individual Performance and Assessment: Active participation throughout the course. Full attendance. The assignment must be completed to obtain 1 ECTS.

1 ECTS (30 learning hours)
Annually (Autumn Semester 2023)
Lecturer:
Frank Pennekamp (UZH)

Introduction to UNIX/Linux and Bash Scripting (BIO609, introductory course for BIO 610)

In collaboration with URPP

The aim of this course is to introduce students to the Linux/Unix command line and shell scripting by taking a hands-on approach. Short lectures present an overview of the Linux/Unix command line focusing on commands for working with files/directories and text files. Students also practice how to install and run software. Participants learn how to write simple shell scripts as they are often used to automate repetitive tasks and to build software pipelines. They will also discuss recommendations for reproducible research such as good coding practices. The course is composed of lectures and guided computer exercises. Students will spend most of the time solving computer exercises.

Individual Performance and Assessment: Attendance at lectures and active participation in the hands-on exercises are required.

0 ECTS (8 learning hours)
Annually (next autumn 2023)
Lecturer: Dr. Deepak Tanwar UZH

 

Next-Generation Sequencing 1: Introductory Course - Assembly, Annotation and Transcriptomes (BIO 610)

In collaboration with URPP

Handling of the huge data produced by next generation sequencers (NGS) requires us experimental knowledge and computational skills. The aim of this course is to familiarize the participants with experimental methods and data analysis about NGS. Topics will include: fundamental analysis of the sequence data, UNIX tools, and RNA-seq analysis. Learning outcomes are: - Understand concepts of NGS technologies, - Understand basic operation of UNIX operating system, - Design a research experiment and the data analysis involving biologically relevant issues affecting populations of plants or animals, - Map NGS data onto a reference genome and estimate gene expression level, - Understand differential gene expression and polymorphism analysis using NGS data, - Understand algorithms of De novo assembly and alignment of NGS data, and - Understand basic bioinformatics of large datasets for practical use in genetic analyses.

Individual Performance and Assessment: Attendance at lectures and active participation in the hands-on exercises are required.

1 ECTS (30 learning hours)
Annually (fall semester)
Lecturer: Prof. Kentaro Shimizu, Prof. Jun Sese (Japan), Dr. Rie Inatsugi, Dr. Masaomi Hatakeyama and Dr. Jianqiang Sun, UZH

Next-Generation Sequencing 2: Advanced Course - Transcriptomes, Variant Calling and Biological Interpretation (BIO 634)

In collaboration with URPP

The goal is to introduce the students into data processing and analysis used in next-generation sequencing (NGS). Based on the course BIO610 "Next-Generation Sequencing for Model and Non-Model Species" it will extend knowledge of NGS analysis and skills in computing taking a hands-on approach.

Individual Performance and Assessment: Attendance at lectures and active participation in the hands-on exercises are required.

1 ECTS (30 learning hours)
Annually (fall semester)
Lecturer: Dr. Gregor Roth and Prof. Kentaro Shimizu (UZH)

 

Microbiomics II: Metabarcoding - from Bioinformatics to Statistics

This computer block course provides a thorough introduction to the application of next-generation sequencing techniques for analyzing diversity of microbial communities with a main focus on the metabarcoding technique. The topics covered by the course range from bioinformatic processing of sequencing data to the most important approaches in multivariate statistics. Using a combination of theoretical lectures and hands-on computer exercises, the participants will learn the computational steps from processing raw sequencing reads down to the final statistical evaluations. 

Individual Performance and Assessment: In order to obtain the ECTS points, participants are required to actively participate during the four course days.

1 ECTS (28 learning hours)
Annually (Spring semester)
Lecturer: Hartmann Martin, Institute of Agricultural Sciences, ETH Zurich

QTL Analysis in Arabidopsis

This course is an introduction to current methods used in the study of polygenetic variation in plants. In particular, we’ll explore the use of quantitative genetic experiments, quantitative trait locus (QTL) analyses, and linkage disequilibrium (LD) mapping as tools for dissecting the genetic details of continuous variation. The course will concentrate on providing students with the basic statistical and conceptual foundation for understanding continuous variation as well as an introduction to various mapping methods and current challenges in QTL cloning. Finally, we will collect phenotypic data on an Arabidopsis thaliana experimental population and conduct basic mapping analyses in a hands-on lab setting.

Individual Performance and Assessment: In order to obtain the ECTS, each participant is required to actively participate in classroom discussions and computer based analyses.

1 ECTS (24 learning hours)
Biannually (last 2021, next 2023)
Lecturer:
Prof Ueli Grossniklaus, University of Zurich, Prof. Tom Juenger, University of Texas at Austin

Reporting using R Markdown and Shiny

R Markdown and Shiny are powerful R packages for static and dynamic reports, publications and dashboards that can be created fully reproducible using a highly intuitive notebook interface. In this course, you will learn to create Markdown documents consisting of code, text and the YAML header. We will use CSS files to format our reports and look at further customizations like section headings, citations, cross-​references, animations, interactive plots, tables, comments and many more. While the main focus of this course will be on R Markdown, we also will introduce Shiny for interactive web applications, shinythemes and htmlwidgets – and last but not least learn how to embed Shiny into R Markdown docs. Depending on the course progress, there will be scope for individuals to work on small projects and / or their own data sets.

Individual Performance and Assessment: In order to obtain the credit points, participants are required to attend *both* course days and hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Annually (Spring Semester)
Lecturer: 
Dr. Jan Wunder

Statistical Modelling

In statistical modeling, the relationships between a response variable and one or more explanatory variables are estimated. In this class, we consider the theory of linear regression with one or more explanatory variables. Moreover, we also study robust methods and nonlinear models. Several numerical examples will illustrate the theory. You will learn to perform a regression analysis and interpret the results correctly. We will use the statistical software R to get hands-​on experience with this. You will also learn to interpret and critique regression analyses done by others.

Individual Performance and Assessment: In order to obtain the credit points, participants are required to attend all course days and hand in an assignment to be carried out at home. The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Annually (Spring Semester)
Lecturer: 
Dr. Matthias Templ (ZHAW) and Barbara Templ (ETHZ)

Scientific Visualisation Using R

Visualisations can decide about the success of scientific lectures, poster presentations or journal articles. In this course you will get a very brief introduction into general design principles and guidelines for data visualisations. Based on this theoretical framework we will spend most of the course time to learn how to use R as a powerful graphical software to create a wide range of customised graphics that include - but are not limited to - traditional scatter plots, bar plots, mosaic plots, box plots, density plots, violin plots, and interactive graphics as well as grid-based geographic maps and state-of-the-art multipanel conditioning plots (and many more). You will learn about the two pillars of the R graphics systems, i.e. Traditional and Grid graphics. The course focuses on the latter system and more recent developments such as ggplot2 and other advanced libraries based on the “The grammar of graphics”-concept. Depending on the course progress, there will be scope for students to work on small projects and / or their own data sets.

Individual Performance and Assessment: Attendance and active participation during the course days (16 hours). In order to obtain the credit points, participants are required to hand in an assignment to be carried out at home (preparation work of 14 hours). The details will be explained during the course. The assignment is due no later than one week after the course has ended.

1 ECTS (30 learning hours)
Annually (Fall Semester)
Lecturer: 
Dr. Jan Wunder

Tutorial on how to work with Clusters

tba.

Individual Performance and Assessment: tba

 

0 ECTS (30 learning hours)
Annually (coming soon)
Lecturer:
Aria Minder (ETHZ GDC) (UZH)

Visual analytics of large-scale biological data

In this course, we will focus on omics data (mainly genomics and transcriptomics data) and combined data such as GWAS and eQTL. The course is a mixture of theoretical lectures and interactive, practical sessions. The hands-on training will introduce the most commonly applied tools in the field as well as some maybe less commonly but nonetheless very useful ones. Dependent on the participants’ programming abilities we will use GUI-based tools as well as R/Bioconductor and other scripting languages. Learning Outcomes: Understand the process of visual analytics, Know the basics and do’s and don’ts of visualization, Learn how to visualize large-scale genome data, Learn how to visualize transcriptional regulation and abundance, Understand the challenge of GWAS and eQTL data visualization and learn new approaches to address these challenges.

Individual Performance and Assessment: Active participation is expected on all course days (24 hours). Participants will be given practical tasks, their performance will be assessed by their degree of commitment, ability to apply the theoretical concepts to the task in question and creativity. A summary of the completed task and a course diary will have to be handed in after the course (approx. 6 hours effort)

1 ECTS (30 learning hours)
Biannually (last 2019)
Lecturer:
PD Dr. Kay Nieselt, Center for Bioinformatics Tübingen, Integrative Transcriptomics, University of Tübingen

CROSS-LISTED COURSES

Get R_eady: Dynamic Reporting & Reproducibility in Research

by UZH School for Transcdisciplinary Studies

course details

Larger collections of data are becoming increasingly available. To exploit their potential, statistical analysis skills are needed. The direct link between data and visualization/reporting of results is highly relevant in all empirical research disciplines, as several scientific fields have recently been criticized for lack of reproducibility.

Dynamic reporting tools can be used to directly link data, visualization and analysis outputs, allowing for rapid adaption after possible changes in the dataset, e.g. after data preparation, validation or in the context of manuscript revision.

Tailored to applications in empirical research, the course covers the basics of dynamic programming in R, including examples of dynamic reports for presentations, manuscripts, and html websites. Research methodology is reflected upon, especially  in relation to reproducibility, Open Science and transdisciplinarity. Exemplary reports from different disciplines will be compiled and presented by the students.

Open and Reproducible Science: Dependable Computations and Statistics

by UZH School for Transdisciplinary Studies

course details

The course is divided in five topics. Version control through Gitlab and the tricks and techniques learned in Reproducible computing are practiced throughout the seven course weeks. Students acquire and practice skills in R programming such as the writing and use of bespoke functions as well as unit testing. The practice part includes several aspects of Good Statistical Practice such as the correct use and interpretation of p-values, sample size calculations, multiple and sequential testing. The course concludes with a summary look at meta data and their importance for reproducibility.

  1. Version control
  2. Reproducible computing with R
  3. Questionable Research Practices
  4. Good statistical practice
  5. Tools in R for meta data handling

Weiterführende Informationen