SCINet Geospatial Research Workshop 2020
Harnessing SCINet computational resources in geospatial data science to further sustainable and intensified agriculture.
Hosted By: SCINet Geospatial Research Working Group with support from USDA ARS, SCINet Scientific Computing Initiative
Dates: 8/25/2020 - 9/1/2020
Goals
The 2020 SCINet Geospatial Workshop continues the efforts outlined from the 2019 workshop held in Las Cruces, NM. The two overarching goals of this workshop are to:
- provide hands-on learning experiences (tutorials) on workflows to access the Ceres high-performance computing (HPC) system and conduct geospatial and machine learning research at scale,
- foster research efforts that had previously been un-attainable due to computational limitations or technical bottleknecks. This includes developing infrastructure and exploring state-of-the-art machine learning methods applicable to geospatial sciences.
Organizing Committee
Rowan Gaffney, Physical Scientist, Ft Collins, CO
Kerrie Geil, SCINet Postdoc, Las Cruces, NM
Amy Hudson, SCINet Postdoc, Las Cruces, NM
Yanghui Kang, SCINet Postdoc, Beltsville, MD
Suzy Stillman, SCINet Postdoc, Las Cruces, NM
How to Participate
If you are coming to this website after the workshop sessions have ended, welcome! All the tutorials we covered during the workshop have been formatted such that you can follow along with them anytime on your own and at your own pace from our webpages. Access session content, including all tutorials, using the “Session#” tabs at the very top and bottom of this homepage. If you run into any errors, feel free to notify the organizing committee so that we can correct the content. Thank you and happy learning!
All members of the working group as well as non-members from USDA ARS are welcome to participate! We also welcome our University collaborators who have USDA SCINet accounts. We are hoping that everyone will attend the general session of the working group (Session 1) and then pick and choose other sessions to attend based on your own interests and skill level.
The workshop is split over 6 separate Zoom sessions (as well as a pre-meeting assistance session) that will include:
- computational infrastructure and resource development for the ARS geospatial research community (e.g. the common data library and the geospatial workbook),
- hands-on tutorials to assist researchers in utilizing the Ceres HPC system, and
- research presentations from successful efforts using machine learning to address agricultural issues.
To follow along with the tutorials you need to already have or apply for a SCINet account and be able to successfully login to your account. We recommend applying for an account by 8/12/2020 at the latest, as the process can take 1-2 weeks for final approval. Please note, if you need help accessing your SCINet account you should plan on attending the pre-meeting login assistance session on 8/19/2020 (Session 0), but make sure you have applied for an account well in advance of this session.
To follow along with the Session 4 Tutorial: Computational Reproducibility Tools make sure you create a free personal Github account for yourself and remember your Github username and password. You will also, of course, need a SCINet account as described above.
Please register for each session individually using the registration links below so we can have an idea of how many people will be present at each event. Note, each session will have a separate Zoom link and password so you must register for each session you would like to attend.
Lastly, review the pre-meeting checklist and background information on the Pre-meeting page to ensure you are prepared for the workshop sessions.
Schedule / Registration
Note: All workshop sessions are open to all scientists and scientific staff at USDA ARS. We also welcome ARS contractors and University collaborators who have a SCINet account. Please make sure to register separately for each session you plan on attending (the Zoom join details are different for each session).
Quick Links to Content Below:
-
8/19/2020, 11am-1pm MDT, Session 0: Pre-meeting SCINet Account Login Assistance
-
8/25/2020, 11am-2pm MDT, Session 1: Annual Meeting of the SCINet Geospatial Research Working Group
-
8/27/2020, 11am-1pm MDT, Session 2: Tutorial: Intro to the Ceres HPC System Environment
-
8/27/2020, 1:30-2:30pm MDT, Session 3: Tutorial: Intro to Distributed Computing on the Ceres HPC System Using Python and Dask
-
8/28/2020, 10:30am-12:30pm MDT, Session 4: Tutorial: Computational Reproducibility Tools
-
8/28/2020, 1:00-2:30pm MDT, Session 5: Tutorial: Distributed Machine Learning: Using Gradient Boosting to Predict NDVI Dynamics
-
9/1/2020, 11am-2pm MDT, Session 6: Symposium: Challenges and opportunities in leveraging machine learning techniques to further sustainable and intensified agriculture
Session 0: Pre-meeting SCINet Account Login Assistance
Wednesday August 19, 11am - 1pm MDT
No registration required, just show up at 11am MDT: session completed
Prerequisites: None
For those who plan on participating in any of the Sessions 2-5 tutorials, this pre-meeting session with the SCINet Vitural Research Support Core (VRSC) is to help anyone who is having trouble accessing their SCINet account.
Please ensure that you have applied for a SCINet account well in advance of this pre-meeting session, as there are multiple approvals (including your supervisor) that new accounts must pass through before it will receive final approval. Suggested final date for applying for a new account in order to be ready for this pre-meeting session is Wednesday Aug 12. Go to https://scinet.usda.gov/signup/ to start the account application process.
Session 1: Annual Meeting of the SCINet Geospatial Research Working Group
Tuesday August 25, 11am - 2pm MDT
Registration Required: session completed
Prerequisites: None
The full content of this session is available on the session page. Access it using the “Session#” tabs at the very top and bottom of this homepage.
We encourage everyone to attend this general session- members and non-members from USDA ARS.
AGENDA (MDT) | |
---|---|
11-11:10 | Welcome and Session Rules |
11:10-11:30 | Review of the 2019 workshop |
11:30-11:45 | Details on the upcoming 2020 sessions |
11:45- 12 | Introduction to the SCINet postdocs |
12-12:15 | break |
12:15-1:15 | Working Session: SCINet common data library |
1:15-1:45 | Working Session: geospatial workbook |
1:45-2 | Proposals for new working group initiatives |
Session 2: Tutorial: Introduction to the Ceres High-Performance Computing System Environment (SSH, JupyterHub, Basic Linux, SLURM batch script)
Thursday August 27, 11am - 1pm MDT
Registration Required: session completed
Prerequisites: have a SCINet account and be able to login (apply for an account here)
The full content of this tutorial session is available on the session page. Access it using the “Session#” tabs at the very top and bottom of this homepage.
This interactive follow-along session will demonstrate how to access the SCINet Ceres HPC system by using Secure Shell at the command line as well as by using the JupyterHub web interface. We will also cover how to access JupyterLab and RStudio on the Ceres HPC through the JupyterHub web interface, basic linux commands, and how to write a SLURM batch script to submit a compute job on the Ceres HPC.
We will not troubleshoot individual SCINet account access problems during this session. If you are having trouble accessing your account please plan to attend Session 0.
Session 3: Tutorial: Introduction to Distributed Computing on the Ceres HPC System Using Python and Dask
Thursday August 27, 1:30pm - 2:30pm MDT
Registration Required: session completed
Prerequisites: basic Python or other basic programming skill helpful (expertise not required), have a SCINet account and be able to login (apply for an account here)
The full content of this tutorial session is available on the session page. Access it using the “Session#” tabs at the very top and bottom of this homepage.
This session will be an interactive follow-along about how to compute in parallel on the Ceres HPC system using Python tools. Participants will use their own SCINet account to walk through a Jupyter Notebook and execute Python code on the Ceres HPC system.
We will not cover how to login to your SCINet account or troubleshoot individual account access problems during this session. If you are having trouble accessing your account please plan to attend Session 0. If you are new to working in an HPC environment attending Session 2 first will be helpful but not required.
Session 4: Tutorial: Computational Reproducibility Tools (Git/Github, Conda, Docker/Singularity containers)
Friday August 28, 10:30am - 12:30pm MDT
Registration Required: session completed
Prerequisites: basic linux, create a free Github account for yourself and remember your username/password, have a SCINet account and be able to login (apply for an account here)
The full content of this tutorial session is available on the session page. Access it using the “Session#” tabs at the very top and bottom of this homepage.
This interactive follow-along session will demonstrate how to use Git/Github, the Conda package/environment management system, and Docker/Singularity containers on the Ceres HPC system. During the Git/Github portion we will cover how to copy an existing Github repo to your SCINet/Ceres account, make a change to the repo locally, push the repo online to your own Github account, and how to pull request your changes to get them incorporated into the original repo. The Conda portion will cover how to access or install Conda on Ceres, how to use Conda to download software on Ceres, how to use Conda environments to document all the software you are using and eliminate dependency issues, and how to save your Conda environment details to a specification file so that you can quickly recreate your complete software environment for any project. We will also cover how containers can allow your codes to run successfully on different operating systems, how to use (and create) a Docker image, and how to use Singularity on the Ceres HPC to run a container from a Docker image.
We will not cover basic linux, how to login to your SCINet account, or troubleshoot individual account access problems during this session. If you are having trouble accessing your account please plan to attend Session 0. If you need basic linux help or are new to working in an HPC environment please plan to first attend Session 2.
Session 5: Tutorial: Distributed Machine Learning: Using Gradient Boosting to Predict NDVI Dynamics
Friday August 28, 1:00pm - 2:30pm MDT
Registration Required: session completed
Prerequisites: basic Python and basic HPC skill helpful (expertise not required), have a SCINet account and be able to login (apply for an account here)
The full content of this tutorial session is available on the session page. Access it using the “Session#” tabs at the very top and bottom of this homepage.
This interactive follow-along tutorial uses a machine learning gradient boosting model (XGBoost) to predict NDVI (Harmonized Landsat Sentinel) from daily weather (PRISM) and physiologic variables (soil properties) at the Central Plains Experimental Range (CPER) Long Term Agro-ecosystem Research station. Participants will use their own SCINet account to walk through a Jupyter Notebook and execute Python code on the Ceres HPC system.
The workflow involves:
- Setup a cluster on Ceres (Dask Distributed)
- Read data and interpolate onto a consistent grid (Xarray, Dask Dataframe)
- Merge/shuffle/split the data (Dask_ML, Scikit Learn)
- Optimize the hyperparameters (Dask_ML, Scikit Learn, XGBoost)
- Train a distributed XGBoost model (Scikit Learn, XGBoost, Dask Distributed, datashader)
- Quantify the accuracy and visualize the results (Scikit Learn, SHAP)
We will not cover basic Python, basic distributed/parallel computing, how to login to your SCINet account, or troubleshoot individual account access problems during this session. If you are having trouble accessing your account please plan to attend Session 0. If you have limited experience working on an HPC system we recommend first attending Sessions 2 and 3.
Session 6: Symposium: Challenges and opportunities in leveraging machine learning techniques to further sustainable and intensified agriculture
Tuesday September 1, 11am - 2pm MDT
Registration Required: session completed
Prerequisites: None
Content from the session is available on the session page. Access it using the “Session#” tabs at the very top and bottom of this homepage.
This session is for USDA ARS scientists, scientific staff, and University collaborators who are interested in learning about how machine learning is being used in agricultural research. We will have 4 invited speakers from outside of USDA ARS give talks about using maching learning for a range of agricultural research questions, followed by a panel discussion.
AGENDA (MDT)
11-11:10 Drs Yanghui Kang & Amy Hudson, USDA-ARS SCINet Postdocs
- Welcome
11:10-11:40 Dr Matthew Jones, University of Montana
- Predicting rangeland fractional cover for the western U.S. with random forests and multitask learning
11:45-12:15 Dr Liheng Zhong, Descartes Labs
- How to use statistical data to train classifiers
12:20-12:50 Dr Vasit Sagan, Saint Louis University
- UAV-satellite spatio-temporal data fusion and deep learning for yield prediction
12:55-1:25 Dr Jingyi Huang, University of Wisconsin
- Characterizing field-scale soil moisture dynamics with big data and machine learning: challenges and opportunities for digital agriculture
1:30-1:40 Short Break
1:40-2:15 Panel Discussion
- What are the merits and pitfalls of machine learning techniques in comparison to traditional deterministic (physical/process-based) and probabilistic modeling/analysis for agricultural research?
- The USDA ARS is currently undergoing an effort to increase/improve the computational and machine learning capabilities across the agency. What are the skillsets that are required to conduct research with “big data” using machine learning techniques? Do you have suggestions on the best approaches for developing these skillsets?
- What do you see as the future directions for machine learning and advanced computing techniques (e.g. cloud, HPC) in agricultural research?
More information about our invited speakers can be found on our Session 6 page
Website Content Metadata
Website Content: CC BY-SA Rowan Gaffney / Kerrie Geil 2020 (get source code).
Website Theme: workshop-template-b by evanwill is built using Jekyll on GitHub Pages. The site is styled using Bootstrap with FontAwesome icons.