Musings about Remote Development with Visual Studio Code

[ til  vscode  ]

Short snippets of notes on VS Code Remote Developent extensions from the perspective of a remote compute user


When running computation codes on the remote HPC (high-performance computing) cluster, I need to ensure that I am running the correct version of the code with the intended configuration files which define the parameters of the job to be run. I run computational fluid dynamics (CFD) codes using OpenFOAM v5.0 (a C++ library for CFD), and need to keep track of the combination of constants + boundary and initial conditions + spatio-temporal discretization schemes + solvers + number of compute nodes that I was using for the CFD simulations in the form of configuration files. Often, I have to check these configurations using vim on the remote cluster environment before executing my codes on the cluster, which could sometimes lead to frustration as I would need to check across to ensure that changes to these files on the local desktop environment and the remote cluster environment are in sync with each other.

Amongst my scatter-brained angst of ensuring that the correct jobs are run on the HPC and the correct results are retrieved from the HPC back to my local desktop environment, I wonder - is there a more user-friendly way for me to work and make changes directly on the remote environment, without going through vim and (clumsily, since I do tend to make mistakes on vim) mess up my configuration files?


Challenges of doing remote dev work in Data Analytics

When I started my current role as a data engineer in a Data Analytics team, the team was getting started with moving code development work onboard an on-premise development cloud. Running data preprocessing and model training codes on our laptops took hours or even days to complete, since our work laptops were meant for general-purpose usage and weren’t powerful enough (2 cores, 4 threads) to run deep learning codes and computationally-intensive workflows without crashing halfway. Running our codes on CentOS-based compute VM instances in our development cloud was relatively smoother and faster after the VM instances were scaled up to 32 cores; however, editing codes on-the-fly within the remote instance did not feel as intuitive as developing/debugging codes on the GUI code editors on our Windows-based local machine.

Initially, my remote development workflow looks similar to this:

  1. Write/edit code on Notepad++ / Visual Studio Code / Jupyter Notebook within local machine
  2. Debug code on Windows CMD terminal / Ubuntu on WSL
  3. If code runs correctly for reduced computational load (correct logic, no syntax errors etc.),refactor/modify code to run for full computational load if required.
  4. Upload code files + dependencies to remote instance via FTP/SFTP filesystem using WinSCP/Filezilla
  5. SSH into remote instance and run code on remote instance
  6. Download output files generated by code from remote instance to local machine
  7. Check if I am getting the expected output files
  8. Repeat Step 1

Problem is, what if we are working with large datasets and we can’t store them on our local machine? Do we have to buy even more external storage just to be able to download and work with the data on our local machine, only to re-upload changes back to the remote instance?

That’s where the Remote Development Extension in VS Code comes in to make writing and developing code directly on the remote instance easier - we can now write code closer to our data sources residing in the remote environment. When more data comes in after our Proof of Concept for a data science project gets the buy-in from our clients/business users, the ability to work closer to our data sources within a remote environment (be it a data lake, database, data mart etc.) becomes even more important.


Setting up a remote dev environment in Visual Studio Code

To get started, we need:

  1. An OpenSSH compatible SSH client
  2. Visual Studio Code or Visual Studio Code Insiders
  3. Remote Development extension pack on Visual Studio Code

At the time of this post, my local machine is running on Windows 10 version 1809.

OpenSSH client for Windows users

If you are running Windows 10, check that you are on version 1803 or newer - OpenSSH Client will be automatically installed. If not, update to the latest version of Windows 10 either via Automatic Updates or manually.

To check if you have OpenSSH Client installed, go to Settings -> Manage optional features and check if you can see OpenSSH Client. IF yes, OpenSSH Client is installed in your local machine.

Otherwise, from Manage optional features -> Add a feature, select OpenSSH Client and (if desired, but not needed for getting Remote Dev to work) OpenSSH Server to install.

If you are running earlier Windows, you can use Git for Windows which contains ssh.exe in the install path. Note that I have not tested the setup on earlier Windows - it was stated in the Troubleshooting section of the VS Code Remote Development Docs.

Note that PuTTY is not supported on Windows since the ssh command must be in the path.

Visual Studio Code or Visual Studio Code Insiders

Remote Development in Visual Studio Code was initially released in beta through Visual Studio Code Insiders on 2nd May 2019, and has been released as a Preview extension in Visual Studio Code since the May 2019 (version 1.35) update. A few colleagues and I have been playing around with Remote Development since its beta release on Visual Studio Code Insiders; since then, we have conducted an internal team briefing (with our ever-helpful system architect leading the briefing) on how to set up Remote - SSH in Visual Studio Code for use with our development cloud. In fact, I’m using Remote - WSL in Visual Studio Code to write this post and pushing updates to Github via Git on Visual Studio Code.

Stable releases are on Visual Studio Code; however, if you would like to test out improvements and new features that the Visual Studio Code team are working on for future releases, you could play around with Visual Studio Code Insiders - it’s like having a sneak peek into what’s coming up in the next stable release of Visual Studio Code.

At the point of writing this post, Visual Studio Code Insiders supports 32-bit ARMv7l (or ARMv8 in 32-bit mode) glibc-based Linux (Raspbian Stretch/9+) hosts and Alpine Linux.

Remote Development extension

The Remote Development extension pack consists of the following:

  1. Remote - SSH
  2. Remote - Containers
  3. Remote - WSL

To install all these extensions, go to Extensions on VS Code and search for Remote Development in the Marketplace.

Configuring SSH-based authentication for Remote - SSH

According to the Visual Studio Code docs on Remote - SSH, the instructions for configurating SSH-based authentication are as follows:

  1. Check to see if you already have a SSH key. The public key is typically located at ~/.ssh/id_rsa.pub on macOS/Linux, and at %USERPROFILE%.ssh\id_rsa.pub on Windows.
    • If you do not have a key, run the following command in a terminal / command prompt to generate a SSH key-pair. This command creates a key-pair encrypterd using RSA-4096: ssh-keygen -t rsa -b 4096
  2. Add the contents of your local public key (the id_rsa.pub file) to the appropriate authorized_keys file(s) on the remote host:
    • Run the following commmands in a local command prompt:
     SET REMOTEHOST=your-user-name-on-host@host-fqdn-or-ip-goes-here
    
     scp %USERPROFILE%\.ssh\id_rsa.pub %REMOTEHOST%:~/tmp.pub
     ssh %REMOTEHOST% "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat ~/tmp.pub >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys && rm -f ~/tmp.pub"
    

Since I might need to access multiple remote hosts using SSH, I configured SSH-based authentication using a dedicated SSH key for Remote SSH in VS Code. The instructions in the docs are as follows:

  1. Run the following command in a terminal / command prompt to generate a SSH key-pair. This command creates a key-pair encrypterd using RSA-4096: ssh-keygen -t rsa -b 4096 -f %USERPROFILE%\.ssh\id_rsa-remote-ssh

  2. In VS Code, run Remote-SSH: Open Configuration File… in the Command Palette (F1), select an SSH config file, and add (or modify) a host entry as follows:

     Host name-of-ssh-host-here
     User your-user-name-on-host
     HostName host-fqdn-or-ip-goes-here
     IdentityFile ~/.ssh/id_rsa-remote-ssh
    
  3. Add the contents of the local id_rsa-remote-ssh.pub file generated in step 1 to the appropriate authorized_keys file(s) on the SSH host:

    • Run the following commmands in a local command prompt:
     SET REMOTEHOST=your-user-name-on-host@host-fqdn-or-ip-goes-here
    
     scp %USERPROFILE%\.ssh\id_rsa.pub %REMOTEHOST%:~/tmp.pub
     ssh %REMOTEHOST% "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat ~/tmp.pub >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys && rm -f ~/tmp.pub"
    

Setting Up SSH-Agent

To enable SSH Agent automatically on Windows, start PowerShell as an Administrator and run the following commands:

# Make sure you're running as an Administrator
Set-Service ssh-agent -StartupType Automatic
Start-Service ssh-agent
Get-Service ssh-agent

Now the agent will be started automatically on login.

After started, try to add identity by using the command:

ssh-add

A few colleagues faced errors when running the above command. A workaround solution recommended by my team’s chief architect is to run the following command:

sc.exe create sshd binPath=C:\Windows\System32\OpenSSH\ssh.exe

and try to add identity again.

Use the following command to check if identity is added successfully:

ssh-add -l

Connecting to Remote Instance via Visual Studio Code

After configuring SSH-based authentication on the local machine, we are ready to run Remote - SSH on VS Code to connect to our remote instance.

  1. Run Remote-SSH: Connect to Host… from the Command Palette (F1) and enter the host and your user on the host in the input box as follows: user@hostname

  2. After a moment, VS Code will connect to the SSH server and set itself up. VS Code will keep you up-to-date using a progress notification and you can see a detailed log in the Remote - SSH output channel.

  3. After you are connected, you’ll be in an empty window. You can then open a folder or workspace on the remote machine using File > Open… or File > Open Workspace…

  4. Install any extensions you want to use on this host from the Extensions view.

Here lies a problem we are currently facing: VS Code and its Extensions are installed on the remote instance for each user who accesses remotely via VS Code. Each VS Code user installation on the remote instance takes up more than 1 GB of disk space - which had not been previously factored into when we first upgraded our remote instance. As Remote Development is still in Preview, improvements on the extension are expected to be made over the next few months. In the meantime, the space allocation required to support VS Code on the remote instance would need to be factored in when we eventually move to our new development cloud.

Concluding Remarks

VS Code Remote Development is a godsend when working with large datasets that are stored remotely - it is now possible to write and edit code even closer to where the data resides without leaving VS Code. I’m looking forward to further developments on VS Code Remote Development extension, and reaping the full benefits of remote development in the near future.

Credit goes to my team’s awesome chief architect for proposing the use of Visual Studio Code Remote Development setup in the team’s dev environment (even before the extension was made available in stable), as well as to the contributors of VS Code Remote Development extension (code contributors and docs writers).

References

  1. Remote Development using SSH
Written on August 22, 2019