Integrating Cloudflare Tunnels with a Linux Server

Recently, I started using Cloudflare tunnels to expose my server to the web. It provides security as well as accessibility. I do not have to expose any ports on my server and the links provided to access the services on my server are all SSL protected.

That got me thinking if it was possible to integrate web development into a Linux server. The problem with doing web development remotely is that you cannot test and see the updates you have made.

Background

Cloud-based development has been around for quite a while. Having such a development environment brings about many benefits.

Everyone uses the same dependencies and environment. This reduces the chances of “Oh it worked fine on my local machine”.
The increased amount of computing power. Cloud servers are likely to provide more computing power than your local machine. It can also be scaled when needed.
Cost savings (in a general sense). Having the machine in the cloud, you likely do not need to shell out cash for spec-ed out machines for your developers.

However, development on the cloud does have its drawbacks. For example, if you are doing web development, you cannot just do localhost:8080 to access the service you created. Instead, you will need to expose port 8080, then access it through your IP-ADDRESS:8080. This may seem sound for now. But imagine you are in a team environment. Many ports will need to be exposed just for development. This poses a huge security risk as it is never a good idea to expose ports unless necessary.

So here is where Cloudflare tunnels come in. It provides a solution to access the localhost ports without actually exposing them to the public.

Use Case

For my use case particularly, I am intending to teach a class using a cloud-based development environment. When a student runs their React app or NodeJs server, they should be able to access the site through a predefined subdomain.

Requirements

Automated - There should be minimal to no manual input from me.
Hassle-free - Students should be able to develop their app just as how they can do it on their local machine.
Availability - Web link should be available when the student starts running the service. It should also be removed once the student is done using it. In this case, it means when the user has exited the terminal session.

Constraints

For my use case, the user will be running using the node environment and running the npm run dev command to start their service. However, this solution should be easily modifiable to any other commands that start listening on a port.

Another constraint is that each user can only use 1 port (Which is assigned and defined in their environment variable). This is generally done to reduce the chances of port numbers clashing.

Breaking it up

To solve this problem, we can break the issue down into 2 parts.

Detect when the npm run dev command is called and when the user exits the session
Interacting with Cloudflare:
- Check if the tunnel configuration for that port already exists
- Add a new configuration for that port if it does not exist
- Remove config for a particular port

Detecting Commands

To detect when a command is being run, we can use the trap command available on Linux machines. This command monitors the signals that are produced when commands are run on the machine.

For example, before running every command, the system actually outputs a DEBUG signal. We can monitor this signal and check the command that emitted this signal.

Detect npm run dev

To detect npm run dev, we will “trap” every debug signal and check if the command emitting it is npm run dev.

COMMAND_TO_MONITOR='npm run dev'

function trap_command(){
    if [ "$BASH_COMMAND" == "$COMMAND_TO_MONITOR" ]; then
        if [ -z "$PORT" ]; then
            echo "no port defined in env. not accessible to public"
        else
            echo "your port number is $PORT"
        fi
    fi
}

trap 'trap_command' DEBUG

The above script snippet creates a variable COMMAND_TO_MONITOR which is the command we want to monitor. Following that, we define a function trap_command that we want to execute every time we have a DEBUG signal sent. This function checks the command ran against the command we want to monitor. If it matches, we then check if a PORT is defined in the environment variables. The -z flag checks if the string is empty. If no PORT is defined, we will just echo a line.

If a port is defined, we would ideally do the checking and updating of tunnel configurations here. These functionalities will be added later. For now, we will just print out the port number that the service is running on.

Detect Exit

To detect when the user has exited the session, we can use the EXIT signal. As the name suggests, this signal is emitted when a user exits the session.

function exit_cleanup(){
    if [ -z "$PORT" ]; then
        echo "nothing to do"
    else
        echo "cleaning port $PORT"
    fi
}

trap 'exit_cleanup' EXIT

In the above snippet, we defined a function exit_cleanup that will be doing some housekeeping tasks. One of them is to remove the tunnel configuration of this particular user. We will update that functionality in the future. For now, we check if there is a PORT defined in the environment variable. If there is, we will “clean” the port. Else, we will do nothing.

The last line is to trap the EXIT signal and run the exit_cleanup function when there is an EXIT signal detected.

Side Note

Note that this was not the initial plan. The initial plan was “clean up” after the service is closed. However, upon further consideration, I decided to monitor when the user exits the session instead. This is due to:

For some reason, SIGINT and SIGTERM signals are not detected when I use CTRL - C to close the node server. Apart from this, there are other issues as well that I detailed in the appendix.
More importantly, it is likely that the user will have to restart the server once in a while to apply changes. Hence it is not wise to keep making API calls to update the tunnel configurations.

Interacting with Cloudflare

To interact with Cloudflare, we can use the HTTP APIs made available by them. They do have a console tool cloudflared available. However, it requires quite a bit of set up. Also for it to be used system-wide, it might pose a security threat as users have access to non-tunnel related configurations.

HTTP APIs use authorization bearer tokens that can be scoped. So we can limit the scope to just tunnel configurations and revoke the token when security issues are detected.

API Interaction

To interact with the API, we will use a Python script rather than a bash script. Python has a well-supported requests library to make HTTP requests. It is also easier to manipulate JSON responses in python than in bash.

Our python script will:

Get the hostname from the configuration tagged to the port provided
If the provided port does not have a configuration, we create a new configuration
Delete configuration tagged to a port

To achieve this, our script will need to take in some flags so it can differentiate GET and DELETE actions. It also needs a flag to know which port are we interested in.

flag	description	values
–type	type of action you want the script to do	`GET` \| `DELETE`
–port, -p	port you are interested in	any port number

./cloudflareCall.py --type GET -p 3000
# The script will get the configuration of port 3000 and
# output the hostname tagged to that port.

./cloudflareCall.py --type DELETE -p 3000
# The script will delete the configuration tagged to port 3000

Implementation

In the beginning, we check the flags to make sure they exist and the input values are valid. Once the flags are checked, we run the functions based on what the type flag is.

type: GET

If the type is GET, we will get the tunnel configurations from Cloudflare and filter them by the port provided. If there are no configurations available, we will create a new configuration. We build a new configuration using a builder and then append it to the configurations before sending the PUT request.

Making update requests does not create a DNS CNAME record. Hence we need to make another call to create a DNS record if there is no DNS record for the hostname. We build the new DNS record and make a POST request to create it.

At this point, the link should be available for use.

type: DELETE

For type DELETE, we will make a GET request and filter it to make sure configurations exist to be deleted. If there are no configurations linked to that port number, we do not need to do anything.

If there is a configuration, we will remove it and make a PUT request to update the configurations on Cloudflare.

Following that, we will also remove the DNS records that are linked to this configuration.

As there are only PUT and GET methods for the Cloudflare tunnels configuration API, it does raise an issue of race condition. This will be discussed further below.

Full script implementation is located here.

Putting it together

After implementing the ability to interact with Cloudflare, we can add the python script into our monitoring script.

Updated monitoring script

command_to_monitor='npm run dev'

function trap_command(){
  if [ "$BASH_COMMAND" == "$command_to_monitor" ]; then
    if [ -z "$PORT" ]; then
      echo "no port defined in env. not accessible to public"
    else
      hostname=$(cd /usr/local/bin; ./cloudflareCall.py --type GET --port $PORT)
      echo "your service is deployed on $hostname"
    fi
  fi
}

trap 'trap_command' DEBUG

function exit_command(){
  if [ -z "$PORT" ]; then
    echo "nothing to do"
  else
    echo "cleaning up cloudlflare configs"
    result=$(cd /usr/local/bin; ./cloudflareCall.py --type DELETE --port $PORT)
    echo $result
  fi
  echo "EXIT FUNCTION RAN"
}

trap 'exit_command' EXIT

We updated it so it will GET the hostname every time npm run dev is called. It will DELETE the configurations when the user exits. We place the script in /usr/local/bin so every user will have access to it.

Problem (for the future)

The above solutions do bring about some problems. Here I will discuss 2 main issues.

1. Race condition

Due to the limitation of the API exposed by Cloudflare, we are only able to make a PUT request to update tunnel configurations. This means to add a new configuration, we need to get the latest configuration, add the new configuration in and make the PUT request. Likewise for deleting configurations.

Considering a multi-user environment, it might cause some users to experience their configurations being removed even though they did not exit the system.

To resolve this, we can use a mutex lock to only allow one user to run the script at any one time. Other users will need to wait their turn. To implement this, we can add the following snippet to our python script:

import fcntl
import time

lock_file = 'LOCK_FILE_PATH'
wait_time = 1
time_out = 10

def acquire_lock():
    fd = open(lock_file, 'w')
    acquired = False
    count = 0
    while not acquired:
        try:
            fcntl.lockf(fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
            acquired = True
        except IOError:
            if count == time_out:
                break
            count = count + 1
            time.sleep(wait_time)
    return acquired

def release_lock():
    fd = open(lock_file, 'w')
    fcntl.lockf(fd, fcntl.LOCK_UN)
    fd.close()

The above snippet defines functions to acquire and release the locks. A timeout is set for 10 seconds. If they cannot acquire the lock after 10 seconds, the program should exit. The function also retries every second.

We can add the acquire_lock function to where we update or delete tunnel configurations. All locks need to be released using the release_lock function when the operation is completed.

Here is some pseudo code on how we will use it to update the configs.

if acquire_lock():
    try:
        get_latest_config()
        add_new_config()
        update_config()
    finally:
        release_lock()
else:
    print("Cannot acquire lock after 10 seconds.")

2. Inconsistency due to failed requests

In the scenario where the DNS update fails, the tunnel configuration will still be updated. Hence there may be some inconsistency between the tunnel configurations and the DNS records.

This might cause the user’s app to be inaccessible at the intended domain name.

To resolve this issue, we may want to use a transactional model instead. So when a request fail is detected anywhere along the pipeline, we will conduct revert operations. This will revert all configurations as if the operation did not happen. This is out of the scope of this post. Maybe I will do it in the future.

Note that due to the use case, we can tolerate some slight inconsistencies. If a student cannot access the site, they can approach their instructors and the instructor will manually resolve the issue for them

Conclusion

With this, we are able to solve the issues we set out to tackle. Website links are automatically generated without additional input from the user. Though there are still some issues with this solution, it will do for now. Anyway software is always about improving on previous work :)

Happy Coding!

Appendix (for those interested)

Why clean up on exit?

Above I mentioned that I wanted to do “clean up” when the user close the service (CTRL + C) but chose against it after further consideration. Here I will detail the other issues faced.

One main issue with it was that the action CTRL + C is used everywhere. Not just for closing the service you are developing. How can we know that the CTRL + C is for the closing of a node server?

The following are things I tried to make it work:

I tried to resolve this issue by getting the previous command ran before a CTRL + C action. This did not work because for some reason when using a script, the script has its own command history. It does not detect the previous command.

Another way was to use the .bash_history file which records all commands ran. This sounds possible at first. However, I found out that this command history is only updated when the user exits the session. It is not updated immediately when a command is run. To resolve this, we can actually add a snippet to the .bashrc file and it will append to the .bash_history file after every command. Assuming the SIGINT or SIGTERM signal is sent, this would work. But it would only work in a single instance environment. For full-stack development, there is a possibility of running 2 node instances at the same time for manual testing. Hence you might be detecting the wrong service closing.

We could also add certain graceful shutdown functionalities to the services we are running. However, that would violate the “Hassle-free” requirement of this project as students will need to make these changes.

Alternative possible solution

VSCode can SSH into the server and do port forwarding through SSH. This gives the user access to the server ports from their local machine. I knew this at the beginning but felt that students still need to know how to SSH through VSCode to the server.

For context, I am using shellinabox so the student can access the terminal through a web browser. All students will use the same environment reducing the chances of issues due to operating systems. If you are interested to find out more about what I am teaching you can visit the Coding Crayons website. However, it is likely you do not need it since you read till here. So please recommend it if your friends are interested :)

Background#

Use Case#

Requirements#

Constraints#

Breaking it up#

Detecting Commands#

Detect npm run dev#

Detect Exit#

Side Note#

Interacting with Cloudflare#

API Interaction#

Implementation#

type: GET#

type: DELETE#

Putting it together#

Updated monitoring script#

Problem (for the future)#

1. Race condition#

2. Inconsistency due to failed requests#

Conclusion#

Appendix (for those interested)#

Why clean up on exit?#

Alternative possible solution#