Exploring the Dangers of Path Traversal Vulnerabilities in Web Applications
Path Traversal Vulnerabilities are a critical security flaw that allows attackers to access files and directories outside the intended scope of a web application. By manipulating variables that reference files with ".." sequences and other directory traversal patterns, attackers can navigate the file system of a server to gain unauthorized access to sensitive files, configuration files, and other critical data.
This might include:
- Application code and data.
- Credentials for back-end systems.
- Sensitive operating system files.
In some cases, an attacker might be able to write to arbitrary files on the server, allowing them to modify application data or behavior, and ultimately take full control of the server.
Reading arbitrary files via path traversal
Imagine a shopping application that displays images of items for sale. This might load an image using the following HTML:
<img src="/loadImage?filename=218.png">
The loadImage URL takes a filename parameter and returns the contents of the specified file. The image files are stored on disk in the location /var/www/images/. To return an image, the application appends the requested filename to this base directory and uses a filesystem API to read the contents of the file. In other words, the application reads from the following file path:
/var/www/images/218.png
This application implements no defenses against path traversal attacks. As a result, an attacker can request the following URL to retrieve the /etc/passwd file from the server's filesystem:
https://insecure-website.com/loadImage?filename=../../../etc/passwd
This causes the application to read from the following file path:
/var/www/images/../../../etc/passwd
The sequence ../ is valid within a file path, and means to step up one level in the directory structure. The three consecutive ../ sequences step up from /var/www/images/ to the filesystem root, and so the file that is actually read is:
/etc/passwd
On Unix-based operating systems, this is a standard file containing details of the users that are registered on the server, but an attacker could retrieve other arbitrary files using the same technique.
On Windows, both ../ and ..\ are valid directory traversal sequences. The following is an example of an equivalent attack against a Windows-based server:
https://insecure-website.com/loadImage?filename=..\..\..\windows\win.ini
Example:
File path traversal, traversal sequences blocked with absolute path bypass
Many applications that place user input into file paths implement defenses against path traversal attacks. These can often be bypassed.
If an application strips or blocks directory traversal sequences from the user-supplied filename, it might be possible to bypass the defense using a variety of techniques.
It might be possible to use an absolute path from the filesystem root, such as filename=/etc/passwd, to directly reference a file without using any traversal sequences.
Example Scenario:
Imagine a web application that allows users to download files from a specific directory on the server. The application includes a check to prevent path traversal attacks by blocking traversal sequences like "../". However, it does not adequately handle absolute path inputs.
The above code correctly identifies and blocks traversal sequences like "../". However, it does not prevent an attacker from using an absolute path to access files outside the intended directory.
Secure Implementation:
The code now also checks if the file_name starts with a "/", preventing absolute paths.
The code resolves both the base path and the full path to their canonical forms using os.path.realpath(). This ensures that any symbolic links or relative paths are properly resolved.
Real World Example:
File path traversal, traversal sequences stripped non-recursively
This likely means that the application is removing or rejecting traversal sequences (such as ../ or ..\) in a straightforward, non-recursive manner. This can be achieved using string manipulation functions or regular expressions that detect and remove or reject sequences like ../ from the input path.
For example, a basic input validation might remove "../" from user inputs to prevent directory traversal. However, if it's done non-recursively, an attacker could potentially bypass this protection by using variations such as "....//" or "..././" or other encoding techniques that might not be caught by a simple filter.
Example:
File Path Traversal with Stripped Traversal Sequences and Superfluous URL-Decode
A file path traversal vulnerability can become even more problematic when an application tries to strip traversal sequences but fails to handle cases where superfluous URL-decoding allows an attacker to bypass these defenses. This kind of sanitization can sometimes be bypassed by URL encoding, or even double URL encoding, the ../ characters. This results in %2e%2e%2f and %252e%252e%252f respectively. Let's walk through an example of this scenario.
Vulnerable Code Example
Consider a web application that allows users to download files by specifying a file name in a URL parameter. The application attempts to prevent path traversal by stripping out traversal sequences like ../, but it doesn't account for URL-encoded input that can bypass these defenses.
An attacker could exploit this vulnerability by using URL-encoding to bypass the traversal sequence stripping.
An attacker can URL-encode the traversal sequence to bypass the stripping logic:
Here, %2e is the URL-encoded representation of . and %2f is the URL-encoded representation of /. When decoded, %2e%2e%2f%2e%2e%2f becomes ../../.
Real World Example:
File path traversal, validation of start of path
When dealing with file path traversal vulnerabilities, validating that the requested file path starts with a specific base directory path is crucial. This ensures that users cannot navigate outside the intended directory, thereby preventing unauthorized access to sensitive files. Here's how you can implement such a validation correctly.
An application may require the user-supplied filename to start with the expected base folder, such as /var/www/images. In this case, it might be possible to include the required base folder followed by suitable traversal sequences. For example: filename=/var/www/images/../../../etc/passwd.
Vulnerable Code Example
Initially, let's consider a flawed implementation that attempts to construct a file path but fails to properly validate it:
An attacker could input something like:
Secure Implementation
To secure the application, we need to ensure that the constructed file path starts with the base path after resolving any relative paths. Here’s how we can do this:
The code checks if the canonical full path starts with the canonical base path. By adding os.sep (which ensures the trailing slash), it ensures that the requested file is within the allowed directory. This prevents access to files outside the base directory, such as /etc/passwd.
Real World Example
File path traversal, validation of file extension with null byte bypass
Null byte injection is a technique used by attackers to bypass file extension validation by inserting a null byte (%00 in URL encoding) into the input. This can cause the application to misinterpret the input, allowing the attacker to bypass security checks. Here’s how to handle this and ensure robust file path validation and extension checks.
Vulnerable Code Example
Consider an application that validates the file extension but does not handle null byte injection:
An attacker could exploit this by appending a null byte to bypass the extension check:
After URL-decoding, ../../etc/passwd%00.txt becomes ../../etc/passwd\x00.txt. Many programming languages, including C and C-based languages, interpret the null byte as a string terminator, potentially leading to the file name being treated as ../../etc/passwd.
Real World Example:
Secure Implementation
To secure the application against null byte injection and ensure proper file extension validation, follow these steps:
- Remove Null Bytes: Strip out null bytes from the user input.
- Canonicalize and Validate: Ensure the file path is within the allowed directory.
- Validate File Extension: Check the file extension after canonicalizing the path.
Here’s how to implement it:
Explanation
- URL-Decoding: The input is URL-decoded to handle any encoded traversal sequences.
- Remove Null Bytes: The code replaces any null bytes in the file name with an empty string, mitigating null byte injection.
- Construct Full Path: The code constructs the full path by combining the base path and the user-provided file name.
- Canonicalization: The code uses os.path.realpath() to resolve both the base path and the full path to their canonical forms, ensuring proper path resolution.
- Validate File Extension: The file extension is validated after the full path has been canonicalized, ensuring the check is performed on the actual resolved path.
- Directory Whitelisting: The code checks if the canonical full path starts with the canonical base path, ensuring the requested file is within the allowed directory.
By following these steps, the application can effectively prevent file path traversal attacks, including those involving null byte injection.