When working with IPv6 addresses in URLs, we encounter a fundamental parsing ambiguity due to conflicting colon usage:
http://[2001:db8:1f70::999:de8:7648:6e8]:100/
versus the invalid interpretation:
http://2001:db8:1f70::999:de8:7648:6e8:100/
The IETF standardized the solution in RFC 3986 Section 3.2.2:
- IPv6 addresses must be enclosed in square brackets when used in URLs
- The port number appears after the closing bracket
Example of valid format:
https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/path
Python Example
from urllib.parse import urlparse
def parse_ipv6_url(url):
parsed = urlparse(url)
if ']' in parsed.netloc:
host_part = parsed.netloc.split(']')[0] + ']'
port_part = parsed.netloc.split(']')[1][1:]
return (host_part[1:-1], port_part) # Remove brackets
return (parsed.netloc.split(':')[0], None)
# Usage:
print(parse_ipv6_url("http://[2001:db8::1]:8080")) # ('2001:db8::1', '8080')
JavaScript Example
function parseIPv6URL(url) {
const urlObj = new URL(url);
let hostname = urlObj.hostname;
let port = urlObj.port;
// Remove IPv6 brackets if present
if (hostname.startsWith('[') && hostname.endsWith(']')) {
hostname = hostname.slice(1, -1);
}
return { hostname, port };
}
// Usage:
console.log(parseIPv6URL("https://[2001:db8::1]:443"));
// { hostname: "2001:db8::1", port: "443" }
Developers should be aware of these scenarios:
// Invalid cases (will throw errors in most parsers):
"http://2001:db8::1:8080" // Missing brackets
"http://[2001:db8::1]:notaport" // Non-numeric port
"http://[2001:db8::1]:" // Empty port
// Valid but tricky cases:
"http://[::1]" // Localhost IPv6, default port
"https://[2001:db8::]:443" // Explicit default port
"http://[::ffff:192.168.1.1]" // IPv4-mapped IPv6 address
Here's a comprehensive regex pattern for IPv6 URL validation:
const ipv6UrlPattern = /^https?:\/\/$$([0-9a-fA-F:]+)$$(?::(\d+))?(\/.*)?$/;
// Test cases:
console.log(ipv6UrlPattern.test("http://[2001:db8::1]:80")); // true
console.log(ipv6UrlPattern.test("http://[::1]/path")); // true
console.log(ipv6UrlPattern.test("http://2001:db8::1")); // false
For production use, consider these battle-tested libraries:
- Python: Use
urllib.parse
(standard library) oryarl
for advanced parsing - JavaScript: The built-in
URL
class orwhatwg-url
for legacy environments - Java:
java.net.URI
with custom IPv6 handling - C++: Boost.Asio's
ip::address_v6
parser
When working with IPv6 addresses in URLs, developers face a unique syntactic challenge due to the colon character's dual purpose:
// Problematic case:
http://2001:db8:1f70::999:de8:7648:6e8:100/
// Is ":100" part of the IP or the port?
The IETF standard (RFC 3986) specifies that IPv6 addresses in URLs must be enclosed in square brackets:
// Correct format:
http://[2001:db8:1f70::999:de8:7648:6e8]:100/
Here's how major programming languages handle IPv6 URLs:
Python Example
from urllib.parse import urlparse
url = "http://[2001:db8::1]:8080/path"
result = urlparse(url)
print(result.hostname) # Output: 2001:db8::1
print(result.port) # Output: 8080
JavaScript Example
const { URL } = require('url');
const myURL = new URL('http://[2001:db8::1]:8080');
console.log(myURL.hostname); // '2001:db8::1'
console.log(myURL.port); // '8080'
- Forgetting brackets when the IPv6 address contains compressed zeros (::)
- Not properly escaping brackets in regular expressions
- Mishandling zone identifiers (which use % character)
For link-local addresses with zone IDs, the format becomes:
http://[fe80::1%eth0]:8080/
Note that the percent sign must be URL-encoded in some contexts:
http://[fe80::1%25eth0]:8080/
Here's a regex pattern to validate IPv6 URLs with ports:
/^https?:\/\/$$([a-f0-9:]+)$$(?::(\d+))?/i