How to Correctly Parse IPv6 Addresses with Port Numbers in URLs: Colon Ambiguity Resolution


2 views

When working with IPv6 addresses in URLs, we encounter a fundamental parsing ambiguity due to conflicting colon usage:

http://[2001:db8:1f70::999:de8:7648:6e8]:100/

versus the invalid interpretation:

http://2001:db8:1f70::999:de8:7648:6e8:100/

The IETF standardized the solution in RFC 3986 Section 3.2.2:

  • IPv6 addresses must be enclosed in square brackets when used in URLs
  • The port number appears after the closing bracket

Example of valid format:

https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/path

Python Example

from urllib.parse import urlparse

def parse_ipv6_url(url):
    parsed = urlparse(url)
    if ']' in parsed.netloc:
        host_part = parsed.netloc.split(']')[0] + ']'
        port_part = parsed.netloc.split(']')[1][1:]
        return (host_part[1:-1], port_part)  # Remove brackets
    return (parsed.netloc.split(':')[0], None)

# Usage:
print(parse_ipv6_url("http://[2001:db8::1]:8080"))  # ('2001:db8::1', '8080')

JavaScript Example

function parseIPv6URL(url) {
    const urlObj = new URL(url);
    let hostname = urlObj.hostname;
    let port = urlObj.port;
    
    // Remove IPv6 brackets if present
    if (hostname.startsWith('[') && hostname.endsWith(']')) {
        hostname = hostname.slice(1, -1);
    }
    
    return { hostname, port };
}

// Usage:
console.log(parseIPv6URL("https://[2001:db8::1]:443")); 
// { hostname: "2001:db8::1", port: "443" }

Developers should be aware of these scenarios:

// Invalid cases (will throw errors in most parsers):
"http://2001:db8::1:8080"      // Missing brackets
"http://[2001:db8::1]:notaport" // Non-numeric port
"http://[2001:db8::1]:"        // Empty port

// Valid but tricky cases:
"http://[::1]"                 // Localhost IPv6, default port
"https://[2001:db8::]:443"     // Explicit default port
"http://[::ffff:192.168.1.1]"  // IPv4-mapped IPv6 address

Here's a comprehensive regex pattern for IPv6 URL validation:

const ipv6UrlPattern = /^https?:\/\/$$([0-9a-fA-F:]+)$$(?::(\d+))?(\/.*)?$/;

// Test cases:
console.log(ipv6UrlPattern.test("http://[2001:db8::1]:80"));  // true
console.log(ipv6UrlPattern.test("http://[::1]/path"));        // true
console.log(ipv6UrlPattern.test("http://2001:db8::1"));       // false

For production use, consider these battle-tested libraries:

  • Python: Use urllib.parse (standard library) or yarl for advanced parsing
  • JavaScript: The built-in URL class or whatwg-url for legacy environments
  • Java: java.net.URI with custom IPv6 handling
  • C++: Boost.Asio's ip::address_v6 parser

When working with IPv6 addresses in URLs, developers face a unique syntactic challenge due to the colon character's dual purpose:


// Problematic case:
http://2001:db8:1f70::999:de8:7648:6e8:100/
// Is ":100" part of the IP or the port?

The IETF standard (RFC 3986) specifies that IPv6 addresses in URLs must be enclosed in square brackets:


// Correct format:
http://[2001:db8:1f70::999:de8:7648:6e8]:100/

Here's how major programming languages handle IPv6 URLs:

Python Example


from urllib.parse import urlparse

url = "http://[2001:db8::1]:8080/path"
result = urlparse(url)
print(result.hostname)  # Output: 2001:db8::1
print(result.port)      # Output: 8080

JavaScript Example


const { URL } = require('url');
const myURL = new URL('http://[2001:db8::1]:8080');
console.log(myURL.hostname); // '2001:db8::1'
console.log(myURL.port);     // '8080'
  • Forgetting brackets when the IPv6 address contains compressed zeros (::)
  • Not properly escaping brackets in regular expressions
  • Mishandling zone identifiers (which use % character)

For link-local addresses with zone IDs, the format becomes:


http://[fe80::1%eth0]:8080/

Note that the percent sign must be URL-encoded in some contexts:


http://[fe80::1%25eth0]:8080/

Here's a regex pattern to validate IPv6 URLs with ports:


/^https?:\/\/$$([a-f0-9:]+)$$(?::(\d+))?/i