Regular Expressions Guide: Master Pattern Matching and Text Validation
Practical guide to using regular expressions for pattern matching, text validation, and string manipulation in your applications. Learn regex syntax, common patterns, best practices, and implementation across different programming languages.
Regular Expressions: Essential Pattern Matching Tool
Regular expressions (regex or regexp) are powerful patterns used to match, search, and manipulate text. They're supported in virtually every programming language and are essential for tasks like validation, parsing, search and replace, and data extraction. While regex syntax can seem cryptic at first, mastering regular expressions will dramatically improve your text processing capabilities. This comprehensive guide covers everything from basic patterns to advanced techniques, with practical examples you can use immediately.
Regex Basics
Understanding fundamental regex concepts and syntax.
Literal Character Matching
The simplest regex is a literal string that matches itself:
Rules:- Most characters match themselves literally - Case-sensitive by default (use flags to change) - Matches first occurrence unless using global flag
Metacharacters
Special characters with special meanings in regex:
Basic Metacharacters:- `." - Matches any character except newline - "^" - Matches start of string/line - "$" - Matches end of string/line - "*" - Matches 0 or more repetitions - "+" - Matches 1 or more repetitions - "?" - Matches 0 or 1 repetition - "\\" - Escape character - "|" - Alternation (OR) - "()" - Grouping - "[]" - Character class - "{}" - Quantifier
Escaping Metacharacters:To match metacharacters literally, escape them with backslash:
Character Classes
Character classes match any single character from a set:
Syntax:- `[abc]` - Matches a, b, or c - `[^abc]` - Matches any character except a, b, or c - `[a-z]` - Matches any lowercase letter - `[A-Z]` - Matches any uppercase letter - `[0-9]` - Matches any digit - `[a-zA-Z]` - Matches any letter
Predefined Character Classes:- `\d` - Digit [0-9] - `\D` - Non-digit [^0-9] - `\w` - Word character [a-zA-Z0-9_] - `\W` - Non-word character [^a-zA-Z0-9_] - `\s` - Whitespace [ \t\n\r\f\v] - `\S` - Non-whitespace [^ \t\n\r\f\v]
Quantifiers
Quantifiers specify how many times a pattern should match:
Basic Quantifiers:- `*` - 0 or more (greedy) - `+` - 1 or more (greedy) - `?` - 0 or 1 (optional) - `{n}` - Exactly n times - `{n,}` - n or more times - `{n,m}` - Between n and m times
Greedy vs Lazy:- Greedy (default): Match as much as possible - Lazy (add ?): Match as little as possible
Anchors
Anchors match positions, not characters: - `^` - Start of string (or line in multiline mode) - `$` - End of string (or line in multiline mode) - `\b` - Word boundary - `\B` - Non-word boundary
Word Boundaries:`\b` matches the position between a word character (\w) and a non-word character:
Groups and Capturing
Parentheses create groups for capturing or grouping:
Capturing Groups:`(pattern)` - Captures matched text
Non-Capturing Groups:`(?:pattern)` - Groups without capturing
Named Groups:`(?
Reference previously captured groups:
Common Regex Patterns
Frequently used regex patterns for validation and matching.
Email Validation
Email validation patterns from simple to comprehensive:
Simple Email Pattern: Standard Email Pattern: Comprehensive Email Pattern: Note: Perfect email validation is complex. For production, consider using dedicated libraries or simply checking for @ and basic format.Phone Number Validation
Phone number patterns for various formats:
Flexible Pattern:URL Validation
URL matching patterns:
Note: URLs are complex. For production, use URL parsing libraries.Password Strength
Password validation patterns:
Combined Password Requirements:Date Formats
Common date format patterns:
Number Formats
Numeric validation patterns:
Username Validation
Username patterns:
Credit Card Numbers
Credit card validation patterns:
Note: Always use additional validation (Luhn algorithm) for credit cards.Using Regex in Programming Languages
Regex implementation across different languages.
JavaScript
JavaScript regex methods and syntax:
Creating Regex: Testing and Matching: Search and Replace: Split: Flags:- `g` - Global (find all matches) - `i` - Case-insensitive - `m` - Multiline - `s` - Dotall (. matches newline) - `u` - Unicode - `y` - Sticky
Python
Python re module:
Common Functions: Compiled Patterns: Flags:- `re.IGNORECASE` or `re.I` - Case-insensitive - `re.MULTILINE` or `re.M` - Multiline mode - `re.DOTALL` or `re.S` - Dot matches all - `re.VERBOSE` or `re.X` - Allow comments - `re.ASCII` or `re.A` - ASCII-only matching
PHP
PHP PCRE functions:
Java
Java Pattern and Matcher:
Other Languages
Advanced Regex Techniques
Advanced patterns and techniques for complex matching.
Lookahead and Lookbehind
Zero-width assertions that don't consume characters:
Positive Lookahead: `(?=pattern)`Asserts that what follows matches pattern
Negative Lookahead: `(?!pattern)`Asserts that what follows doesn't match pattern
Positive Lookbehind: `(?<=pattern)`Asserts that what precedes matches pattern
Negative Lookbehind: `(?Asserts that what precedes doesn't match pattern Practical Example - Password Validation:Conditional Patterns
Match based on conditions:
Recursive Patterns
Some regex flavors support recursion (PCRE, Perl):
Note: Not supported in JavaScript.Performance Optimization
Practical Applications
Real-world regex use cases.
Form Validation
Complete form validation example:
Data Extraction
Extract data from text:
Log File Parsing
Parse log files:
Syntax Highlighting
Basic syntax highlighting:
Input Sanitization
Clean user input: