The Parsing Engine: Tokenization, Validation, and Extraction
Once command-line tokens are passed to a command's execution closure, they go through a rigorous three-step pipeline: Tokenization, Constraint Validation, and Parameter Extraction.
This page covers the algorithms, state transitions, and resolution priorities that make up the clish parsing engine.
The Input Schema (ArgSchema)
To parse raw command-line strings correctly, the parser needs a lookup table that describes what options and flags are valid. The macro generates this as an ArgSchema struct:
pub struct ArgSchema {
pub named: &'static [&'static str],
pub flags: &'static [&'static str],
pub short_named_chars: &'static [char],
pub short_named_targets: &'static [&'static str],
pub short_flag_chars: &'static [char],
pub short_flag_targets: &'static [&'static str],
}
named&flags: Long names (e.g.,--portor--force).short_named_chars/short_flag_chars: Short aliases (e.g.,'p'or'f').short_named_targets/short_flag_targets: Long names that correspond one-to-one to the short character indices.
Step 1: Tokenization (ParsedArgs::parse)
The tokenization loop is a state-based parser that scans the list of argument strings (&[String]) from left to right. It separates tokens into three disjoint buckets:
positional: A sequence of standard argument values.named: A map from long option names to a list of string values (HashMap<String, Vec<String>>).flags: A set of present flag names (HashSet<String>).
Here is a visual flowchart of how the tokenization loop processes each argument:
flowchart TD
Start([Get Next Arg]) --> CheckDash{Starts with - ?}
CheckDash -->|No| Pos[Push to positional]
CheckDash -->|Yes| CheckDoubleDash{Arg == '--' ?}
CheckDoubleDash -->|Yes| ModePos[Set 'after_double_dash = true']
CheckDoubleDash -->|No| CheckLong{Starts with '--' ?}
CheckLong -->|Yes| ParseLong[Parse Long Option/Flag]
CheckLong -->|No| ParseShort[Parse Short Option/Flag]
Pos --> Next([Advance Index])
ParseLong --> Next
ParseShort --> Next
Next --> CheckRemaining{More args left?}
CheckRemaining -->|Yes| Start
CheckRemaining -->|No| End([Return ParsedArgs])
flowchart TD
ParseLong[Parse Long Option/Flag] --> CheckEq{Contains '=' ?}
CheckEq -->|Yes| SplitEq[Split to name/value, push to named]
CheckEq -->|No| CheckNamedLong{Is Named Option?}
CheckNamedLong -->|Yes| FetchVal[Take next arg as value]
CheckNamedLong -->|No| CheckFlagLong{Is Flag?}
CheckFlagLong -->|Yes| InsertFlag[Insert to flags]
CheckFlagLong -->|No| ErrLong[Error: Unknown Option]
ErrLong --> EndErr([Return ErrorKind])
flowchart TD
ParseShort[Parse Short Option/Flag] --> BundleCheck{Len == 2?}
BundleCheck -->|Yes| SingleShort[Process single short char]
BundleCheck -->|No| BundleLoop[Loop characters: flag bundling]
SingleShort --> IsShortNamed{Is Named Option?}
IsShortNamed -->|Yes| FetchValShort[Take next arg as value]
IsShortNamed -->|No| IsShortFlag{Is Flag?}
IsShortFlag -->|Yes| InsertFlagShort[Insert to flags]
IsShortFlag -->|No| ErrShort[Error: Unknown Option]
BundleLoop --> ForEach[For each char]
ForEach --> IsCharFlag{Is Flag?}
IsCharFlag -->|Yes| InsertFlagBundle[Insert to flags]
IsCharFlag -->|No| IsCharNamed{Is Named Option?}
IsCharNamed -->|Yes| ErrBundle[Error: Missing Value for option]
IsCharNamed -->|No| ErrBundleUnk[Error: Unknown Option]
ErrShort --> EndErr([Return ErrorKind])
ErrBundle --> EndErr
ErrBundleUnk --> EndErr
Key Tokenization Rules
- The
--(Double Dash) Separator: If the parser encounters a token that is exactly--, it sets a internal state variableafter_double_dash = true. For all subsequent arguments in the command line, the parser bypasses all option/flag checks and pushes them directly into thepositionallist. - Flag Bundling: If a short argument contains multiple characters (like
-abc), the parser loops over each character.- If a character is registered as a short flag (e.g.
'a'for--all), it adds the flag to theflagsset. - If a character is registered as a short named option (which requires a value, e.g.
'p'for--port), it returns aMissingValueerror. This is because a named option cannot retrieve its value from inside a character bundle. - Otherwise, it returns an
UnknownOptionerror.
- If a character is registered as a short flag (e.g.
Step 2: Constraint Validation (validate_params)
Once tokenization successfully completes, the parser validates relationships between named parameters and flags by checking the static ParamEntry definitions.
conflicts_with(Mutual Exclusion): If parameterAhasconflicts_with = ["B"], the validator checks if bothAandBare present in the parsed maps. If they are, it returnsErrorKind::Conflict { name: "A", other: "B" }.requires(Prerequisites): If parameterAhasrequires = ["B"], the validator checks ifAis present. If it is,Bmust also be present in the maps. IfBis missing, it returnsErrorKind::Requires { name: "A", requires: "B" }.
Step 3: Parameter Extraction & Value Resolution
If validation passes, the generated closure extracts parameters one by one. clish resolves values according to a strict priority list:
graph TD
Start[Resolve Parameter Value] --> CheckCLI{Provided on CLI?}
CheckCLI -->|Yes| UseCLI[Use CLI Value]
CheckCLI -->|No| CheckEnv{Has 'env' attribute & env var set?}
CheckEnv -->|Yes| UseEnv[Use ENV Value]
CheckEnv -->|No| CheckDefault{Has 'default' attribute?}
CheckDefault -->|Yes| UseDefault[Use Default Value]
CheckDefault -->|No| CheckRequired{Is parameter required?}
CheckRequired -->|Yes| Err[Return Missing Error]
CheckRequired -->|No| UseNone[Resolve to None / Empty Vector]
1. Extracting Raw Tokens
The runtime uses helper functions inside clish-core::command::parse to extract string slices based on parameter indices or keys:
* parse_required(&parsed, index): Retrieves a positional argument from the positional vector at the given index. If index is out of bounds, returns ErrorKind::MissingArgument.
* parse_optional(&parsed, index): Retrieves a positional argument as Option<String>. Returns None if out of bounds.
* parse_variadic(&parsed, start_index): Collects all positional arguments from start_index to the end of the vector into a Vec<String>.
* parse_named(&parsed, name): Retrieves the first value associated with name in the named options map.
* parse_named_many(&parsed, name): Retrieves all values associated with name as a Vec<String> (supporting repeatable options like --tag a --tag b).
* parse_flag(&parsed, name): Returns true if name is present in the flags set.
2. Resolving Fallbacks
If a parameter value is not present on the CLI, the extractor checks:
1. Environment Fallback: If an env key is configured, it calls std::env::var(env_var_name). If set, that value is used.
2. Default Value: If a default value is configured in the attribute, the parser uses the default string representation.
3. Parsing into Target Types (FromStr)
For any type other than String, clish attempts to parse the resolved string value into the target Rust type:
let value = raw_string_val
.parse::<T>()
.map_err(|_| ErrorKind::invalid_value(raw_string_val, std::any::type_name::<T>()))?;
Because this step relies on Rust's standard FromStr trait, you can use any custom type as a parameter in clish as long as your type implements FromStr. If parsing fails, it yields an InvalidValue error, which explains what token failed and what type was expected.
Next, let's look at the Error Pipeline to see how parsing errors are serialized, returned, and rendered in the terminal.