Skip to content

The Parsing Engine: Tokenization, Validation, and Extraction

Once command-line tokens are passed to a command's execution closure, they go through a rigorous three-step pipeline: Tokenization, Constraint Validation, and Parameter Extraction.

This page covers the algorithms, state transitions, and resolution priorities that make up the clish parsing engine.


The Input Schema (ArgSchema)

To parse raw command-line strings correctly, the parser needs a lookup table that describes what options and flags are valid. The macro generates this as an ArgSchema struct:

pub struct ArgSchema {
    pub named: &'static [&'static str],
    pub flags: &'static [&'static str],
    pub short_named_chars: &'static [char],
    pub short_named_targets: &'static [&'static str],
    pub short_flag_chars: &'static [char],
    pub short_flag_targets: &'static [&'static str],
}
  • named & flags: Long names (e.g., --port or --force).
  • short_named_chars / short_flag_chars: Short aliases (e.g., 'p' or 'f').
  • short_named_targets / short_flag_targets: Long names that correspond one-to-one to the short character indices.

Step 1: Tokenization (ParsedArgs::parse)

The tokenization loop is a state-based parser that scans the list of argument strings (&[String]) from left to right. It separates tokens into three disjoint buckets:

  1. positional: A sequence of standard argument values.
  2. named: A map from long option names to a list of string values (HashMap<String, Vec<String>>).
  3. flags: A set of present flag names (HashSet<String>).

Here is a visual flowchart of how the tokenization loop processes each argument:

flowchart TD
    Start([Get Next Arg]) --> CheckDash{Starts with - ?}

    CheckDash -->|No| Pos[Push to positional]
    CheckDash -->|Yes| CheckDoubleDash{Arg == '--' ?}

    CheckDoubleDash -->|Yes| ModePos[Set 'after_double_dash = true']
    CheckDoubleDash -->|No| CheckLong{Starts with '--' ?}

    CheckLong -->|Yes| ParseLong[Parse Long Option/Flag]
    CheckLong -->|No| ParseShort[Parse Short Option/Flag]

    Pos --> Next([Advance Index])
    ParseLong --> Next
    ParseShort --> Next

    Next --> CheckRemaining{More args left?}
    CheckRemaining -->|Yes| Start
    CheckRemaining -->|No| End([Return ParsedArgs])
flowchart TD
    ParseLong[Parse Long Option/Flag] --> CheckEq{Contains '=' ?}
    CheckEq -->|Yes| SplitEq[Split to name/value, push to named]
    CheckEq -->|No| CheckNamedLong{Is Named Option?}
    CheckNamedLong -->|Yes| FetchVal[Take next arg as value]
    CheckNamedLong -->|No| CheckFlagLong{Is Flag?}
    CheckFlagLong -->|Yes| InsertFlag[Insert to flags]
    CheckFlagLong -->|No| ErrLong[Error: Unknown Option]
    ErrLong --> EndErr([Return ErrorKind])
flowchart TD
    ParseShort[Parse Short Option/Flag] --> BundleCheck{Len == 2?}
    BundleCheck -->|Yes| SingleShort[Process single short char]
    BundleCheck -->|No| BundleLoop[Loop characters: flag bundling]

    SingleShort --> IsShortNamed{Is Named Option?}
    IsShortNamed -->|Yes| FetchValShort[Take next arg as value]
    IsShortNamed -->|No| IsShortFlag{Is Flag?}
    IsShortFlag -->|Yes| InsertFlagShort[Insert to flags]
    IsShortFlag -->|No| ErrShort[Error: Unknown Option]

    BundleLoop --> ForEach[For each char]
    ForEach --> IsCharFlag{Is Flag?}
    IsCharFlag -->|Yes| InsertFlagBundle[Insert to flags]
    IsCharFlag -->|No| IsCharNamed{Is Named Option?}
    IsCharNamed -->|Yes| ErrBundle[Error: Missing Value for option]
    IsCharNamed -->|No| ErrBundleUnk[Error: Unknown Option]

    ErrShort --> EndErr([Return ErrorKind])
    ErrBundle --> EndErr
    ErrBundleUnk --> EndErr

Key Tokenization Rules

  • The -- (Double Dash) Separator: If the parser encounters a token that is exactly --, it sets a internal state variable after_double_dash = true. For all subsequent arguments in the command line, the parser bypasses all option/flag checks and pushes them directly into the positional list.
  • Flag Bundling: If a short argument contains multiple characters (like -abc), the parser loops over each character.
    • If a character is registered as a short flag (e.g. 'a' for --all), it adds the flag to the flags set.
    • If a character is registered as a short named option (which requires a value, e.g. 'p' for --port), it returns a MissingValue error. This is because a named option cannot retrieve its value from inside a character bundle.
    • Otherwise, it returns an UnknownOption error.

Step 2: Constraint Validation (validate_params)

Once tokenization successfully completes, the parser validates relationships between named parameters and flags by checking the static ParamEntry definitions.

  • conflicts_with (Mutual Exclusion): If parameter A has conflicts_with = ["B"], the validator checks if both A and B are present in the parsed maps. If they are, it returns ErrorKind::Conflict { name: "A", other: "B" }.
  • requires (Prerequisites): If parameter A has requires = ["B"], the validator checks if A is present. If it is, B must also be present in the maps. If B is missing, it returns ErrorKind::Requires { name: "A", requires: "B" }.

Step 3: Parameter Extraction & Value Resolution

If validation passes, the generated closure extracts parameters one by one. clish resolves values according to a strict priority list:

graph TD
    Start[Resolve Parameter Value] --> CheckCLI{Provided on CLI?}
    CheckCLI -->|Yes| UseCLI[Use CLI Value]
    CheckCLI -->|No| CheckEnv{Has 'env' attribute & env var set?}

    CheckEnv -->|Yes| UseEnv[Use ENV Value]
    CheckEnv -->|No| CheckDefault{Has 'default' attribute?}

    CheckDefault -->|Yes| UseDefault[Use Default Value]
    CheckDefault -->|No| CheckRequired{Is parameter required?}

    CheckRequired -->|Yes| Err[Return Missing Error]
    CheckRequired -->|No| UseNone[Resolve to None / Empty Vector]

1. Extracting Raw Tokens

The runtime uses helper functions inside clish-core::command::parse to extract string slices based on parameter indices or keys: * parse_required(&parsed, index): Retrieves a positional argument from the positional vector at the given index. If index is out of bounds, returns ErrorKind::MissingArgument. * parse_optional(&parsed, index): Retrieves a positional argument as Option<String>. Returns None if out of bounds. * parse_variadic(&parsed, start_index): Collects all positional arguments from start_index to the end of the vector into a Vec<String>. * parse_named(&parsed, name): Retrieves the first value associated with name in the named options map. * parse_named_many(&parsed, name): Retrieves all values associated with name as a Vec<String> (supporting repeatable options like --tag a --tag b). * parse_flag(&parsed, name): Returns true if name is present in the flags set.

2. Resolving Fallbacks

If a parameter value is not present on the CLI, the extractor checks: 1. Environment Fallback: If an env key is configured, it calls std::env::var(env_var_name). If set, that value is used. 2. Default Value: If a default value is configured in the attribute, the parser uses the default string representation.

3. Parsing into Target Types (FromStr)

For any type other than String, clish attempts to parse the resolved string value into the target Rust type:

let value = raw_string_val
    .parse::<T>()
    .map_err(|_| ErrorKind::invalid_value(raw_string_val, std::any::type_name::<T>()))?;

Because this step relies on Rust's standard FromStr trait, you can use any custom type as a parameter in clish as long as your type implements FromStr. If parsing fails, it yields an InvalidValue error, which explains what token failed and what type was expected.


Next, let's look at the Error Pipeline to see how parsing errors are serialized, returned, and rendered in the terminal.