Paper status: completed

Ambiguity, Nondeterminism and State Complexity of Finite Automata

Ambiguity Analysis of Finite Automata (1)Nondeterminism Measures in Finite Automata (1)State Complexity of Finite Automata (1)Comparisons of Nondeterministic Finite Automata (1)

Original Link

Price: 0.100000

2 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper surveys measures of ambiguity and nondeterminism in finite automata, focusing on their impact on state complexity, and reveals how increased ambiguity or nondeterminism can reduce the number of states needed in NFAs.

Abstract

Acta Cybernetica 23 (2017) 141–157. Ambiguity, Nondeterminism and State Complexity of Finite Automata Yo-Sub Han a , Arto Salomaa b , and Kai Salomaa c Abstract The degree of ambiguity counts the number of accepting computations of a nondeterministic finite automaton (NFA) on a given input. Alternatively, the nondeterminism of an NFA can be measured by counting the amount of guessing in a single computation or the number of leaves of the computation tree on a given input. This paper surveys work on the degree of ambiguity and on various nondeterminism measures for finite automata. In particu- lar, we focus on state complexity comparisons between NFAs with quantified ambiguity or nondeterminism. Keywords: finite automata, nondeterminism, degree of ambiguity, state complexity Dedicated to the memory of Zolt´ an ´ Esik (1951–2016). 1 Introduction Finite automata are a fundamental model of computation that has been system- atically studied since the 1950’s. At the same time many important questions on finite automata and regular languages remain open [7, 18, 52]. The last decades have seen much work on the descriptional complexity, or state complexity, of r

Mind Map

In-depth Reading

English Analysis~23 min read · 30,169 chars

1. Bibliographic Information

1.1. Title

Ambiguity, Nondeterminism and State Complexity of Finite Automata

1.2. Authors

Yo-Sub Han: Affiliated with the Basic Science Research Program through NRF funded by MEST (2015R1D1A1A01060097) and the Yonsei University Future-leading Research Initiative of 2016.
Arto Salomaa
Kai Salomaa: Supported by the Natural Sciences and Engineering Research Council of Canada Grant OGP0147224.

1.3. Journal/Conference

This paper was published in Acta Cybernetica, Vol. 25, No. 2, pp. 241–266, 2021. Acta Cybernetica is a well-regarded journal in theoretical computer science, focusing on areas such as automata theory, formal languages, and computational complexity.

1.4. Publication Year

2021 (based on confirmed publication in Acta Cybernetica).

1.5. Abstract

The paper surveys existing research on the concepts of ambiguity and nondeterminism in nondeterministic finite automata (NFAs). It defines the degree of ambiguity as the number of accepting computations for a given input string. It also explores various measures of nondeterminism, such as counting the amount of guessing in a single computation or the number of leaves in a computation tree. The primary focus of the survey is to compare the state complexity of NFAs when their ambiguity or nondeterminism is quantified, meaning how the minimum number of states required to recognize a language changes based on these measures.

1.6. Original Source Link

The official source link for the paper is: https://toc.yonsei.ac.kr/~emmous/papers/actacyb_amb_sc.pdf This is an officially published paper in Acta Cybernetica.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper addresses revolves around understanding and quantifying the descriptive power of nondeterministic finite automata (NFAs) compared to deterministic finite automata (DFAs) and other variants. While DFAs and NFAs recognize the same class of languages (regular languages), NFAs can often represent these languages more compactly, requiring fewer states. This succinctness comes from nondeterminism and ambiguity.

The problem is important because finite automata are fundamental models in computation, and understanding their efficiency and complexity is crucial for designing efficient algorithms and systems. Previous research has explored state complexity (the minimum number of states required) for DFAs and NFAs. However, nondeterminism itself can be measured in various ways, and the ambiguity of an NFA (how many ways a string can be accepted) adds another layer of complexity.

Specific challenges or gaps in prior research include systematically comparing state complexity across different degrees of ambiguity and different amounts of nondeterminism, especially when these measures are unbounded or grow as a function of the input length. The paper's entry point is to survey and synthesize the existing work in these areas, highlighting known separation results and identifying open problems for future study.

2.2. Main Contributions / Findings

The paper's primary contributions are:

Comprehensive Survey: It provides a thorough overview of research on degree of ambiguity and various nondeterminism measures for finite automata.
Focus on State Complexity Comparisons: It systematically reviews the state complexity trade-offs between NFAs with different quantified levels of ambiguity (e.g., unambiguous, finitely ambiguous, polynomially ambiguous, exponentially ambiguous) and nondeterminism (e.g., finite branching, tree width).
Highlighting Separation Results: The paper emphasizes key theoretical results demonstrating super-polynomial separations in state complexity between different classes of NFAs. For instance, it notes that:
- UFAs can be exponentially more succinct than DFAs.
- Finitly Ambiguous NFAs (FNFAs) can be exponentially more succinct than UFAs.
- Polynomially Ambiguous NFAs (PNFAs) can be super-polynomially more succinct than FNFAs.
- General NFAs (potentially exponentially ambiguous) can be super-polynomially more succinct than PNFAs.
Quantifying Nondeterminism: It details various measures of nondeterminism, such as guessing, branching, trace, and tree width, and their relationships.
Spectrum Result for Finite Nondeterminism: It presents the "spectrum" result by Goldstine et al. [11], showing that different finite amounts of branching in an NFA can lead to incremental savings in state complexity, often with precise bounds.
Identification of Open Problems: The paper concludes by pointing out significant gaps in current knowledge, particularly regarding state complexity comparisons for NFAs where the amount of nondeterminism is unbounded and measured as a function of input length. It suggests areas for future research, including applying communication complexity techniques to these unbounded measures and comparing NFAs based on both ambiguity and nondeterminism.

These findings solve the problem of consolidating scattered research, providing a clear landscape of known results, and charting a path for future exploration in the descriptional complexity of finite automata.

3.1. Foundational Concepts

To understand this paper, a reader needs a basic grasp of automata theory fundamentals and computational complexity concepts.

Finite Automata (FA): A mathematical model of computation.
- Deterministic Finite Automaton (DFA): An FA where for each state and input symbol, there is exactly one transition to a next state. This means the computation path for any given input string is unique.
  - Formal Definition: A DFA is a tuple $A = (Q, \Sigma, \delta, q_0, F)$ $A = (Q, Σ, δ, q_{0}, F)$ , where:
    - $Q$ : A finite set of states.
    - $\Sigma$ : A finite set of input symbols, called the alphabet.
    - $\delta: Q \times \Sigma \to Q$ : The transition function. For a given state $q \in Q$ and input symbol $a \in \Sigma$ , $\delta(q, a)$ gives a unique next state.
    - $q_0 \in Q$ : The initial state.
    - $F \subseteq Q$ : The set of final (or accepting) states.
- Nondeterministic Finite Automaton (NFA): An FA where for each state and input symbol, there can be zero, one, or multiple transitions to next states. This introduces the possibility of multiple computation paths for a single input string. An NFA accepts a string if at least one of its possible computation paths leads to a final state after reading the entire string.
  - Formal Definition: An NFA is a tuple $A = (Q, \Sigma, \delta, q_0, F)$ $A = (Q, Σ, δ, q_{0}, F)$ , where:
    - $Q$ : A finite set of states.
    - $\Sigma$ : A finite set of input symbols, the alphabet.
    - $\delta: Q \times \Sigma \to 2^Q$ : The transition function. For a given state $q \in Q$ and input symbol $a \in \Sigma$ , $\delta(q, a)$ returns a set of possible next states (which can be empty). $2^Q$ denotes the power set of $Q$ .
    - $q_0 \in Q$ : The initial state.
    - $F \subseteq Q$ : The set of final states.
Regular Languages: A class of formal languages that can be recognized by finite automata (both DFAs and NFAs recognize exactly the same class of languages).
State Complexity: For a given regular language $L$ , the state complexity of $L$ is the minimum number of states required for a DFA to recognize $L$ . Similarly, nondeterministic state complexity is the minimum number of states required for an NFA to recognize $L$ . The paper extends this to compare NFAs with different degrees of ambiguity or nondeterminism.
Ambiguity: In the context of NFAs, ambiguity refers to the number of distinct accepting computations an NFA can have for a given input string.
- An unambiguous NFA (UFA) has at most one accepting computation for any input string.
- A finitely ambiguous NFA (FNFA) has a bounded maximum number of accepting computations for any input string, regardless of its length.
- A polynomially ambiguous NFA (PNFA) has a maximum number of accepting computations that grows polynomially with the input string's length.
- An exponentially ambiguous NFA has a maximum number of accepting computations that grows exponentially with the input string's length.
Nondeterminism Measures: Different ways to quantify the "amount of guessing" or "choices" an NFA makes. These include guessing, branching, trace, and tree width, which are formally defined in the methodology section.
Regular Expressions: A sequence of characters that define a search pattern, often used to describe regular languages. There's a relationship between regular expressions and finite automata.
Communication Complexity: A field in theoretical computer science that studies the amount of communication required between two or more parties to compute a function whose inputs are distributed among them. This technique is mentioned as a powerful tool for proving lower bounds in state complexity.
Super-polynomially separated: A term used in state complexity to indicate that simulating a device from one class (e.g., NFA) with a device from another class (e.g., PNFA) results in an increase in the number of states that grows faster than any polynomial function. For example, if an $n$ -state NFA requires $2^n$ states in a PNFA, they are super-polynomially separated.

3.2. Previous Works

The paper extensively references prior research, forming the backbone of its survey. Key works include:

Maslov (1970) [34]: One of the earliest works to consider the state complexity of basic operations on regular languages. This laid the groundwork for quantifying the size of finite automata.
Book et al. (1971) [3]: First systematically considered ambiguity in regular expressions and finite state machines, and the relationship between these concepts. They established fundamental ideas about how ambiguity is defined and preserved.
Brüggemann-Klein and Wood (1998) [4]: Introduced the more restrictive notion of one-unambiguity (or 1-determinism) for regular expressions, where the position automaton is deterministic. This showed that not all regular languages can be denoted by 1-unambiguous regular expressions, contrasting with the fact that all have unambiguous regular expressions.
Ravikumar and Ibarra (1989) [42]: Systematically studied size trade-offs between unambiguous, finitely ambiguous, polynomially ambiguous, and exponentially ambiguous NFAs. They were pioneers in comparing state complexity based on degrees of ambiguity, establishing the concept of super-polynomial separation.
Leung (1998, 2005) [30, 32]: Provided critical separation results. Leung [30] established that general NFAs can be super-polynomially separated from PNFAs, constructing $n$ -state NFAs for which any equivalent PNFA needs $2^n-1$ states. Leung [32] also showed that finitely ambiguous NFAs can be exponentially more succinct than UFAs ( $2^n-1$ states for an $n$ -state FNFA).
Hromkovič et al. (2002, 2011) [19, 20]: Used powerful communication complexity techniques to prove state complexity separations. Hromkovič et al. [19] provided a simplified proof for the NFA-to-PNFA separation (Theorem 6). Hromkovič and Schnitger [20] solved the long-standing open question by proving a super-polynomial separation between PNFAs and FNFAs (Theorem 7), and more generally between NFAs with different polynomial degrees of growth for ambiguity (Theorem 8). They also studied nondeterminism measures like tree width and advice.
Mandel and Simon (1977) [33] & Reutenauer (1977) [43]: Showed that it's decidable whether an NFA is finitely ambiguous or polynomially ambiguous, and Reutenauer provided an algorithm to compute the polynomial degree of growth.
Weber and Seidl (1991) [50]: Provided a simpler structural characterization of finitely ambiguous and polynomially ambiguous NFAs, leading to polynomial time algorithms for deciding these properties and computing the polynomial degree of growth. They also improved the upper bound for the maximal finite degree of ambiguity of an $n$ -state FNFA.
Kintala and Fischer (1980) [25] & Kintala and Wotschke (1980) [26]: Early work on quantifying nondeterminism. Kintala and Wotschke first quantified nondeterminism in finite automata computations and observed a significant difference in determinization size blow-up for NFAs with different finite numbers of nondeterministic choices.
Goldstine et al. (1990, 1992, 2002) [10, 11, 12]: Conducted extensive research on descriptional complexity and nondeterminism. Goldstine et al. [11] established the "spectrum result" for finite branching (Theorems 17 & 18), showing incremental savings in state complexity for NFAs with different finite branching amounts. They also showed an exponential blow-up for converting a general NFA to a finite branching NFA. Goldstine et al. [12] explored the subtle relationship between ambiguity and guessing.
Palioudakis et al. (2012, 2013, 2014, 2015) [38, 39, 40, 41]: Researched tree width and other nondeterminism measures. They characterized state complexity for NFAs with finite tree width (Theorem 19), compared nondeterminism measures, and studied bounds for converting NFAs to MDFAs.
Kappes (2000) [23]: Showed that an NFA with $n$ states and branching $k$ can be simulated by a deterministic finite automaton with multiple initial states (MDFA) with $k \cdot n$ states and $k$ initial states.

3.3. Technological Evolution

The study of finite automata began in the 1950s as a foundational model of computation. Initially, the focus was on their expressive power (what languages they can recognize). State complexity emerged as a field to quantify the size of automata.

The evolution of understanding nondeterminism and ambiguity in finite automata can be traced as follows:

Early 1970s: Initial systematic studies of ambiguity in regular expressions and NFAs (Book et al.). The concept of unambiguous regular expressions was introduced.
Early 1980s: Quantifying nondeterminism directly for finite automata computations (Kintala and Wotschke), leading to observations about determinization size blow-up.
Late 1980s - Early 1990s: Systematic comparisons of state complexity for NFAs with different degrees of ambiguity (Ravikumar and Ibarra). Decidability results for finite and polynomial ambiguity (Mandel and Simon, Reutenauer, Weber and Seidl). The "spectrum" of nondeterminism for finite branching NFAs (Goldstine et al.).
Late 1990s - 2000s: Major separation results proved, showing super-polynomial succinctness for various classes of NFAs (Leung, Hromkovič et al.). Introduction of communication complexity techniques for proving state complexity lower bounds. Exploration of different nondeterminism measures like tree width and their relationships.
2010s - Present: Refinements of bounds, studies on unary alphabets, operations on UFAs, and descriptional complexity for input-driven pushdown automata. Continued identification of open problems concerning unbounded nondeterminism.

This paper's work fits into the current state of this timeline by providing a concise summary of these developments, synthesizing the known state complexity comparisons, and clearly outlining the remaining open questions in the field.

3.4. Differentiation Analysis

Compared to other surveys or foundational texts, this paper's core differentiation lies in its specific focus and synthesis:

Joint Consideration of Ambiguity and Nondeterminism: Many works might focus on one aspect or the other. This paper explicitly brings together the degree of ambiguity and various nondeterminism measures, analyzing their relationships and combined impact on state complexity.
Emphasis on State Complexity Comparisons and Separation Results: The paper is not just a definition of terms but a comprehensive review of the succinctness hierarchies among different NFA classes. It meticulously details exponential and super-polynomial separation results, which are crucial for understanding the inherent computational power gains from nondeterminism and ambiguity.
Clear Identification of Open Problems: A significant contribution is the explicit enumeration of open research questions, particularly concerning unbounded nondeterminism and the application of communication complexity to these problems. This provides a roadmap for future research in the field.
Beginner-Friendly Definitions: While rigorous, the paper introduces and formally defines all the key measures (degree of ambiguity, guessing, branching, tree width, etc.), making it accessible to those new to the specific sub-field of descriptional complexity.

In essence, while the individual results surveyed are from various researchers, the paper's innovation is in its structured, comparative synthesis and its forward-looking perspective on the remaining theoretical challenges.

4. Methodology

The paper is a survey, so it doesn't propose a new methodology in the experimental sense. Instead, its "methodology" is to rigorously define the concepts and measures it surveys, and then present the known theoretical results (theorems and propositions) that compare these measures in terms of state complexity. This section will detail those definitions and the underlying principles.

4.1. Principles

The core principle guiding the analysis in this paper is the quantitative comparison of different forms of nondeterminism and ambiguity in finite automata through their state complexity. This involves:

Formal Definition: Precisely defining various metrics for ambiguity and nondeterminism.
Classification: Categorizing NFAs based on the growth rate or boundedness of these metrics (e.g., finitely ambiguous, polynomially ambiguous).
Succinctness Comparison: Investigating the minimum number of states required for equivalent automata belonging to different classes, often leading to separation results (e.g., exponential blow-ups).
Relationships: Exploring mathematical relationships and bounds between different ambiguity and nondeterminism measures.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Basic Definitions and Notations

Set of Positive Integers: $\mathbb{N}$
Cardinality of a Set: $|F|$ for a finite set $F$ .
Alphabet: A finite set of symbols, $\Sigma$ .
Set of Strings: $\Sigma^*$ over an alphabet $\Sigma$ .
Empty String: $\varepsilon$ .
Bounded Language: A subset of $a_1^* a_2^* \cdots a_k^*$ , where $a_i \in \Sigma$ are (not necessarily distinct) elements.
Nondeterministic Finite Automaton (NFA): An NFA $A$ is formally defined as a tuple $A = (Q, \Sigma, \delta, q_0, F)$ :
- $Q$ : A finite set of states.
- $\Sigma$ : The input alphabet.
- $\delta: Q \times \Sigma \to 2^Q$ : The transition function. This function maps a state and an input symbol to a set of possible next states. For example, $\delta(q, b)$ returns the set of states reachable from $q$ on input $b$ .
- $q_0 \in Q$ : The initial state.
- $F \subseteq Q$ : The set of final (or accepting) states.
Extended Transition Function: The transition function $\delta$ is extended to handle strings: $\delta: Q \times \Sigma^* \to 2^Q$ . For a state $q$ and string $w$ , $\delta(q, w)$ is the set of all states reachable from $q$ by reading $w$ .
Language Recognized by an NFA: The language L(A) recognized by NFA $A$ is the set of all strings $w$ for which at least one computation path starting from $q_0$ and reading $w$ ends in an accepting state. $L(A) = \{w \in \Sigma^* \mid \delta(q_0, w) \cap F \neq \emptyset\}$ .
Deterministic Finite Automaton (DFA): A DFA is a special case of an NFA where for all $q \in Q$ and $b \in \Sigma$ , $|\delta(q, b)| \leq 1$ . This means there is at most one next state for any given state-symbol pair. The paper notes that DFAs can have undefined transitions (i.e., $|\delta(q,b)|=0$ for some pairs).
State Complexity of a Language:
- State complexity of a regular language $L$ : The number of states in the minimal DFA recognizing $L$ .
- Nondeterministic state complexity of a regular language $L$ : The number of states in a state-minimal NFA recognizing $L$ .
Computation of an NFA: For a string $w = b_1 b_2 \cdots b_k$ (where $b_i \in \Sigma$ and $k \geq 0$ ), a computation of $A$ on $w$ is a sequence of states $(p_1, \ldots, p_\ell)$ such that:
- $p_1 \in \delta(q_0, b_1)$ (the first state is reachable from the initial state via the first symbol).
- $p_{j+1} \in \delta(p_j, b_{j+1})$ for $j = 1, \ldots, \ell-1$ (each subsequent state is reachable from the previous one via the next symbol).
- Either $\ell = k$ (a complete computation), or $\ell < k$ and $\delta(p_\ell, b_{\ell+1}) = \emptyset$ (the computation halts prematurely if no transition is defined).
Accepting Computation: A complete computation (where $\ell = k$ ) that ends in a final state ( $p_k \in F$ ).
Set of Computations: $\mathrm{comp}_A(w)$ is the set of all computations of $A$ on $w$ .
Set of Accepting Computations: $\mathrm{comp}_A^{\mathrm{acc}}(w)$ is the set of all accepting computations of $A$ on $w$ .

4.2.2. Degree of Ambiguity

The degree of ambiguity quantifies how many ways an NFA can accept a given string.

Degree of Ambiguity of an NFA on a String $w$ : $\mathrm{da}_A(w)$ is defined as the number of accepting computations of $A$ on $w$ . $\mathrm{da}_A(w) = |\mathrm{comp}_A^{\mathrm{acc}}(w)|$
- Explanation: This counts all distinct paths from the initial state to a final state, consuming the entire string $w$ .
Degree of Ambiguity of an NFA on Strings of Length $m$ : $\mathrm{da}_A(m)$ is defined as the maximum degree of ambiguity for any string of length $m$ . $\mathrm{da}_A(m) = \operatorname*{max} \{\mathrm{da}_A(w) \mid w \in \Sigma^m\}$
- Explanation: This provides a measure of ambiguity that depends on the length of the input string.
Finite (Bounded) Ambiguity: An NFA $A$ is finitely ambiguous if the values $\mathrm{da}_A(m)$ for all $m \in \mathbb{N}$ are bounded by some constant. The maximum such bound is denoted: $\mathrm{da}_A^{\mathrm{sup}} = \operatorname*{sup}_{m \in \mathbb{N}} \mathrm{da}_A(m)$
- Explanation: If an NFA is finitely ambiguous, there's a fixed upper limit on how many ways any string can be accepted, regardless of how long the string is.
Unambiguous NFA (UFA): An NFA $A$ is unambiguous if $\mathrm{da}_A^{\mathrm{sup}} = 1$ . This means any string has at most one accepting computation. Every DFA is inherently unambiguous.
Classes of NFAs by Ambiguity: Following Ravikumar and Ibarra [42], the paper categorizes NFAs into five classes based on their degree of ambiguity growth rate:
1. DFAs: Deterministic Finite Automata.
2. UFAs: Unambiguous NFAs ( $\mathrm{da}_A^{\mathrm{sup}} = 1$ ).
3. FNFAs: Finitely Ambiguous NFAs ( $\mathrm{da}_A^{\mathrm{sup}}$ is a finite constant $>1$ ).
4. PNFAs: Polynomially Ambiguous NFAs. An NFA $A$ is strictly polynomially ambiguous if it is not finitely ambiguous and there exists a polynomial $p(\cdot)$ such that $\mathrm{da}_A(m) \leq p(m)$ for all $m \in \mathbb{N}$ . The polynomial degree of growth is the minimal degree of such a polynomial.
5. General (Potentially Exponentially Ambiguous) NFAs: An NFA $A$ is strictly exponentially ambiguous if it is not polynomially ambiguous, meaning $\mathrm{da}_A(m)$ grows faster than any polynomial.

4.2.3. Nondeterminism Measures

These measures quantify the "amount of guessing" or "choices" made by an NFA.

Branching of a Transition: For a state $q \in Q$ and input symbol $b \in \Sigma$ , the branching is $|\delta(q, b)|$ , which is the number of possible next states.
Guessing of a Computation ( $\gamma_A(C)$ ): For a computation $C = (p_1, \ldots, p_\ell)$ on a string $w = b_1 b_2 \cdots b_k$ , the guessing $\gamma_A(C)$ (introduced by Goldstine et al. [11]) is defined as: $\gamma_A(C) = \log_2 {|\delta(q_0, b_1)|} + \sum_{i=1}^{\ell-1} \log_2 {|\delta(p_i, b_{i+1})|}$
- Explanation: This measure quantifies the amount of guessing in bits of information.
  - $\log_2 {|\delta(q_0, b_1)|}$ : Represents the number of bits needed to choose the first state $p_1$ from the set of possible initial transitions (i.e., $\delta(q_0, b_1)$ ).
  - $\sum_{i=1}^{\ell-1} \log_2 {|\delta(p_i, b_{i+1})|}$ : Sums the bits needed to choose the next state at each subsequent nondeterministic step $i$ along the computation path $C$ .
  - If an NFA is a DFA, all branching factors $|\delta(q,b)|$ are 1, so $\log_2(1)=0$ , and the guessing is 0.
Branching of a Computation ( $\beta_A(C)$ ): The branching $\beta_A(C)$ is defined as $2^{\gamma_A(C)}$ .
- Explanation: This is the product of the branching factors of the individual transitions along the computation path $C$ . It represents the total number of choices made along that specific path.
Guessing of a String $w \in L(A)$ ( $\gamma_A(w)$ ): This is a "best-case" measure, defined as the guessing of the best accepting computation for $w$ . $\gamma_A(w) = \mathrm{min} \{\gamma_A(C) \mid C \in \mathrm{comp}_A^{\mathrm{acc}}(w)\}$
- Explanation: For a string accepted by the NFA, this finds the accepting computation that involves the minimum amount of guessing.
Maximum Guessing of a String $w \in \Sigma^*$ ( $\gamma_A^{\mathrm{max}}(w)$ ): This is a "worst-case" measure, defined as the maximum guessing over all computations (not necessarily complete or accepting) for $w$ . $\gamma_A^{\mathrm{max}}(w) = \operatorname*{max} \{\gamma_A(C) \mid C \in \mathrm{comp}_A(w)\}$
- Explanation: This captures the maximum amount of nondeterminism an NFA might encounter on any path for a given input, regardless of whether it leads to acceptance or completion.
Branching of a String $w$ ( $\beta_A(w)$ ): This corresponds to the best-case guessing. $\beta_A(w) = 2^{\gamma_A(w)}$
Trace of a String $w$ ( $\tau_A(w)$ ): This corresponds to the worst-case guessing. $\tau_A(w) = 2^{\gamma_A^{\mathrm{max}}(w)}$
Tree Width of a String $w$ ( $\mathrm{tw}_A(w)$ ): This measure quantifies the total amount of nondeterminism by counting all possible computation paths. $\mathrm{tw}_A(w) = |\mathrm{comp}_A(w)|$
- Explanation: This is equivalent to the number of leaves in the computation tree of $A$ on $w$ . It's also called 'leaf size' in some literature.
Measures as Functions of Length ( $\chi_A(m)$ ): Similar to degree of ambiguity, these nondeterminism measures can be expressed as functions of input length $m$ by taking the maximum value over all strings of that length. If $\chi$ is any of tw, $\gamma$ , $\gamma^{\mathrm{max}}$ , $\beta$ , or $\tau$ , then $\chi_A(m)$ is: $\chi_A(m) = \operatorname*{max} \{\chi_A(w) \mid w \in \Sigma^m\}$
- Explanation: This allows for characterizing the growth rate of nondeterminism with input length.
Finite $\chi$ Function: The $\chi$ function of $A$ is finite if $\chi_A^{\mathrm{sup}} = \operatorname*{sup}_{m \in \mathbb{N}} \chi_A(m)$ is finite.

4.2.4. NFA Class Separations

The paper defines what it means for one class of automata to be super-polynomially separated from another.

Super-Polynomially Separated: Consider two classes of automata, $X$ $X$ and $Y$ $Y$ . Class $Y$ $Y$ is super-polynomially separated from class $X$ $X$ if there exists a family of languages $L_n$ $L_{n}$ (for $n \in \mathbb{N}$ $n \in N$ ) such that:
1. Each $L_n$ can be recognized by an $n$ -state automaton from class $Y$ .
2. For any polynomial p(n) and sufficiently large $n$ , any automaton from class $X$ recognizing $L_n$ requires more than p(n) states.
- Explanation: This means that converting an automaton from class $Y$ to an equivalent one in class $X$ results in an exponential or super-polynomial size blow-up in the worst case. This quantifies the succinctness advantage of class $Y$ over class $X$ .

5. Experimental Setup

As a survey paper focusing on theoretical concepts and existing results in state complexity, this paper does not describe a traditional experimental setup involving datasets, evaluation metrics, and baselines in the empirical sense. Instead, the "setup" of this field of study relies on:

5.1. Datasets

In the context of descriptional complexity research, "datasets" are regular languages designed to demonstrate specific state complexity lower bounds or separation results. These languages are not collections of real-world data but rather carefully constructed formal languages.

Example from the Paper: The paper mentions the family of languages $L_n = (0 + (0 1^*) ^{n-1} 0)^*$ $L_{n} = (0 + (0 1^{*})^{n - 1} 0)^{*}$ , for $n \geq 1$ $n \geq 1$ , used by Leung [30].
- Source: This is a theoretically constructed language.
- Characteristics: It's a regular language. It's designed to be recognized by a small n-state NFA that is exponentially ambiguous, but which requires a much larger PNFA to recognize it.
- Example Data Sample (for $n=2$ ): For $n=2$ , $L_2 = (0 + (01^*) 0)^*$ . Strings in $L_2$ could include: $\varepsilon$ , 0, 00, 010, 000, 0100, 0010, 0110, etc. The NFA for $L_n$ is designed such that for certain strings, the number of accepting computations grows exponentially with the length or with $n$ .
- Purpose: Such languages are chosen because they exhibit the worst-case behavior for the state complexity transformation being studied (e.g., NFA to PNFA). They are effective for validating theoretical bounds rather than empirical performance.

5.2. Evaluation Metrics

The primary evaluation metric throughout this research area is state complexity, which refers to the minimum number of states required by an automaton to recognize a given language.

State Complexity (SC):
1. Conceptual Definition: State complexity quantifies the size of the minimal automaton (either DFA or NFA) required to recognize a particular regular language. When comparing different classes of automata, it measures the succinctness advantage of one class over another. A higher state complexity for an equivalent automaton in a different class indicates a size blow-up.
2. Mathematical Formula: While there isn't a single universal formula, state complexity is typically expressed as a function of the number of states $n$ in the original automaton. For instance, when converting an $n$ -state NFA to an equivalent DFA, the state complexity is often bounded by $2^n$ .
3. Symbol Explanation:
  - $n$ : The number of states in the initial (source) automaton.
  - State complexity result: Often expressed as O(f(n)), f(n), or specific bounds like $2^n-1$ or $1 + \sum_{j=1}^{k} \binom{n-1}{j}$ , indicating the number of states in the target automaton.
Degree of Ambiguity Measures:
- $\mathrm{da}_A(w)$ : Number of accepting computations for string $w$ .
- $\mathrm{da}_A(m)$ : Maximum degree of ambiguity for strings of length $m$ .
- $\mathrm{da}_A^{\mathrm{sup}}$ : Finite maximum degree of ambiguity.
- These are used to classify NFAs and are then correlated with state complexity.
Nondeterminism Measures:
- $\gamma_A(C)$ : Guessing of a computation $C$ (in bits).
- $\beta_A(C)$ : Branching of a computation $C$ (number of choices).
- $\gamma_A(w)$ : Best-case guessing for string $w$ .
- $\gamma_A^{\mathrm{max}}(w)$ : Worst-case guessing for string $w$ .
- $\beta_A(w)$ : Best-case branching for string $w$ .
- $\tau_A(w)$ : Trace (worst-case branching) for string $w$ .
- $\mathrm{tw}_A(w)$ : Tree width (number of computations) for string $w$ .
- These measures are quantified for NFAs and then their impact on state complexity is analyzed.

5.3. Baselines

In state complexity research, "baselines" are not competitor models in an experimental sense, but rather the alternative classes of finite automata against which the succinctness of a given NFA class is compared. The paper implicitly compares:

Unambiguous NFAs (UFAs) against DFAs.
Finitely Ambiguous NFAs (FNFAs) against UFAs.
Polynomially Ambiguous NFAs (PNFAs) against FNFAs.
General NFAs against PNFAs.
NFAs with different finite branching amounts against each other.
NFAs with finite tree width against DFAs.
NFAs with limited nondeterminism against Deterministic Finite Automata with Multiple Initial States (MDFAs).

The goal is to establish how much nondeterminism or ambiguity can reduce the number of states required to recognize a language, by comparing the state complexity of an NFA from one class to an equivalent NFA from a "less powerful" or "more restricted" class.

6. Results & Analysis

This section synthesizes the key findings presented in the paper, focusing on the state complexity comparisons for different degrees of ambiguity and nondeterminism.

6.1. Core Results Analysis

The paper's results are presented as theorems and observations, establishing decision problems, upper bounds, and crucial separation results regarding the state complexity of NFAs with quantified ambiguity or nondeterminism.

6.1.1. Degree of Ambiguity: Characterization and Complexity

Decidability of Ambiguity Type:
- Theorem 1 (Weber and Seidl [50]): It is decidable in polynomial time whether a given NFA is finitely ambiguous, strictly polynomially ambiguous, or strictly exponentially ambiguous. Furthermore, the polynomial degree of growth can be computed in polynomial time. This is a foundational result, allowing for algorithmic classification of NFAs by their ambiguity.
Complexity of Testing Ambiguity Degree:
- Theorem 2 (Chan and Ibarra [5]): For a given NFA $A$ and an integer $k \in \mathbb{N}$ , testing whether the degree of ambiguity of $A$ is at least $k$ is PSPACE-complete. This implies that determining the exact finite degree of ambiguity for an arbitrary $k$ is computationally hard.
Upper Bound for Finite Ambiguity:
- Theorem 3 (Weber and Seidl [50]): The degree of ambiguity of an $n$ -state FNFA is at most $5^{\frac{n}{2}} \cdot n^n$ . This provides an upper bound on how large the number of accepting computations can be for an NFA that is finitely ambiguous.

6.1.2. Ambiguity and State Complexity Separations

These results demonstrate the succinctness advantages of NFAs with higher degrees of ambiguity over those with lower degrees.

UFA vs. DFA / FNFA vs. UFA:
- Theorem 4 (Leiss [29], Leung [32]):
  - For each $n \in \mathbb{N}$ , there exists an UFA with $n$ states such that the minimal equivalent DFA has $2^n - 1$ states. (This highlights an exponential size blow-up during determinization of UFAs).
  - For each $n \in \mathbb{N}$ , there exists an FNFA with $n$ states such that any equivalent UFA has $2^n - 1$ states. (This shows an exponential succinctness advantage for FNFAs over UFAs).
Bounded Languages (Special Case):
- Theorem 5 (Ravikumar and Ibarra [42]): Any NFA recognizing a bounded language can be converted to an FNFA with at most polynomial size blow-up. However, the classes of FNFAs (respectively, UFAs) recognizing bounded languages are super-polynomially separated from UFAs (respectively, DFAs). This suggests that for bounded languages, while ambiguities can be reduced to finite, there's still a significant cost to strict unambiguity or determinism.
General NFA vs. PNFA:
- Theorem 6 (Leung [30], Hromkovič et al. [19]): The class of NFAs is super-polynomially separated from the class of PNFAs. This means exponentially ambiguous NFAs can be drastically more succinct than polynomially ambiguous ones. Leung's original proof yields an optimal $2^n-1$ state blow-up for the transformation.
PNFA vs. FNFA and Different Polynomial Degrees:
- Theorem 7 (Hromkovič and Schnitger [20]): For $n \in \mathbb{N}$ , there exists a PNFA $A$ with a number of states polynomial in $n$ such that any FNFA recognizing L(A) has at least $2^{\Omega(n^{1/3})}$ states. This provides a super-polynomial separation between polynomially ambiguous and finitely ambiguous NFAs.
- Theorem 8 (Hromkovič and Schnitger [20]): This generalizes Theorem 7. For positive integers $r$ and $t = (r/k^2)^{1/3}$ , there exist languages $L_{r,k}$ recognized by NFAs with degree of ambiguity $O(m^k)$ and $k \cdot \mathrm{poly}(r)$ states, such that any NFA for $L_{r,k}$ with degree of ambiguity $o(m^k)$ (i.e., of a lower polynomial degree or finite) requires at least $2^{\Omega(r^{(1/3}/k^{5/3)})}$ states. This establishes a super-polynomial separation between NFAs having different polynomial degrees of growth for ambiguity.

6.1.3. Limited Nondeterminism: Growth Rates and Complexity

Tree Width Growth Rates:
- Theorem 9 (Hromkovič et al. [19]): For any NFA $A$ , the function $\mathrm{tw}_A(m)$ (tree width on strings of length $m$ ) is either bounded by a constant, between linear and polynomial in $m$ , or in2^{\Theta(m)}

(`exponential`). It cannot be unbounded and sublinear. This theorem provides a fundamental characterization of how `tree width` can grow.
*   **Complexity of Bounded Guessing**:
    *   **Theorem 10 (Leung [31])**: For a given `NFA`  $A$ , it is `PSPACE-complete` to decide whether  $\gamma_A(m)$  (`guessing` on strings of length  $m$ ) is bounded. This shows that determining if guessing remains finite is computationally hard.
*   **Sublinear Guessing Growth**:
    *   **Theorem 11 (Simon [48], Goldstine et al. [12])**: For each  $k \in \mathbb{N}$ , there is an `NFA`  $A$  such that  $\gamma_A(m) = \Theta(\sqrt[k]{m})$ . This proves that `guessing` can be unbounded but `sublinear`, a unique characteristic not seen with `tree width`. This also implies `branching`  $\beta_A(m)$  can be  $2^{\Theta(\sqrt[k]{m})}$ .
*   **Upper Bound for Finite Tree Width**:
    *   **Theorem 13 (Palioudakis et al. [38])**: The `tree width` of an  $n$ -state `finite tree width NFA` is at most  $2^{n-2}$ . Also, for every  $n \geq 2$  and  $1 \leq k \leq 2^{n-2}$ , there exists an  $n$ -state `NFA` with optimal `tree width`  $k$ .

### 6.1.4. Relationships Between Nondeterminism Measures and Ambiguity

*   **Proposition 1 (Palioudakis et al. [39])**: If  $A$  is an `NFA` with finite `tree width`, then

\mathrm{tw}_A^{\mathrm{sup}} \leq \tau_A^{\mathrm{sup}} \leq 2^{\mathrm{tw}_A^{\mathrm{sup}} - 1}

*   **Explanation**: This shows a direct relationship between the maximum `tree width` and the maximum `trace` (worst-case branching), indicating that `trace` can be exponentially larger than `tree width` in the worst case, but `tree width` is a lower bound for `trace`.
*   **Theorem 14 (Hromkovič et al. [19])**: If  $A$  is a `minimal NFA`, then for all  $m \in \mathbb{N}$

\operatorname*{max} (\gamma_A^{\operatorname*{max}}(m), \mathrm{da}_A(m)) \leq \mathrm{tw}_A(m) = O(\mathrm{da}_A(m) \cdot \gamma_A^{\operatorname*{max}}(m))
\$\$
*   **Explanation**: This relates `maximum guessing`, `degree of ambiguity`, and `tree width`. It shows that `tree width` upper bounds both `maximum guessing` and `degree of ambiguity`, and is itself bounded by their product.

Theorem 15 (Goldstine et al. [12]): Let $A$ be an NFA. If $\gamma_A(m)$ (guessing) is non-constant and sublinear, then the complete ambiguity of $A$ must be unbounded. Conversely, if $\gamma_A(m)$ is in $O(1)$ (bounded) or $\Theta(m)$ (linear), then complete ambiguity may be either bounded or unbounded. This reveals a subtle connection: sublinear, unbounded guessing forces unbounded ambiguity.

6.1.5. Limited Nondeterminism and State Complexity Separations

General NFA vs. Finite Branching NFA:
- Theorem 16 (Goldstine et al. [11]): For each $n \in \mathbb{N}$ , there exists an $n$ -state NFA $A$ such that any finite branching NFA recognizing L(A) needs at least $2^{n-1}$ states. This is an exponential separation between general NFAs and those restricted to finite branching.
Spectrum Result for Finite Branching: These theorems by Goldstine et al. [11] show how state complexity varies with different finite amounts of branching.
- Theorem 17 (Goldstine et al. [11]): Let $A$ $A$ be a minimal DFA of size $2^n - 1$ $2^{n} - 1$ , $n \geq 2$ $n \geq 2$ . Then:
  - (i) For $2 \leq k \leq \frac{n}{\log_2 n}$ , the optimal size of an NFA with branching $k$ for L(A) is at least $2^{\frac{n}{k}}$ .
  - (ii) For $k \geq \frac{n}{\log_2 n}$ , an NFA with branching $k$ for L(A) has size at least $n$ .
- Theorem 18 (Goldstine et al. [11]): For $n \geq 2$ $n \geq 2$ , there exists a minimal NFA $A_n$ $A_{n}$ with $n+1$ $n + 1$ states such that if $\sigma_n[k]$ $σ_{n} [k]$ denotes the optimal size of an NFA with branching $k$ $k$ recognizing $L(A_n)$ $L (A_{n})$ , then the following relations hold:
  - (i) $\sigma_n[1] = 2^n$ , and $2^{\frac{n}{k}} \leq \sigma_n[k] < 2k \cdot 2^{\frac{n}{k}}$ when $2 \leq k < \frac{n}{\log_2 n}$ .
  - (ii) $n+1 \leq \sigma_n[k] < 2k \cdot 2^{\frac{n}{k}}$ when $\frac{n}{\log_2 n} \leq k < n$ .
  - (iii) $n+1 \leq \sigma_n[k] < 4n$ , when $k \geq n$ .
  - Explanation: These theorems demonstrate a "spectrum" of state complexity reductions. As $k$ (the allowed branching) increases, the state complexity for NFAs recognizing certain languages decreases, showing a trade-off. For very low $k$ , the cost is exponential, but it reduces as $k$ approaches $n$ .
Finite Tree Width NFA vs. DFA:
- Theorem 19 (Palioudakis et al. [38]): For an NFA $A$ $A$ with $n$ $n$ states having tree width at most $k \leq n-1$ $k \leq n - 1$ , the language L(A) has a DFA of size $1 + \sum_{j=1}^{k} \binom{n}{j}$ $1 + \sum_{j = 1}^{k} (j n)$ . Furthermore, for every $1 \leq k \leq n-1$ $1 \leq k \leq n - 1$ , there exists an $n$ $n$ -state NFA $A_{n,k}$ $A_{n, k}$ with tree width $k$ $k$ over a binary alphabet such that the minimal DFA for $L(A_{n,k})$ $L (A_{n, k})$ has $1 + \sum_{j=1}^{k} \binom{n-1}{j}$ $1 + \sum_{j = 1}^{k} (j n - 1)$ states.
  - Explanation: Unlike general NFAs, NFAs with finite tree width can be determinized to DFAs with polynomial size blow-up (specifically, the sum of binomial coefficients, which is polynomial in $n$ for fixed $k$ ). This is a significant difference from the exponential $2^n$ blow-up for general NFAs.
NFA with Finite Branching vs. MDFA:
- Theorem 20 (Kappes [23]): An NFA with $n$ states and branching $k$ can be simulated by an MDFA (deterministic finite automaton with multiple initial states) with $k \cdot n$ states and $k$ initial states.
- Almost Matching Lower Bound (Palioudakis et al. [40]): For infinitely many $n, k \in \mathbb{N}$ $n, k \in N$ , there exists an $n$ $n$ -state NFA with branching $k$ $k$ such that any equivalent MDFA needs at least $\frac{k}{1 + \log k} \cdot n$ $\frac{k}{1 + l o g k} \cdot n$ states.
  - Explanation: This shows a linear trade-off between branching and the number of states/initial states for MDFAs, demonstrating that MDFAs are quite efficient in simulating finite branching.

6.2. Data Presentation (Tables)

The paper presents its results primarily through theorems and propositions, stating bounds and relationships in text rather than in formal tables. Therefore, there are no specific tables to transcribe from the paper for this section. The quantitative results are embedded within the theorem statements as described above.

6.3. Ablation Studies / Parameter Analysis

As a survey paper, this work does not present original experimental results or ablation studies. It reviews existing theoretical results that often involve parameter analysis (e.g., the effect of $k$ in branching $k$ on state complexity) as part of their proofs or constructions. The theorems listed in the previous section implicitly cover this type of analysis by showing how state complexity varies with parameters like $n$ (number of states) or $k$ (degree of ambiguity/nondeterminism).

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper provides a rigorous and comprehensive survey of the relationship between ambiguity, nondeterminism, and state complexity in finite automata. It meticulously defines various measures for quantifying both ambiguity (e.g., finite, polynomial, exponential) and nondeterminism (e.g., guessing, branching, tree width). A central theme is the descriptional complexity comparison, specifically how the succinctness (minimal number of states) of NFAs varies across different classes defined by these measures.

The survey highlights several key findings:

Super-polynomial separation results are well-established for NFAs with different degrees of ambiguity (e.g., general NFAs are super-polynomially more succinct than PNFAs, PNFAs are super-polynomially more succinct than FNFAs, and FNFAs are exponentially more succinct than UFAs).
For finite nondeterminism, the "spectrum" result by Goldstine et al. precisely quantifies the state complexity savings for NFAs with different finite branching values, showing an incremental trade-off.
NFAs with finite tree width have a polynomial determinization cost to DFAs, a stark contrast to the exponential cost for general NFAs.
Relationships and bounds between different nondeterminism measures (e.g., tree width and trace) and between ambiguity and guessing have been established.

Overall, the paper consolidates a significant body of theoretical research, providing a clear picture of the computational benefits of nondeterminism and ambiguity in terms of automata size.

7.2. Limitations & Future Work

The authors explicitly outline several limitations in the current state of research and suggest promising avenues for future work:

State Complexity of Unbounded Nondeterminism: Very little is known about the state complexity of NFAs where the amount of nondeterminism is unbounded and measured as a function of input length. This is a significant gap compared to the detailed results for finite nondeterminism.
Succinctness Comparisons for Unbounded Branching/Tree Width: There are currently no known succinctness comparisons between NFAs of different unbounded branching or unbounded tree width. This means it's unclear if different growth rates for these unbounded measures yield similar super-polynomial separations as seen with ambiguity.
Application of Communication Complexity Techniques: The authors suggest exploring whether the powerful communication complexity techniques (successfully used for ambiguity separations by Hromkovič et al.) can be applied to establish good lower bounds and separation results for NFAs where branching or tree width is measured as a function of input length.
Succinctness Comparison: Ambiguity vs. Nondeterminism Measures: A further topic of interest is comparing the succinctness of NFAs based on their degree of ambiguity against those based on their branching or tree width. Only a few tentative results exist in this direction.

7.3. Personal Insights & Critique

This paper serves as an invaluable resource for anyone studying the descriptional complexity of finite automata.

Clarity and Structure: The paper's strength lies in its clear definitions and systematic presentation of complex theoretical results. For a beginner, the carefully defined terms and the structured survey approach make an otherwise daunting field much more accessible.
Identification of Gaps: The explicit outlining of open problems is a critical contribution. It not only summarizes what is known but also effectively guides future research by highlighting the frontiers of knowledge in this area. This encourages new researchers to tackle these challenging questions.
Theoretical vs. Practical Impact: While the results are highly theoretical, quantifying the inherent succinctness of nondeterminism and ambiguity has fundamental implications for understanding the limits of computation and the trade-offs between computational models. For example, knowing that determinization can cause exponential blow-ups informs the design of algorithms that operate on NFAs.
Complexity of Quantification: The array of nondeterminism measures (guessing, branching, trace, tree width) and their subtle differences underscore the inherent complexity of nondeterminism itself. The relationships between these measures and ambiguity (e.g., Theorem 15) reveal that these concepts are deeply intertwined in non-obvious ways.
Opportunity for Communication Complexity: The suggestion to apply communication complexity techniques to unbounded nondeterminism measures is particularly intriguing. This powerful tool has proven successful in other areas of computational complexity and could unlock significant breakthroughs in understanding the state complexity of NFAs with function-of-length nondeterminism.

One potential area for deeper exploration (though outside the scope of a survey) could be the practical implications or approximate scenarios where these theoretical worst-case bounds are observed. For a beginner, understanding why a specific language constructed to demonstrate an exponential blow-up is "hard" to determinize could provide deeper intuition beyond just the mathematical proof. However, the paper's focus is clearly on the theoretical landscape, and it executes that mission flawlessly.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

Ambiguity, Nondeterminism and State Complexity of Finite Automata

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~23 min read · 30,169 chars

1. Bibliographic Information

1.1. Title

1.2. Authors

1.3. Journal/Conference

1.4. Publication Year

1.5. Abstract

1.6. Original Source Link

2. Executive Summary

2.1. Background & Motivation

2.2. Main Contributions / Findings

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

3.2. Previous Works

3.3. Technological Evolution

3.4. Differentiation Analysis

4. Methodology

4.1. Principles

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Basic Definitions and Notations

4.2.2. Degree of Ambiguity

4.2.3. Nondeterminism Measures

4.2.4. NFA Class Separations

5. Experimental Setup

5.1. Datasets

5.2. Evaluation Metrics

5.3. Baselines

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Degree of Ambiguity: Characterization and Complexity

6.1.2. Ambiguity and State Complexity Separations

6.1.3. Limited Nondeterminism: Growth Rates and Complexity

6.1.5. Limited Nondeterminism and State Complexity Separations

6.2. Data Presentation (Tables)

6.3. Ablation Studies / Parameter Analysis

7. Conclusion & Reflections

7.1. Conclusion Summary

7.2. Limitations & Future Work

7.3. Personal Insights & Critique

Similar papers