Tseitin Encoding and DIMACS Format [5 points]
Short description
Write a program which translates a description of a formula in NNF into a DIMACS CNF formula using Tseitin encoding.
Input format
The input file contains a description of a single formula in NNF using a very simplified SMT-LIB format following grammatical rules:
<formula> ::= `(' `and' <formula> <formula> `)'
| `(' `or' <formula> <formula> `)'
| `(' `not' <variable> `)'
| <variable>
Here <variable>
is a sequence of alphanumeric characters starting with a letter. Newline or a sequence of whitespace characters should be allowed wherever space is allowed.
See the example input files
Output format
The output should be a CNF description in the DIMACS format.
c
c start with comments
c
c
p cnf 5 3
1 -5 4 0
-1 5 3 4 0
-3 -4 0
- The file can start with comments, that is lines beginning with the character
c
. - Right after the comments, there is the line
p cnf nbvar nbclauses
indicating that the instance is in CNF format wherenbvar
is the maximum index of a variable appearing in the file;nbclauses
is the exact number of clauses contained in the file. - Then the clauses follow.
- Each clause is a sequence of distinct non-null numbers between
-nbvar
andnbvar
ending with0
on the same line - It cannot contain the opposite literals i and -i simultaneously
- Positive numbers denote the corresponding variables. Negative numbers denote the negations of the corresponding variables.
- Each clause is a sequence of distinct non-null numbers between
Use the part with comments to describe which variables in the CNF instance represent which original variables from the input formula. It should be also clear which variables are auxiliary (for gates) and which variable in the CNF correspond to the root node of the derivation tree of the input formula.
Extended Description
The typical invocation of your program should be
formula2cnf [input [output]]
Parameters input
and output
specify the input and output files respectively, if they are missing standard input should be read or standard output should be written to.
The program should allow an option which specifies if the Tseitin encoding should use equivalences or only left-to-right implications. (See also the problem number 5 in the first practical.