Unixxdos

Download last version: unixxdos 1.0

Page created on 2010-11-25 by André Gillibert

License

Unixxdos is free software, released under the WTFPL license.

Introduction

Unixxdos is a conservative alternative to dos2unix and unix2dos. It provides a symmetrical operation, so that:

Behavior

Any LF not preceded by a CR becomes a CRLF.

A CRLF sequence becomes a LF unless it's preceded by another CR.

A CRCRLF sequence is kept unchanged.

A CR not followed by a LF is kept unchanged.

Base rules:
CR		-> CR
CR LF		-> LF
LF		-> CR LF
CR CR LF	-> CR CR LF

Significant samples:
LF LF		-> CR LF CR LF
LF CR		-> CR LF CR
LF CR LF	-> CR LF LF

Proof

A binary blob passed twice through unixxdos is unchanged.

Lemmas: Any sequence that's terminated by LF is still terminated by LF after transformation. That's pretty obvious from the four base rules. Any sequence that's terminated by CR and a non-CR non-LF char is also transformed into such a sequence.

It's not hard to see we can cut any file in three types of chunks: "(?<!\r)\r*\n" (type N chunk) sequences, "\r+([^\r\n]|$)" (type R chunk) sequences and "[^\r\n]" characters (type C chunk). Note: (?<!\r) is a negative perl5 look-behind assertion which means that the specified sequence is not preceded by a \r (see perlre(1)).

It's easy to see that the program will transform each chunk independently of others so that unixxdos(chunk1 . chunk2) is equal to (unixxdos(chunk1) . unixxdos(chunk2)) where dot is string concatenation. That's trivial if one of the chunks is a type C sequence, but, it's easy to see that it holds true if chunk1 is type N and chunk type R or the reverse, thanks to lemmas. Moreover, each sequence is transformed into a sequence of the same type.

Now, we just have to prove that unixxdos(unixxdos(chunk))=chunk. This is trivial for type C and type R chunks, but it's also easy to see it holds true for type N chunks.

Synopsis

unixxdos < dos.txt > unix.txt
unixxods < unix.txt > dos.txt

Invoking

unixxdos reads data from stdin and outputs transformed data to stdout. It recognizes no command line argument.