Study this page and 11.6( binary files) and 11.8 (stringstreams) and 11.9 (strstreams)
and write down the questions,
doubts, and surprises that you have on a piece of paper.
Assigned Work Due
Hand in a question, doubt or surprise and the page it happens on.
With your name. For credit.
Input
Down load [ example ] and save as source and then try these commands
The file command will make a good guess at the kind of file you have:
file example
The wc command will count the lines, words, and characters in a file:
wc -l example
wc -w example
wc -c example
The cat command outputs text files to the terminal (among other things).
cat exampleFor long files use more to see the file one screen at a time, and tap space for the next page, and 'q' to quit.
The od command will tell you precisely what is in a file -- character by character or byte by byte. It makes unprintable and "white space" characters explicit.
od -c example(Characters)
od -x example(hexadecimal bytes)
od -cx example(both).
The book uses direct access into a normal line-oreinted text file. I personally avoid designing programs and software that does this. I prefer handling irregular lines by one line after another... It is much simpler. But here is an example -- a program to read any 10 characters from a file: [ 11peek.cpp ] (Down load and try it.... there may be bugs).
Unformatted data is efficient because our program can go directly to any random place in the file: direct access or in IBM-speak "DASD" (dazz-dee). We have seek to move to a character in a random access file and tell to find out where the file was last at. Call them like this:
fstreamvar.tellp();Returns the position where characters will be put.
fstreamvar.seekp(position);Go to position to get characters in. The book shows the other versions: all are useful.
fstreamvar.tellg();Position for getting characters in.
fstreamvar.seekg(position);Go to position to get characters in. The book shows the other versions: all are useful.
Demonstration of seek and tell with formatted data [ 11seek.cpp ]
But to be able to use direct access your program must be able to calculate where the data is.
Start by designing the layout of the data in the unformatted file. Draw a simple picture. Tabulate the data and calculate where it is in a simple case. Work out formulas that calculate where the general data is and hence code for reading and writing it. Document any special constraints. For example the file in lab06 must always start with zero or more real records followed by zero or more blank records. Another example would be to require that the file is always sorted in a particular way. A third trick is to place data near a position calculated by a hash function, that way it is quick to find it again.
It is also wise to not mix formatted input/output with unformatted input output. As a rule use 'read' and 'write' to handle the data, Stick to the following:
fstreamvar.read(address_of_object, sizeof(object)).
fstreamvar.write(address_of_object, sizeof(object)).These copy bytes into and out of primary memory. Read copies them from disk into RAM and write does the reverse. No tranformation occurs, the bytes are copied as is. [ passwd.dat ] (file) [ 11read.cpp ] (read file) [ 11write.cpp ] (change file).
Another rule that preserves sanity: never, ever switch from reading to writing without first executing a seekp. Similarly when you have written some data and want to read from a direct access file, always take care to do a seekg before reading. However you can safely execute a series of reads (or a series of writes) and get each item from the file in turn.
A direct access file is a repository for pieces of RAM. We can store them, keep them and then read them back in again. But when read they are are not necessarily any where near their old address.
Never store pointers or pointer based data in an unformatted file. This means that you can not store any of the Standard Library classes (vectors, strings, deques, etc.) in objects that are written and read into unformatted files.
It is better to use a data base to handle persistant data than invent your own data format for a direct access file!
More Experience with Random Access Files
The fifth lab for this course worked nicely last time I taught this
course. But in 2007 the prgrams started to misbehave -- badly. First,
the compiler started whining about adding a "sizeof" to a "streampos".
Secondly, the listing program started to output garbage and terminated
with a "Segmentation" fault. I checked the old compiled
code and it worked. But the moment I recompiled the program
it went badly wrong -- accessing the 123,456th character in the
file (or some such silly number).
Reason: the updates to the Gnu compiler changed the internal coding of objects like the user password in the lab. I guess that it added typeid information at the start of the object. So, what my code though was a length was something else....
The solution was to reconstruct the "passwd" file from scratch using newly compiled programs thruout.
Now, I'm hoping that they do not upgrade the compiler before the next lab.
So, add another reason for avoiding direct access binary files.
The format of the data is compiler dependent.
stringstream and strstsream -- real cool stuff
[ tss2004.cpp ]
Handy for converting an argument to the main program into a numeric value. Also useful for interpreting a buffer of direct access data, or placing numeric data into character format in a direct access file.
If you have a program that needs to both read and write data to
a file use an fstream.
What is the differnce between fstream strstream and stringstream?
The first is attached to a file, the second to a char*, and the
last to a C++ string variable. Note -- strstreams are older than stringstreams
and may be removed in future versions of C++.
clear() and no flags are set?">What happens if you clear() and no flags are set?
Nothing.......... I think.
Do we have to learn all these new words for types?
Learn the pattern that underlies them, pick one item
from each line below:
WHy do I need to access the arguments of main?
(1) See the lab work. (2) WHen commands are input they have the name
of the program plus other data:
11peek name_of_fileand so these programs need to get this information. The operating system puts the arguments into the arguments of main.
. . . . . . . . . ( end of section CSci202 Computer Science II, Session 11 Files and Strings) <<Contents | End>>
Laboratory 6 Information Security
[ lab06.html ]
Next
Introduction to Containers
(chapter 12) +
[ 12.html ]
Including a quiz on exceptions, I/O, Files, Arguments to Main, ...
in next session