Link to home
Start Free TrialLog in
Avatar of inzaghi
inzaghi

asked on

Reading encrypted bytes from a file

I have a file that contains the following

encrypteddata
followed by \r and \n
encrypteddata

I want to read the encrypted binary data into a byte array, how can this be done assuming that each line length could vary and also the encrypteddata could also contain \r and \n.  This has to be done line by line
Avatar of TimYates
TimYates
Flag of United Kingdom of Great Britain and Northern Ireland image

InputStream.read() until you encounter \r followed by \n?
Why are you posting this - we're dealing with this already? It's also different from what you posted before which was (from memory)

name#encrypteddata\r\n
Avatar of inzaghi
inzaghi

ASKER

I have posted this because the original question was closed and the other question which was posted Giant2 was not happy us discussing  the problem there.

I just thought it would be unfair not to allocate any more points.
OK - now you've done it, i'll stop posting in the 'first' one. You'd also better talk about how you were going to use an Oracle encryption function
I don't understand what you are discussing here.
I didn't post to this thread.
Why there is my nickName?
Because the Asker explains that he had posted the multiple quetsions because of you.
Everyone,
Calm down and let's see what's up with all these similar questions....
OK - question to inzaghi :

in what type are you holding the 'encrypted data'
Avatar of inzaghi

ASKER

The data is written to a file which consists of the following

Username#encryptedData
Username#encryptedData
Username#encryptedData

each line is followed by writeByte('\n') & writeByte('\r')
The encrypted data is encrypted using des3 and will be loaded into an oracle database
using external tables.

For my testing purposes I wish to read the file back in, how can I read line by line in?
Is there a potential for the encrypted data to contain \n and \r

This will be loaded into
Correction  (i see they're in a file):

are these in binary or ascii form?
Avatar of inzaghi

ASKER

in what type are you holding the 'encrypted data' > 
byte[]
Avatar of inzaghi

ASKER

all data is written as bytes, the username is also written as bytes
>>all data is written as bytes

OK. That won't work then. Can you explain why?
Avatar of inzaghi

ASKER

The reason why the encrypted data is written as bytes is because the des3 functionality in java encrypts the data as bytes.
I needed a way of writing the username together with the encrypted data to one file.
Thats the reason why I chose a DataOutputStream.

eg username#encryptedData

This data is then to be loaded into a oracle database.

Does this make sense?
>>The reason why ...

OK i know. I actually meant 'can you explain why it won't work'.

If the encrypted data is not large, this is what i'd do:

textFilePrintWriter.println(username + "#" + byteArrayToHexString(encryptedBytes));
Avatar of inzaghi

ASKER

The main idea is that the file will be loaded into a oracle database.

The reason why i wish to read the file back in is just for testing purposes hence the reason why i asked how to read each line by line
Having written the file as i mentioned it will be simplicity itself to reconstitute the data:

s = bufferedReader.readLine(x);
String[] atoms = s.split("#");
String username = atoms[0];
byte[] encryptedData = hexStringToByteArray(atoms[1]);

Avatar of inzaghi

ASKER

>>OK i know. I actually meant 'can you explain why it won't work'
I am not to sure why you are asking me this, you said previously that there would be a problem if the encrypted data contained \n or \r
>>I am not to sure why you are asking me this

Just making sure you know what i'm going on about ;-)
Avatar of inzaghi

ASKER

I think CEHJ you are write I am going to use BufferedWriter to write the data as strings I will encode the encrypted data using the class w3c.tools.codec.Base64Encoder.

When I load the data into a oracle table I then can decode the hex string into a byte array using UTL_ENCODE.BASE64_DECODE.

When the byte array is encoded in base64 I am correct in assuming it will not contain the character '#'?

IF i am not asking to much what characters does the encoded string contain?
>>When the byte array is encoded in base64 I am correct in assuming it will not contain the character '#'?

I'm afraid you can't assume that, but this is getting much closer to the correct solution. If you're using a Writer though, you should be able to choose a separator that minimizes the possibility of a 'collision'

>>IF i am not asking to much what characters does the encoded string contain?

Don't understand you there ...
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of inzaghi

ASKER

What I meant was when a byte array is encoded will it can only contain the following characters:
 A..Z, a..z,0...9, + and /.  Therefore how can it contain the # character.

If I am to base64 encode the encrypted data, is this safe?  ie I am assuming this data to the human eye is worthless
>>If I am to base64 encode the encrypted data, is this safe?  

Yes of course - it's already encrypted before base64.

Shall check on the other issue
>>Shall check on the other issue

No, you can't count on '#' not being included in the base64, as it can use any value in the range 32 to 95 (i.e. ASCII characters from SPACE to UNDERSCORE).

So use the method i mentioned. As it happens, if you use UTF-8, only the record separator should occupy more than one byte
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of inzaghi

ASKER

Hi CEHJ,

In your solution

userName + '\u241E' + base64EncodeIt(encryptedData);
what does UTF '\u241E' do? does this mean the encrypted data when encoded will not contain this character.

I dont think I understand the purpose of using UTF.
> userName + '\u241E' + base64EncodeIt(encryptedData);

thats unnecessary, seem my earlier comment.

>>what does UTF '\u241E' do?

It's the Unicode record separator

>>I dont think I understand the purpose of using UTF.

The purpose of UTF-8 is to ensure that there are never any problems encoding the name, even if characters like ß,â,ü etc. are included, and to encode the record separator itself

>>thats unnecessary, seem my earlier comment.

There's no point in pushing the length thing - i've already recommended that independently of your previous input and inzaghi doesn't want to do it ;-)


> There's no point in pushing the length thing - i've already recommended that independently of your
> previous input and inzaghi doesn't want to do it ;-)

not really up to you.

inzaghi,

Using my suggestion you don't need to worry about any encoding of any of your data, and can simpler store the bytes directly.
>>not really up to you.

That's precisely my point - it's up to inzaghi ;-)
> That's precisely my point - it's up to inzaghi ;-)

then don't comment, it doesn't help anyone.
Avatar of inzaghi

ASKER

I need to use encoding as the process to read the file will be external tables in oracle 9i.

Hi CEHJ,

In your earlier comment you said save the lot as UTF-8

userName + '\u241E' + base64EncodeIt(encryptedData);

how can that be done, is there some functionality on the BufferedWriter object?

Yes. A PrintWriter would be easier to use btw:

PrintWriter out = new PrintWriter(new OutputStreamWriter(new FileOutputStream("yourFile.txt"), "UTF8"));

> I need to use encoding as the process to read the file will be external tables in oracle 9i.

why exactly, what is the column type?
Avatar of inzaghi

ASKER

the column types are
String for the userName
Raw for the encrypted data

This will be loaded into a oracle table with the line separator set as above.
The string after the lineseperator field will be converted to a raw using the oracle libararies
And how is the loading going to be done exactly?

> The string after the lineseperator field will be converted to a raw using the oracle libararies

why?
I don't see the reason you need to encode the data just to insert it into the database, seems like a redundant step.
Avatar of inzaghi

ASKER

Because this was orignianlly encypted data using des3, which is then converted to a String using base64 encoding, we need to decode it back as the field in the table is of type raw
sounds so far what would be better would be to save each encrypted block to a (seperate) file, and at the same time generate a single script that inserts each row into database.
Then all you need do is run the script.
(or you could of course insert data into db directly yourself).
> which is then converted to a String using base64 encoding

yes but why are you encoding it in the 1st place :)
Avatar of inzaghi

ASKER

sounds so far what would be better would be to save each encrypted block to a (seperate) file, and at the same time generate a single script that inserts each row into database.

I need a way of linking the username to the encrypted block of data, there could be more than one encrypted block of data per username.  Each encrypted block requires one record in the table.

The reason why I am encoding it is because I then can write it to a file using the printwriter or bufferedwriter objects in java.  Therefore I can convert the data to strings and also use the newLine() methods to seperate each record in the file.

This can then be loaded into an oracle database or be read back in by an java process to whcih I then can tokenise each element of the line.
Sounds  to me like you're proceeding in the right direction. Make sure you doin't do unnecessary de/encoding steps though
For normalization purposes you may like to consider keeping the encrypted data in a separate table from the usernames (ids)
> I need a way of linking the username to the encrypted block of data

the import script would handle that

> The reason why I am encoding it is because I then can write it to a file using the printwriter or
> bufferedwriter objects in java.

you don't need to use a writer, binary data can be written directly without encoding using a stream

> or be read back in by an java process to whcih I then can tokenise each element of the line.

no need to tokenise when reading from db
so i still don't see any need for encoding, you're just making what sounds like a fairly simple task excessively complicated.
> For normalization purposes you may like to consider keeping the encrypted data in a separate table from the usernames (ids)

can we stay on topic please
Avatar of inzaghi

ASKER

The only encoding I wil do is on the encryted byte data when writing  to the file and decoding it before I load into an oracle table

Is this ok?
It is on topic. The topic is to help inzaghi
objects, if you've got an alternative way of doing this, please don't make nebulous statements like:

>>

> I need a way of linking the username to the encrypted block of data

the import script would handle that

>>

but provide a clear implementation direction
> The only encoding I wil do is on the encryted byte data when writing  to the file and decoding it before I
> load into an oracle table
> Is this ok?

its unnecessary, just save the encrypted data as is, and load that into oracle.
Avatar of inzaghi

ASKER

I would like to save the encrypted data as it is but how do I link it to the userName?
>>>>
The only encoding I wil do is on the encryted byte data when writing  to the file

>>its unnecessary

>>>>

Wrong. It's necessary - unless you use field length delimiting
> I would like to save the encrypted data as it is but how do I link it to the userName?

the generated import script would insert the userName along with the corresponding encrypted data.
Avatar of inzaghi

ASKER

Gentleman,

Thanks for all your efforts I have really appreciated your help and comments.  
I have decided to go with the method of using the printwriter and writing the data out as strings.  This will make life easier when loading this data into oracle as well as we have a record separator to use as the delimeter.

Question: When we load into oracle what do we specify as the record separator?
If we tokenise the string in Java what do we specify as the string tokenizer is it
'\u241E'
> This will make life easier when loading this data into oracle as well as we have a record separator to use as the delimeter.

Don't really agree but if thats what you want to do.
The approach I suggested above would allow you to import your data directly using SQL Loader, doesn't get much easier than that :)  And a lot safer than the approach you are looking at.

> If we tokenise the string in Java what do we specify as the string tokenizer is it

use whatever it is you use.
Why do yuou need to read the file again to tokenise it?
What are you actually going to use to import the data into oracle?
>>If we tokenise the string in Java what do we specify as the string tokenizer is it
'\u241E'

Yes.

final static String UNICODE_RECORD_SEPARATOR = "\u241E";

...

String[] atoms = s.split(UNICODE_RECORD_SEPARATOR);
Avatar of inzaghi

ASKER

>>Why do yuou need to read the file again to tokenise it?

Just for testing purposes to see if we can decrypt the data back in.


>> What are you actually going to use to import the data into oracle?

We are going to use the external table featrues of oracle 9i which is apparently better than sql loader. It allows files to be seen as tables.
Avatar of inzaghi

ASKER

what about in oracle when we use external tables feature  we need to specify the field delimeter?
> Just for testing purposes to see if we can decrypt the data back in

Be easier to do that check while you're processing the data.

> We are going to use the external table featrues of oracle 9i which is apparently better than sql loader.
> It allows files to be seen as tables.

Not sure I follow, do you have any doco on that?

>>what about in oracle when we use external tables feature  we need to specify the field delimeter?

I thought you were storing the encrypted data in the field (external table) separately from the username and that the concatenation of username + encrypted data is for testing purposes?
so are you creating a file to use as your external table, or are you going to insert it into your table?
Avatar of inzaghi

ASKER

Hi CEHJ,

we are loading the userName & encyrpted table into one table,
the primary key will be the userNAme+another field which may be generated from a sequence.
Suppose we have the following table
userNAme
id
encryptedData

the userName and id will be the primary key.

We will load the userName and encrypted data into the same table.  What do I specify as the field delimeter?
Avatar of inzaghi

ASKER

>> so are you creating a file to use as your external table, or are you going to insert it into your table?

will insert it into our table
> will insert it into our table

then again thers no need for encoding :)
you can insert your binary data directly.
>>What do I specify as the field delimeter?

It's not clear that you need one. Why would you if you're storing the data and username separately?
> What do I specify as the field delimeter?

Depends on how you are doing the bulk insert.
You can most likely pick whatever you like (that does not appear in your data).
eg. \t
Avatar of inzaghi

ASKER

but I still need a way of linking the username to that bit of encrypted data and also
there could be more than bit of encrypted data per user.  
The linkage would be done by the tables would it not?
Avatar of inzaghi

ASKER

The username and data are written to a file
as
userName + '\u241E' + base64EncodeIt(encryptedData);
userName + '\u241E' + base64EncodeIt(encryptedData);
userName + '\u241E' + base64EncodeIt(encryptedData);
userName + '\u241E' + base64EncodeIt(encryptedData);


when I buk load into oracle I have to specify the delimeter to separate the data so it is loaded into the correct colums.

In oracle how do I specify that  '\u241E' is my delimeter?
Try

FIELDS TERMINATED BY '\u241e'

in sql loader
tab should be fine.

what are you using to do the loading?
> FIELDS TERMINATED BY '\u241e'
> in sql loader

???
If that doesn't work, try

FIELDS TERMINATED BY 'x241e'

or

FIELDS TERMINATED BY x'241e'

(not sure of exact syntax)



The following is of importance for loading UTF-8

http://www.akadia.com/services/ora_sql_loader_utf8.html
Avatar of inzaghi

ASKER

Ok, one last thing if I was to use the dataoutputstream class to write my data
then I would use
dos.write(userName)
dos.writeByte('#');
dos.write(encryptedData)
out.writeByte('\r');
out.writeByte('\n');

Now if was to load this straight into oracle (without having to read the data back in with java) is it ok to specify my file delimeter as simply the '#' symbol.  I am assuming that the encrypted data will not contain the '#' delimeter!!

>>I am assuming that the encrypted data will not contain the '#' delimeter!!

As mentioned before - you *can't assume that. This is how i'd do it:

final String CRLF = "\r\n";

dos.write(userName.getBytes("UTF8"));
dos.write(UNICODE_RECORD_SEPARATOR.getBytes("UTF8"));
dos.write(encryptedData)
dos.write(CRLF.getBytes());
Avatar of inzaghi

ASKER

May be I don't understand,
the encrypted data is binary data returned from the des3 uitlity, if this may contain the # symbol, but I dont understand why it may not contain the UTF-8 record separator

I apologise for asking all these questions, but its just to make me understand
Avatar of inzaghi

ASKER

Any thoughts on my last posting
Sorry - somehow i missed your last question. The answer is that in utf-8 the separator is encoded as

E2 90 9E

so the probability of those 3 bytes occurring by accident is slim
Avatar of inzaghi

ASKER

I have decided in the end finally to go for the base64encoded option using utf charcacter transformation suggested by CEHJ.

I would like to thank you both for your efforts in helping me find the solution.

No problem
> Now if was to load this straight into oracle (without having to read the data back in with java) is it ok to
> specify my file delimeter as simply the '#' symbol.

If thats the case then you definitely don't actually need a delimiter, or any encoding.

Good luck with it :)