Link to home
Start Free TrialLog in
Avatar of dadadude
dadadude

asked on

File manipulation python (indexing and extraction)

I just wanted to check if my algorithm is right
that' s how my attached file looks like:
0      1818      1148
0      1818      1147
0      1819      1147
0      1818      1146
etc...
i have a list: [1,2,3,4,1,1] and the index is [ 0,1,2,3,4,5] so we can see that 0,4 and 5 belongs to the same group.

and that's how i read the file to extract the groups:

for i in range(0,len(classList)):
			classe = classList[i].getGraphemClass()
			count = 0
			x0 = []
			y0 = []
			
			while count < len(classList):
				if classList[count].getGraphemClass() == classe:
					for j in range(0,len(classList[count].getListOfGraphems())):
						nb = str(classList[count].getListOfGraphems()[j])
						
						for line in file:
							line = line.strip()
							if not line:
								break
							else:
								line = line.split()
								if line[0] == nb:
									x0.append(line[1])
									y0.append(line[2])
									
				count = count + 1

Open in new window

coordonnes.txt
Avatar of dadadude
dadadude

ASKER

is that code ok?
for i in range(0,len(classList)):
			x0 = []
			y0 = []
			file = open('C:\\Decomposition\\Analyse\\image139\\feat\\coordonnes.txt')
			for j in range(0,len(classList[i].getListOfGraphems())):
				classe = int(classList[i].getListOfGraphems()[j])	
				
				while 1:
					lines = file.readlines(100000)
					if not lines:
						break
					for line in lines:
						line = line.split()
						if int(line[0]) == classe:
							x0.append(int(line[1]))
							y0.append(int(line[2]))
					
						
			for c in range(0,len(x0)):
				x = x0[c]
				y = y0[c]
				self.im[x,y,0] = classList[i].getR()
				self.im[x,y,1] = classList[i].getV()
				self.im[x,y,2] = classList[i].getB()

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of pepr
pepr

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You re-read the file that many time how many classes you have.  This is very inefficient.  You should also close the open file.
i have 63 classes. and it's very slow.
and more than 4000 attribute each attribute has a list of coordinates (x,y) as u can see in the file.
at first i created 4000 file with x,y in them it worked fine but i was very slow.
it's the first time that i face such a problem.

I might be reading the file in a wrong way.
as you can see in the file that i posted at first i have 2943 element.
and i am still confused on which format should i use for the file.
should i use the one of the previous question or this one.
With the first question it worked perfectly. But it was very slow!!!!  and i was able to color the image.
Now with this algorithm it's not working although i go throught the File correctly i guess.
ok i cleaned the code. it's not reading all the classes!! it's just giving me one! i don't get it.
it's looks logical for me. but the results are weird. the indexing of the file is not working well.
original = 'C:/Decomposition/Analyse/image139/images/image139.png'
		im = imread(original,0)
		file = open('C:/Decomposition/Analyse/image139/feat/coordonnes.txt')
		for i in range(0,len(classList)):
			for j in range(0,len(classList[i].getListOfGraphems())):
				classe = int(classList[i].getListOfGraphems()[j])
				points = []
				for line in file:             
					lst = line.split()
					if len(lst) == 0:
						break              
					if int(lst[0]) == classe:
						  points.append( (int(lst[1]), int(lst[2])) )
								
				for c in range(0,len(points)):
					x = points[c][0]
					y = points[c][1]
					im[x,y,0] = classList[i].getR()
					im[x,y,1] = classList[i].getV()
					im[x,y,2] = classList[i].getB()

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ok here i am i'll post for u an image with colored segments: it will be all explained on it:
For the file format... the text format is fine.  You should choose the one that is easy to generate.  The earlier format may be better because it is less space consuming.  The disk speed probably limits the processing more than a bit more complex algorithm (which is not more complex, only looks so).

But "In the face of ambiguity, refuse the temptation to guess." -- try in Python interactive mode:

>>> import this
sorry wrong post earlier.ok here i am i'll post for u an image with colored segments: it will be all explained on it:
I circled a segments as an example: the circled segment will look like that in the file:

x:

X1    Y1
X2    Y2
.        .
.        .
Xn   Yn

so basically these coordinates will help me color the segments that belong to the same class with the same color
Thank you, for your help.

test.png
I will explain it in a better way:
Step 1:create classes:
i have a list (called book in the program) with [0,0,0,1,4,5,....] this book represents the classes with the indices the segments.

supposed that i have 9 classes: self.g3 represents the number of classes:

#code book construction
		classList = []
		#self.g3 = number of classes
		#self.code = contains the classes with the indices beings the segments
		for i in range(0,self.g3):
			#create random colors
			r = round(rand.uniform(0,255))
			v = round(rand.uniform(0,255))
			b = round(rand.uniform(0,255))
			listOfGraphems = []
			#loop throught the list (self.code) 
			#if self.code[j] == i (classe number)
			#add the index to the list.
			#i have a class called Book so i have all the list of segments in that class with the class ID in that case it is i
			for j in range(0,len(self.code)):
				if self.code[j] == i:
					listOfGraphems.append(j)
			classList.append(book(i,listOfGraphems,r,v,b))

Open in new window

then i move on the next time with i posted earlier to color the image.
as for imread: i load the RGB image that i want to color it.
I think that the code is much clearer now: didn't use indexing switching to python! thank u for this info it's great and way easier!!!!!!!
for c in classList:
			graph =  c.getListOfGraphems()
			r = c.getR()
			v = c.getV()
			b = c.getB()
			for i in range(0,len(graph)):
				classe = int(graph[i])
				points = []
				for line in file:             
					lst = line.split()
					if len(lst) == 0:
						break              
					if int(lst[0]) == classe:
						  points.append( (int(lst[1]), int(lst[2])) )
						  
								
					for p in points:
						x = p[0]
						y = p[1]
					
						im[x,y,0] = r
						im[x,y,1] = v
						im[x,y,2] = b

Open in new window

still can't color the segments!! WEIRD!!
Dear sir,
it worked!! with the other code that u posted!! as u said it was very simple!!!!!!!!!!!!! thank you sooo much for ur help.

Please can u tell me what will be ur algorithm just in simple on how to read this file if u were me.
and do u have advise me to use objects in lists as i am doing, because i find it very easy to manipulate objects, makes my work much easier.

I will also have to change how i iterate throught the other lists.

Thank you.
Sincerely,
Hani.
Solution:

for c in classList:
			wanted =  c.getListOfGraphems()
			r = c.getR()
			v = c.getV()
			b = c.getB()
			coord = []
			for r in wanted:    
				points = a[r][1]     
				for p in points:
					x = p[0]
					y = p[1]
					im[x,y,0] = r
					im[x,y,1] = v
					im[x,y,2] = b

		return im,self.variance

Open in new window

by the way i am working on an interactive genetic algorithm with a beautiful interface. i would like u to see the executable version when i finish it. It still have to finish some problems related to pyqt4. but they will  be solved i am sure.
Hi Hani.  You apparently own a kind of clever head (a bit younger than mine -- I guess from "ur newspeak").  I guess you are Italian, based on test.png and on "verde".  You may be a postgraduate student that tries the genetic algorithm approach to OCR.  I appreciate your politeness expressed formally.  Still, both we are just people on a kind of forum where informal behaviour is rather a norm ;)  To summarize: no need for "Dear sir" :)  I prefer the kind of "frienship" and not the "hierarchy" as all we have the things to accept from and to give to the others -- and one feels better among the friends than in a hierarchy.  Now back to the problem...

I strongly suggest not to use tab for indentation in your Python sources.  Even though it seems to be marginal problem, it could get worse later.  The reason is that tabs can be interpreted the way you do not expect.  They are not visible and if combined with spaces...  Python relies on indentation and with tabs combined with spaces it goes wrong very easily.  Use 4 spaces for one level.  Use no tabs in your sources.  This is a bit stronger recommendation than the "Python style guide" says but I do recommend it.

Please can u tell me what will be ur algorithm just in simple on how to read this file if u were me.

Do you mean how to read the data from the text file?  What format to choose?  Or do you mean the source file?  Let's continue in comments...


and do u have advise me to use objects in lists as i am doing, because i find it very easy to manipulate objects, makes my work much easier.

Definitely yes.  I prefer the Object Oriented approach whenever it is suitable.  And Python is extremely nice to express the approach.  Actually, every value used in Python is represented as an object.  Every variable is only a name coupled with an untyped reference to the object.  Every assignment means only copying the reference.  This way the list of integers is equally complex as the list of other objects.

For future, I do not use pyqt, so I will not help you there ;)
Thank you pepr for the comments. They were really helpful and solved all the problems. I also learned a new thing about lists and files so it's great. I'll keep you posted on my coding. When i finish it i'll post the executable and code.

Very good user