Solved

SPSS selection some records and save it on another file

Posted on 2013-01-20
45
539 Views
Last Modified: 2013-01-23
Hello everyone. I want to ask question regarding SPSS or if we can do the same work in EXCEL.

suppose i have file which contains 500000 records and in spss it looks like this
v1    v2      v3
2      3        1
23    34      2
43    5        1
34    6        3

okay... so in V3 there are only 1, 2 and 3 numbers which means something to me.
i want to take 1000 records from v1, v2 which only belong to number 1 in v3. and save it on my desktop..
how can i do that?
thanks
0
Comment
Question by:reoroman
  • 23
  • 18
  • 4
45 Comments
 
LVL 29

Expert Comment

by:gowflow
ID: 38800201
well easy and several solutions for this in Excel can you post a sample of the file you have and I will write a macro for you ? If you do not want a macro in Excel you can use filtering and select in the col C which is V3 a tick on 1 and it will list all the rows that have 1 and you can simply copy the first 1000 but in this case a small macro is better.

Can you post a sample of this data so it would  be easier for you to see the result ?
gowflow
0
 

Author Comment

by:reoroman
ID: 38800268
Thank you for offering the help, kindly find the attached file.
i know how to do it manually, but that makes me sick, i need the script or macro. that would be great.
the text file is huge one. don't open it in notepad, it would take you 30 minutes, you can open it in excel very fast after importing it as comma delimited. also you can open it in notepad2. Thanks again
training.zip
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38800313
What Excel you prefer 2003 or 2007 ?
gowflow
0
 

Author Comment

by:reoroman
ID: 38800317
I have 2010.. so i think 2007 will do the work also,,
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38800377
yes anyway you hv large file and 2003 will not load the whole thing. Now I got it into 2007 I need to know what columns you want to sort on as you have many that are zeroes and many that have values and several that have 1 I will post a small sample and just higlight in yellow the ones that you want to use a filter we can use many and we can sort the whole thing as we want so feel free to laydown all that what you want so you get a nice finsih product that meet exactly your requirement.

gowflow
0
 

Author Comment

by:reoroman
ID: 38800420
ok... thank you so much for your help and for being here.
i will give an example.. so based on it you can send me the script which also i can modify for number of records and which class i want . what i mean by class is 1 or 2 or 3 or 4 or 5 which are in the last column.
suppose that i need 900 belongs to class 1, and 800 belongs to class 2 and 700 belongs to class 3 and 600 belongs to class 4 and 500 belongs to class 5.
so if the script allows me to modify the number of records (raw)s that i need to copy and from which class they belong to. that would be very great.
0
 

Author Comment

by:reoroman
ID: 38800424
sorry i forgot to say the last column pretenses the classes
0
 

Author Comment

by:reoroman
ID: 38800444
example...
example.JPG
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38800589
ok great let me ask more questions:

1) How often the data changes and how often oyu require this type of reporting/merging ??
the question here is to know If I developp the macro presuming that the data has been imported to Excel (which is easier) or if this is requested like very often twice a day on multiple requests and data keep on changing like the txt file would come at very freuent times like 3  4 times a day then we would developp the solution to read directly from the txt file.

2) In case we are working out of the Excel would these diffrent criterias that you reuqested be all in the SAME run or each one on a diffrent run
I explain: Would you require input of all these
suppose that i need 900 belongs to class 1, and 800 belongs to class 2 and 700 belongs to class 3 and 600 belongs to class 4 and 500 belongs to class 5.
at the same time to produce 1 output  Say all in the same File1 ???

OR

you would want 1 output for 900 belongs to class 1 say File1
800 belongs to class 2 say File2
700 belongs to class 3 say File3
600 belongs to class 4 say File4
500 belongs to class 5 say File5

let me know.
gowflow
0
 

Author Comment

by:reoroman
ID: 38800625
yes i need them at the same file output, not in different files, and it would be better if the file is CSV-(comma delimited) or excel and later i save it to csv.

data will not be changed, as i will work only on two files, the one i sent you and the other one similar to it.. classes will not be changed but records are different... so i think the same coed will do the work. all i have to do is to change numbers of records that i want to export them every time. The classes are static. Thanks
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38800629
ok by changes I mean you have a system that produces the whole data very often like twice a day you hv a new set of the training.txt you send me ??? or you have addidtional lines to add to the existing one you send ??
gowflow
0
 

Author Comment

by:reoroman
ID: 38800650
no...just these data .. there will not be any system that will produce another data or any modification to the data i sent you.. i will not add any lines to it.. all work will be on these data only. the other file name is out.txt and it contains same classes with different rerecords, in this case i can only rename the file name in the script from training to out. thanks
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38800666
ok will deal with out.txt later.

More questions: (I am a pain sometimes ... :)

1) Is it a possibility to sort the file by the Class ?? like all the 1 then all the 2 then etc .... makes it easier for handling large data.

2) If sorting is fine what would be the next col to sort by so we don't mess the sort initial sequence ?

3) The product that comes out you wanted in a .csv file you said or a .txt ? for sure it will be in a diffrent file than the one we are working on.

gowflow
0
 

Author Comment

by:reoroman
ID: 38800680
for the first Q: yes we can sort it by class, no problem. sorting is fine

Q2: I care only about sorting the last column (classes), so it is only one. no more.

Q3 yes please i want it in CS-comma delimited.
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38800694
ok fine no more questions pls allow me some time to work on it as code is pretty straight forward but need to test the speed on this amount of data that I think will be the main challenge.

Will revert.
gowflow
0
 

Author Comment

by:reoroman
ID: 38800695
okay, take your time. thanks a lot.
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38801410
ok reoroman here is it.

1) Load this file it is a macro excel 2007 file that contain a 2 sheets
Main and
Training

Training has a small amount of rows (1000 or so) just to be able to upload here.
you should import your Training.txt into this workbook onto a new sheet and call it Training you can delete this training that is in here. after you import it into this workbook SAVE the workbook and you are ready to go.

2) Activate macroes in this workbook.
3) here is how it work in sheet main you need to input whatever you want to request the macro to perform in Col A and B (leave Column C,D,E for the macro to give you a trace of what it is doing)
So when you input all the data that you want in Col A and B hit the button and answer the questions once done you will have the result in a separate file that will be opened behind this one view the data and it will be save into the same directory as the existing workbook.

4) Once your export is finished save the csv file created and close your macro file saving the data. Becoz your file is big some 116mb or so it will take some time when you save it.

5) Anytime you need to produce a new file open it again and run it it will always save it with today and actual time it has run so it does not overlap and you always have the latest version.

Give me your comments so I can advise you how to do with the other file you have.
gowflow
Training-sample.xlsm
0
 
LVL 9

Expert Comment

by:richdiesal
ID: 38801434
If you still want to do this in SPSS, you can do it in one step.  Under the Data menu, select Select Cases.  Click If condition is satisfied.  Select Delete unselected cases at the bottom and then the If button.  Enter whatever criterion you want to use (for example: v3 = 1).  Hit OK twice and Save with a new file name.
0
 

Author Comment

by:reoroman
ID: 38801786
I think you misunderstand me... the code doesn't work as it should be...
look...

there are 42 columns .. okay.. the last one is the description of data which i give it number value (and we agreed to name is as class).  so every class has a certain amount of records which are :
1 = 97278
2 = 4107
3 = 391458
4 = 52
5 = 1126

say i want to filter the last column (number 42) and check only class number 4 i should see only 52 records.. so if i need only 30 records (raws) from this class i can copy them and past them in new excel sheet,  and so on for the rest of all classes.. please check your script and advice me if it's okay.
0
 
LVL 9

Expert Comment

by:richdiesal
ID: 38801893
Reoroman, the technique I described above for SPSS will do what you're describing in one step - no macros/programming required.  If you need to filter on multiple categories, you can do so by adding additional terms to the If.
0
 

Author Comment

by:reoroman
ID: 38801897
dear richdiesal, yes i am interested doing it in SPSS,

your method is to reduce certain rows to any number in any column. this is not what we are talking about.

what i need is to choose certain rows which belongs to certian value in the last column.
example in spss, i have a 42 columns. okay
v1 v2 v3 ....v42
0    0   2 ...  1          so all existing value of v42 are only (5 numbers, (1,2,3,4,5)) so what i need is to get a for example 1000 records belong to number 2. number 2 has 4107 records. and to get 30 records from number 4, which has 52 records and so on...

thanks
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38801907
did you try my code ???
It does automatically what you input in Col A and B !!!! I tested here it is working very fine.

Well this was my next question when you have 9254 rows or something like that how will you choose the rows ? manually ? as the macro is always picking the first say you want 300 rows it will pick the first 300 rows I was going to propose from row to row but first wanted your input on this first attempt does it work fine there and does it copy the rows you put with the class in col A and B of main ???

lets do it 1 step at a time.
gowflow
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:reoroman
ID: 38801976
to : gowflow
Thank you so much the code works like a champ. also i want to know from richdiesal about SPSS method before awarding the points. you done great job. :-)

by: richdiesalPosted on 2013-01-21 at 18:55:30ID: 38801893

Click If condition is satisfied.  Select  Delete unselected cases at the bottom and then the If button.  Enter   whatever criterion you want to use (for example: v3 = 1).  
i did this then what?? please write more details, guide me through it step by step. i am new to SPSS. Thanks
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38801992
Sorry I cannot assist you in SPSS as do not know about this software what does SPSS stand for ??
gowflow
0
 
LVL 9

Expert Comment

by:richdiesal
ID: 38802837
Hi reoroman -

I'm having a little trouble following your English, so I'm sorry if this isn't correct. But I'm interpreting your problem like this:

1) You have many variables.  One of those variables (lets say a variable called v42) is a grouping variable (for example, it indicates some cases belong to Group 1, some to Group 2, some to Group 3, some to Group 4).

2) You want to delete all cases belonging to Groups 1, 2 and 3 in order to produce a single dataset containing only data from Group 4.

3) To do this, open the dataset and select Select Cases.  This opens the Select Cases dialog, which allows you to specify which cases will be used in subsequent analyses.  SPSS has built in functionality to analyze only portions of a dataset without deleting cases.  However, you said you wanted to delete those cases (i.e. make a new dataset), so that's what I'm explaining.  To do this, check Delete unselected cases at the bottom.  This will remove unselected cases from the datsaet.

4) Click the If condition is satisfied radio option.  This is telling SPSS to "select" only cases (you are calling them "rows") that meet whatever you specify in the "If" dialog.

5) Click the If button under If condition is satisfied.

6) This opens the If dialog box, which allows you to specify which cases are to be selected.  If you only wanted cases that belong to  Group 4 in variable v42, you would type: v42 = 4

7) Click OK, and then click OK again.  The dataset that is showing should now only contain the cases you want.  File / Save As to give it a new name or you'll overwrite your original dataset.

It's important to remember that SPSS doesn't treat cells independently - it is NOT a spreadsheet program like Excel.  When you Select Cases based upon one variable, it always selects all values within the entire cases you have indicated.
0
 

Author Comment

by:reoroman
ID: 38803704
here are the list of records which belongs to each group of these 5 groups.
1 = 97278
2 = 4107
3 = 391458
4 = 52
5 = 1126

when i do what you said for v42=4, that would delete all other groups in v42 columns and keeps only all rows that belong to group 4 which are 54 rows or records.

that is not what i want.

I need to be able to choose among v42=4 say for example 20 rows (records), not all 52 rows (records). also i need all other groups to be there, not to delete them, or/and i have the right to choose among every group records.. say i want 30 records from every group of those 5 groups.

to be more clear ,,, this is this is the input:
1 = 97278
2 = 4107
3 = 391458
4 = 52
5 = 1126

i need this output

1 = 30
2 = 30
3 = 30
4 = 30
5 = 30
0
 

Author Comment

by:reoroman
ID: 38804015
Dear gowflow,

The code is fine, but I have an issue here, can you please help with it.

As we mentioned earlier that your code will take a certain records which belongs to certain groups (classes), and save it as (output1).  
For example, Say that I need to the code to expert 30 records from every class of 5 classes which are (1,2,3,4,5) and repeat this operation for 3 times. So the problem is the same records will be duplicated in the different 3 output files.

In this case I need you to modify the code so every time it export records, it deletes them from the main training file, so that when I repeat the code again, the second output (output 2) will be with a new records, not repeated (duplicated) as in the previous output1, and so on for the output 3.

Many thanks in advanced..
Hope you can help me on this. :-)
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38804204
yes no sweat. Are you sure you want to delete them ???? if there would be an other way we can do this by putting a flag on the column 43 so if it is imported not to be imported again will it be ok ???
I ususally no favor deleting real data !!!
Let me know
gowflow
0
 

Author Comment

by:reoroman
ID: 38804245
wow... can we really do this...oh please do this.. i am waiting to see it
0
 
LVL 29

Accepted Solution

by:
gowflow earned 500 total points
ID: 38804305
ok here it is and follow the instructions to the DOT or you may mix up your data.

1) Make a copy of your current workbook that contain the data and the macro and give it a new name say backup_Training.xlsm. (This is a backup in cas somehting goes wrong we can go back to your present data)
2) Open your workbook.
3) We need to make a small change to the training sheet. goto row1 last column now it is AP you have the class.  If you noticed the first row I had put  header1 header2 header3 etc.... till header 42 (I hope you have this in your workbook) if you do not have this you will need to insert a row in row1 and type say V1 in cell A1 or header1 whatever suits you and drag it till Col AQ you should have there V43 or Header43 or whatever it is better to have the last digit a number so when you drag them they increment and you know the column number this way.
4) Once done with this SAVE the workbook (don't close it). We will now update the code.
5)  goto VBA or press ALT F11 you will see in the left pane module1 doubleclick on it and from the Edit menu selectt all and delete all the code that is in module1.
6) Paste the below code into module1. To do this in this code window here at the bootom of the window there is a button called Select All press on it and it will highlight all the code put the mouse on top of the code and right click choose copy then paste it in the blank area of module1 in your workbook.


Sub OutputData()
Dim WS As Worksheet
Dim WSTemp As Worksheet
Dim WSTR As Worksheet
Dim WBOut As Workbook
Dim WSOut As Worksheet
Dim WSMaxRow As Long, WSTRMaxRow As Long, WSTRMaxCol As Long, WSOutMaxRow As Long, I As Long, J As Long
Dim RngFilter As Range
Dim RngOut As Range
Dim swbPath As String, swbName As String
Dim Start, Finish

'---> Set Local Variables
Set WS = ActiveSheet
WSMaxRow = WS.UsedRange.Rows.Count

Set WSTR = Sheets("Training")
WSTRMaxRow = WSTR.UsedRange.Rows.Count
WSTRMaxCol = WSTR.UsedRange.Columns.Count
swbPath = ActiveWorkbook.Path
If Right(swbPath, 1) <> "\" Then swbPath = swbPath & "\"

'---> Create new Output Workbook
Set WBOut = Workbooks.Add
swbName = "Output - " & Format(Now, "dd-mmm-yyyy hhmm") & ".csv"
WBOut.SaveAs Filename:=swbPath & swbName, FileFormat:=xlCSV
Set WSOut = WBOut.ActiveSheet
WSOutMaxRow = WSOut.UsedRange.Rows.Count
WBOut.Windows(WBOut.Name).WindowState = xlMinimized
ActiveWorkbook.Windows(1).WindowState = xlMaximized

'---> Delete all sheets in output except WSOut
Application.ScreenUpdating = False
Application.DisplayAlerts = False
For Each WSTemp In WBOut.Worksheets
    If WSTemp.Name <> WSOut.Name Then WSTemp.Delete
Next WSTemp
Application.DisplayAlerts = True

'---> Start Preparing Requested Data
For I = 2 To WSMaxRow
    If WS.Cells(I, 1) <> "" And WS.Cells(I, 2) <> "" Then
        Start = TimeValue(Now)
        '---> Filter the Coresponding Class
        WSTR.Range(WSTR.Cells(1, 1), WSTR.Cells(1, WSTRMaxCol)).AutoFilter _
            field:=WSTRMaxCol - 1, _
            Criteria1:=WS.Cells(I, 2)
                    
        WSTR.Range(WSTR.Cells(1, 1), WSTR.Cells(1, WSTRMaxCol)).AutoFilter _
            field:=WSTRMaxCol, _
            Criteria1:="="

        '---> Compile the Data
        J = 1
        For Each RngFilter In WSTR.UsedRange.SpecialCells(xlCellTypeVisible).EntireRow
            If J > WS.Cells(I, 1) Then Exit For
            If RngFilter.Row <> 1 Then
                If Not RngOut Is Nothing Then
                    Set RngOut = Range(RngOut, RngFilter)
                Else
                    Set RngOut = RngFilter
                End If
                J = J + 1
            End If
        Next RngFilter
        
        '---> Copy the Data
        If Not RngOut Is Nothing Then
            RngOut.Copy
            WSOut.Range("A" & WSOutMaxRow).PasteSpecial Paste:=xlPasteValues
            RngOut.Columns(WSTRMaxCol) = "Y"
            WSOutMaxRow = WSOut.UsedRange.Rows.Count + 1
        End If
         
        '---> Update Trace
        Finish = TimeValue(Now)
        Application.ScreenUpdating = True
        If J - 1 = 0 Then
            WS.Cells(I, 3) = "Not Found"
            WS.Cells(I, 4) = J - 1
            WS.Cells(I, 5) = Format(Finish - Start, "hh:mm:ss")
        Else
            WS.Cells(I, 3) = "Done"
            WS.Cells(I, 4) = J - 1
            WS.Cells(I, 5) = Format(Finish - Start, "hh:mm:ss")
        End If
        DoEvents
        Application.ScreenUpdating = False
        
        '---> Remove Filter and clear variables
        WSTR.ShowAllData
        Set RngOut = Nothing
        Set RngFilter = Nothing
    End If
Next I

'---> Remove final filter
WSTR.AutoFilterMode = False

'---> Save workbook
WBOut.Save
Application.ScreenUpdating = True

'---> Give feedback
MsgBox ("A total of " & WSOutMaxRow & " rows of data have been exported to workbbok:" & Chr(10) & WBOut.FullName & " Successfully.")

End Sub

Open in new window



7) SAVE the workbook and close it.
8) Open it and give it a try

Let  me know
gowflow
0
 

Author Comment

by:reoroman
ID: 38804621
this is fantastic... thanks a lot, how can i find you in future?
0
 

Author Closing Comment

by:reoroman
ID: 38804659
100% excellent job. Thanks a lot.. Regards reoroman at hotmail
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38804918
Tks for your kind appreciation. I will keep this question monitored in case you need other help do not worry just drop a line in here and I will be glad to assist. I am glad you are statissfied with the result as it is my main aim.
Rgds/gowflow
0
 

Author Comment

by:reoroman
ID: 38804943
I am very very glad.. you are very helpful. thanks a lot ,,,
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38804955
your welcome my pleasure. Sorry we cannot communicate via hotmail is is against the rules in EE.
gowflow
0
 
LVL 9

Expert Comment

by:richdiesal
ID: 38805256
Oh, you wanted to select 30 cases randomly from within each category?  That is a stratified sampling strategy.  This is also a built-in function in SPSS.  But I am glad you got it figured out.  :)
0
 

Author Comment

by:reoroman
ID: 38806358
Dear gowflow, after i finished the work using the code, now i faced a problem that the software (3rd party) doesn't read classes from 1 to 5 ... but from 0 to 4... can you please send me a code that change the numbers ib last column:-):-):-)thanks
0
 

Author Comment

by:reoroman
ID: 38806376
So that number 1 will be 0 and 2 will be 1 and 3 will be 2, 4 will be 3,  and 5 will be 4
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38806758
what do you mean software 3rd party ? is it a macro in Excel ? can you upload it ?
gowflow
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38807430
so??
0
 

Author Comment

by:reoroman
ID: 38808353
hi, sorry i was sleeping..

well... the problem is after finishing the work using your macro, which is fabulous work.
because the exported data in CSV format i want to use it with software i have which only works with the classes 0,1,2,3,4  and data in the original excel sheet and exported CSV one is having 1,2,3,4,5 classes.
in this case the software i have will gives me error.. because it demands that classes should be from 0 to 4, not from 1 to 5.

now i need to convert all classes in the excel sheet before i process them using your macro.

so i can do it manually by using ctrl+H and replace all 1 class to 0 class... and 2 class to 1 class// but that is time consuming.. so can you please do a small macro that can do this instead of doing it manually.

Thanks
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38808736
Yes sure. No problem not for points but as this question threads is already too long I suggest you post a new question with the explanation as you just did as question and I will be glad to look at it.

Pls put the link here so I can assist you.
gowflow
0
 

Author Comment

by:reoroman
ID: 38808871
ok, sure thing, thanks
0
 

Author Comment

by:reoroman
ID: 38808904
0
 
LVL 29

Expert Comment

by:gowflow
ID: 38809023
Tks attended. I will stop monitoring this question as we have a new question that  I will keep monitoring. So any time you need help just put your comment in the last question we have.
gowflow
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Entering time in Microsoft Access can be difficult. An input mask often bothers users more than helping them and won't catch all typing errors. This article shows how to create a textbox for 24-hour time input with full validation politely catching …
I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now