Statistical Packages

57

Solutions

152

Contributors

Statistical packages are software titles, such as JMP and GNU Octave, and programming languages, such as MATLAB, R and SAS, that are used to discover, explore and analyze data and suggest useful conclusions, either to learn something unexpected or to confirm a hypothesis. The field includes the design and analysis of techniques to give approximate but accurate solutions to hard problems in statistics, econometrics, time-series, optimization and 2D- and 3D-visualization. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.

Share tech news, updates, or what's on your mind.

Sign up to Post

I have an XML to parse and load it in to a dataframe. The XML has duplicate tag so using

xmldataframe <- xmlToDataFrame( "C:\\Sample.XML") is not working and throwing an error saying
Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("C",  :
  duplicate subscripts for columns

When I remove the duplicate tags manually and try to execute it works. But the problem is I have huge real time XML, i couldn't correct all of them, because I couldn't find the duplicate tags.

1. Is there a way to find out duplicate TAG's so I can remove manually?
2. If there are duplicates can i have clubbed in to same column in the dataframe?

Here is the sample XML.  


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IesEnhancedAttributes>
    <EnhancedAttribute>
        <action>C</action>
        <cleiCode>SDDFDFDFD</cleiCode>
        <physicalDescription>Small Form Factor(SFF), (e.g., SFP, GBIC, XFP, XPAK)</physicalDescription>
        <height_metric unit="mm">8.6</height_metric>
        <height_english unit="in">0.339</height_english>
        <width_metric unit="mm">13.7</width_metric>
        <width_english unit="in">0.539</width_english>
        <depth_metric unit="mm">56.5</depth_metric>
        <depth_english unit="in">2.224</depth_english>
            <depth_english unit="in">3.333</depth_english>
        <weight_metric unit="NS"></weight_metric>
        <weight_english unit="NS"></weight_english>
        <MaximumPowerUsage …
0
Free Tool: Port Scanner
LVL 9
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Hello All Experts,
I am a student enthusiast in learning "Data Analytics" , which is the best platform to learn for FREE?
I want to Learn 'Data Science (Statistics)' & 'SAS/R' from scratch?
Any videos? Any websites? Any Blogs?

Thanks,

Regards,
Satish Kumar G N
0
I have what I thought was a well prepared dataset.  I wanted to use the Apriori Algorithm in R to look for associations and come up with some rules.  I have about 16,000 rows (unique customers) and 179 columns that represent various items/categories.  The data looks like this:

Cat1  Cat2  Cat3  Cat4  Cat5 ... Cat179
1,        0,       0,        0,      1,     ...  0
0,        0,       0,        0,      0,     ...  1
0,        1,       1,        0,      0,     ...  0
...

I thought having a comma separated file with binary values (1/0) for each customer and category would do the trick, but after I read in the data using:

>data5 = read.csv("Z:/CUST_DM/data_test.txt",header = TRUE,sep=",")

and then run this command:

> rules = apriori(data5, parameter = list(supp = .001,conf = 0.8))

I get this error:

Error in asMethod(object):
column(s) 1, 2, 3, ...178 not logical or a factor. Discretize the columns first.  

I understand Discretize but not in this context I guess.  Everything is a 1 or 0.  I've even changed it from INT to CHAR and received the same error.  I also had the customer ID (unique) in column 1 but I understand that isn't necessary when the data is in this form (flat file). I'm sure there is something obvious I'm missing - I'm new to R.

What am I missing?  Thanks for your input.
0
Hi All,
While using REF keyword in my logical file , i get compilation error - "Record name same as name of file being created"

DDS of LF -

*************** Beginning of data *************************************
                                            REF(ACCOUNT)                
                R USEREF                                                
                  ACCLVL    R               REFFLD(ACCLEVELID ACCOUNT)  
                  ACCORG    R               REFFLD(ACTORGCOD  ACCOUNT)  
                  ACCNUM    R               REFFLD(ACCOUNTNUM ACCOUNT)  
****************** End of data ****************************************

May i know why is that so ?
0
Issue is that when I set a different it doesn't update neither my texblock.Text nor my listbox.Items;

Help very appreciated:)

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices.WindowsRuntime;
using Windows.Foundation;
using Windows.Foundation.Collections;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Controls.Primitives;
using Windows.UI.Xaml.Data;
using Windows.UI.Xaml.Input;
using Windows.UI.Xaml.Media;
using Windows.UI.Xaml.Navigation;
using Windows.Services.Maps;
using Windows.Devices.Geolocation;


// The Blank Page item template is documented at https://go.microsoft.com/fwlink/?LinkId=402352&clcid=0x409

namespace New_World_Map
{
    /// <summary>
    /// An empty page that can be used on its own or navigated to within a Frame.
    /// </summary>
    public sealed partial class MainPage : Page
    {
       

     

        List<string> stringlist = new List<string>();

        public MainPage()
        {
            this.InitializeComponent();

            this.RightTapped += MainPage_RightTapped;

            mapscontrol.CenterChanged += Mapscontrol_CenterChanged;

            listbox.DoubleTapped += Listbox_DoubleTapped;

            listview.Items.Add("Zoom In");

            listview.Items.Add("Zoom Out");

            listview.Items.Add("Navigate North");

            listview.Items.Add("Navigate South");

 …
0
gnn
bhhd
0
Hi Experts,

 I am looking  for a data science project(using python) with complete source code and documentation , please help me with the same and will appreciate your help in this regard.  

Thanks,
SRK,
0
Team, need help resolving a laptop build that's continously failing at the bitlocker stage of task sequence, it's specific to just this model laptop, and I suspect it's related to some BIOS config.
Can you advise or direct me please,
Laptop Model = HP Elite X2 1012

______________________________________________________________________________________________________________________________________________
Error in logs:

... r
Initial TPM state: 55
Creating TPM owner authorization value
Succeeded loading resource DLL 'C:\Windows\CCM\1033\TSRES.DLL'
Taking ownership of TPM
uStatus == 0, HRESULT=80070005 (e:\nts_sccm_release\sms\framework\tscore\tpm.cpp,645)
pTpm->TakeOwnership( sOwnerAuth ), HRESULT=80070005 (e:\nts_sccm_release\sms\client\osdeployment\bitlocker\bitlocker.cpp,522)
InitializeTpm(), HRESULT=80070005 (e:\nts_sccm_release\sms\client\osdeployment\bitlocker\bitlocker.cpp,1313)
ConfigureKeyProtection( keyMode, pwdMode, pszStartupKeyVolume ), HRESULT=80070005 (e:\nts_sccm_release\sms\client\osdeployment\bitlocker\bitlocker.cpp,1552)
pBitLocker->Enable( argInfo.keyMode, argInfo.passwordMode, argInfo.sStartupKeyVolume, argInfo.bWait ), HRESULT=80070005 (e:\nts_sccm_release\sms\client\osdeployment\bitlocker\main.cpp,382)
'TakeOwnership' failed (2147942405)
Failed to take ownership of TPM. Ensure that Active Directory permissions are properly configured
ccess is denied. (Error: 80070005; Source: Windows)
0
write.csv(df,file="~C:/Users/anitha/Documents/social_media analysis/socialmedia/tweets.csv",row.names=FALSE,append = TRUE)
Error in file(file, ifelse(append, "a", "w")) :
  cannot open the connection
0
Its supposed to be a map guider an accurate gps for car by giving the accurate route through roads car must do.

underlined line is what debug shows it as wrong.

any other

using System;


using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices.WindowsRuntime;
using Windows.Foundation;
using Windows.Foundation.Collections;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Controls.Primitives;
using Windows.UI.Xaml.Data;
using Windows.UI.Xaml.Input;
using Windows.UI.Xaml.Media;
using Windows.UI.Xaml.Navigation;
using Windows.Devices.Geolocation;
using Windows.Services.Maps;


// The Blank Page item template is documented at http://go.microsoft.com/fwlink/?LinkId=402352&clcid=0x409

namespace App75
{
    /// <summary>
    /// An empty page that can be used on its own or navigated to within a Frame.
    /// </summary>
    public sealed partial class MainPage : Page
    {
        public MainPage()
        {
            this.InitializeComponent();

            button.Tapped += Button_Tapped;
        }

        private async void Button_Tapped(object sender, TappedRoutedEventArgs e)
        {
            BasicGeoposition b1 = new BasicGeoposition();

            b1.Latitude = Convert.ToDouble(startpositionlatitude.Text);

            b1.Longitude = Convert.ToDouble(startpositionlongitude.Text);

            BasicGeoposition b2 = new BasicGeoposition();
0
Free Tool: Subnet Calculator
LVL 9
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Hi,

I fairly new in R, I am doing some simple visualization in shiny app, I am trying to flip a bar chart downward using  scale_y_reverse() , it works well when I run my code in R console, but when I run it in shiny it does not flip the bar chart, below is my code in the server part:

output$trendbarPlot <- renderPlotly({
                              mydat <- mydatCopy %>% filter(Country ==input$Country)
                              

attacksbarplot = ggplot(data=mydat,aes(x=as.factor(Year))) + geom_bar() + theme_bw(base_size=35) + xlab("") + ylab("") + theme(axis.text.x = element_blank(), axis.ticks=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),panel.border=element_blank())  + scale_y_reverse()


attacksbarplotnol = ggplot(data=mydat,aes(x=as.factor(Year))) + geom_bar() + theme_bw(base_size=15) + xlab("") + ylab("") + theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),panel.border=element_blank()) +  scale_y_reverse()
 
                              })

attached file has the required flipped bar chart in shiny.

Does anyone knows how can I solve this issue?
FlippedChart.png
0
Hi,
 We have a common situation where we want to browse through a folder container 'n' number of files and execute a package created for each file so we can import each of the file into a corresponding table in our sql server db. The packages are all included in the project.

'For each' of these files in the folder, a script task identifies the package full path and sets it to a variable which I then try to set it to 'connection' on the expression of the package. Question is what should I be setting in expression, 'Connection' to full path of the pacakge or just 'PackageName' (to the package Name alone)?

When I try to set PacakgeName in the expression  - using project reference, the designer complains that "Failed to locate the specified package in the project".  We tried both "project reference and external reference" file system to no avail.

Should I be using Project Reference at all? Or Should I use external Reference and then file system which also doesn't seem to work? We are not deploying the packages to anywhere else right now.

Nothing seems to be changing the execution at run time. What is the right way to do this? What exactly should go in the expression to switch the packages dynamically?

Thank you.
0
My data:

Gage_number Latitude    Longitude   Date    Gage_1  Gage_2  Gage_3

1   35.02   -80.84  1/1/2002    0.23    0   0.7
2   35.03   -81.04  1/2/2002    0   0   0.2
3   35.06   -80.81  1/3/2002    3.2 2.1 0.1
This is just a subset of data. I around 50 gauge stations. I want to find spatial auto correction between my gauge stations for rain fall. Based on distance between them. I have created my distance matrix. But I don’t want to use any library in R. I want to do all steps in a function.

loc <- read.table("rain_data.txt",header=TRUE,fill=TRUE)  
gauge.dists <- as.matrix(dist(cbind(loc$Latitude, loc$Latitude))) #distance matrix
Now since distance between gauges is not uniform. I want to use a certain bin size to decide about distance lags.

Pseudocode:

If the distance between guage pair 1-2 is 1 meter then assign a distance lag of 1 and so on So Lag 1=intergage dist=1 meter. So Lag 5=intergage dist=5 meter After creating that matrix I will find autocorrelation between gauge pairs.

so for lag 1 intergage dist=1 for lag 5 intergage dist=5

Gage pair   date    RainA   RainB       Gage pair   date    RainA   RainB

1-2 1/1/2002    0.23    0       1-3 1/1/2002    0.23    0.7
1-2 1/2/2002    0   0       1-3 1/2/2002    0   0.2
1-2 1/3/2002    3.2 2.1     1-3 1/3/2002    3.2 0.1
I have a hard time translating it into loop or a function. Any ideas?
0
I am bit new to R so I am not sure if this is possible or if its more difficult than I am assuming.

Objective: I want to find the correlation between Diagnosis codes. If patient #1 has condition X what the likelihood he will at some point also have condition Y as well.

Here is what I have:
136,337 Unique patient IDs (74,527 Female, 61,810 Male)
34,442 Unique Diagnosis that exists in my population
7,777,728 Unique observations

So my 2 questions are:
1. How should I layout my Table for R?
Right now I have the table columns as :
ID, SEX, Diagnosis

2. What should my Rscript look like in order to create correlation coefficients between all my diagnosis codes.  

FYI: Yes I also have a time stamp per diagnosis code but adding it now would be to adding more confusion to the confusion I already have.
0
I have an excel file that I want to add a two new columns to and then group and sum the new and other columns in R Studio and save the output, not entirely sure how to do this.  

Adding two new columns:
if Sec_flag is "Y" then I want to add a new column called Sec_checked and put a 1 as the value
if stu_status is "Ret" i want to add another new column Stu_check and put a 1 as the value

Group & Sum
I would like to group the data by columns Year, Month, Stu_status, Point1, Point2 and Point3 and sum them by the values in stu_fee, stu_return_fee, student_count, Sec_checked and Stu_check.
Overtime I will add new data points to my excel file so I would like to be able to add these in future and get new groupings.

I tried using plyr but i dont know how to add the new columns and group & sum the data.
setwd("C:/Desktop/rtest")
system("java -version")

library(xlsx)
mydata <- read.xlsx("stu_d_sample.xlsx", sheetName = "Sample") 
mydata


library(plyr)
groupColumns = c("year","month", "Stu_status","Point1","Point2","Point3")
dataColumns = c("stu_fee", "stu_return_fee","student_count", "Sec_checked", "stu_check")
res = ddply(baseball, groupColumns, function(x) colSums(x[dataColumns]))
head(res)

Open in new window

stu_d_sample---Copy.xlsx
0
2 Questions about regression in R
 
  Question 1:
 
  Let's say I create a model that correlates the unique words found in a corpus to the number of lines read. Notice that this model compiles the logs of BOTH, the outcome and the predictor.
 
  x <- lm( log(Words) ~ log(Lines) )
 
  Does that mean that exp(predict(x,list(Lines=100000))) will give me the number of words for a given number of lines? Or will it give me the LOG of a number of words for a given number of lines?
 
  Question 2:
 
  How do I invert this model so that I can input a number of words, and get back a prediction for the number of lines required in order to obtain this quantity of words?
0
Hello all,
i have a situation where a common value out of available data is to be computed, but the data contains different summary stats, for example:
consider there are apples in different boxes and average size of the apple is to be determined, and the available data consists  of size mean  from one basket, standard deviation of size from other basket,min size and max size from other boxes, is there any way that a general value can be derived to represent size of the apple? 
0

Statistical Packages

57

Solutions

152

Contributors

Statistical packages are software titles, such as JMP and GNU Octave, and programming languages, such as MATLAB, R and SAS, that are used to discover, explore and analyze data and suggest useful conclusions, either to learn something unexpected or to confirm a hypothesis. The field includes the design and analysis of techniques to give approximate but accurate solutions to hard problems in statistics, econometrics, time-series, optimization and 2D- and 3D-visualization. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.

Top Experts In
Statistical Packages
<
Monthly
>