Statistical Packages

137

Solutions

317

Contributors

Statistical packages are software titles, such as JMP and GNU Octave, and programming languages, such as MATLAB, R and SAS, that are used to discover, explore and analyze data and suggest useful conclusions, either to learn something unexpected or to confirm a hypothesis. The field includes the design and analysis of techniques to give approximate but accurate solutions to hard problems in statistics, econometrics, time-series, optimization and 2D- and 3D-visualization. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.

Share tech news, updates, or what's on your mind.

Sign up to Post

Q: How to unite interaction columns from R emmeans package dynamically generated in R-Shiny?

I am building an R-Shiny app where I need to wrangle the output from the 'emmeans' package. However, in this interactive environment where many factors may be entered by the user, the single-tibble 'emmeans' output structure will vary with each run depending on the selections made. It could go from having only a single main effect to having multiple 3-way interactions (mixed with main effects and 2-way interactions) arranged in a wide format way.

For instance, assuming the user selects FctrA (with levels A and B) and FctrB (with levels C, D, and E), the interaction FctrA_FctrB will be automatically considered as well. When (~FctrA, ~FctrB, ~FctrA+FctrB) are submitted to 'emmeans', the output tibble is structured as follows:

- the leftmost side of the tibble contains FctrA results (levels, estimates, SE, df, CLs);
- the certermost block contains FctrB results;
- the rightmost side of the tibble contains the interaction results

So far so good except that FctrA levels columns is a single column, FctrB levels column is also a single column, but the interaction portion has its levels split into two columns, one with FctrA and one with FctrB.

The above issue impairs gathering, spreading, stacking of the separate blocks owing to the dimensional discrepancy.

My question is: How can I tell Shiny ('tidyr') to find those split interaction columns and concatenate them …
0
C++ 11 Fundamentals
LVL 13
C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

Hi I have a Dell Perc 740p RAID card I just put in a computer, the computer recognizes it but I can't get in with ctrl + R.  I see the post message, but this one dosen't say press ctrl + R to get into Bios.  it says...

"PowerEdge Expandable RAID controller BIOS copyright AVAGO Technologies"

Then Initializing virtual drives, then goes on it's marry way.

Question: is there another way to get into Bios?  Am I dooing something wrong?

Thanks all
0
hi,

I have a study on ETL tools and recently heard a lot of voice on just need to do ETL programming using script/coding, e.g. R and Qview, so is this means now ETL tools like MS SSIS is useless ?

what is the pros and cons on doing ETL logic by coding and ETL tools?

it seems now doing ETL code in container level with RESTFUL API already make ETL process can do load balancing, parallel execution and scale out (by container), is that correct ? so no need ETL tools any more ???

the new MariaDB X3 platform seems can even ignore ETL process as it can stream data directly form OLTP to OLAP, so not need ETL anymore?
0
Hello,

You can see I have commented out a couple of tables.  The  B.EMPLID = '4373198'
here does not have a residency - even with a left outer join he does not appear. What needs to be done for him to appear in the result set?

SELECT DISTINCT B.EMPLID  
 , T.FIRST_NAME_SRCH  
 , T.LAST_NAME_SRCH  
 , B.STRM  
 , B.DESCR  
 , B.CLASS_NBR  
 , B.SUBJECT  
 , B.CATALOG_NBR  
 , B.CLASS_SECTION  
 , B.ENRL_STATUS_REASON  AS STATUS
 , X.XLATSHORTNAME AS ENRL_STATUS_REASON  
 , B.ENRL_ACTN_RSN_LAST AS ActionReasonLastStatus  
  , T.PHONE  
 , U.EMAIL_ADDR  
 , B.ENRL_ADD_DT  
 , B.ENRL_DROP_DT  
 , A.ACCOUNT_BALANCE
  , VW.DESCR
 , VW.REF1_DESCR
 , (  
 SELECT O.COMMENTS  
  FROM PS_PERSON_COMMENT O  
 WHERE B.EMPLID = COMMON_ID  
   AND ADMIN_FUNCTION = 'SFAC'  
   AND CMNT_CATEGORY = 'FYI'  
   AND COMMENT_DT <= GETDATE()  
   AND COMMENTS IS NOT NULL  
   AND SEQ_3C = (  
 SELECT (MAX(SEQ_3C))  
  FROM PS_PERSON_COMMENT O2  
 WHERE O2.COMMON_ID = O.COMMON_ID  
   AND ADMIN_FUNCTION = 'SFAC'  
   AND CMNT_CATEGORY = 'FYI'  
   AND COMMENT_DT <= GETDATE() ))
   --,R.RESIDENCY
  -- ,S.SRVC_IND_CD
  -- ,MAX(SRVC_IND_DTTM)
  FROM PS_CLASS_TBL_SE_VW  B
LEFT OUTER JOIN XLATTABLE_VW X ON B.ENRL_STATUS_REASON = X.FIELDVALUE
LEFT OUTER JOIN PS_PERSONAL_DATA T ON B.EMPLID = T.EMPLID
LEFT OUTER JOIN PS_EMAIL_ADDRESSES U ON B.EMPLID = U.EMPLID  
LEFT OUTER JOIN PS_ACCOUNT_TOT_VW A ON B.EMPLID = A.EMPLID
LEFT OUTER JOIN PS_ITEM_SF_VW VW  …
0
Hi Experts

Could you give me an overall knowledge on how to use R language to obtain data from Facebook ?

Thanks in advance.
0
/usr/local/sbin/smsbox -v 4 /home/admin/web/mysite.com/kannel/kannel.conf
2019-05-29 08:17:42 [2871] [0] PANIC: Failed to open HTTP socket
2019-05-29 08:17:42 [2871] [0] PANIC: /usr/local/sbin/smsbox(gw_panic+0x145) [0x438f05]
2019-05-29 08:17:42 [2871] [0] PANIC: /usr/local/sbin/smsbox(main+0x128d) [0x40e7cd]
2019-05-29 08:17:42 [2871] [0] PANIC: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f7d88b9e445]
2019-05-29 08:17:42 [2871] [0] PANIC: /usr/local/sbin/smsbox() [0x40ead2]
[root@sms-api ~]# /usr/local/sbin/bearerbox -v 4 /home/admin/web/mysite.com/kannel/kannel.conf
2019-05-29 08:18:03 [2872] [3] PANIC: Could not open smsbox port 13001
2019-05-29 08:18:03 [2872] [3] PANIC: /usr/local/sbin/bearerbox(gw_panic+0x145) [0x47d8c5]
2019-05-29 08:18:03 [2872] [3] PANIC: /usr/local/sbin/bearerbox() [0x41b638]
2019-05-29 08:18:03 [2872] [3] PANIC: /usr/local/sbin/bearerbox() [0x47b41f]
2019-05-29 08:18:03 [2872] [3] PANIC: /lib64/libpthread.so.0(+0x7e25) [0x7fe3f9e54e25]
2019-05-29 08:18:03 [2872] [3] PANIC: /lib64/libc.so.6(clone+0x6d) [0x7fe3f8f99bad]
  i run command to see process active and  why this happend.

tcp        0      0 0.0.0.0:2525            0.0.0.0:*               LISTEN      779/exim
tcp        0      0 0.0.0.0:13000           0.0.0.0:*               LISTEN      2182/bearerbox
tcp        0      0 0.0.0.0:13001           0.0.0.0:*               LISTEN      2182/bearerbox
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN   …
0
Hi Experts

I am new to R Script and coding, just inherited some R Code that i which plots perfectly fine and gives the correct results. I want to have the ability to show the x and y values when a user hoovers the mouse cursor over the line.

I have found a link to a articles that describe how to carry out the necessary step but having no luck at all..
[url=" https://www.r-graph-gallery.com/124-change-hover-text-in-plotly/"]


my r code.
library(dplyr)
library(ggplot2)
library(survival)
library(survminer)
library(grid)
library(gridExtra)
library(plotly)
pc <- dataset
fstat <- pc %>% mutate(fstatus = case_when(OUTCOMETYPE=="Revised" ~ 1,TRUE ~ 0))
pmpa <- fstat %>% select(PRIMARYPROCEDUREID,PRIMARYTOOUTCOMEYEARS,fstatus,OUTCOMETYPE)
if(nrow(pmpa) < 4){
d <- pmpa %>% select(PRIMARYPROCEDUREID,PRIMARYTOOUTCOMEYEARS,OUTCOMETYPE) %>% mutate(INSUFFICIENTDATA = "Summary Results")
h = head(d[,2:4])
grid.table(h)
}else{
fit <- survfit(Surv(PRIMARYTOOUTCOMEYEARS,fstatus)~1,data = pmpa) 
ggsurv <- ggsurvplot(fit,
           ylab="Patient Analysis",
           xlab="Time (Years)",
           break.time.by = 1,
           xlim = c(0,max(fit$time)),
           surv.scale = "percent",
           legend.title = "Kaplan-Meier",
           legend.labs = "",
           risk.table = TRUE,
           fontsize = 3,
           font.tickslab = c(10, "plain"),
           risk.table.y.text = FALSE,
           fun = "event"
           )
ggsurv$plot <- ggsurv$plot + theme(plot.title = 

Open in new window

0
If you had to consider all the myriad of stock trading indicators out there that many novice and advanced traders alike base their trading on, and you had to group that huge number of indicators into the broadest possible major categories, what would those categories be?

I'm thinking there may only be two (2) categories off the top of my head:  Price and Volume.

What do you think?
0
Guys,
I like to know how and at what scenarios Pythons and R that embedded in sql server 2017 can really meaningful to be use? I'm still using sql 2014 and running sql reporting services to produces end report to customer  and I can says 100% of data analysis that we have performed  were generated from T-sql.
Actually I wonder how this R and Python in sql 2017 would assist me to speed up or create more meaningful data for customer as from my experiences T-sql is doing more than enough for me to provides even complex reports to end users.
If anyone  here are able to shed some lights I maybe have more explanation and reason to upgrade to sql 2017.
0
I've installed the rattle package and run this code.
library(rattle)
test <- c(1,2,3,4,5,6)
test
test2 <- binning(test,4,method = "quantile",ordered = FALSE)
test2

Open in new window


This is the output I get.

[1] 1.000000 1.916667 3.500000 5.083333 6.000000
Levels: [1,1.92] (1.92,3.5] (3.5,5.08] (5.08,6]

Open in new window

I understand that 3.5 is the median.  Where do 1.92 and 5.08 come from?
0
Rowby Goren Makes an Impact on Screen and Online
LVL 13
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

Hi,

I have a file with extension .dta

I need to analyse the data in excel

I've downloaded Stata and would like to know how to convert the .dta to a .xls or .csv

Thanks
Seamus
0
Microsoft R and SQL Server 2016

I installed Microsoft R on an existing instance of SQL Server 2016 and I cannot get the SQL Server Launchpad service to start.  Each time I try, I get "The request failed or the service did not respond in a timely fashion. Consult the event log or other applicable error logs for details."  In the application error log is says "A timeout was reached (30000 milliseconds) while waiting for the SQL Server Launchpad (MSSQLSERVER) service to connect."

I have tried everything I can find and no joy.  I read that if the R library is out of sync with SQL Server this can happen.  How can I tell if this is the case and if so how can I fix it?

Here is the version number of the Launchpad.exe file 13.0.1601.5
Here is the version number of the SQLSVC.dll is 13.0.5216.0

The @@VERSION for SQL Server is Microsoft SQL Server 2016 (SP2-CU3) (KB4458871) - 13.0.5216.0 (X64)   Sep 13 2018 22:16:01  

Any help would be greatly appreciated.

Jim
0
I ran into a problem in R Studio. I am trying to run a t-test and it says "grouping data must have exactly 2 levels". Anyone know how to do this t-test with our values? We are trying to use bar charts showing means with error bars by presenting the results in Excel. Also need to know how to segment results using ANOVA.
rstudio.png
0
Difficulty with one DC in a multi-site AD setup - Naming Context is in the process of being removed or is not replicated from the specified server
It appears that syncing FROM the master DC (schema, FSMO roles holder) TO the out-of-sync DC works without error, however the receiving DC cannot initiate a sync via GUI in AD Sites and Services nor can it via repadmin /replicate.

Promoted another server in the remote site to DC and was able to successfully get it working, so WAN / VPN / DNS appears to be working as expected.

Is there a way I can force the sync From the main to the out-of-sync DC and get it to pick back up again?
0
Hi there.

I have trouble how to do this in STATA: I have a dataset with response (BOR) for the patients. The variable BOR can be CR, PR, SD, PD or NE. Each patient can be in either arm A or B. I need to make a table in STATA with the distribution of BOR on arm A and B, and I also need to calculate a p-value (log-rank) and Hazard ratio for each value of BOR.
Anyone who can tell me how to do this in STATA? If not STATA, when maybe the codes in an altertative statistics software? It's probably very much alike.
Thank you in advance.

Best regards

Ulrich
0
Are there any good machine learning libraries for .NET that may make it an alternative to Python and R?  I've heard of ML.NET, but am not sure how good it is and how committed Microsoft are to it.
0
I am trying to find out a task that automatically starts an application when a user login to the computer running Windows Server 2012 R.
But I do not where it is located.  With the Event Viewer, will we be able to find out how/where the task start from?
0
Hi Experts!

I am running into an error when using CPYFRMIMPF and numerics.    When I run the CPYFRMIMPF command I get these errors..  The copy did not complete for reason code 9. When I have the DDS fields all alpha numeric it works.

This is my CPYFRMIMPF code:

0049.00              CPYFRMIMPF FROMSTMF('/ZIP/ZIPFILE.CSV') +              
0050.00                           TOFILE(*CURLIB/&EDTF) MBROPT(*REPLACE) +  
0051.00                           RCDDLM(*LF) STRDLM(*NONE) +              
0052.00                           RMVBLANK(*TRAILING) RPLNULLVAL(*FLDDFT) +
0052.01                           RMVCOLNAM(*YES)                          
 
This is a sample of the data from the ZIPFILE.CSV

zip_code,distance,city,state
15090,99.162,"Wexford","PA"
15084,99.649,"Tarentum","PA"
15006,98.913,"Bairdford","PA"
15015,98.329,"Bradfordwoods","PA"

This is my DDS:

0001.00 0008 A          R ZIPCREC                   TEXT('ZIP RADIUS')
0002.00 0000 A            ZIP              5A         COLHDG('ZIP CODE')
0003.00 0000 A            ZDIST          4P 3     COLHDG('DISTANCE')
0004.00 0000 A            ZCITY        30A         COLHDG('CITY')    
0005.00 0000 A            ZSTATE       4A         COLHDG('STATE')  

Thanks for your help!!
0
Hi,

I'm currently working in AWS and trying to use a Lambda function to automate the creation of my AMIs. I'm doing this via the use of the Python script below, but when I test it it returns an error. Can anyone shed any light on what I should be looking at please?

Script:

import boto3
import collections
import datetime
import sys
import pprint

ec = boto3.client('ec2')
#image = ec.Image('id')

def lambda_handler(event, context):
   
    reservations = ec.describe_instances(
        Filters=[
            {'Name': 'tag-key', 'Values': ['backup', 'Backup']},
        ]
    ).get(
        'Reservations', []
    )

    instances = sum(
        [
            [i for i in r['Instances']]
            for r in reservations
        ], [])

    print "Found %d instances that need backing up" % len(instances)

    to_tag = collections.defaultdict(list)

for instance in instances:
    try:
        retention_days = [
            int(t.get('Value')) for t in instance['Tags']
            if t['Key'] == 'Retention'][0]
    except IndexError:
        retention_days = 7

    finally:

        #for dev in instance['BlockDeviceMappings']:
        #    if dev.get('Ebs', None) is None:
        #        continue
        #    vol_id = dev['Ebs']['VolumeId']
        #    print "Found EBS volume %s on instance %s" % (
        #        vol_id, instance['InstanceId'])

            #snap = ec.create_snapshot(
            #    VolumeId=vol_id,
      …
0
CompTIA Security+
LVL 13
CompTIA Security+

Learn the essential functions of CompTIA Security+, which establishes the core knowledge required of any cybersecurity role and leads professionals into intermediate-level cybersecurity jobs.

How to calculate linear regression in oracle plsql.
Please see the file attached.
C--Tanuja-Lake_IL-BRDs-linear-regre.docx
0
How do i get the attached data "normalised" -- made into normal distribution ?

Worksheet "Data" = Full Data
"Filtered 25-65" = data with 25-65 s filtered by "Data" and copied in this worksheet.

If i use 25-65, it has the best result - ie, less skewness but stil not into normal distribution.

Any idea to use that range such that the data plotted is normal distribution  ?
Data-25-65--Check-Normal--EE.xlsx
0
Hi,
I would like to prepare data for regression analysis.
I can prepare data in two forms.
a) values
b) rankings.
example:
values  60,30,25,90
rankings 2,3,4,1
Which format would be most suitable or does it not matter ?
many thanks
Ian
0
Hello,

I have  a list of lists.  The lists in the list of lists are file names.  I use lapply to read and merge the contents of each list in the list of lists (3 merged contents in this case  which will be the content of 3 files).  Then, I  have to change the name of the 3 resulting files and finally I have to write the contents of the files to each file.

 lc <- list("test.txt", "test.txt", "test.txt", "test.txt")
 lc1 <- list("test.txt", "test.txt", "test.txt")
 lc2 <- list("test.txt", "test.txt")
#list of lists.  The lists contain file names
 lc <- list(lc, lc1, lc2)
#new names for the three lists in the list of lists
 new_dataFns <- list("name1", "name2", "name3")
 file_paths <- NULL
 new_path <- NULL
#add the file names to the path and read and merge the contents of each list in the list of lists
 lapply(
    lc,
    function(lc) {
     filenames <- file.path(dataFnsDir, lc)
     dataList= lapply(filenames, function (x) read.table(file=x, header=TRUE))
     Reduce(function(x,y) merge(x,y), dataList)
     #   print(dataList)

    }
  )  

#add the new name of the file to the path total will be 3 paths/fille_newname.tsv.  
 lapply(new_path, function(new_path){new_path <- file.path(getwd(), new_dataFns)

The statements above work because lc and  new_dataFns are global and I can pass them to the lapply function

#Finally, I need to write the merged contents to the corresponding file (path/name.tsv).  I tried the following statement, but this …
0
Hi
I have one PDC server 2008R2 (D2R03Q02)  holding all FSMO roles and a second PDC server 2012R2 (PowerT130) who is not replicating any more since more than a month.

on PDC1 the command repadmin /showrepl shows no erros
on PDC2 the command repadmin /showrepl contains several errors

the netdom query FSMO shows all roles on PDC D2R03Q02

Connectivity: I can ping both servers

If I try to transfer FSMO to the second PDC PowerT130 I get the  ERROR The current Operations master is offline. The role cannot be transferred.
But the PDC D2R03Q02 is up and running and I can ping it from the second PDC.

Dcdiag show many errors and warnings on both PDC

Errors related to Ldap for example

or warnings like :

Warning: DcGetDcName(PDC_REQUIRED) call failed, error 1355

Warning: DcGetDcName(TIME_SERVER) call failed, error 1355
A Time Server could not be located.
The server holding the PDC role is down.
Warning: DcGetDcName(GOOD_TIME_SERVER_PREFERRED) call failed,
error 1355
A Good Time Server could not be located.

I attached The complet Dcdiag report dcdia.txt


DNSLINT command look good

DNSLint Report

System Date: Fri Jul 20 23:42:26 2018

Command run:

dnslint /ad 192.168.1.6 /s 192.168.1.7 /v

 Root of Active Directory Forest:

    Ecole.Schulz.Local
Active Directory Forest Replication GUIDs Found:
 
DC: D2R03Q02
GUID: 1a3677e0-7a77-413b-b70d-f0ede03ff7af

DC: POWERT130
GUID: …
0
All, I am preparing for MFE (financial engineering). I could not thin of any other forum where i could seek help for clearing my doubts on financial products.

Dear experts, is my below understanding correct?

Is  r>-1  coming from this statement?

&&&1 + r&&&

if 1+r>0, then r>-1
*****

Extract from Stochastic Calculus for Finance 1

We introduce also an interest rate 1^n. One dollar invested in the money

market at time zero will yield 1 + r dollars at time one. Conversely, one dollar

borrowed from the money market at time zero will result in a debt of &&&1 + r&&&

at time one. In particular, the interest rate for borrowing is the same as the

interest rate for investing. It is almost always true that r >= 0, and this is

the case to keep in mind. However, the mathematics we develop requires only

that r > -1.

Kindly guide
0

Statistical Packages

137

Solutions

317

Contributors

Statistical packages are software titles, such as JMP and GNU Octave, and programming languages, such as MATLAB, R and SAS, that are used to discover, explore and analyze data and suggest useful conclusions, either to learn something unexpected or to confirm a hypothesis. The field includes the design and analysis of techniques to give approximate but accurate solutions to hard problems in statistics, econometrics, time-series, optimization and 2D- and 3D-visualization. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.

Top Experts In
Statistical Packages
<
Monthly
>