• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 626
  • Last Modified:

Random number based on percentage

Hi all,

I have a database full of orders from different countries. Based on the performance (inTime in percent) of the logistical provider, I want to set an attribute "inTime" to each of the orders. InTime yes/no should be randomized, with the percentage of yes being the same as the inTime percent of the transporter. Is there an easy way of doing this in T-SQL?

best regards,

henrik
0
henrikatwork
Asked:
henrikatwork
  • 8
  • 8
  • 2
1 Solution
 
HilaireCommented:
Do you need to do it for all the transporters ?
If so

-- 1) get Transporter ID + inTime percent in a cursor
declare C1 cursor for select  TransporterID,inTimePercent from ....
declare @TransporterID int, @inTimePercent int  -- eg 75 if pct is 75%
fetch C1 into @TransporterID, @inTimePercent
while @@fetch_status = 0
begin
   -- 2) set default value 0 for this transporter
   update orders set inTime = 0 where TransporterID = @TransporterID
   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10)) + ' order by newid())' )

   fetch C1 into @TransporterID, @inTimePercent
end
close C1
deallocate C1


There might be better solutions but it's hard to be more specific without knowing your datat model

The idea is to use newid() to update records randomly,
along with " top N percent " to update the only the relevant percentage of the records

I guess you'll be able to apply this idea to a single transporter if need be

HTH
Hilaire
0
 
ispalenyCommented:
Beauty of Hilaire's algorithm is in an exact result, you also get numbers very near inTimePct for a small number of records also.

The following query does a random distibution based on inTimePct of IDTransporter. So it is expected to match better with large number of records.

create table #Orders(IDOrder int primary key,IDTransporter int,inTime bit)
create table #Transporters(IDTransporter int primary key,inTimePct numeric(7,6))
insert #Orders(IDOrder,IDTransporter) values (1,1)
insert #Orders(IDOrder,IDTransporter) values (2,1)
insert #Orders(IDOrder,IDTransporter) values (3,2)
insert #Orders(IDOrder,IDTransporter) values (4,2)
insert #Orders(IDOrder,IDTransporter) values (5,2)
insert #Orders(IDOrder,IDTransporter) values (6,2)
insert #Transporters(IDTransporter,inTimePct) values (1,.5)
insert #Transporters(IDTransporter,inTimePct) values (2,.2)

-- inTimePct = 0..1 = 0..100%
update A
set inTime=case when B.inTimePct>=
((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
then 1 else 0 end
from #Orders A
join #Transporters B on A.IDTransporter=B.IDTransporter

select * from #Orders
0
 
HilaireCommented:
Oops missing join

   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10)) + ' order by newid())' )

should be

   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10))
       + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10))
       + ' order by newid()) b on a.OrderID = b.OrderID' )
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
henrikatworkAuthor Commented:
thanks, I'll try them out immediately
0
 
henrikatworkAuthor Commented:
I sadly have no influence on the data structure. There's no transporter ID (only transporter & country forming a unique identification in the table: tTransporter (transporter, country, intimepercent)

The orderline table doesn't have transporter (yeah, this data is created from flatfiles from some Dinosaur system from the 70 ies), which is why I have to randomize (based on intimepercent) wheter each orderline arrived in time or not. The connection between tTransporter and tOrder exists through district attribute. Table tOrder( product, ... ... district, inTime) (where inTime is right now null everywhere). A third table DistrictCountry holds the countries and the districts tDistrictCountry (District, Country). Any ideas of how to solve this? It's too complicated for me :(

cheers,

henrik
0
 
ispalenyCommented:
Run these test queries and post results:

select count(*) from tOrder where district is not null

select count(*) from tOrder A join tDistrictCountry B on A.district=B.District join tTransporter C on B.Country=C.country
0
 
henrikatworkAuthor Commented:
1. 1271369
2. 1354044

0
 
ispalenyCommented:
Some district caused 6% of duplicities, find it first.

select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District
having count(*)>1

You need to create a table mapping 1:1 transporter-district.
0
 
henrikatworkAuthor Commented:
Okey, so I cleaned the data and now those 6% missing countries are in. A mapping 1:1? How could I do that? Now order.fk_district is related to districtcountry.district and districtcountry.fk_countryid is related to country.countryID. Any hints on how to procede?
0
 
ispalenyCommented:
select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District
having count(*)>1

Now returns nothing?
0
 
henrikatworkAuthor Commented:
now returns nothing
0
 
ispalenyCommented:
Great, create the 1:1 mapping table.

select B.District,C.transporter
into DistrictTransporter
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District,C.transporter
0
 
henrikatworkAuthor Commented:
ok, what do I do with this table?
0
 
ispalenyCommented:
-- ok, what do I do with this table?
-- inTimePct = 0..1 = 0..100%
update A
set inTime=
case
  when C.inTimePct >= ((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
  then 1
  else 0
end
from tOrder A
join DistrictTransporter B on A.District=B.District
join tTransporter C on B.transporter=C.transporter
0
 
henrikatworkAuthor Commented:
Almost there!

The inTime column gets nulls and 1:s only :(

The inTime column (in transporter table) contains values between 62 and 100.

Any ideas? Please don't give up now, when only so little more help is what I need to finalize this!

cheers,

henrik
0
 
ispalenyCommented:
/*
>>The inTime column gets nulls and 1:s only :(
NULLs = unknown transporter
1 = known transporter (and unexpected inTimePct values above 1 (100%) )

When execution plan uses SORT step, function newid() can provide unpredictable results. This can avoided by rewriting UPDATE to SELECT INTO with primary key and using the result table in next UPDATE step with join on pk. At first try the following query.
*/

-- inTimePct 0..100
-- when transporter is unknown use default inTimePct 75%
update A
set inTime=
case
  when C.inTimePct is not null then
    case
      when C.inTimePct/100. >= ((abs(cast(cast(newid() as binary(16)) as int)) % 20000000) / 20000000.)
      then 1
      else 0
    end
  when 0.75                        >= ((abs(cast(cast(newid() as binary(16)) as int)) % 20000000) / 20000000.)
  then 1
  else 0
end
from tOrder A
left join DistrictTransporter B on A.District = B.District
left join tTransporter C on B.transporter = C.transporter
0
 
henrikatworkAuthor Commented:
Ispaleny, thank you VERY, VERY much!!!!

I'm sorry all I can give you is 500pts ( I sadly can't find the button to give you 5000 points, lol)

I wouldn't figure this out in a year...

If you get some time maybe you can answer: How can newid randomize based on the intime column? And why the 20000000?

Cheers,

henrik
0
 
ispalenyCommented:
Sorry, I must leave office now (CET 9:00PM). Post a note here tomorrow. Or post a new question for someone else.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

  • 8
  • 8
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now