Link to home
Start Free TrialLog in
Avatar of henrikatwork
henrikatworkFlag for Sweden

asked on

Random number based on percentage

Hi all,

I have a database full of orders from different countries. Based on the performance (inTime in percent) of the logistical provider, I want to set an attribute "inTime" to each of the orders. InTime yes/no should be randomized, with the percentage of yes being the same as the inTime percent of the transporter. Is there an easy way of doing this in T-SQL?

best regards,

henrik
Avatar of Hilaire
Hilaire
Flag of France image

Do you need to do it for all the transporters ?
If so

-- 1) get Transporter ID + inTime percent in a cursor
declare C1 cursor for select  TransporterID,inTimePercent from ....
declare @TransporterID int, @inTimePercent int  -- eg 75 if pct is 75%
fetch C1 into @TransporterID, @inTimePercent
while @@fetch_status = 0
begin
   -- 2) set default value 0 for this transporter
   update orders set inTime = 0 where TransporterID = @TransporterID
   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10)) + ' order by newid())' )

   fetch C1 into @TransporterID, @inTimePercent
end
close C1
deallocate C1


There might be better solutions but it's hard to be more specific without knowing your datat model

The idea is to use newid() to update records randomly,
along with " top N percent " to update the only the relevant percentage of the records

I guess you'll be able to apply this idea to a single transporter if need be

HTH
Hilaire
Avatar of ispaleny
Beauty of Hilaire's algorithm is in an exact result, you also get numbers very near inTimePct for a small number of records also.

The following query does a random distibution based on inTimePct of IDTransporter. So it is expected to match better with large number of records.

create table #Orders(IDOrder int primary key,IDTransporter int,inTime bit)
create table #Transporters(IDTransporter int primary key,inTimePct numeric(7,6))
insert #Orders(IDOrder,IDTransporter) values (1,1)
insert #Orders(IDOrder,IDTransporter) values (2,1)
insert #Orders(IDOrder,IDTransporter) values (3,2)
insert #Orders(IDOrder,IDTransporter) values (4,2)
insert #Orders(IDOrder,IDTransporter) values (5,2)
insert #Orders(IDOrder,IDTransporter) values (6,2)
insert #Transporters(IDTransporter,inTimePct) values (1,.5)
insert #Transporters(IDTransporter,inTimePct) values (2,.2)

-- inTimePct = 0..1 = 0..100%
update A
set inTime=case when B.inTimePct>=
((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
then 1 else 0 end
from #Orders A
join #Transporters B on A.IDTransporter=B.IDTransporter

select * from #Orders
Oops missing join

   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10)) + ' order by newid())' )

should be

   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10))
       + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10))
       + ' order by newid()) b on a.OrderID = b.OrderID' )
Avatar of henrikatwork

ASKER

thanks, I'll try them out immediately
I sadly have no influence on the data structure. There's no transporter ID (only transporter & country forming a unique identification in the table: tTransporter (transporter, country, intimepercent)

The orderline table doesn't have transporter (yeah, this data is created from flatfiles from some Dinosaur system from the 70 ies), which is why I have to randomize (based on intimepercent) wheter each orderline arrived in time or not. The connection between tTransporter and tOrder exists through district attribute. Table tOrder( product, ... ... district, inTime) (where inTime is right now null everywhere). A third table DistrictCountry holds the countries and the districts tDistrictCountry (District, Country). Any ideas of how to solve this? It's too complicated for me :(

cheers,

henrik
Run these test queries and post results:

select count(*) from tOrder where district is not null

select count(*) from tOrder A join tDistrictCountry B on A.district=B.District join tTransporter C on B.Country=C.country
1. 1271369
2. 1354044

Some district caused 6% of duplicities, find it first.

select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District
having count(*)>1

You need to create a table mapping 1:1 transporter-district.
Okey, so I cleaned the data and now those 6% missing countries are in. A mapping 1:1? How could I do that? Now order.fk_district is related to districtcountry.district and districtcountry.fk_countryid is related to country.countryID. Any hints on how to procede?
select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District
having count(*)>1

Now returns nothing?
now returns nothing
Great, create the 1:1 mapping table.

select B.District,C.transporter
into DistrictTransporter
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District,C.transporter
ok, what do I do with this table?
-- ok, what do I do with this table?
-- inTimePct = 0..1 = 0..100%
update A
set inTime=
case
  when C.inTimePct >= ((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
  then 1
  else 0
end
from tOrder A
join DistrictTransporter B on A.District=B.District
join tTransporter C on B.transporter=C.transporter
Almost there!

The inTime column gets nulls and 1:s only :(

The inTime column (in transporter table) contains values between 62 and 100.

Any ideas? Please don't give up now, when only so little more help is what I need to finalize this!

cheers,

henrik
ASKER CERTIFIED SOLUTION
Avatar of ispaleny
ispaleny
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ispaleny, thank you VERY, VERY much!!!!

I'm sorry all I can give you is 500pts ( I sadly can't find the button to give you 5000 points, lol)

I wouldn't figure this out in a year...

If you get some time maybe you can answer: How can newid randomize based on the intime column? And why the 20000000?

Cheers,

henrik
Sorry, I must leave office now (CET 9:00PM). Post a note here tomorrow. Or post a new question for someone else.