henrikatwork
asked on
Random number based on percentage
Hi all,
I have a database full of orders from different countries. Based on the performance (inTime in percent) of the logistical provider, I want to set an attribute "inTime" to each of the orders. InTime yes/no should be randomized, with the percentage of yes being the same as the inTime percent of the transporter. Is there an easy way of doing this in T-SQL?
best regards,
henrik
I have a database full of orders from different countries. Based on the performance (inTime in percent) of the logistical provider, I want to set an attribute "inTime" to each of the orders. InTime yes/no should be randomized, with the percentage of yes being the same as the inTime percent of the transporter. Is there an easy way of doing this in T-SQL?
best regards,
henrik
Beauty of Hilaire's algorithm is in an exact result, you also get numbers very near inTimePct for a small number of records also.
The following query does a random distibution based on inTimePct of IDTransporter. So it is expected to match better with large number of records.
create table #Orders(IDOrder int primary key,IDTransporter int,inTime bit)
create table #Transporters(IDTransporte r int primary key,inTimePct numeric(7,6))
insert #Orders(IDOrder,IDTranspor ter) values (1,1)
insert #Orders(IDOrder,IDTranspor ter) values (2,1)
insert #Orders(IDOrder,IDTranspor ter) values (3,2)
insert #Orders(IDOrder,IDTranspor ter) values (4,2)
insert #Orders(IDOrder,IDTranspor ter) values (5,2)
insert #Orders(IDOrder,IDTranspor ter) values (6,2)
insert #Transporters(IDTransporte r,inTimePc t) values (1,.5)
insert #Transporters(IDTransporte r,inTimePc t) values (2,.2)
-- inTimePct = 0..1 = 0..100%
update A
set inTime=case when B.inTimePct>=
((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
then 1 else 0 end
from #Orders A
join #Transporters B on A.IDTransporter=B.IDTransp orter
select * from #Orders
The following query does a random distibution based on inTimePct of IDTransporter. So it is expected to match better with large number of records.
create table #Orders(IDOrder int primary key,IDTransporter int,inTime bit)
create table #Transporters(IDTransporte
insert #Orders(IDOrder,IDTranspor
insert #Orders(IDOrder,IDTranspor
insert #Orders(IDOrder,IDTranspor
insert #Orders(IDOrder,IDTranspor
insert #Orders(IDOrder,IDTranspor
insert #Orders(IDOrder,IDTranspor
insert #Transporters(IDTransporte
insert #Transporters(IDTransporte
-- inTimePct = 0..1 = 0..100%
update A
set inTime=case when B.inTimePct>=
((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
then 1 else 0 end
from #Orders A
join #Transporters B on A.IDTransporter=B.IDTransp
select * from #Orders
Oops missing join
-- 3) update flag for a random set of records up to expected PCT
exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = ' + cast(@TransporterID as varchar(10)) + ' order by newid())' )
should be
-- 3) update flag for a random set of records up to expected PCT
exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10))
+ ' percent OrderID from Orders where TransporterID = ' + cast(@TransporterID as varchar(10))
+ ' order by newid()) b on a.OrderID = b.OrderID' )
-- 3) update flag for a random set of records up to expected PCT
exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = ' + cast(@TransporterID as varchar(10)) + ' order by newid())' )
should be
-- 3) update flag for a random set of records up to expected PCT
exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10))
+ ' percent OrderID from Orders where TransporterID = ' + cast(@TransporterID as varchar(10))
+ ' order by newid()) b on a.OrderID = b.OrderID' )
ASKER
thanks, I'll try them out immediately
ASKER
I sadly have no influence on the data structure. There's no transporter ID (only transporter & country forming a unique identification in the table: tTransporter (transporter, country, intimepercent)
The orderline table doesn't have transporter (yeah, this data is created from flatfiles from some Dinosaur system from the 70 ies), which is why I have to randomize (based on intimepercent) wheter each orderline arrived in time or not. The connection between tTransporter and tOrder exists through district attribute. Table tOrder( product, ... ... district, inTime) (where inTime is right now null everywhere). A third table DistrictCountry holds the countries and the districts tDistrictCountry (District, Country). Any ideas of how to solve this? It's too complicated for me :(
cheers,
henrik
The orderline table doesn't have transporter (yeah, this data is created from flatfiles from some Dinosaur system from the 70 ies), which is why I have to randomize (based on intimepercent) wheter each orderline arrived in time or not. The connection between tTransporter and tOrder exists through district attribute. Table tOrder( product, ... ... district, inTime) (where inTime is right now null everywhere). A third table DistrictCountry holds the countries and the districts tDistrictCountry (District, Country). Any ideas of how to solve this? It's too complicated for me :(
cheers,
henrik
Run these test queries and post results:
select count(*) from tOrder where district is not null
select count(*) from tOrder A join tDistrictCountry B on A.district=B.District join tTransporter C on B.Country=C.country
select count(*) from tOrder where district is not null
select count(*) from tOrder A join tDistrictCountry B on A.district=B.District join tTransporter C on B.Country=C.country
ASKER
1. 1271369
2. 1354044
2. 1354044
Some district caused 6% of duplicities, find it first.
select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
select * from tOrder A
where A.district=B.District
)
group by B.District
having count(*)>1
You need to create a table mapping 1:1 transporter-district.
select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
select * from tOrder A
where A.district=B.District
)
group by B.District
having count(*)>1
You need to create a table mapping 1:1 transporter-district.
ASKER
Okey, so I cleaned the data and now those 6% missing countries are in. A mapping 1:1? How could I do that? Now order.fk_district is related to districtcountry.district and districtcountry.fk_country id is related to country.countryID. Any hints on how to procede?
select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
select * from tOrder A
where A.district=B.District
)
group by B.District
having count(*)>1
Now returns nothing?
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
select * from tOrder A
where A.district=B.District
)
group by B.District
having count(*)>1
Now returns nothing?
ASKER
now returns nothing
Great, create the 1:1 mapping table.
select B.District,C.transporter
into DistrictTransporter
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
select * from tOrder A
where A.district=B.District
)
group by B.District,C.transporter
select B.District,C.transporter
into DistrictTransporter
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
select * from tOrder A
where A.district=B.District
)
group by B.District,C.transporter
ASKER
ok, what do I do with this table?
-- ok, what do I do with this table?
-- inTimePct = 0..1 = 0..100%
update A
set inTime=
case
when C.inTimePct >= ((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
then 1
else 0
end
from tOrder A
join DistrictTransporter B on A.District=B.District
join tTransporter C on B.transporter=C.transporte r
-- inTimePct = 0..1 = 0..100%
update A
set inTime=
case
when C.inTimePct >= ((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
then 1
else 0
end
from tOrder A
join DistrictTransporter B on A.District=B.District
join tTransporter C on B.transporter=C.transporte
ASKER
Almost there!
The inTime column gets nulls and 1:s only :(
The inTime column (in transporter table) contains values between 62 and 100.
Any ideas? Please don't give up now, when only so little more help is what I need to finalize this!
cheers,
henrik
The inTime column gets nulls and 1:s only :(
The inTime column (in transporter table) contains values between 62 and 100.
Any ideas? Please don't give up now, when only so little more help is what I need to finalize this!
cheers,
henrik
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ispaleny, thank you VERY, VERY much!!!!
I'm sorry all I can give you is 500pts ( I sadly can't find the button to give you 5000 points, lol)
I wouldn't figure this out in a year...
If you get some time maybe you can answer: How can newid randomize based on the intime column? And why the 20000000?
Cheers,
henrik
I'm sorry all I can give you is 500pts ( I sadly can't find the button to give you 5000 points, lol)
I wouldn't figure this out in a year...
If you get some time maybe you can answer: How can newid randomize based on the intime column? And why the 20000000?
Cheers,
henrik
Sorry, I must leave office now (CET 9:00PM). Post a note here tomorrow. Or post a new question for someone else.
If so
-- 1) get Transporter ID + inTime percent in a cursor
declare C1 cursor for select TransporterID,inTimePercen
declare @TransporterID int, @inTimePercent int -- eg 75 if pct is 75%
fetch C1 into @TransporterID, @inTimePercent
while @@fetch_status = 0
begin
-- 2) set default value 0 for this transporter
update orders set inTime = 0 where TransporterID = @TransporterID
-- 3) update flag for a random set of records up to expected PCT
exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = ' + cast(@TransporterID as varchar(10)) + ' order by newid())' )
fetch C1 into @TransporterID, @inTimePercent
end
close C1
deallocate C1
There might be better solutions but it's hard to be more specific without knowing your datat model
The idea is to use newid() to update records randomly,
along with " top N percent " to update the only the relevant percentage of the records
I guess you'll be able to apply this idea to a single transporter if need be
HTH
Hilaire