?
Solved

Random number based on percentage

Posted on 2005-05-09
18
Medium Priority
?
620 Views
Last Modified: 2012-06-27
Hi all,

I have a database full of orders from different countries. Based on the performance (inTime in percent) of the logistical provider, I want to set an attribute "inTime" to each of the orders. InTime yes/no should be randomized, with the percentage of yes being the same as the inTime percent of the transporter. Is there an easy way of doing this in T-SQL?

best regards,

henrik
0
Comment
Question by:henrikatwork
  • 8
  • 8
  • 2
18 Comments
 
LVL 26

Expert Comment

by:Hilaire
ID: 13966565
Do you need to do it for all the transporters ?
If so

-- 1) get Transporter ID + inTime percent in a cursor
declare C1 cursor for select  TransporterID,inTimePercent from ....
declare @TransporterID int, @inTimePercent int  -- eg 75 if pct is 75%
fetch C1 into @TransporterID, @inTimePercent
while @@fetch_status = 0
begin
   -- 2) set default value 0 for this transporter
   update orders set inTime = 0 where TransporterID = @TransporterID
   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10)) + ' order by newid())' )

   fetch C1 into @TransporterID, @inTimePercent
end
close C1
deallocate C1


There might be better solutions but it's hard to be more specific without knowing your datat model

The idea is to use newid() to update records randomly,
along with " top N percent " to update the only the relevant percentage of the records

I guess you'll be able to apply this idea to a single transporter if need be

HTH
Hilaire
0
 
LVL 13

Expert Comment

by:ispaleny
ID: 13966843
Beauty of Hilaire's algorithm is in an exact result, you also get numbers very near inTimePct for a small number of records also.

The following query does a random distibution based on inTimePct of IDTransporter. So it is expected to match better with large number of records.

create table #Orders(IDOrder int primary key,IDTransporter int,inTime bit)
create table #Transporters(IDTransporter int primary key,inTimePct numeric(7,6))
insert #Orders(IDOrder,IDTransporter) values (1,1)
insert #Orders(IDOrder,IDTransporter) values (2,1)
insert #Orders(IDOrder,IDTransporter) values (3,2)
insert #Orders(IDOrder,IDTransporter) values (4,2)
insert #Orders(IDOrder,IDTransporter) values (5,2)
insert #Orders(IDOrder,IDTransporter) values (6,2)
insert #Transporters(IDTransporter,inTimePct) values (1,.5)
insert #Transporters(IDTransporter,inTimePct) values (2,.2)

-- inTimePct = 0..1 = 0..100%
update A
set inTime=case when B.inTimePct>=
((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
then 1 else 0 end
from #Orders A
join #Transporters B on A.IDTransporter=B.IDTransporter

select * from #Orders
0
 
LVL 26

Expert Comment

by:Hilaire
ID: 13967192
Oops missing join

   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10)) + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10)) + ' order by newid())' )

should be

   -- 3) update flag for a random set of records up to expected PCT
   exec ('update a set inTime = 1 from orders a inner join (select top ' + cast(@inTimePercent as varchar(10))
       + ' percent OrderID from Orders where TransporterID = '  + cast(@TransporterID as varchar(10))
       + ' order by newid()) b on a.OrderID = b.OrderID' )
0
Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

 
LVL 1

Author Comment

by:henrikatwork
ID: 13967229
thanks, I'll try them out immediately
0
 
LVL 1

Author Comment

by:henrikatwork
ID: 13982159
I sadly have no influence on the data structure. There's no transporter ID (only transporter & country forming a unique identification in the table: tTransporter (transporter, country, intimepercent)

The orderline table doesn't have transporter (yeah, this data is created from flatfiles from some Dinosaur system from the 70 ies), which is why I have to randomize (based on intimepercent) wheter each orderline arrived in time or not. The connection between tTransporter and tOrder exists through district attribute. Table tOrder( product, ... ... district, inTime) (where inTime is right now null everywhere). A third table DistrictCountry holds the countries and the districts tDistrictCountry (District, Country). Any ideas of how to solve this? It's too complicated for me :(

cheers,

henrik
0
 
LVL 13

Expert Comment

by:ispaleny
ID: 13984827
Run these test queries and post results:

select count(*) from tOrder where district is not null

select count(*) from tOrder A join tDistrictCountry B on A.district=B.District join tTransporter C on B.Country=C.country
0
 
LVL 1

Author Comment

by:henrikatwork
ID: 13985216
1. 1271369
2. 1354044

0
 
LVL 13

Expert Comment

by:ispaleny
ID: 13986377
Some district caused 6% of duplicities, find it first.

select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District
having count(*)>1

You need to create a table mapping 1:1 transporter-district.
0
 
LVL 1

Author Comment

by:henrikatwork
ID: 13994847
Okey, so I cleaned the data and now those 6% missing countries are in. A mapping 1:1? How could I do that? Now order.fk_district is related to districtcountry.district and districtcountry.fk_countryid is related to country.countryID. Any hints on how to procede?
0
 
LVL 13

Expert Comment

by:ispaleny
ID: 13995070
select B.District,count(*)
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District
having count(*)>1

Now returns nothing?
0
 
LVL 1

Author Comment

by:henrikatwork
ID: 14006695
now returns nothing
0
 
LVL 13

Expert Comment

by:ispaleny
ID: 14008908
Great, create the 1:1 mapping table.

select B.District,C.transporter
into DistrictTransporter
from tDistrictCountry B
join tTransporter C on B.Country=C.country
where exists(
 select * from tOrder A
 where  A.district=B.District
)
group by B.District,C.transporter
0
 
LVL 1

Author Comment

by:henrikatwork
ID: 14044064
ok, what do I do with this table?
0
 
LVL 13

Expert Comment

by:ispaleny
ID: 14046268
-- ok, what do I do with this table?
-- inTimePct = 0..1 = 0..100%
update A
set inTime=
case
  when C.inTimePct >= ((abs(cast(cast(newid() as binary(16)) as int))%20000000)/20000000.)
  then 1
  else 0
end
from tOrder A
join DistrictTransporter B on A.District=B.District
join tTransporter C on B.transporter=C.transporter
0
 
LVL 1

Author Comment

by:henrikatwork
ID: 14061561
Almost there!

The inTime column gets nulls and 1:s only :(

The inTime column (in transporter table) contains values between 62 and 100.

Any ideas? Please don't give up now, when only so little more help is what I need to finalize this!

cheers,

henrik
0
 
LVL 13

Accepted Solution

by:
ispaleny earned 2000 total points
ID: 14066536
/*
>>The inTime column gets nulls and 1:s only :(
NULLs = unknown transporter
1 = known transporter (and unexpected inTimePct values above 1 (100%) )

When execution plan uses SORT step, function newid() can provide unpredictable results. This can avoided by rewriting UPDATE to SELECT INTO with primary key and using the result table in next UPDATE step with join on pk. At first try the following query.
*/

-- inTimePct 0..100
-- when transporter is unknown use default inTimePct 75%
update A
set inTime=
case
  when C.inTimePct is not null then
    case
      when C.inTimePct/100. >= ((abs(cast(cast(newid() as binary(16)) as int)) % 20000000) / 20000000.)
      then 1
      else 0
    end
  when 0.75                        >= ((abs(cast(cast(newid() as binary(16)) as int)) % 20000000) / 20000000.)
  then 1
  else 0
end
from tOrder A
left join DistrictTransporter B on A.District = B.District
left join tTransporter C on B.transporter = C.transporter
0
 
LVL 1

Author Comment

by:henrikatwork
ID: 14071250
Ispaleny, thank you VERY, VERY much!!!!

I'm sorry all I can give you is 500pts ( I sadly can't find the button to give you 5000 points, lol)

I wouldn't figure this out in a year...

If you get some time maybe you can answer: How can newid randomize based on the intime column? And why the 20000000?

Cheers,

henrik
0
 
LVL 13

Expert Comment

by:ispaleny
ID: 14071382
Sorry, I must leave office now (CET 9:00PM). Post a note here tomorrow. Or post a new question for someone else.
0

Featured Post

 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Access has a limit of 255 columns in a single table; SQL Server allows tables with over 255 columns, but reading that data is not necessarily simple.  The final solution for this task involved creating a custom text parser and then reading…
This month, Experts Exchange sat down with resident SQL expert, Jim Horn, for an in-depth look into the makings of a successful career in SQL.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Suggested Courses

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question