cool hit counter [T-SQL] Distributed Extraction of Partial Data_Intefrankly

[T-SQL] Distributed Extraction of Partial Data


Well, I really don't know how to title this, so I made a "distribution", which feels a bit upscale, but it's not really technical, so you'll see. The situation is this, just received a temporary task, need to make a few business points of sales data [become] a little less, that is, in the ERP reports related to the query out of the data to be in the specified interval, to put it bluntly is that what ~ you know, certain peers should be familiar with this kind of task, while some peers may be facing or about to face such a task, I hope this article can provide a little thought.

I was given this kind of task for the first time, feeling quite new, thinking about it, decided to do it in the test library, finished to let his ERP access to the test library, after the thing and then cut back to the official library, the details will not say, focus on the data processing methods. The underlying principle is to delete some of the documents, because the report data from the documents, less documents, the natural number is small (as for the document data structure, of course, different ERP programs have different designs, delete a single involving data modification is also different, my case is to directly delete the main single on the line, the detailed table will automatically cascade to delete, of course, there are other associated data, that do not care, anyway, is a test library, how to toss it, as long as the report is presented in line with expectations). In order to make the detailed data look more natural, of course, you can not simply and roughly delete all the documents in a continuous period, or change the discount of a large number of documents to lower to meet the target, that is too rough, botched is not good.

What I envision is to divide the documents by a certain time granularity (e.g. daily, weekly, monthly, quarterly) and then check a certain percentage of documents from each interval and then compare it to the target and gradually adjust the granularity and percentage based on the comparison until it meets the target. This gives you the documents that need to be retained, and all that remains is to delete the rest of the documents in the query interval. Of course, you can also subtract the current data from the target data to get the data that needs to be chopped, and when you're done, use that data as the target to query the documents, so you can directly get the documents that need to be deleted. I use the former, i.e. get the documents that need to be kept, and when I'm done it's a matter of where not in, no effort. Upcodes.

WITH cte AS ( - grouped at a certain time granularity (daily, weekly, monthly, quarterly, etc.)
SELECT ROW_NUMBER() OVER(PARTITION BY DATEPART(week, Date of the document) ORDER BY  Document Number) AS 'RowNo'
,DATEPART(week, Date of the document) AS 'Rang', Date of the document,  Document Number,  Amount Sold
 FROM main list table
WHERE  business location='xxx'
 AND YEAR(date of document)=2015
 ),cte2 AS ( - to get the number of sheets in each group as the denominator
SELECT Rang,COUNT(1) AS 'BillCount' FROM cte GROUP BY Rang
)
SELECT a.*,BillCount
FROM cte a
JOIN cte2 b ON b.Rang = a.Rang
WHERE RowNo/CAST(BillCount AS DECIMAL(16,4)) < 0.81 -- Proportionally take some of the rows of each group,billcount beint, Need to convert to decimals before dividing

Time granularity on modifying the 1st parameter of the datepart function (day/week/month/quarter, etc.), note that there are 2 places, and the ratio on modifying that last 0.81, it's so simple that it's redundant to talk about it.

My environment is SQL08R2, so if there is a more correct pose, I would appreciate any advice from the greats.


Recommended>>
1、Rattling Pink Panther and Lord God Child
2、2018 Digital Expo Calls for Leading Big Data Technology Results from Around the World
3、Instead of being anxious about being replaced by artificial intelligence polish the basics of life
4、AMD Thread Ripper 2950X now in SiSoftware database
5、The Great Cyber Security Knowledge

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号