cool hit counter [SQL] What the hell is a CLR aggregation function?_Intefrankly

[SQL] What the hell is a CLR aggregation function?


I've written a CLR aggregation function that merges strings before, basically copying MS example Recently, I wanted to transform it and encountered some problems, so I tinkered with it, and some insights are recorded below .

I . Miscellaneous

  • Although CLR aggregate functions are functions in SQL, they actually exist as a class or structure in C# (yes, the entity implementing the aggregate function can be a structure or a class), which is different from a CLR ordinary function, which is a method in a class, and in view of this, the class or structure implementing the aggregate function is collectively referred to as the aggregate class below to avoid confusion in the reader's mind
  • Aggregate classes must be characterized with[SqlUserDefinedAggregate] perform a description。MS example Also used in[Serializable] characteristics, This feature has been tested and is not required
  • Aggregate classes are always serialized and deserialized during execution, which requires them to be serializable, as described in detail below
  • The IsInvariantToNulls, IsInvariantToDuplicates, IsInvariantToOrder, IsNullIfEmpty properties of the SqlUserDefinedAggregate feature are used by the SQL engine's query processor, presumably as a basis for optimizing execution plans and the like, and have nothing to do with the implementation of aggregation classes . What does it mean, for example, IsInvariantToNulls=true, is that it tells the SQL query processor that this aggregate function of mine returns consistent results regardless of whether you have null in the field (meaning null in SQL, same below), and then the query processor may determine whether to reuse the execution plan that already exists based on that; it does not mean that it will automatically filter out the null values passed into the Accumulate method for you . In other words, after setting IsInvariantToNulls=true, what should come in null in Accumulate will come in as usual, and you'll still have to deal with what you have to deal with . The same goes for several other properties, they won't take care of anything for you, you'll still have to worry about what's your turn to worry about . So given that, your implementation had better be consistent with what the above property describes, or else you'll have good luck tricking the query processor, presumably . It's like telling a MM you're treating her to a $6 hot and spicy meal, when in reality you only bought a $1 one, with consequences of your own making up your mind

ii . order of implementation

Upon interruption point debugging, the aggregation classes are executed in the following order .

  1. Enter the Init method . This is the first step after the aggregation has started, tried adding unrefined constructors to the class, but it didn't go in, and here the aggregated class is already implicitly warning you not to treat it like a normal class . As for why, I don't know, I hope someone high up will answer
  2. Enter the Accumulate method (this step does not necessarily happen, as will be explained later) . The parameters of this method are the parameters that can be passed in when the aggregation is finally deployed to SQL (SQL 2005 only supports 1 parameter), which is equivalent to an entry point for SQL, and the elements to be aggregated will be thrown in one by one, and one will be executed once, and then one will be executed once more, so the method will be executed many times in a loop according to the number of elements to be aggregated, but of course, if the elements are 0, the method will not be executed and will jump directly to the next step, which is why it is said that this step does not necessarily happen . It should be noted that the Aggregate function works in groups as a cycle, that is, the GROUP BY comes out with several groups, the Aggregate function will be called several times, here we are talking about the order of execution in each one, so the number of cycles of the Accumulate method is the number of rows of a single group, not the number of rows of all groups . As a chestnut, GROUP BY comes out 2 groups, group 1 has 2 rows and group 2 has 3 rows, then the whole aggregation function will be called 2 times, in the first time Accumulate will loop 2 times, finished for the next step, after the whole cycle is completed, then start aggregating the next group, obviously, in the second round Accumulate will loop 3 times
  3. Perform serialization. Why do you need to serialize, I'd like to know, all I know is that after this step, all the values of the fields of the aggregated class are cleared (reset to type defaults to be exact), so if you don't take the opportunity to save the data in a hurry while serializing, it will undo all the work done in Init and Accumulate before, because you won't get any data in the next step of the deserialization process, which in turn will lead to no data to return in the final Terminate method! So the serialization step must be focused on understanding. This step can only be seen to have happened if you implement serialization yourself. Simply put, the serialization behavior of the aggregate class is divided into two types, specified by the Format property of the SqlUserDefinedAggregate feature, which (being an enumeration type) has three values: Unknown, Native, and UserDefined, where Unknown exists as a default value, similar to None in other enumerations or the like, representing that it has not yet been set. So the serialization behavior of the aggregated class would be only Native, UserDefined.
    • Native。 Representing aggregated classes given toCLR Deserialization and deserialization, No need to implement it yourself, It looks beautiful., neverthelessNative The way some prerequisites, It is the aggregated class that can only have members of the value type, Cannot have members of reference types, includestring, And if the aggregated class isclass rather thanstruct, That must be used[StructLayout(LayoutKind .Sequential)] Marking of characteristics, If the above conditions are not met, Deployment must fail。 So sometimes you can't be lazy, Must be usedUserDefined way
    • UserDefined。 means that the serialization and deserialization behavior must be implemented by itself, Specifically by having the aggregation class implementIBinarySerialize interface, just asMS example that kind。 The interface has two methods,public void Write(BinaryWriter w) harmonypublic void Read(BinaryReader r), represent the serialization and deserialization processes, respectively。 Finally, we're getting to the point., When the aggregated class reaches this step of serialization, You are responsible for writing all the data you want to save to thew( anBinaryWriter an actual example) in the base stream of the, This can be done specifically throughw ofWrite conduct, You can also visit directly w .BaseStream Operating the base stream, Or like new BinaryFormatter() . Serialize(w .BaseStream, obj) thus, Take the entire object withBinaryFormatter Write to the base stream, Anyway, there are various ways, Knowledge that is part of the stream, This article does not go into detail, Anyway, the ultimate goal is to write data intow .BaseStream, for insurance purposes, After you write it, you canw .Flush() do (sth. for a bit to give it a try)。 Here comes the second point, It's about the amount of data that can be written,SqlUserDefinedAggregate The characteristics have aMaxByteSize properties, When usingUserDefined modality, This property must be specified, Indicates the maximum number of bytes of data that can be written in during serialization。 No designation is0, It just doesn't save any data.~ play with hairs。MaxByteSize The maximum value that can be set is determined by theSqlUserDefinedAggregateAttribute .MaxByteSizeValue constant, And this constant .net2 .0-3 .5 all8000, I don't know if there are any changes in subsequent versions。 that is say, When serializing, Can write up to8000 Bytes of data, Can be saved4000 Chinese character? Ouch, that's good.~NONONO, According to my debugging,w The coding method ofUTF8( I'm not sure it has anything to do with the environment., owing tow beCLR Responsible for incoming, What is the case of passing in what code ofw, no way of finding out。 If it's a fixed passUTF8, Well, that's a bit of a pity.ASCII The people of the district.), and cannot be changed, that is1 A Chinese character may occupy3~4 byte, press3 Bytes are also just2600 About one Chinese character, I should say it's not rich., I can only pray to God that this limit is not touched in the application。 So my opinion., Be sure to save this capacity, Keep only the necessary data, Don't save yourself the trouble of serializing the whole entire object in。 for exampleMS of example Just putStringBuilder hit the targetstring stuff in, Without putting the wholeStringBuilder The object is tucked in.
  4. Performing deserialization。 as we said last time, Serialize the data you want to save, Up to this point, Naturally, that means getting the data out。 equivalent, This can be done byr( anBinaryReader) variousReadXXX methodological, You can also visitr .BaseStream Operate the base stream to retrieve data。 Keep your head above water here., That is, after the data is taken out it is meant to be used in theTerminate method to process and return the result, Instead of having to restore the member object, Then go back toTerminate operating object in。 What do you mean?, also tookMS of example discuss, But this time it's the opposite, (located) atRead to get the previously savedstring after, There's no need to revert toStringBuilder, It is entirely possible to use astring field to catch the, Then inTerminate Just process the field and return it
  5. Enter the Terminate method. As mentioned above Accumulate is the SQL-oriented entry point, and Terminate is the exit. The result of the aggregate calculation is returned to the caller via Terminate, so the return type of the method is the type obtained in SQL. By the above, know that there is a serialization and deserialization link between Accumulate and Terminate, and know that after serialization, the values of the class fields have been cleared and are no longer the same class fields they were (unless you restore them during deserialization). Being clear about this, you should know that seemingly natural practices like this: in aggregate classes are hugely wrong unless s are preserved during serialization and restored during deserialization.
public void Accumulate(SqlString str)
{
    s = str;
}

public SqlString Terminate()
{
    return new SqlString(s);
}
  1. Start the next set of Init→Accumulate→Serialize/Deserialize→Terminate . Of course if there's no next group, the whole thing ends

It's over? Isn't there a Merge method, I'm sorry, I don't know when this one will be used . In my many debugging sessions, I never encountered a case where Merge was executed . depending on MSDN Documentation As stated, my guess is that the CLR doesn't guarantee to use the same aggregation class instance all at once, it could always open another instance to work with and use the Merge method of the newly opened instance to merge the data from the old instance into the new one, releasing the old one when it's done . I don't know if this is a good guess or not, but if the jerk-off masters are clear, I'd like to ask for guidance, thanks in advance . If this guess is correct, it is clear that all the Merge method has to do is to merge the data from the old instance (other) into the current instance, and the reader should already have an idea of exactly how this should be written . Note that if the aggregation class is designed to handle only non-repeating elements, then it is guaranteed that the elements stored in each instance are unique, but it is possible for elements in two instances to be identical; keep an eye on this when implementing Merge to ensure that the merged data is still unique .

III. FINAL

At the moment it seems to me that the aggregate class, although it is a class/structure in C#, is full of quirks, such as not executing the constructor and clearing the class fields during runtime and switching to serialization and deserialization to pass state, making it less of a normal class, so I would recommend not using some OOP techniques to implement it, such as inheritance overrides or whatever, until you've fully figured it out, don't even think about it, just fill in the blanks honestly . Also, I hope to get some expert guidance on the doubts raised in the article, so thanks again .

Finally, a modified string aggregation (ignoring null, whitespace, duplicate strings, removing first and last whitespace) is attached.

using Microsoft .SqlServer .Server;
using System;
using System .Collections .Generic;
using System .Data .SqlTypes;
using System .IO;

namespace AhDung .SqlClr
{
    [SqlUserDefinedAggregate(
        Format .UserDefined,
        IsInvariantToNulls = true,
        IsInvariantToDuplicates = true,
        IsInvariantToOrder = true,
        MaxByteSize = SqlUserDefinedAggregateAttribute .MaxByteSizeValue)
    ]
    public class JoinString : IBinarySerialize
    {
        string sptr, result;
        Dictionary<string, object> dic;

        public void Init()
        {
            sptr = string .Empty;
            dic = new Dictionary<string, object>(StringComparer .CurrentCultureIgnoreCase);// Ignore case
        }

        public void Accumulate(SqlString str, SqlString separater)
        {
            if (sptr .Length == 0 && !separater .IsNull && !string .IsNullOrEmpty(separater .Value)) { sptr = separater .Value; }
            string s;
            if (str .IsNull || (s = str .Value .Trim()) .Length == 0 || dic .ContainsKey(s)) { return; }
            dic .Add(s, null);
        }

        public void Merge(JoinString other)
        {
            foreach (string s in other .dic .Keys)
            {
                if (dic .ContainsKey(s)) { continue; }
                dic .Add(s, null);
            }
        }

        public SqlString Terminate()
        {
            return new SqlString(result);
        }

        public void Read(BinaryReader r)
        {
            result = r .ReadString();
        }

        public void Write(BinaryWriter w)
        {
            string[] ss = new string[dic .Count];
            dic .Keys .CopyTo(ss, 0);
            w .Write(string .Join(sptr, ss));
        }
    }
}

- End -


Recommended>>
1、Dive Party Back to the Bulls Algorithm Engineer 2017 Study Internship Fall Recruiting Summary
2、Setting web directory permissions under linux
3、Go Language Community Initializing Memory Data Game List Data
4、ASPNETMVC5EF6EasyUI Backend Management System 31 MVC Using RDL Reports
5、Java calls to Python crawlers

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号