Showing posts with label Linq. Show all posts
Showing posts with label Linq. Show all posts

Saturday, December 3, 2016

Performance of different lock methods in .net


Something that pops up from time to time is what kind of thread synchronization technique should be used when dealing with shared ´resources in a multi-threaded environment. And yes, I know that the best would be to not share resources at all but that is not always an option, especially when working with legacy code-bases and applications that have organically grown over time.

Shared resource access types

I've included the following operations in the test.
public interface ICacheTest
{
    void Add(CacheItem item);
    CacheItem Get(Guid id);
    CacheItem Search(Guid id);
    void Clear();
}


Add
Adds an item to the dictionary

Get
Gets an item with key lookup.

Search
There will always be scenarios when the key lookup isn't enough. Most code-bases that I've seen tend to go the LINQ way. Some have extra lookup dictionaries. For this test I wanted to include something that takes a little more time then a lookup, so LINQ query is used.

Techniques in the test

Unsafe

No thread synchronization, the test class basically just wraps an Dictionary<Guid, CacheItem>
public class CacheUnsafe : ICacheTest
{
    private readonly Dictionary<Guid, CacheItem> _cache = new Dictionary<Guid, CacheItem>();
    public void Add(CacheItem item)
    {
        _cache.Add(item.Id, item);
    }
    public CacheItem Get(Guid id)
    {
        if (_cache.ContainsKey(id))
            return _cache[id];
        return null;
    }
    public CacheItem Search(Guid id)
    {
        var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
        return item;
    }
    public void Clear()
    {
        _cache.Clear();
    }
}


Full lock

using the lock key-word to just allow one thread at a time to execute within the block of code. This is usually my go-to method as it is easy to implement and usually performs good enough.
public class CacheFullLock : ICacheTest
{
    private readonly Dictionary<Guid, CacheItem> _cache = new Dictionary<Guid, CacheItem>();
    public void Add(CacheItem item)
    {
        lock (_cache)
        {
            _cache.Add(item.Id, item);
        }
    }
    public CacheItem Get(Guid id)
    {
        lock (_cache)
        {
            if(_cache.ContainsKey(id))  
                return _cache[id];
        }
        return null;
    }
    public CacheItem Search(Guid id)
    {
        lock (_cache)
        {
            var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
            return item;
        }
    }
    public void Clear()
    {
        lock (_cache)
        {
            _cache.Clear();
        }
    }
}

Full lock with indexed search

Same as the full lock but also tries to optimize the search to a lookup instead. So in this simplified test the lookup looks kinda unnecessary but in the real world it could be looking up with multiple parameters instead of running the Linq query.
public CacheItem Search(Guid id)
{
    lock (_cache)
    {
        if (_index.ContainsKey(id))
            return _cache[_index[id]];
        return null;
    }
}

ReaderWriterLock

Using the ReaderWriterLock class. Allows multiple readers to access the resource at the same time, but only 1 writer at a time.
public class CacheWriteLock : ICacheTest
{
    private readonly Dictionary<Guid, CacheItem> _cache = new Dictionary<Guid, CacheItem>();
    private readonly ReaderWriterLock _lock = new ReaderWriterLock();
    public void Add(CacheItem item)
    {
        _lock.AcquireWriterLock(1000);
        try
        {
            _cache.Add(item.Id, item);
        }
        finally
        {
            _lock.ReleaseWriterLock();
        }
    }
    public CacheItem Get(Guid id)
    {
        _lock.AcquireReaderLock(1000);
        try
        {
            if (_cache.ContainsKey(id))
                return _cache[id];
            return null;
        }
        finally
        {
            _lock.ReleaseReaderLock();
        }
    }
    public CacheItem Search(Guid id)
    {
        _lock.AcquireReaderLock(1000);
        try
        {
            var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
            return item;
        }
        finally
        {
            _lock.ReleaseReaderLock();
        }
    }
    public void Clear()
    {
        _lock.AcquireWriterLock(1000);
        try
        {
            _cache.Clear();
        }
        finally
        {
            _lock.ReleaseWriterLock();
        }
    }
}

ConcurrentDictionary

The built in concurrent dictionary that is thread-safe. I tried to use this a long way ago when it was first introduced, but found it under-performing compared to a simple Full lock strategy. So up to the test again. At least it looks easy to implement and you do not have to care about the thread safety yourself at it is handled internally.
public class CacheConcurrent : ICacheTest
{
    private readonly ConcurrentDictionary<Guid, CacheItem> _cache = new ConcurrentDictionary<Guid, CacheItem>();
    public void Add(CacheItem item)
    {
        _cache.AddOrUpdate(item.Id, item, (key, oldValue) => item);
    }
    public CacheItem Get(Guid id)
    {
        if (_cache.ContainsKey(id))
            return _cache[id];
        return null;
    }
    public CacheItem Search(Guid id)
    {
        var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
        return item;
    }
    public void Clear()
    {
        _cache.Clear();
    }
}

Single-threaded test

Measuring ticks elapsed for add, get, search and clear methods. Number of items are increased with each test epoch. Each epoch executes the test for 1000 iterations.
So no multi-threading here, just a check of baseline differences between the solutions. I.e. time to acquire the lock, ability to cope with different number of elements etc.

The test rig:

Uses RunningAverage to calculate the average without storing each individual number.

private void ExecuteTestSingle(ICacheTest cache, int dataCount, int iterations)
{
    var type = cache.GetType().Name;
    try
    {
        var testData = GenerateTestData(dataCount);
        var addTimings = new RunningAverage();
        var clearTimings = new RunningAverage();
        var getTimings = new RunningAverage();
        var searchTimings = new RunningAverage();
        for (int i = 0; i < iterations; i++)
        {
            for (int dataIndex = 0; dataIndex < testData.Count; dataIndex++)
            {
                var item = testData[dataIndex];
                addTimings.Add(TimeAction(() =>
                {
                    cache.Add(item);
                }));
                getTimings.Add(TimeAction(() =>
                {
                    cache.Get(item.Id);
                }));
                searchTimings.Add(TimeAction(() =>
                {
                    cache.Search(item.Id);
                }));
            }
            clearTimings.Add(TimeAction(cache.Clear));
        }
        Log($"{type}\tAdd\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{addTimings.Average:0.000}\tticks");
        Log($"{type}\tGet\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{getTimings.Average:0.000}\tticks");
        Log($"{type}\tSearch\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{searchTimings.Average:0.000}\tticks");
        Log($"{type}\tClear\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{clearTimings.Average:0.000}\tticks");
    }
    catch (Exception ex)
    {
        Log($"{type}\tFail\tDataCount\t{dataCount}\tIterations:\t{iterations}\tFailed\t{ex.Message}");
    }
}

private float TimeAction(Action action)
{
    var sw = Stopwatch.StartNew();
    action();
    sw.Stop();
    return sw.ElapsedTicks;
}

Multi-threaded test

3 threads executing at the same time.

  • 1 thread Adds
  • 1 thread Gets
  • 1 thread Searches
Number of successful operations withing the allowed timespan (50 milliseconds) are counted.
The test is executed 1000 times per type and an average is calculated

The test rig:

private void ExecuteTestMulti(ICacheTest cache, int dataCount, int iterations)
{
    var type = cache.GetType().Name;
    try
    {
        var testData = GenerateTestData(dataCount);
        var getAverage = new RunningAverage();
        var searchAverage = new RunningAverage();
        var writeAverage = new RunningAverage();
        for (int i = 0; i < iterations; i++)
        {
            foreach (var data in testData)
                cache.Add(data);

            var d = testData.GetRandom();
            var startTime = DateTime.UtcNow.AddMilliseconds(8);
            var getter = CreateThread(startTime, () => { cache.Get(d.Id); });
            var searcher = CreateThread(startTime, () => { cache.Search(d.Id); });
            var writer = CreateThread(startTime, () => {cache.Add(CacheItem.Create());});

            getter.Thread.Start(startTime);
            writer.Thread.Start(startTime);
            searcher.Thread.Start(startTime);
            getter.Thread.Join();
            searcher.Thread.Join();
            writer.Thread.Join();
            if (!getter.Verify(type, Log)
                || !searcher.Verify(type, Log)
                || !writer.Verify(type, Log))
            {
                return;
            }
            getAverage.Add(getter.WorkIterations);
            searchAverage.Add(searcher.WorkIterations);
            writeAverage.Add(writer.WorkIterations);
            cache.Clear();
        }
        Log($"{type}\tGets\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{getAverage.Average:0}\tgets");
        Log($"{type}\tSearch\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{searchAverage.Average:0}\tsearches");
        Log($"{type}\tWrites\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{writeAverage.Average:0}\twrites");
    }
    catch (Exception ex)
    {
        Log($"{type}\tFail\tDataCount\t{dataCount}\tIterations:\t{iterations}\tFailed\t{ex.Message}");
    }
}

private WorkerResult CreateThread(DateTime start, Action action )
{
    var work = new WorkerResult();
    work.Thread = new Thread((context =>
    {
        try
        {
            while (DateTime.UtcNow < start)
                Thread.Sleep(0);
            var sw = Stopwatch.StartNew();
            while (sw.ElapsedMilliseconds < 50)
            {
                action();
                work.Increment();
            }
        }
        catch (Exception ex)
        {
            work.Exception = ex;
        }
    }))
    {
        IsBackground = true
    };
    return work;
}

private class WorkerResult
{
    public Thread Thread { get; set; }
    public Exception Exception { get; set; }
    public float WorkIterations => _workIterations;
    private float _workIterations;

    public void Increment()
    {
        _workIterations++;
    }

    public bool Verify(string type, Action<string> logger)
    {
        if (Exception != null)
        {
            logger($"{type}\tFail\tDataCount\t-\tIterations:\t-\tFailed\t{Exception.Message}");
            return false;
        }
        return true;
    }
}

Results

This post only contains the summary..
If you are interested in all the timings, I've posted the table in a previous post.

Starting with the single threaded results





For the 16 items test, the ConcurrentDictionary clearly is a looser. The other strategies are at the same level and for this small number of items it seems to not be so important what strategy to chose.






So not much surprising here, the ConcurrentDictionary lost the search category and unsurprisingly the for the cause optimized lookup was the fastest that became clear with a little more data.
The Add and Get categories was won by the unsafe lookup, not really interesting for this test but
good for a baseline. The ConcurrentDictionary seems to have lost most categories.

Multi-threaded results

The unsafe did not make it here, due to the 'Collection was modified; enumeration operation may not execute' exception. The main reason we use locks in a multi-threaded environment ;)
So here we only look at 4 different contestants.



The full lock strategy seems to fail here, probably as they all lock out each other. I.e. a long duration search locks out other readers. The switch to full lock with a lookup dictionary instead of a linq query won the searches count by far, but that tends to be the case most times when writing a non-generic bit of code to optimize your scenario. So no surprise there.
The ConcurrentDictionary actually holds its place quite well, if you use it as a dictionary only. Doing searches in it seems to go very slow.

Conclusions

The numbers seem to jump a little and the conclusion I can draw is that if performance is key to your success, do your own performance tests for your exact scenario. For the general case, I think that I'll stick with the Full-Lock and extra indexes if needed strategy for now. The extra complexity needed to get a ReaderWriterLock is in my mind not worth the risk of me or someone else screwing it up, at least not for a 1 writer, 2 readers scenario.


All code provided as-is. This is copied from my own code-base, May need some additional programming to work. Hope this helps someone out there :)



Thursday, September 15, 2016

Knowledge based reasoning in .net c#


I've lately been working on a side project in AI. One of the key parts of it is to figure out how to include a knowledge base and let the program reason about things itself. So how to model knowledge?


Other articles in the AI Knowledge Based Reasoning series on this site:
Knowledge based reasoning in .net c#
Reasoning in Open or Closed models
Logic Expression evaluation with open-world assumption
Expression evaluation on object based models
Expression evaluation over time
Expression evaluation over time - Was?

Prolog

Somewhere in the back of my head I have a class that I took back in university that scratched the surface of the Prolog language.
In a very simplified way the Prolog language uses facts and rules.

Facts in Prolog are in the form:

cat(pixel)

In other words, it is a known fact that pixel is a cat.
You can then go ahead and as for all the cats and get back pixel.
The fun things start when you define the rule that cat is an animal

animal(X) :- cat(X)

and start asking about animals and get back pixel.
A special case of rule you add a tuple and thus basically gain a relation

brotherOf(prime, pixel)
brotherOf(tiger, pixel)

meaning that prime and tiger and brothers to pixel.

As said, this is a very vague memory from the back of my head. If you are looking for a Prolog tutorial, then this is not the post for you.

Modelling knowledge

So, my approach to model knowledge is influenced by Prolog but that's about it. As I want the application to create the knowledge by itself there is no need to write a complicated textual representation, just a model that can be added to a queried. The textual representation used in this article are just there to make it easier to write about it
The model will be

attribute(variable)
relation(variable1, variable2)
variable1 => variable2

In this first version it is also assumed that all variables are strings. The attribute, relation and implication names are also treated as strings.

Some code for the knowledge model:

public class KnowledgeAttribute
{
 public string Attribute { get; set; }
 public string Subject { get; set; }
 public override string ToString()
 {
  return $"'{Attribute}'('{Subject}')";
 }
}
public class KnowledgeImplication
{
 public string Implicator { get; set; }
 public string Implied { get; set; }
 public override string ToString()
 {
  return $"'{Implicator}' => '{Implied}'";
 }
}
public class KnowledgeRelation
{
 public string Relation { get; set; }
 public string Subject { get; set; }
 public string Target { get; set; }
 public override string ToString()
 {
  return $"'{Relation}'('{Subject}', '{Target}')";
 }
}

So nothing too complicated there. Lets for simplicity store them in a holding object
public class KnowledgeModel
{
 public Dictionary<string, KnowledgeAttribute> Attributes { get; set; }
 public Dictionary<string, KnowledgeRelation> Relations { get; set; }
 public Dictionary<string, KnowledgeImplication> Implications { get; set; }

 public KnowledgeModel()
 {
  Attributes = new Dictionary<string, KnowledgeAttribute>();
  Relations = new Dictionary<string, KnowledgeRelation>();
  Implications = new Dictionary<string, KnowledgeImplication>();
 }
}

Querying

Now we can start querying the model. To get all variables with the same attribute:
public HashSet<string> ListAllWith(string attribute)
{
 return new HashSet<string>(from x in _model.Attributes.Values
  where
   x.Attribute.Equals(attribute, StringComparison.OrdinalIgnoreCase)
  select x.Subject
  );
}
Get variables matching a list of attributes
public HashSet<string> ListAllWith(IEnumerable attributes)
{
 HashSet<string> result = null;
 foreach (var attribute in attributes)
 {
  var found = ListAllWith(attribute);
  if (found.Count == 0)
   break; // nothing matches all attributes

  if (result == null)
   result = found;
  else
   result.IntersectWith(found);

  if (result.Count == 0)
   break; // nothing matches all attributes
 }
 return result;
}
Get all implications
private HashSet<string> GetAllImplications(string implied)
{
 var result = new HashSet<string>();
 result.UnionWith(from x in _model.Implications.Values
  where x.Implied.Equals(implied, StringComparison.OrdinalIgnoreCase)
  select x.Implicator);
 var chained = new HashSet<string>();
 foreach (var item in result)
 {
  chained.UnionWith(GetAllImplications(item));
 }
 result.UnionWith(chained);
 result.Add(implied);
 return result;
}
Or all related
public HashSet<string> ListAllRelated(string relationType, string variable)
{
 var relationTypes = GetAllImplications(relationType);
 return new HashSet<string>(from x in _model.Relations.Values
  where
  relationTypes.Contains(x.Relation)
  && x.Target.Equals(variable, StringComparison.OrdinalIgnoreCase)
  select x.Subject
  );
}

So. now just fill it with data and start querying.
Hope this helps someone out there

All code provided as-is. This is copied from my own code-base, May need some additional programming to work.

For example source code, head over to my github repository and play around for yourself:

Good luck :)

This is part of a series of posts regarding Knowledge modelling and expression evaluation.
The next part is
Logic Expression evaluation with open-world assumption