Reading time ~4 minutes This post is an analysis of a very interesting optimization proposed by Nicholas Frechette in the comments under the previous post. He proposed to use one of the oldest tricks in performance cookbook - divide and conquer. Well, it did not turn out as I expected.Saga Before I go further here are some link to the previous posts on the problem of calculating similarities and then optimizing. This thread grew to a few post. Here are all of them:How I calculate similariti...
Strona głównaUżytkownik
maklipsa | użytkownik
Sztuka programowania 2872 dni, 21 godzin, 18 minut temu 82 źrodło rozwiń
Reading time ~2 minutes This post was inspired by a discussion on Reddit that followed my previous post In this post, I will cover a suggestion by BelowAverageITGuy that cut down the total execution time by almost one hour. Saga Before I go further here are some link to the previous posts on the problem of calculating similarities and then optimizing it grew to few post. Here are all of them:How I calculate similarities in cookit?How to calculate 17 billion similaritiesIndependent code in ...
Sztuka programowania 2884 dni, 5 godzin, 34 minuty temu 96 źrodło rozwiń
Reading time ~1 minute This will be a fast errata to the previous one. This time I will expand the oldest performance mantra: The fastest code is the one that doesn’t execute. Second to that is the one that executes once Last time I’ve forgot to mention one very important optimization. It was one of two steps that allowed me to go from 1530 to 484 seconds in the sample run.Saga Before I go further here are some link to the previous posts on the problem of calculating similarities and then...
Programowanie rozproszone 2897 dni, 7 godzin, 3 minuty temu 49 źrodło rozwiń
Reading time ~6 minutes Last time I’ve shown how I’ve gone from 34 hours to 11. This time we go faster. To go faster I have to do less. The current implementation of Similarity iterates over one vector and checks if that ingredient exists in the second one. Since those vectors are sparse the chance of a miss is big. This means that I am losing computational power on iterating and calling TryGetValue. How to iterate only over the mutually owned ones and do it fast? Saga Before I go furth...
Sztuka programowania 2894 dni, 6 godzin, 43 minuty temu 59 źrodło rozwiń
Reading time ~5 minutes The previous post described the methodology I’ve used to calculate similarities between recipes in cookit. If You haven’t read it I’ll give it 4 minutes because it will make understanding this post easier. Go one, I’ll wait. It ended on a happy note and everything seemed to be downhill from there on. It was until I tried to run it. It took long. Very long. How long? I don’t know because I’ve canceled it after about one hour. Going with a famous quote (probably from E...
Sztuka programowania 2900 dni, 6 godzin, 38 minut temu 141 źrodło rozwiń
Reading time ~5 minutes Warning this post contains some math. Even more, it shows how to use it for solving real life problems. This post describes how I calculate similarity between recipes in my pet project cookit.pl. For those of you that don’t know, cookit is a search engine for recipes. It crawls websites extracting recipe texts, then it parses it and tries to create a precise ingredient list with amounts and units. 182 184 recipes2936 ingredients This scale may not seem huge, but tr...
Sztuka programowania 2908 dni, 5 godzin, 30 minut temu 56 źrodło rozwiń
Reading time ~6 minutes This post is covering a subset of what I am talking in my talk How I stopped worrying and learned to love parallel processing (currently only in polish). This will cover on how, in terms of performance, AsParallel can kick you in a place where it hurts a lot, simultaneously being a blessing in terms of… performance. How is that? Let’s look at someHistory AsParallel was introduced as an extension to LINQ with TPL in .NET 4.0. In theory, it’s God’s sent. The promise w...
Architektura 2910 dni, 3 godziny, 37 minut temu 117 źrodło rozwiń
Reading time ~4 minutes Diagnosing high memory usage can be tricky, here is the second part of how I found what was hogging to much memory in our system. In the previous post I’ve wrote how to create a memory dump and how many possibilities of catching just the right moment for it ProcDump has. When trying to analyze memory leaks, or high memory usage (not necessary meaning a leak) we have a few ways to approach it: Attach a debugger There are many problems with this approach, to name a fe...
Reading time ~2 minutes I’m taking a short break from Hangfire series, but I will get back to it. This time - Where did my memory go ? Or to be more exact: Why is this using so much memory? The story starts with one IIS application pool using around 6 Gigabytes of memory on one of our test environments. It was several times above the values that we expected it to use, so we decided to investigate. Without much thinking we fired up Visual Studio installed on the test server, and attached to the proce...
Architektura 2969 dni, 7 godzin, 15 minut temu 64 źrodło rozwiń
Reading time ~6 minutes This is a sixth part of a series:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellationpart 5 - Job continuation with ContinueWithpart 6 - Recurring jobs and cron expressions Parts 3, 4, and 5 covered the BackgroundJob class responsible for enqueuing single jobs (fire and forget). This post will cover RecurringJob class exposing API for recurring jobs (as the name ...
Architektura 2973 dni, 36 minut temu 45 źrodło rozwiń
[EN]Don't do it now! Part 5. Hangfire details - job continuation with ContinueWith – IndexOutOfRange
Reading time ~3 minutes This is a fifth part of a series:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellationpart 5 - Job continuation with ContinueWithpart 6 - Recurring jobs and cron expressions Part 3 covered almost all functions in BackgroundJob class except for ContinueWith functions family. So here we go :) The fact that it has the same name as a System.Threading.Tasks.Task funct...
Architektura 2980 dni, 3 godziny, 49 minut temu 77 źrodło rozwiń
Reading time ~3 minutes This is the fourth part of a series discussing job scheduling and Hangfire details:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellation This part will cover few small topics:dashboardretriesmore technical part of the Hangfire.BackgroundJob class APIjob cancellationDashboard Let’s start with the administrative dashboard because it gives a good background for the ...
Architektura 2987 dni, 9 godzin, 2 minuty temu 55 źrodło rozwiń
Reading time ~2 minutes This is the third part of a series discussing job scheduling and Hangfire details:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellation This part will focus on the basic scheduling API of Hangfire. The easiest way to create a fire and forget job is by using the classHangfire.BackgroundJob and its minimalistic (and this is a complement) API of static functions:Enqu...
Architektura 3004 dni, 19 godzin, 58 minut temu 99 źrodło rozwiń
Reading time ~2 minutes In the previous post I’ve wrote about why I think the ability to schedule tasks for later execution is a fundamental technical feature, but also a must have from a business point of view. We are passed the whys, so lets get to the hows. The answer is simple - Hangfire. I’ve wrote about it here, here and here, so yeah, I like it. Hangfire is an amazing library. It has proved itself in my pet project (cookit.pl) and in a huge ERP system that we are building at work, where we repla...
Architektura 3016 dni, 7 godzin, 18 minut temu 153 źrodło rozwiń
Just how long does garbage collection take in .NET? Which generation takes longer?
Sztuka programowania 3028 dni, 2 godziny, 58 minut temu 124 źrodło rozwiń
One of the steps in cookit is calculating similar recipes. This is what you can see on the left on the recipe page like this For the sake of clarity and manageability it’s scheduled as separate Hangfire jobs. Because cookit is running 5 workers, so similarities are calculated for 5 websites concurrently. The process uses cosine similarity, so it allocates a huge list at start and calculates similarities. A very CPU heavy operation. So some time after triggering all recipes recalculation I saw this in...
Anyone who made any HackerRank problems considering performance has seen this phrase in the assignment: “watch out for slow IO”. We are used to thing about files, databases and such as potentially slow IO, but the Console? Yes, and you will be amazed how much. Couple words about the setup. I am using NLog with file target (for normal logging) and mail target (for total failure, and aggregated reports). When debugging or profiling I run the process as a con...
One of the main processes in cookit is dealing with extracting recipe information from raw html. I know it isn’t the most elegant solution but it is the only universal one. But to the point. Every web page goes through a process involving html parsing, stemming, parsing, and n-gram token matching. Then it’s saved to Sql Server and after transformation to Solr. So a lot of string manipulation, math calculations and from time to time mostly 0-gen GC. In the most pessimistic case this process has to be r...
Sztuka programowania 3098 dni, 1 godzinę, 55 minut temu 60 źrodło rozwiń
In the previous post I’ve written about new features in Neo4j. One of the new game changing functions were stored procedures. But, as I experienced, getting them to run on a Windows / .NET environment wasn’t that easy, and I was seeing “There is no procedure with the name …” more often then I wished for. So here is a short how to. Hope to save you some googling.
Bazy danych i XML 3115 dni, 6 godzin, 7 minut temu 25 źrodło rozwiń
Last week I had the opportunity to attend Graph Connect Europe. Many great sessions, but one thing topped them all - Neo4j 3.0 is out! And as with previous major release (it introduced Cypher) there are many bug fixes, tweaks, speed improvements, but here are my personal favorites.
Bazy danych i XML 3119 dni, 53 minuty temu 28 źrodło rozwiń