Reading time ~5 minutes Warning this post contains some math. Even more, it shows how to use it for solving real life problems. This post describes how I calculate similarity between recipes in my pet project cookit.pl. For those of you that don’t know, cookit is a search engine for recipes. It crawls websites extracting recipe texts, then it parses it and tries to create a precise ingredient list with amounts and units. 182 184 recipes2936 ingredients This scale may not seem huge, but tr...
maklipsa | użytkownik
Reading time ~6 minutes This post is covering a subset of what I am talking in my talk How I stopped worrying and learned to love parallel processing (currently only in polish). This will cover on how, in terms of performance, AsParallel can kick you in a place where it hurts a lot, simultaneously being a blessing in terms of… performance. How is that? Let’s look at someHistory AsParallel was introduced as an extension to LINQ with TPL in .NET 4.0. In theory, it’s God’s sent. The promise w...
Reading time ~4 minutes Diagnosing high memory usage can be tricky, here is the second part of how I found what was hogging to much memory in our system. In the previous post I’ve wrote how to create a memory dump and how many possibilities of catching just the right moment for it ProcDump has. When trying to analyze memory leaks, or high memory usage (not necessary meaning a leak) we have a few ways to approach it: Attach a debugger There are many problems with this approach, to name a fe...
Reading time ~2 minutes I’m taking a short break from Hangfire series, but I will get back to it. This time - Where did my memory go ? Or to be more exact: Why is this using so much memory? The story starts with one IIS application pool using around 6 Gigabytes of memory on one of our test environments. It was several times above the values that we expected it to use, so we decided to investigate. Without much thinking we fired up Visual Studio installed on the test server, and attached to the proce...
Reading time ~6 minutes This is a sixth part of a series:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellationpart 5 - Job continuation with ContinueWithpart 6 - Recurring jobs and cron expressions Parts 3, 4, and 5 covered the BackgroundJob class responsible for enqueuing single jobs (fire and forget). This post will cover RecurringJob class exposing API for recurring jobs (as the name ...
Reading time ~3 minutes This is a fifth part of a series:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellationpart 5 - Job continuation with ContinueWithpart 6 - Recurring jobs and cron expressions Part 3 covered almost all functions in BackgroundJob class except for ContinueWith functions family. So here we go :) The fact that it has the same name as a System.Threading.Tasks.Task funct...
Reading time ~3 minutes This is the fourth part of a series discussing job scheduling and Hangfire details:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellation This part will cover few small topics:dashboardretriesmore technical part of the Hangfire.BackgroundJob class APIjob cancellationDashboard Let’s start with the administrative dashboard because it gives a good background for the ...
Reading time ~2 minutes This is the third part of a series discussing job scheduling and Hangfire details:part 1 - Why schedule and procrastinate jobs?part 2 - Overview of Hangfiepart 3 - Scheduling and Queuing jobs in Hangfirepart 4 - Dashboard, retries and job cancellation This part will focus on the basic scheduling API of Hangfire. The easiest way to create a fire and forget job is by using the classHangfire.BackgroundJob and its minimalistic (and this is a complement) API of static functions:Enqu...
Reading time ~2 minutes In the previous post I’ve wrote about why I think the ability to schedule tasks for later execution is a fundamental technical feature, but also a must have from a business point of view. We are passed the whys, so lets get to the hows. The answer is simple - Hangfire. I’ve wrote about it here, here and here, so yeah, I like it. Hangfire is an amazing library. It has proved itself in my pet project (cookit.pl) and in a huge ERP system that we are building at work, where we repla...
One of the steps in cookit is calculating similar recipes. This is what you can see on the left on the recipe page like this For the sake of clarity and manageability it’s scheduled as separate Hangfire jobs. Because cookit is running 5 workers, so similarities are calculated for 5 websites concurrently. The process uses cosine similarity, so it allocates a huge list at start and calculates similarities. A very CPU heavy operation. So some time after triggering all recipes recalculation I saw this in...
Anyone who made any HackerRank problems considering performance has seen this phrase in the assignment: “watch out for slow IO”. We are used to thing about files, databases and such as potentially slow IO, but the Console? Yes, and you will be amazed how much. Couple words about the setup. I am using NLog with file target (for normal logging) and mail target (for total failure, and aggregated reports). When debugging or profiling I run the process as a con...
One of the main processes in cookit is dealing with extracting recipe information from raw html. I know it isn’t the most elegant solution but it is the only universal one. But to the point. Every web page goes through a process involving html parsing, stemming, parsing, and n-gram token matching. Then it’s saved to Sql Server and after transformation to Solr. So a lot of string manipulation, math calculations and from time to time mostly 0-gen GC. In the most pessimistic case this process has to be r...
In the previous post I’ve written about new features in Neo4j. One of the new game changing functions were stored procedures. But, as I experienced, getting them to run on a Windows / .NET environment wasn’t that easy, and I was seeing “There is no procedure with the name …” more often then I wished for. So here is a short how to. Hope to save you some googling.
Last week I had the opportunity to attend Graph Connect Europe. Many great sessions, but one thing topped them all - Neo4j 3.0 is out! And as with previous major release (it introduced Cypher) there are many bug fixes, tweaks, speed improvements, but here are my personal favorites.
Ostatnio na blogu
macko (32 816,53)
http://pawlos.blo... (31 383,4)
pzielinski (27 178,29)
gordon_shumway (21 178,87)
paduda (20 336,33)
psz750 (13 018,14)
rroszczyk (10 381,46)
Damian (9 011,08)
danielplawgo (7 235,99)
arek (6 642,75)
burczu (6 214,22)
PaSkol (5 393,84)
lukaszgasior (4 097,38)
jj09 (3 388,06)
http://jakub-flor... (3 224,66)
CaMeL (2 954,87)
jedmac (2 639,34)
mnikolajuk (2 596,93)
lkurzyniec (2 466,02)
FutureProcessing (2 450,11)