riley's blog

Thoughts, baseball, engineering

Is Bi-Directional Replication (BDR) in Postgres Transactional?

Recently in the #general Postgres-slack channel I asked if BDR applied at the TX level or the table level. For those of you unfamiliar with BDR, it’s short for bi-directional replication and it enables multi-master setups using postgres. It is particularly suited for geographically distributed applications that have to alter data but don’t want to pay the speed of light penalty in latency.

An_message: Format Agnostic Data Transfer

This post originally appeared on the AppNexus techblog, I am reproducing here to keep a log of the work

Every distributed RESTful system has a communication problem. How does Service A communicate with Service B? Does it pass data via multipart/form-data? Does it pass individual fields on the query string? Does it POST a blob of JSON?

With the proliferation of “RESTful” services the trend is decidedly towards JSON and away from XML. JSON is relatively compact and fast to parse (at least for most services the bottleneck is not parsing the JSON). This works well for most “wait based” services (database lookup, file reads, etc.) However, there is a class of services in the ad-tech space (and elsewhere) that have more stringent SLA’s for which JSON parsing is actually a significant portion of the runtime of a single request. For these services we can do better while still keeping the schematic safety of JSON in place.

Color-theme

I have been evaluating CLion from Jetbrains. Lack of Makefile support makes it a little unwieldy in our environment. One thing they do get very right is the default color theme. Easy on the eyes, yet very clear.

I have attempted to duplicate this theme on emacs (Aquamacs).

CLion defaults look like this:

My emacs copy (so far):

emacs treats types differently than CLion does so I can’t get it exact. And I also prefer that the types are strongly colored vs. off white.

This relies on color-theme and here is the theme so far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(defun color-theme-clion ()
  "Mimic CLion defaults"
  (interactive)
  (let ((color-theme-is-cumulative t))
    (color-theme-midnight)
    (color-theme-install
     '(color-theme-clion
       ((foreground-color . "#B7C4C8")
  (background-color . "#393939")
  (background-mode . dark))
       (default ((t (nil))))
       (region ((t (:foreground "cyan" :background "dark cyan"))))
       (underline ((t (:foreground "yellow" :underline t))))
       (modeline ((t (:foreground "dark cyan" :background "wheat"))))
       (modeline-buffer-id ((t (:foreground "dark cyan" :background "wheat"))))
       (modeline-mousable ((t (:foreground "dark cyan" :background "wheat"))))
       (modeline-mousable-minor-mode ((t (:foreground "dark cyan" :background "wheat"))))
       (italic ((t (:foreground "dark red" :italic t))))
       (bold-italic ((t (:foreground "dark red" :bold t :italic t))))
       (font-lock-warning-face ((t (:foreground "Firebrick"))))
       (font-lock-keyword-face ((t (:foreground "#D78B40"))))
       (font-lock-constant-face ((t (:foreground "#9C9B30"))))
       (font-lock-comment-face ((t (:foreground "#929292"))))
       (font-lock-doc-face ((t (:foreground "#929292"))))
       (font-lock-string-face ((t (:foreground "#7C9769"))))
       (font-lock-builtin-face ((t (:foreground "#9C9B30"))))
       (font-lock-variable-name-face ((t (:foreground "#B7C4C8"))))
       (font-lock-function-name-face ((t (:foreground "#B7C4C8"))))
       (bold ((t (:bold))))))))

-march -mtune, What's the Difference?

Recently David Iserovich and I at AppNexus ran into an issue with build scripts while porting our builds to use OBS. Prior to building via OBS, we had dedicated, older, build machines which would build our releases.

At the same time I was upgrading coverity to the latest release for better static analysis of our stack. During this coverity upgrade I was wresting with getting our various apps to compile from a single trigger script. Turns out that some of our apps did:

1
gcc ... -march=native -mtune=native

Using -march=native will cause the compiler to generate machine code that matches the processor where it is currently running when optimizing code. This will generate the best possible code for that chipset but will likely break the compiled object on older chipsets (assuming backwards compatibility). -mtune=native will “tune” the optimized code to run best for the current chipset but will still allow backwards compatibility with older chipsets. It is important to note here that -march trumps -mtune. If you specify them both (like we were), you will get optimized code that can only run on that chip or newer chips.

In practice this wasn’t an issue for us because our sad build machine was old. Until OBS that is. The OBS build server had a shiny new Xeon E5 Sandy Bridge chip set while the old, sad, outcast build machine had:

1
model name   : Intel(R) Xeon(R) CPU           L5630  @ 2.13GHz

So -march used on the L5630 machine would produce optimized code for that chip. The newer E5 Sandy Bridge chips in production would run fine because they contained all the instructions already included in the L5630 chips on the build machine. However, when we started building on OBS and an E5 chip, we decided to test in a sandbox environment (which had the older L5630s) and you can imagine what happened:

Why Are My Page_faults So High in Perf?

Interesting behavior I have been dealing with lately.

Have you seen this perf profile before?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  PerfTop:   12025 irqs/sec  kernel:21.0%  exact:  0.0% [1000Hz cycles],  (all, 12 CPUs)
------------------------------------------------------------------------------------------------

             samples  pcnt function                                 DSO
             _______ _____ ________________________________________ ____________________________

            19739.00 22.3% page_fault                               [kernel.kallsyms]
             9475.00 10.7% __GI_time                                /lib64/libc-2.5.so
             4934.00  5.6% free                                     /usr/lib64/libjemalloc.so.1
             4816.00  5.4% arena_dalloc_bin_locked                  /usr/lib64/libjemalloc.so.1
             3111.00  3.5% calloc                                   /usr/lib64/libjemalloc.so.1
             2399.00  2.7% __GI_vfprintf                            /lib64/libc-2.5.so
             2258.00  2.6% malloc                                   /usr/lib64/libjemalloc.so.1
             2176.00  2.5% arena_tcache_fill_small                  /usr/lib64/libjemalloc.so.1

I had not seen it. Probably I had not seen it because no one does this many syscalls in high performance code (dur). I googled it and nothing useful came up aside from the normal major/minor page_fault stuff explained by the paging model in the Linux kernel.

I then asked the smartest guy I know, and he pointed me in the right direction. What follows is the explanation in case you run into this. I suspect no one else will run into this because it has to do with libc/kernel mismatches.

SDF Public Access UNIX System

I am hosting this blog on sdf.org. I realized recently that I have been a member since 2004!

1
2
3
4
[0:ns/r/riley> uinfo
Created:      Thu Dec 16 22:04 2004  on ttyp3
Validated:    Thu Dec 16 22:24 2004
Joined ARPA:  Thu Dec 16 22:24 2004

If you don’t know what it is, please visit the link above and join if you want it’s free!

Extremely useful when you need a shell from anywhere in the world, backups of small things, simple hosting, general awesomeness.

Consider donating, or buying a membership at a higher level.

Support public access.

Welcome

Inspired by a colleague, I am going to blog about stuff. Welcome here.