thedeemon | спецолимпиадное

По мотивам поста Никиты про разницу в производительности "высокоуровнего/функционального" и "низкоуровнего/императивного" кода. Взял буквально его пример с созданием массива пар чисел. Его оригинал на кложе:

(concat
 (mapv
    (fn [y] [from-x y])
    (range from-y (quot to-y 2)))
  (mapv
    (fn [y] [to-x y])
    (range (quot to-y 2) to-y)))

Как быстро он работает - не представляю.

Попробовал обе версии - "функциональную" и "императивную" - записать на D в качестве эксперимента, посмотреть, насколько компилятор справляется это дело развернуть.

struct Point { int x, y; }

auto f(int from_y, int to_y, int from_x, int to_x) {
    return chain( iota(from_y, to_y/2).map!(y => Point(from_x, y)),
                  iota(to_y/2, to_y)  .map!(y => Point(to_x, y)) ).array;   
}

auto g(int from_y, int to_y, int from_x, int to_x) {
    auto res = new Point[to_y - from_y];
    foreach(y; from_y .. to_y)
        res[y - from_y] = Point(y < to_y / 2 ? from_x : to_x, y); 
    return res;
}

На 60 миллионах точек ( f(20_000_000, 80_000_000, 1, 2) ) "функциональный" у меня работает 167 мс, "императивный" - 146 мс, разница в пределах 15%. Терпимо.
(полный текст тут, компилятор - LDC)

Попробовал записать это дело на хаскеле, но в нем я нуб, не знаю, как правильно его готовить. Вот такой вариант

import qualified Data.Vector.Unboxed as V
type Point = (Int, Int)

f :: Int -> Int -> Int -> Int -> V.Vector Point
f from_y to_y from_x to_x =
    let a = V.generate (to_y `div` 2 - from_y) (\y -> (from_x, y+from_y))
        b = V.generate (to_y - to_y `div` 2)   (\y -> (to_x, y + to_y `div` 2))
    in V.concat [a, b]

на тех же исходных данных работает 1.33 сек, т.е. примерно на порядок медленнее. Другие варианты, что я пробовал, еще медленнее. Например, если data Point = P {-# UNPACK #-} !Int !Int, то оно уже не Unboxed, и если просто Data.Vector для них использовать, то время больше трех секунд уже.
Вопрос хаскелистам: как вы unboxed массив простых структур делаете?

\@State(Scope.Benchmark) \@BenchmarkMode(Mode.AverageTime) \@OutputTimeUnit(TimeUnit.MILLISECONDS) public class MassiveLoop { \@Param({"80000000"}) public int to_y; \@Param({"20000000"}) public int from_y; \@Param({"2"}) public int to_x; \@Param({"1"}) public int from_x; \@Benchmark public int[][] testMassiveLoop() { int[][] ys = new int[to_y - from_y][2]; for(int y = from_y; y < to_y; y++) { ys[y - from_y][0] = y < to_y / 2 ? from_x : to_x; ys[y - from_y][1] = y; } return ys; } \@Benchmark public int[][] testMassiveLoopTransposed() { int[][] ys = new int[2][to_y - from_y]; for(int y = from_y; y < to_y / 2; y++) { ys[0][y - from_y] = from_x; } for(int y = to_y / 2; y < to_y; y++) { ys[0][y - from_y] = to_x; } for(int y = from_y; y < to_y; y++) { ys[1][y - from_y] = y; } return ys; } ...}

# Run complete. Total time: 00:38:59 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark (from_x) (from_y) (to_x) (to_y) Mode Cnt Score Error Units MassiveLoop.testMassiveLoop 1 20000000 2 80000000 avgt 25 4585.517 ± 360.835 ms/op MassiveLoop.testMassiveLoopTransposed 1 20000000 2 80000000 avgt 25 78.890 ± 0.532 ms/op

# Run complete. Total time: 00:16:49 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark (from_x) (from_y) (to_x) (to_y) Mode Cnt Score Error Units MassiveLoop.testFlatArray 1 20000000 2 80000000 avgt 25 68.758 ± 2.180 ms/op MassiveLoop.testFlatArrayLong 1 20000000 2 80000000 avgt 25 64.882 ± 1.768 ms/op

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Most Popular Tags

android - 3 uses
asia - 51 uses
ats - 6 uses
c++ - 4 uses
clean - 2 uses
codejam - 2 uses
compression - 6 uses
d - 31 uses
elm - 2 uses
foxnews - 3 uses
fp - 113 uses
fun - 86 uses
geometry - 3 uses
haskell - 10 uses
haxe - 5 uses
humour - 6 uses
icfpc - 11 uses
idfpc - 3 uses
idris - 13 uses
information - 6 uses
interpreter optimization - 2 uses
leo - 11 uses
life - 19 uses
linux - 4 uses
mind - 2 uses
movies - 16 uses
music - 11 uses
ocaml - 19 uses
oop - 2 uses
pano - 3 uses
parsers - 7 uses
programming - 8 uses
python - 2 uses
rant - 5 uses
relativity - 3 uses
ruby - 4 uses
rust - 2 uses
screenpressor - 1 use
spbench - 3 uses
travel - 2 uses
uk - 20 uses
vm - 3 uses
work - 22 uses
дыбр - 2 uses
квадрокопетр - 9 uses
кванты - 5 uses
наброс - 2 uses
находки - 4 uses
простофото - 19 uses
теоркат - 11 uses

Flat | Top-Level Comments Only

From:

juan_gandhi

В скале вся потеря времени будет тоже с боксингом-анбоксингом.

sassa_nf

The other expensive operation in all safe languages is going to be bounds check.

Oh, absolutely. All the time.

Data.Vector.Unboxed.Mutable, probably

thedeemon

Same as Data.Vector.Unboxed here. It's for primitive types and those of `Unbox` type class. It's unclear to me how to fit my own struct in there.

Edited Date: 2019-06-26 08:50 pm (UTC)

If you just replaced the type of the vector, then it won't show the difference.

I think usually they actually declare the unpacked types like you did for Point. But the trick is also in knowing when it will create thunks, and when it won't, and when it will compute the result, and when it will only work out WHNF. So (\y -> (to_x, y + to_y `div` 2)) may actually just work out a thunk computing a tuple of to_x and + with two args, one of which is a div with two args, etc.

Those bangs in !Int tell it to store strict values, not thunks. So I can define that unpacked strict pair, but I don't know how to define Unbox type class for it, and without it I cannot put my data type into Data.Vector.Unboxed.*. I can probably use some other type of array, but it will just store references to my pairs allocated on the heap. I've tried that with Data.Vector, it was too slow.

By the way, if I understand correctly, with just tuples (Int,Int), Data.Vector.Unboxed stores the data as two linear arrays of Ints, that's kinda clever.

Yes, but won't it figure out it needs unboxing only after returning a thunk for the tuple from lambda?

Ah, you mean the strict struct is not yet created, data floating around in thunks... I guess you're right, that could be the problem. I'll play with it more, trying to force evaluation of array elements.

FWIW, JMH (benchmarking harness for java):

JMH runs this for many iterations, warm up first, then measure, then redo the whole thing several times.

This results in massively different performance (I think there's lots of GC in the first test - list of pairs vs pair of lists):

Edited Date: 2019-06-28 06:26 am (UTC)

Hm, allocations + bounds checks make a lot of difference. With a class instead of int[2] it seems to be much faster. But of course an order of magnitude slower than without that many allocations.
Here's another test of Java with timings:
https://tonsky.livejournal.com/322036.html?thread=5438964#t5438964
https://tonsky.livejournal.com/322036.html?thread=5452020#t5452020

Invalid testing methodology.

Dmitry Popov

спецолимпиадное

спецолимпиадное

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

February 2026

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags