thedeemon: (Default)
[personal profile] thedeemon
По мотивам поста Никиты про разницу в производительности "высокоуровнего/функционального" и "низкоуровнего/императивного" кода. Взял буквально его пример с созданием массива пар чисел. Его оригинал на кложе:
(concat
 (mapv
    (fn [y] [from-x y])
    (range from-y (quot to-y 2)))
  (mapv
    (fn [y] [to-x y])
    (range (quot to-y 2) to-y)))

Как быстро он работает - не представляю.

Попробовал обе версии - "функциональную" и "императивную" - записать на D в качестве эксперимента, посмотреть, насколько компилятор справляется это дело развернуть.
struct Point { int x, y; }

auto f(int from_y, int to_y, int from_x, int to_x) {
    return chain( iota(from_y, to_y/2).map!(y => Point(from_x, y)),
                  iota(to_y/2, to_y)  .map!(y => Point(to_x, y)) ).array;   
}

auto g(int from_y, int to_y, int from_x, int to_x) {
    auto res = new Point[to_y - from_y];
    foreach(y; from_y .. to_y)
        res[y - from_y] = Point(y < to_y / 2 ? from_x : to_x, y); 
    return res;
}

На 60 миллионах точек ( f(20_000_000, 80_000_000, 1, 2) ) "функциональный" у меня работает 167 мс, "императивный" - 146 мс, разница в пределах 15%. Терпимо.
(полный текст тут, компилятор - LDC)

Попробовал записать это дело на хаскеле, но в нем я нуб, не знаю, как правильно его готовить. Вот такой вариант
import qualified Data.Vector.Unboxed as V
type Point = (Int, Int)

f :: Int -> Int -> Int -> Int -> V.Vector Point
f from_y to_y from_x to_x =
    let a = V.generate (to_y `div` 2 - from_y) (\y -> (from_x, y+from_y))
        b = V.generate (to_y - to_y `div` 2)   (\y -> (to_x, y + to_y `div` 2))
    in V.concat [a, b]

на тех же исходных данных работает 1.33 сек, т.е. примерно на порядок медленнее. Другие варианты, что я пробовал, еще медленнее. Например, если data Point = P {-# UNPACK #-} !Int !Int, то оно уже не Unboxed, и если просто Data.Vector для них использовать, то время больше трех секунд уже.
Вопрос хаскелистам: как вы unboxed массив простых структур делаете?

Date: 2019-06-26 01:44 pm (UTC)
juan_gandhi: (Default)
From: [personal profile] juan_gandhi
В скале вся потеря времени будет тоже с боксингом-анбоксингом.

Date: 2019-06-27 07:57 pm (UTC)
From: [personal profile] sassa_nf
The other expensive operation in all safe languages is going to be bounds check.

Date: 2019-06-27 08:54 pm (UTC)
juan_gandhi: (Default)
From: [personal profile] juan_gandhi
Oh, absolutely. All the time.

Date: 2019-06-26 04:24 pm (UTC)
From: [personal profile] sassa_nf
Data.Vector.Unboxed.Mutable, probably

Date: 2019-06-26 09:05 pm (UTC)
From: [personal profile] sassa_nf
If you just replaced the type of the vector, then it won't show the difference.

I think usually they actually declare the unpacked types like you did for Point. But the trick is also in knowing when it will create thunks, and when it won't, and when it will compute the result, and when it will only work out WHNF. So (\y -> (to_x, y + to_y `div` 2)) may actually just work out a thunk computing a tuple of to_x and + with two args, one of which is a div with two args, etc.

Date: 2019-06-27 07:24 am (UTC)
From: [personal profile] sassa_nf
Yes, but won't it figure out it needs unboxing only after returning a thunk for the tuple from lambda?

Date: 2019-06-28 06:25 am (UTC)
From: [personal profile] sassa_nf
FWIW, JMH (benchmarking harness for java):

\@State(Scope.Benchmark)
\@BenchmarkMode(Mode.AverageTime)
\@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MassiveLoop {
   \@Param({"80000000"})
   public int to_y;

   \@Param({"20000000"})
   public int from_y;

   \@Param({"2"})
   public int to_x;

   \@Param({"1"})
   public int from_x;

   \@Benchmark
   public int[][] testMassiveLoop() {
      int[][] ys = new int[to_y - from_y][2];
      for(int y = from_y; y < to_y; y++) {
         ys[y - from_y][0] = y < to_y / 2 ? from_x : to_x;
         ys[y - from_y][1] = y;
      }
      return ys;
   }

   \@Benchmark
   public int[][] testMassiveLoopTransposed() {
      int[][] ys = new int[2][to_y - from_y];
      for(int y = from_y; y < to_y / 2; y++) {
         ys[0][y - from_y] = from_x;
      }

      for(int y = to_y / 2; y < to_y; y++) {
         ys[0][y - from_y] = to_x;
      }

      for(int y = from_y; y < to_y; y++) {
         ys[1][y - from_y] = y;
      }
      return ys;
   }
...}

JMH runs this for many iterations, warm up first, then measure, then redo the whole thing several times.

This results in massively different performance (I think there's lots of GC in the first test - list of pairs vs pair of lists):
# Run complete. Total time: 00:38:59

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                  (from_x)  (from_y)  (to_x)    (to_y)  Mode  Cnt     Score     Error  Units
MassiveLoop.testMassiveLoop                       1  20000000       2  80000000  avgt   25  4585.517 ± 360.835  ms/op
MassiveLoop.testMassiveLoopTransposed             1  20000000       2  80000000  avgt   25    78.890 ±   0.532  ms/op
Edited Date: 2019-06-28 06:26 am (UTC)

Date: 2019-06-28 10:59 am (UTC)
From: [personal profile] sassa_nf
Invalid testing methodology.

# Run complete. Total time: 00:16:49

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                      (from_x)  (from_y)  (to_x)    (to_y)  Mode  Cnt   Score   Error  Units
MassiveLoop.testFlatArray             1  20000000       2  80000000  avgt   25  68.758 ± 2.180  ms/op
MassiveLoop.testFlatArrayLong         1  20000000       2  80000000  avgt   25  64.882 ± 1.768  ms/op

Profile

thedeemon: (Default)
Dmitry Popov

December 2025

S M T W T F S
 12 3456
789101112 13
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 25th, 2026 02:49 am
Powered by Dreamwidth Studios