Quantcast
Channel: 平板电脑
Viewing all articles
Browse latest Browse all 500

Sierpiński Carpet in OpenCL 2.0

$
0
0

We demonstrate how to create a Sierpinski Carpet in OpenCL 2.0

Prerequisites:

      A laptop or a workstation with the 5th Generation Intel® Core™ Processor

What is Nested Parallelism?

Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks (see Khronos Finalizes OpenCL 2.0 Specification for Heterogeneous Computing). 

What are Blocks?

Blocks simplify nested parallelism (also known as device-side enqueue). Blocks

For more information see Blocks in OpenCL 2.0.

What is Sierpiński Carpet?

The Sierpinski carpet is a plane fractal first described by Wacław Sierpiński in 1916. Start with a white square. Divide the square into 9 sub-squares in a 3-by-3 grid. Paint the central sub-square black. Apply the same procedure recursively to the remaining 8 sub-squares. And so on …

See http://en.wikipedia.org/wiki/Sierpinski_carpet for more info.

Sierpinski carpet

enqueue_kernel API

int enqueue_kernel ( queue_t queue,
                     kernel_enqueue_flags_t flags,
                     const ndrange_t ndrange,
                     void (^block)(void) );
enqueue_kernel is similar to clEnqueuNDRangeKernel API, but in OpenCL C kernel language. It has three more variations available that provide handling of event dependencies and passing local memory. For more info, see enqueue_kernel functions online documentation.
 

Sierpiński Carpet – Host Side

Build your code with "-cl-std=CL2.0“ to enable OpenCL 2.0 compilation. Don’t forget to create device side queue on the host (note that it is an out of order queue):
// You need to create device side queue for enqueue_kernel to work
// We set the device side queue to 16MB,
// since we are going to have a large number of enqueues
cl_queue_properties qprop[] = {CL_QUEUE_SIZE, 16*1024*1024, 
                               CL_QUEUE_PROPERTIES,         
  (cl_command_queue_properties)CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE |
                               CL_QUEUE_ON_DEVICE |
                               CL_QUEUE_ON_DEVICE_DEFAULT, 0};
  
cl_command_queue my_device_q = clCreateCommandQueueWithProperties(CLU_CONTEXT, cluGetDevice(CL_DEVICE_TYPE_GPU), qprop, &status);

Sierpiński Carpet in OpenCL 2.0

__kernel void sierpinski(__global char* src, int width, int offsetx, int offsety)
{
    int x = get_global_id(0);
    int y = get_global_id(1);
    queue_t q = get_default_queue();

    int one_third = get_global_size(0) / 3;
    int two_thirds = 2 * one_third;

    if (x >= one_third && x < two_thirds &&
        y >= one_third && y < two_thirds)
    {
        src[(y+offsety)*width+(x+offsetx)] = BLACK;
    }
    else
    {
        src[(y+offsety)*width+(x+offsetx)] = WHITE;

        if (one_third > 1 && x % one_third == 0 && y % one_third == 0)
        {
            const size_t  grid[2] = {one_third, one_third};
            enqueue_kernel(q, 0, ndrange_2D(grid), ^{ sierpinski (src, width, x+offsetx, y+offsety); });
        }
    }
}

Download the full source code of the sample below.

About the Author

Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and is in the process of recording a third video on Nested Parallelism.

You might also be interested in the following:

GPU-Quicksort in OpenCL 2.0: Nested Parallelism and Work-Group Scan Functions

Optimizing Simple OpenCL Kernels: Modulate Kernel Optimization

Optimizing Simple OpenCL Kernels: Sobel Kernel Optimization

 

  • Sierpinski Carpet
  • VTune Amplifier
  • Intel VTune
  • openCL
  • OpenCL 2.0
  • GPU Quick Sort
  • GPU
  • visual computing
  • tools
  • Intel Tools
  • Software Tools
  • intel software
  • software developer
  • Robert Ioffe
  • vtune
  • VC Source
  • OpenCL SDK
  • YouTube
  • How to
  • 开发人员
  • 英特尔 AppUp® 开发人员
  • 合作伙伴
  • 教授
  • 学生
  • Android*
  • Apple OS X*
  • Linux*
  • Microsoft Windows* 8
  • 安卓*
  • Windows*
  • C/C++
  • 入门级
  • 中级
  • 面向 OpenCL™ 应用的英特尔® 软件开发套件
  • OpenCL™ Code Builder
  • 英特尔® Integrated Native Developer Experience
  • 英特尔® Integrated Native Developer Experience Build Edition(适用 OS X*)
  • OpenCL*
  • 游戏开发
  • 图形
  • 并行计算
  • 笔记本电脑
  • 电话
  • 服务器
  • 平板电脑
  • 桌面
  • URL
  • 代码样本
  • 开始
  • 主题专区: 

    IDZone
  • Windows*

  • Viewing all articles
    Browse latest Browse all 500

    Trending Articles



    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>