Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream #28

Merged
merged 70 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
d50b720
Add typos CI, fix typos
inducer Jul 18, 2024
da5c786
InstructionBase: implement _with_new_tags
matthiasdiener Jul 18, 2024
07cee43
TypeInferenceMapper: allow np.bool in map_type_case
matthiasdiener Jul 18, 2024
538b542
Emit better docs for add_prefetch
kaushikcfd Jul 19, 2024
6f257c2
spacing in error msg
kaushikcfd Jul 19, 2024
dd2da55
Improve error message for unrecognized base storage
inducer Jul 23, 2024
88d876c
[kernel.creation] duplicate inames only after adding all relevant inames
kaushikcfd Jul 29, 2024
98d5b62
Fix _InameDuplicator.within
kaushikcfd Jul 29, 2024
f01a86b
Test iname duplication for cases with only loop-nest based iname-depe…
kaushikcfd Jul 29, 2024
d77b54f
Ruff 0.5.6 fixes
kaushikcfd Aug 2, 2024
c73db67
Add helpful error msg for type uninferred temps
kaushikcfd Aug 2, 2024
ad3618f
Fix CUDA local temp var allocation with base storage
kaushikcfd Aug 2, 2024
421ee2a
Migrate package info to pyproject.toml
inducer Aug 2, 2024
4009eee
Add py.typed marker
inducer Aug 2, 2024
fef5734
Bump mypy python_version to 3.10 for X | Y instead of Union
inducer Aug 6, 2024
25a6acf
Add a type annotation in loopy.target.pyopencl for numpy 2.1
inducer Aug 19, 2024
1946cae
Add Tree as helper for scheduling
kaushikcfd Oct 22, 2022
2ed95fd
Add HappensAfter, type more of loopy.kernel.isntruction
a-alveyblanc Aug 24, 2024
2257b5e
Typing fixes regarding None-ness of insn.id
a-alveyblanc Aug 24, 2024
206961f
Add typing_extensions to dependencies
inducer Aug 6, 2024
135b319
Make an ArrayArgDescriptor base class
inducer Aug 6, 2024
cb7d4a6
Restrict TranslationUnit.{__getitem__, default_entrypoint} to returni…
inducer Aug 6, 2024
a3f4ef3
Add TUnitOrKernelT
inducer Aug 6, 2024
ade9c73
Type TranslationUnit.{copy,with_kernel}
inducer Aug 6, 2024
0832f38
Type infer_unknown_types
inducer Aug 6, 2024
490d365
Type Reduction and TypeCast
inducer Aug 6, 2024
48e0874
Type callable transforms
inducer Aug 6, 2024
864ca06
Type add_dtypes
inducer Aug 6, 2024
c77ef9d
Misc smaller typing improvements
inducer Aug 6, 2024
ac7df5f
Reformat ArrayBase docs to use autoattribute
inducer Aug 6, 2024
d0633b6
Drop a spurious mid-file docstring
inducer Aug 6, 2024
4489a7b
Type KernelArgument and subclass's constructors
inducer Aug 6, 2024
93974d0
Make SubstitutionRule a dataclass
inducer Aug 6, 2024
8eaec93
Type make_assignment
inducer Aug 6, 2024
75bea1e
Reformat TemporaryVariable docs to use autoattribute
inducer Aug 6, 2024
d858cad
Type TemporaryVariable methods
inducer Aug 6, 2024
ea29c69
Type infer_arg_descr
inducer Aug 6, 2024
b22c45e
Fix up references in documentation
inducer Aug 12, 2024
8b73fed
Fix Github CI README badge
inducer Aug 18, 2024
5c025de
Let ruff refactor some comprehensions
inducer Aug 18, 2024
2410e6c
Type for_each_kernel, add check_each kernel, mostly type loopy.check
inducer Aug 18, 2024
b700bc6
Type stringify_instruction_list
inducer Aug 18, 2024
e9f2b96
Make a type alias for iname strings
inducer Aug 24, 2024
736ccba
Convert LoopKernel to in-line attribute docs
inducer Aug 24, 2024
9d9b08f
Add helpers to figure out loop nestings from a kernel
kaushikcfd Aug 24, 2024
6ef3ac0
Type a few more bits of loopy.schedule.tools
inducer Aug 24, 2024
adca923
Document tree and loopy.schedule.tools (and a few typing fixes)
inducer Aug 24, 2024
85ab47a
adds loopy scheduler v2
kaushikcfd Aug 16, 2021
a933682
changes in docs to account for equivalent generated codes from the sa…
kaushikcfd Aug 16, 2021
a5b1452
Fix missing dependencies in test_duplicate_iname_not_read_only_nested
inducer Aug 25, 2024
bb46dce
Add test_long_kernel
inducer Feb 16, 2024
d5ee690
Work around setuptools 64's breakage of static analysis tools
inducer Sep 1, 2024
0f78426
Update tutorial for islpy 2024.2
inducer Sep 4, 2024
070df9f
Require that happens_after is not mutable (#866)
kaushikcfd Sep 9, 2024
66389cd
Use warn_with_kernel for V1-scheduler fallback
kaushikcfd Sep 22, 2024
7b5d73d
Avoid setting loop priority for disjoint loops
kaushikcfd Sep 22, 2024
6cc60f0
Unconditionally depend on typing-extensions
inducer Oct 8, 2024
db13612
Documentation defined the unpack argument twice. Now it only defines …
nkoskelo Oct 11, 2024
e390b53
RecursiveMapper -> Mapper
inducer Oct 22, 2024
94d64dd
Limit Github PR CI concurrency
inducer Oct 3, 2024
3009c04
Call flatten() on expressionss that are assumed to be simplified
inducer Oct 22, 2024
123f534
Add a few more calls to flatten for compat with pymbolic 2024.1
inducer Oct 30, 2024
442d3ef
Change deprecated calls to Expression.index
inducer Oct 30, 2024
510ad10
Silence spurious pylint warning in kernel creation
inducer Oct 30, 2024
e54799b
Add calls to flatten() in precompute and privatize
inducer Nov 1, 2024
da8537c
Swap out a deprecated .index
inducer Nov 1, 2024
7ac9fa6
Fix some typos
inducer Nov 1, 2024
b22e634
Merge remote-tracking branch 'upstream/main' into connorjward/merge-u…
connorjward Nov 4, 2024
da84302
Un-type symbolic.flatten
inducer Nov 6, 2024
8993e31
Merge remote-tracking branch 'upstream/main' into connorjward/merge-u…
connorjward Nov 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ on:
schedule:
- cron: '17 3 * * 0'

concurrency:
group: ${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

jobs:
ruff:
name: Ruff
Expand All @@ -20,6 +24,13 @@ jobs:
pipx install ruff
ruff check

typos:
name: Typos
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: crate-ci/typos@master

pylint:
name: Pylint
runs-on: ubuntu-latest
Expand Down
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,6 @@ lextab.py
yacctab.py
.pytest_cache/*

loopy/_git_rev.py

.cache
.env
virtualenv-[0-9]*[0-9]
Expand Down
2 changes: 1 addition & 1 deletion MEMO
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Documentation Notes
Things to consider
^^^^^^^^^^^^^^^^^^

- Depedencies are pointwise for shared loop dimensions
- Dependencies are pointwise for shared loop dimensions
and global over non-shared ones (between dependent and ancestor)

- multiple insns could fight over which iname gets local axis 0
Expand Down
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Loopy: Transformation-Based Generation of High-Performance CPU/GPU Code
.. image:: https://gitlab.tiker.net/inducer/loopy/badges/main/pipeline.svg
:alt: Gitlab Build Status
:target: https://gitlab.tiker.net/inducer/loopy/commits/main
.. image:: https://github.com/inducer/loopy/workflows/CI/badge.svg?branch=main&event=push
.. image:: https://github.com/inducer/loopy/workflows/CI/badge.svg?branch=main
:alt: Github Build Status
:target: https://github.com/inducer/loopy/actions?query=branch%3Amain+workflow%3ACI+event%3Apush
:target: https://github.com/inducer/loopy/actions?query=branch%3Amain+workflow%3ACI
.. image:: https://badge.fury.io/py/loopy.png
:alt: Python Package Index Release Page
:target: https://pypi.org/project/loopy/
Expand Down
2 changes: 1 addition & 1 deletion contrib/mem-pattern-explorer/pattern_vis.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def tick(self):
class Array:
def __init__(self, ctx, name, shape, strides, elements_per_row=None):
# Each array element stores a tuple:
# (timestamp, subgroup, g0, g1, g2, ) of last acccess
# (timestamp, subgroup, g0, g1, g2, ) of last access

assert len(shape) == len(strides)

Expand Down
8 changes: 4 additions & 4 deletions doc/misc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ In the meantime, you can generate code simply by saying::
print(cg_result.host_code())
print(cg_result.device_code())

Additionally, for C-based languages, header defintions are available via::
Additionally, for C-based languages, header definitions are available via::

loopy.generate_header(knl)

Expand Down Expand Up @@ -338,8 +338,8 @@ This list is always growing, but here are a few pointers:

Use :func:`loopy.join_inames`.

In what sense does Loopy suport vectorization?
----------------------------------------------
In what sense does Loopy support vectorization?
-----------------------------------------------

There are really two ways in which the OpenCL/CUDA model of computation exposes
vectorization:
Expand All @@ -352,7 +352,7 @@ vectorization:
e.g. ``float4``, which support arithmetic with implicit vector semantics
as well as a number of 'intrinsic' functions.

Loopy suports both. The first one, SIMT, is accessible by tagging inames with,
Loopy supports both. The first one, SIMT, is accessible by tagging inames with,
e.g., ``l.0```. Accessing the second one requires using both execution- and
data-reshaping capabilities in loopy. To start with, you need an array that
has an axis with the length of the desired vector. If that's not yet available,
Expand Down
4 changes: 4 additions & 0 deletions doc/ref_internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,7 @@ Schedule
--------

.. automodule:: loopy.schedule
.. automodule:: loopy.schedule.tools
.. automodule:: loopy.schedule.tree


1 change: 1 addition & 0 deletions doc/ref_kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ Instructions

.. {{{

.. autoclass:: HappensAfter
.. autoclass:: InstructionBase

.. _assignments:
Expand Down
5 changes: 5 additions & 0 deletions doc/ref_other.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
Reference: Other Functionality
==============================

Auxiliary Data Types
--------------------

.. automodule:: loopy.typing

Obtaining Kernel Performance Statistics
---------------------------------------

Expand Down
42 changes: 23 additions & 19 deletions doc/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,8 @@ with identical bounds, for the use of the transpose:
... out[ii,jj] = 2*out[ii,jj] {dep=transpose}
... """,
... [lp.GlobalArg("out", shape=lp.auto, is_input=False), ...])
>>> knl = lp.prioritize_loops(knl, "i,j,ii,jj")
>>> knl = lp.prioritize_loops(knl, "i,j")
>>> knl = lp.prioritize_loops(knl, "ii,jj")

:func:`loopy.duplicate_inames` can be used to achieve the same goal.
Now the intended code is generated and our test passes.
Expand Down Expand Up @@ -613,7 +614,7 @@ commonly called 'loop tiling':
... assumptions="n mod 16 = 0 and n >= 1")
>>> knl = lp.split_iname(knl, "i", 16)
>>> knl = lp.split_iname(knl, "j", 16)
>>> knl = lp.prioritize_loops(knl, "i_outer,j_outer,i_inner")
>>> knl = lp.prioritize_loops(knl, "i_outer,j_outer,i_inner,j_inner")
>>> knl = lp.set_options(knl, write_code=True)
>>> evt, (out,) = knl(queue, a=a_mat_dev)
#define lid(N) ((int) get_local_id(N))
Expand Down Expand Up @@ -822,7 +823,7 @@ enabling some cost savings:
{
int const i_outer = -1 + n + -1 * ((3 * n) / 4);
<BLANKLINE>
if (-1 + n >= 0)
if (i_outer >= 0)
{
a[4 * i_outer] = (float) (0.0f);
if (-2 + -4 * i_outer + n >= 0)
Expand Down Expand Up @@ -957,7 +958,7 @@ Consider the following example:
... "{ [i_outer,i_inner, k]: "
... "0<= 16*i_outer + i_inner <n and 0<= i_inner,k <16}",
... """
... <> a_temp[i_inner] = a[16*i_outer + i_inner] {priority=10}
... <> a_temp[i_inner] = a[16*i_outer + i_inner]
... out[16*i_outer + i_inner] = sum(k, a_temp[k])
... """)
>>> knl = lp.tag_inames(knl, dict(i_outer="g.0", i_inner="l.0"))
Expand Down Expand Up @@ -1032,8 +1033,8 @@ transformation exists in :func:`loopy.add_prefetch`:
>>> evt, (out,) = knl_pf(queue, a=x_vec_dev)
#define lid(N) ((int) get_local_id(N))
...
acc_k = 0.0f;
a_fetch = a[16 * gid(0) + lid(0)];
acc_k = 0.0f;
for (int k = 0; k <= 15; ++k)
acc_k = acc_k + a_fetch;
out[16 * gid(0) + lid(0)] = acc_k;
Expand All @@ -1056,12 +1057,11 @@ earlier:
>>> evt, (out,) = knl_pf(queue, a=x_vec_dev)
#define lid(N) ((int) get_local_id(N))
...
if (-1 + -16 * gid(0) + -1 * lid(0) + n >= 0)
acc_k = 0.0f;
if (-1 + -16 * gid(0) + -1 * lid(0) + n >= 0)
a_fetch[lid(0)] = a[16 * gid(0) + lid(0)];
if (-1 + -16 * gid(0) + -1 * lid(0) + n >= 0)
{
acc_k = 0.0f;
for (int k = 0; k <= 15; ++k)
acc_k = acc_k + a_fetch[lid(0)];
out[16 * gid(0) + lid(0)] = acc_k;
Expand Down Expand Up @@ -1209,6 +1209,12 @@ Let us start with an example. Consider the kernel from above with a
... assumptions="n mod 16 = 0")
>>> prog = lp.split_iname(prog, "i", 16, inner_tag="l.0", outer_tag="g.0")

.. testsetup::

>>> prog = prog.with_kernel(
... prog.default_entrypoint.copy(
... silenced_warnings=["v1_scheduler_fallback"]))

Here is what happens when we try to generate code for the kernel:

>>> cgr = lp.generate_code_v2(prog)
Expand Down Expand Up @@ -1312,7 +1318,7 @@ The kernel translates into two OpenCL kernels.
int tmp;
<BLANKLINE>
tmp = tmp_save_slot[16 * gid(0) + lid(0)];
arr[(lid(0) + gid(0) * 16 + 1) % n] = tmp;
arr[(1 + lid(0) + gid(0) * 16) % n] = tmp;
}

Now we can execute the kernel.
Expand Down Expand Up @@ -1903,18 +1909,16 @@ Now to make things more interesting, we'll create a kernel with barriers:
{
__local int c[50 * 10 * 99];
<BLANKLINE>
{
int const k_outer = 0;
<BLANKLINE>
for (int i = 0; i <= 49; ++i)
for (int j = 0; j <= 9; ++j)
for (int i = 0; i <= 49; ++i)
{
barrier(CLK_LOCAL_MEM_FENCE) /* for c (insn rev-depends on insn_0) */;
c[990 * i + 99 * j + lid(0) + 1] = 2 * a[980 * i + 98 * j + lid(0) + 1];
barrier(CLK_LOCAL_MEM_FENCE) /* for c (insn_0 depends on insn) */;
e[980 * i + 98 * j + lid(0) + 1] = c[990 * i + 99 * j + 1 + lid(0) + 1] + c[990 * i + 99 * j + -1 + lid(0) + 1];
}
}
{
int const k_outer = 0;
<BLANKLINE>
barrier(CLK_LOCAL_MEM_FENCE) /* for c (insn rev-depends on insn_0) */;
c[990 * i + 99 * j + lid(0) + 1] = 2 * a[980 * i + 98 * j + lid(0) + 1];
barrier(CLK_LOCAL_MEM_FENCE) /* for c (insn_0 depends on insn) */;
e[980 * i + 98 * j + lid(0) + 1] = c[990 * i + 99 * j + 1 + lid(0) + 1] + c[990 * i + 99 * j + -1 + lid(0) + 1];
}
}

In this kernel, when a work-item performs the second instruction it uses data
Expand Down
13 changes: 2 additions & 11 deletions examples/fortran/ipython-integration-demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(prog)"
"print(prog) # noqa: F821"
]
},
{
Expand Down Expand Up @@ -105,17 +105,8 @@
"metadata": {},
"outputs": [],
"source": [
"print(prog)"
"print(prog) # noqa: F821"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
5 changes: 1 addition & 4 deletions examples/python/ispc-stream-harness.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,7 @@ def transform(knl, vars, stream_dtype):
knl, "i", 2**18, outer_tag="g.0", slabs=(0, 1))
knl = lp.split_iname(knl, "i_inner", 8, inner_tag="l.0")

knl = lp.add_and_infer_dtypes(knl, {
var: stream_dtype
for var in vars
})
knl = lp.add_and_infer_dtypes(knl, dict.fromkeys(vars, stream_dtype))

knl = lp.set_argument_order(knl, vars + ["n"])

Expand Down
14 changes: 5 additions & 9 deletions loopy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
BarrierInstruction,
CallInstruction,
CInstruction,
HappensAfter,
InstructionBase,
LegacyStringInstructionTag,
MemoryOrdering,
Expand Down Expand Up @@ -203,15 +204,9 @@
find_rules_matching,
)
from loopy.translation_unit import TranslationUnit, for_each_kernel, make_program

# }}}
from loopy.type_inference import infer_unknown_types
from loopy.types import to_loopy_type

# {{{ imported user interface
from loopy.typing import auto

# {{{ import transforms
from loopy.version import MOST_RECENT_LANGUAGE_VERSION, VERSION


Expand Down Expand Up @@ -242,6 +237,7 @@
"ExecutorBase",
"GeneratedProgram",
"GlobalArg",
"HappensAfter",
"ISPCTarget",
"ImageArg",
"InKernelCallable",
Expand Down Expand Up @@ -563,18 +559,18 @@ def make_copy_kernel(new_dim_tags, old_dim_tags=None):

indices = ["i%d" % i for i in range(rank)]
shape = ["n%d" % i for i in range(rank)]
commad_indices = ", ".join(indices)
command_indices = ", ".join(indices)
bounds = " and ".join(
f"0<={ind}<{shape_i}"
for ind, shape_i in zip(indices, shape))

set_str = "{{[{}]: {} }}".format(
commad_indices,
command_indices,
bounds
)
result = make_kernel(set_str,
"output[%s] = input[%s]"
% (commad_indices, commad_indices),
% (command_indices, command_indices),
lang_version=MOST_RECENT_LANGUAGE_VERSION,
default_offset=auto)

Expand Down
Loading
Loading