From 040015cf2bdb3724e7d5605b9445c69e1d3f6fb8 Mon Sep 17 00:00:00 2001
From: Mengwei Liu <larryliu0820@users.noreply.github.com>
Date: Thu, 10 Oct 2024 23:37:36 -0700
Subject: [PATCH] Pick pybind (#6151)

* Add MethodMeta object for python visibility (#5571)

Summary:
Pull Request resolved: https://github.com/pytorch/executorch/pull/5571

Some clients and consumers of the Executorch program files (.pte) were
requesting ways to access metadata like the sizes of tensors and the number
of bytes they needed.
When I told them how to access them in C++, they requested Python wrappers
since they had processing scripts written in Python.

Add some implementations of MethodMeta and TensorInfo methods.
Note that these become more expensive than in C++ because they need to
allocate python objects, but I doubt these are used in
performance-sensitive applications anyway. And dealing with
lifetimes of mixed C++/Python objects is complex, so I favored simple lifetimes.

Reviewed By: dbort

Differential Revision: D63288433

fbshipit-source-id: af775120a8ebd9bf455671a8ce1f158259aa50e6

* Add mapping from C++ program::verification to Python (#5915)

Summary:
As titled. This enables
`portable_lib._load_for_executorch[_from_buffer]` to accept `Program::Verification` argument.

See added test, now we can do something like:

```
from executorch.extension.pybindings.portable_lib import Verification
module = load_fn(
  exported_program.buffer,
  enable_etdump=False,
  debug_buffer_size=0,
  program_verification=Verification.Minimal,
)
```

Pull Request resolved: https://github.com/pytorch/executorch/pull/5915

Test Plan: See unit test

Reviewed By: dbort

Differential Revision: D63987538

Pulled By: larryliu0820

fbshipit-source-id: b68d8d1149e2d46b90544679707f420179e72b19

* Find portable_lib.so in pip package during cmake build (#5961)

Summary:
* Rename `_portable_lib.cpython-3.<distribution info>.so` to `_portable_lib.so` so it can be found by CMake `find_library()`. This can be achieved by setting `SETUPTOOLS_EXT_SUFFIX`.
* Since `executorch-config.cmake` is also being used to find installed libraries such as `executorch.a`, `xnnpack_backend.a`, add a condition to tell if `executorch-config.cmake` is being used in cmake-out or site-packages.

Pull Request resolved: https://github.com/pytorch/executorch/pull/5961

Reviewed By: metascroy

Differential Revision: D64014291

Pulled By: larryliu0820

fbshipit-source-id: 2757f2883d3f836e9efd45676f792c12f742e63d

* Improve pip package build (#5965)

Summary:
Addressing comments in https://github.com/pytorch/executorch/issues/5961.

* Separate out `executorch-wheel-config.cmake` from `executorch-config.cmake`.
* Hardcode the envrionment variable `SETUPTOOLS_EXT_SUFFIX` in `setup.py`.

Pull Request resolved: https://github.com/pytorch/executorch/pull/5965

Reviewed By: dbort

Differential Revision: D64017947

Pulled By: larryliu0820

fbshipit-source-id: 0bdff5e2d2ec5873540d1b701595c7a316e84e80

* Let find_package(executorch) find the correct include directory (#6102)

Summary:
There's a typo in `executorch-wheel-config.cmake` that points to the wrong `include` path:
```
<site-packages>/executorch/share/cmake/include
```

Where it actually should be
```
<site-packages>/executorch/include
```

Fixing this issue. Verified it on [build_torchao_ops.sh](https://github.com/pytorch/ao/blob/main/torchao/experimental/build_torchao_ops.sh)

Pull Request resolved: https://github.com/pytorch/executorch/pull/6102

Reviewed By: lucylq

Differential Revision: D64189337

Pulled By: larryliu0820

fbshipit-source-id: 13033587f5499537623995b8f9457fb47d780340

* New Runtime pybind API (#6063)

Summary:
Based on this proposal: https://docs.google.com/document/d/10Q4-pt97inQQtFf-FjjwhMaDXXCfk1zGy6V6EkygNUY/edit#heading=h.fcrpnrtb6cud

Historically our pybinding APIs are not following the same C++ modeling
(Program, Method etc) and hence it's hard to use and easy to hit
footguns - for example, if we load the program and return it from a
python method, it goes out of the scope and releases the memory.

This effort is to create Pybind APIs that resembles C++ objects so it's
less confusing to the users.

Add the following python classes:
* `Runtime`: a singleton object hosting methods like `load_program`.
  Returns a `Program` object when calling `load_program`. Also exposes
  the operator registry
* `Program`: each pte file should have one `Program` object. Most
  important method is `load_method` which returns a `Method` object. It
  has a property `method_names` where we can inspect what methods are
  inside this .pte file.
* `Method`: one object per method name in a given `Program`. Exposes
  `execute` which takes in pytree flattened torch tensors as input and
  return pytree flattened output. It also exposes `MethodMeta` for users
  to inspect more information regarding input/output of this method.

Pull Request resolved: https://github.com/pytorch/executorch/pull/6063

Reviewed By: dbort

Differential Revision: D64132360

Pulled By: larryliu0820

fbshipit-source-id: a2f35edc5fd8c200df0812a693e454d66d6a907e

* Lint

* Fix test_pybindings.py

---------

Co-authored-by: Riley Dulin <dulinr@meta.com>
---
 build/executorch-wheel-config.cmake          |  40 +++
 extension/pybindings/TARGETS                 |   1 +
 extension/pybindings/portable_lib.py         |   2 +
 extension/pybindings/pybindings.cpp          | 252 +++++++++++++++++--
 extension/pybindings/pybindings.pyi          |  94 ++++++-
 extension/pybindings/test/TARGETS            |   5 +-
 extension/pybindings/test/make_test.py       | 252 +++++++++++++------
 extension/pybindings/test/test_pybindings.py |  21 +-
 pytest.ini                                   |   2 +
 runtime/TARGETS                              |  14 ++
 runtime/__init__.py                          | 198 +++++++++++++++
 runtime/test/TARGETS                         |  12 +
 runtime/test/test_runtime.py                 |  78 ++++++
 setup.py                                     |  10 +
 14 files changed, 858 insertions(+), 123 deletions(-)
 create mode 100644 build/executorch-wheel-config.cmake
 create mode 100644 runtime/TARGETS
 create mode 100644 runtime/__init__.py
 create mode 100644 runtime/test/TARGETS
 create mode 100644 runtime/test/test_runtime.py
diff --git a/build/executorch-wheel-config.cmake b/build/executorch-wheel-config.cmake
new file mode 100644
index 0000000000..239fff67c1
--- /dev/null
+++ b/build/executorch-wheel-config.cmake
@@ -0,0 +1,40 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+# Config defining how CMake should find ExecuTorch package. CMake will search
+# for this file and find ExecuTorch package if it is installed. Typical usage
+# is:
+#
+# find_package(executorch REQUIRED)
+# -------
+#
+# Finds the ExecuTorch library
+#
+# This will define the following variables:
+#
+#   EXECUTORCH_FOUND        -- True if the system has the ExecuTorch library
+#   EXECUTORCH_INCLUDE_DIRS -- The include directories for ExecuTorch
+#   EXECUTORCH_LIBRARIES    -- Libraries to link against
+#
+cmake_minimum_required(VERSION 3.19)
+
+# Find prebuilt _portable_lib.so. This file should be installed under
+# <site-packages>/executorch/share/cmake
+find_library(_portable_lib_LIBRARY _portable_lib.so PATHS "${CMAKE_CURRENT_LIST_DIR}/../../extension/pybindings/")
+set(EXECUTORCH_LIBRARIES)
+set(EXECUTORCH_FOUND OFF)
+if(_portable_lib_LIBRARY)
+  set(EXECUTORCH_FOUND ON)
+  message(STATUS "ExecuTorch portable library is found at ${_portable_lib_LIBRARY}")
+  list(APPEND EXECUTORCH_LIBRARIES _portable_lib)
+  add_library(_portable_lib STATIC IMPORTED)
+  set(EXECUTORCH_INCLUDE_DIRS ${CMAKE_CURRENT_LIST_DIR}/../../include)
+  set_target_properties(_portable_lib PROPERTIES
+    IMPORTED_LOCATION "${_portable_lib_LIBRARY}"
+    INTERFACE_INCLUDE_DIRECTORIES "${EXECUTORCH_INCLUDE_DIRS}"
+    CXX_STANDARD 17
+  )
+endif()
diff --git a/extension/pybindings/TARGETS b/extension/pybindings/TARGETS
index ecf23e4658..17ccbb2477 100644
--- a/extension/pybindings/TARGETS
+++ b/extension/pybindings/TARGETS
@@ -67,6 +67,7 @@ runtime.python_library(
     srcs = ["portable_lib.py"],
     visibility = [
         "//executorch/exir/...",
+        "//executorch/runtime/...",
         "@EXECUTORCH_CLIENTS",
     ],
     deps = [":_portable_lib"],
diff --git a/extension/pybindings/portable_lib.py b/extension/pybindings/portable_lib.py
index d094710e67..25624ad60c 100644
--- a/extension/pybindings/portable_lib.py
+++ b/extension/pybindings/portable_lib.py
@@ -45,6 +45,8 @@
     _reset_profile_results,  # noqa: F401
     BundledModule,  # noqa: F401
     ExecuTorchModule,  # noqa: F401
+    MethodMeta,  # noqa: F401
+    Verification,  # noqa: F401
 )
 
 # Clean up so that `dir(portable_lib)` is the same as `dir(_portable_lib)`
diff --git a/extension/pybindings/pybindings.cpp b/extension/pybindings/pybindings.cpp
index d674f2fe58..19d4d09596 100644
--- a/extension/pybindings/pybindings.cpp
+++ b/extension/pybindings/pybindings.cpp
@@ -24,6 +24,7 @@
 #include <executorch/extension/data_loader/mmap_data_loader.h>
 #include <executorch/extension/memory_allocator/malloc_memory_allocator.h>
 #include <executorch/runtime/core/data_loader.h>
+#include <executorch/runtime/core/exec_aten/util/scalar_type_util.h>
 #include <executorch/runtime/executor/method.h>
 #include <executorch/runtime/executor/program.h>
 #include <executorch/runtime/kernel/operator_registry.h>
@@ -55,6 +56,16 @@
     }                                                             \
   })
 
+#define THROW_INDEX_IF_ERROR(error, message, ...)                 \
+  ({                                                              \
+    if ((error) != Error::Ok) {                                   \
+      char msg_buf[128];                                          \
+      snprintf(msg_buf, sizeof(msg_buf), message, ##__VA_ARGS__); \
+      /* pybind will convert this to a python exception. */       \
+      throw std::out_of_range(msg_buf);                           \
+    }                                                             \
+  })
+
 // Our logs work by writing to stderr. By default this is done through fprintf
 // (as defined in posix.cpp) which then does not show up in python environments.
 // Here we override the pal to use std::cerr which can be properly redirected by
@@ -157,13 +168,15 @@ class Module final {
   explicit Module(
       std::unique_ptr<DataLoader> loader,
       std::unique_ptr<ETDumpGen> tracer = nullptr,
-      size_t debug_buffer_size = 0)
+      size_t debug_buffer_size = 0,
+      Program::Verification program_verification =
+          Program::Verification::InternalConsistency)
       : loader_(std::move(loader)),
         event_tracer_(std::move(tracer)),
         debug_buffer_size_(debug_buffer_size) {
     ::executorch::runtime::runtime_init();
-    Result<Program> program = Program::load(
-        loader_.get(), Program::Verification::InternalConsistency);
+    Result<Program> program =
+        Program::load(loader_.get(), program_verification);
     THROW_IF_ERROR(
         program.error(),
         "loading program failed with error: 0x%" PRIx32,
@@ -238,10 +251,10 @@ class Module final {
       const std::vector<EValue>& args,
       const std::optional<std::vector<Span<uint8_t>>>& output_storages =
           std::nullopt) {
-    auto& method = methods_[method_name];
+    auto& method = get_method(method_name);
     exec_aten::ArrayRef<EValue> input_evalue_list(args.data(), args.size());
 
-    Error set_inputs_status = method->set_inputs(input_evalue_list);
+    Error set_inputs_status = method.set_inputs(input_evalue_list);
     THROW_IF_ERROR(
         set_inputs_status,
         "method->set_inputs() for method '%s' failed with error 0x%" PRIx32,
@@ -262,9 +275,9 @@ class Module final {
         c10::autograd_dispatch_keyset);
 #endif
     if (output_storages) {
-      setup_output_storage(*method, *output_storages);
+      setup_output_storage(method, *output_storages);
     }
-    Error execute_status = method->execute();
+    Error execute_status = method.execute();
     THROW_IF_ERROR(
         execute_status,
         "method->execute() failed with error 0x%" PRIx32,
@@ -291,11 +304,22 @@ class Module final {
   Method& get_method(const std::string& method_name) {
     if (methods_.count(method_name) == 0) {
       THROW_IF_ERROR(
-          Error(), "no such method in program: %s", method_name.c_str());
+          Error::InvalidArgument,
+          "no such method in program: %s",
+          method_name.c_str());
     }
     return *methods_[method_name].get();
   }
 
+  /// Returns the names of all methods in the program.
+  std::vector<std::string> method_names() const {
+    std::vector<std::string> names;
+    for (const auto& method : methods_) {
+      names.push_back(method.first);
+    }
+    return names;
+  }
+
   bool has_etdump() {
     return static_cast<bool>(event_tracer_);
   }
@@ -375,19 +399,22 @@ inline std::unique_ptr<Module> load_module_from_buffer(
     const void* ptr,
     size_t ptr_len,
     bool enable_etdump,
-    size_t debug_buffer_size) {
+    size_t debug_buffer_size,
+    Program::Verification program_verification) {
   EXECUTORCH_SCOPE_PROF("load_module_from_buffer");
   auto loader = std::make_unique<BufferDataLoader>(ptr, ptr_len);
   return std::make_unique<Module>(
       std::move(loader),
       enable_etdump ? std::make_unique<torch::executor::ETDumpGen>() : nullptr,
-      debug_buffer_size);
+      debug_buffer_size,
+      program_verification);
 }
 
 inline std::unique_ptr<Module> load_module_from_file(
     const std::string& path,
     bool enable_etdump,
-    size_t debug_buffer_size) {
+    size_t debug_buffer_size,
+    Program::Verification program_verification) {
   EXECUTORCH_SCOPE_PROF("load_module_from_file");
 
   Result<MmapDataLoader> res = MmapDataLoader::from(
@@ -402,7 +429,8 @@ inline std::unique_ptr<Module> load_module_from_file(
   return std::make_unique<Module>(
       std::move(loader),
       enable_etdump ? std::make_unique<torch::executor::ETDumpGen>() : nullptr,
-      debug_buffer_size);
+      debug_buffer_size,
+      program_verification);
 }
 
 static constexpr size_t kDEFAULT_BUNDLED_INPUT_POOL_SIZE = 16 * 1024U;
@@ -448,34 +476,158 @@ struct PyBundledModule final {
   size_t program_len_;
 };
 
+/// Expose a subset of TensorInfo information to python.
+struct PyTensorInfo final {
+  explicit PyTensorInfo(
+      std::shared_ptr<Module> module,
+      torch::executor::TensorInfo info)
+      : module_(std::move(module)), info_(info) {}
+
+  py::tuple sizes() const {
+    const auto shape = info_.sizes();
+    py::tuple tup(shape.size());
+    for (size_t i = 0; i < shape.size(); ++i) {
+      tup[i] = py::cast(shape[i]);
+    }
+    return tup;
+  }
+
+  int8_t dtype() const {
+    return static_cast<std::underlying_type<exec_aten::ScalarType>::type>(
+        info_.scalar_type());
+  }
+
+  bool is_memory_planned() const {
+    return info_.is_memory_planned();
+  }
+
+  size_t nbytes() const {
+    return info_.nbytes();
+  }
+
+  std::string repr() const {
+    std::string size_str = "[";
+    for (const auto& d : info_.sizes()) {
+      size_str.append(std::to_string(d));
+      size_str.append(", ");
+    }
+    if (size_str.length() >= 2) {
+      // Pop the last two characters (command and space) and add close bracket.
+      size_str.pop_back();
+      size_str.pop_back();
+    }
+    size_str.append("]");
+    return "TensorInfo(sizes=" + size_str + ", dtype=" +
+        std::string(executorch::runtime::toString(info_.scalar_type())) +
+        ", is_memory_planned=" +
+        (info_.is_memory_planned() ? "True" : "False") +
+        ", nbytes=" + std::to_string(info_.nbytes()) + ")";
+  }
+
+ private:
+  // TensorInfo relies on module to be alive.
+  std::shared_ptr<Module> module_;
+  torch::executor::TensorInfo info_;
+};
+
+/// Expose a subset of MethodMeta information to python.
+struct PyMethodMeta final {
+  explicit PyMethodMeta(
+      std::shared_ptr<Module> module,
+      torch::executor::MethodMeta meta)
+      : module_(std::move(module)), meta_(meta) {}
+
+  const char* name() const {
+    return meta_.name();
+  }
+
+  size_t num_inputs() const {
+    return meta_.num_inputs();
+  }
+
+  std::unique_ptr<PyTensorInfo> input_tensor_meta(size_t index) const {
+    const auto result = meta_.input_tensor_meta(index);
+    THROW_INDEX_IF_ERROR(
+        result.error(), "Cannot get input tensor meta at %zu", index);
+    return std::make_unique<PyTensorInfo>(module_, result.get());
+  }
+
+  size_t num_outputs() const {
+    return meta_.num_outputs();
+  }
+
+  std::unique_ptr<PyTensorInfo> output_tensor_meta(size_t index) const {
+    const auto result = meta_.output_tensor_meta(index);
+    THROW_INDEX_IF_ERROR(
+        result.error(), "Cannot get output tensor meta at %zu", index);
+    return std::make_unique<PyTensorInfo>(module_, result.get());
+  }
+
+  py::str repr() const {
+    py::list input_meta_strs;
+    for (size_t i = 0; i < meta_.num_inputs(); ++i) {
+      input_meta_strs.append(py::str(input_tensor_meta(i)->repr()));
+    }
+    py::list output_meta_strs;
+    for (size_t i = 0; i < meta_.num_outputs(); ++i) {
+      output_meta_strs.append(py::str(output_tensor_meta(i)->repr()));
+    }
+    // Add quotes to be more similar to Python's repr for strings.
+    py::str format =
+        "MethodMeta(name='{}', num_inputs={}, input_tensor_meta={}, num_outputs={}, output_tensor_meta={})";
+    return format.format(
+        std::string(meta_.name()),
+        std::to_string(meta_.num_inputs()),
+        input_meta_strs,
+        std::to_string(meta_.num_outputs()),
+        output_meta_strs);
+  }
+
+ private:
+  // Must keep the Module object alive or else the meta object is invalidated.
+  std::shared_ptr<Module> module_;
+  torch::executor::MethodMeta meta_;
+};
+
 struct PyModule final {
   explicit PyModule(
       const py::bytes& buffer,
       bool enable_etdump,
-      size_t debug_buffer_size = 0)
+      size_t debug_buffer_size = 0,
+      Program::Verification program_verification =
+          Program::Verification::InternalConsistency)
       : module_(load_module_from_buffer(
             buffer.cast<std::string_view>().data(),
             py::len(buffer),
             enable_etdump,
-            debug_buffer_size)) {}
+            debug_buffer_size,
+            program_verification)) {}
 
   explicit PyModule(
       const void* ptr,
       size_t ptr_len,
       bool enable_etdump,
-      size_t debug_buffer_size = 0)
+      size_t debug_buffer_size = 0,
+      Program::Verification program_verification =
+          Program::Verification::InternalConsistency)
       : module_(load_module_from_buffer(
             ptr,
             ptr_len,
             enable_etdump,
-            debug_buffer_size)) {}
+            debug_buffer_size,
+            program_verification)) {}
 
   explicit PyModule(
       const std::string& path,
       bool enable_etdump,
-      size_t debug_buffer_size = 0)
-      : module_(load_module_from_file(path, enable_etdump, debug_buffer_size)) {
-  }
+      size_t debug_buffer_size = 0,
+      Program::Verification program_verification =
+          Program::Verification::InternalConsistency)
+      : module_(load_module_from_file(
+            path,
+            enable_etdump,
+            debug_buffer_size,
+            program_verification)) {}
 
   PyModule(const PyModule&) = delete;
   PyModule& operator=(const PyModule&) = delete;
@@ -486,14 +638,20 @@ struct PyModule final {
   static std::unique_ptr<PyModule> load_from_buffer(
       const py::bytes& buffer,
       bool enable_etdump,
-      size_t debug_buffer_size = 0) {
-    return std::make_unique<PyModule>(buffer, enable_etdump, debug_buffer_size);
+      size_t debug_buffer_size = 0,
+      Program::Verification program_verification =
+          Program::Verification::InternalConsistency) {
+    return std::make_unique<PyModule>(
+        buffer, enable_etdump, debug_buffer_size, program_verification);
   }
   static std::unique_ptr<PyModule> load_from_file(
       const std::string& path,
       bool enable_etdump,
-      size_t debug_buffer_size = 0) {
-    return std::make_unique<PyModule>(path, enable_etdump, debug_buffer_size);
+      size_t debug_buffer_size = 0,
+      Program::Verification program_verification =
+          Program::Verification::InternalConsistency) {
+    return std::make_unique<PyModule>(
+        path, enable_etdump, debug_buffer_size, program_verification);
   }
 
   static std::unique_ptr<PyModule> load_from_bundled_program(
@@ -751,8 +909,17 @@ struct PyModule final {
     return list;
   }
 
+  std::unique_ptr<PyMethodMeta> method_meta(const std::string method_name) {
+    auto& method = module_->get_method(method_name);
+    return std::make_unique<PyMethodMeta>(module_, method.method_meta());
+  }
+
+  std::vector<std::string> method_names() {
+    return module_->method_names();
+  }
+
  private:
-  std::unique_ptr<Module> module_;
+  std::shared_ptr<Module> module_;
   // Need to keep-alive output storages until they can be compared in case of
   // bundled programs.
   std::vector<std::vector<uint8_t>> output_storages_;
@@ -805,12 +972,20 @@ PYBIND11_MODULE(EXECUTORCH_PYTHON_MODULE_NAME, m) {
   // Redirects cout and cerr for function calls this guards to the python env.
   auto call_guard = py::
       call_guard<py::scoped_ostream_redirect, py::scoped_estream_redirect>();
+
+  // Bind the verification enum to python.
+  py::enum_<Program::Verification>(m, "Verification")
+      .value("Minimal", Program::Verification::Minimal)
+      .value("InternalConsistency", Program::Verification::InternalConsistency);
+
   m.def(
       "_load_for_executorch",
       PyModule::load_from_file,
       py::arg("path"),
       py::arg("enable_etdump") = false,
       py::arg("debug_buffer_size") = 0,
+      py::arg("program_verification") =
+          Program::Verification::InternalConsistency,
       call_guard);
   m.def(
       "_load_for_executorch_from_buffer",
@@ -818,6 +993,8 @@ PYBIND11_MODULE(EXECUTORCH_PYTHON_MODULE_NAME, m) {
       py::arg("buffer"),
       py::arg("enable_etdump") = false,
       py::arg("debug_buffer_size") = 0,
+      py::arg("program_verification") =
+          Program::Verification::InternalConsistency,
       call_guard);
   m.def(
       "_load_for_executorch_from_bundled_program",
@@ -866,6 +1043,12 @@ PYBIND11_MODULE(EXECUTORCH_PYTHON_MODULE_NAME, m) {
           py::arg("method_name"),
           py::arg("clone_outputs") = true,
           call_guard)
+      .def(
+          "method_meta",
+          &PyModule::method_meta,
+          py::arg("method_name"),
+          call_guard)
+      .def("method_names", &PyModule::method_names, call_guard)
       .def(
           "run_method",
           &PyModule::run_method,
@@ -900,6 +1083,27 @@ PYBIND11_MODULE(EXECUTORCH_PYTHON_MODULE_NAME, m) {
           call_guard);
 
   py::class_<PyBundledModule>(m, "BundledModule");
+  py::class_<PyTensorInfo>(m, "TensorInfo")
+      .def("sizes", &PyTensorInfo::sizes, call_guard)
+      .def("dtype", &PyTensorInfo::dtype, call_guard)
+      .def("is_memory_planned", &PyTensorInfo::is_memory_planned, call_guard)
+      .def("nbytes", &PyTensorInfo::nbytes, call_guard)
+      .def("__repr__", &PyTensorInfo::repr, call_guard);
+  py::class_<PyMethodMeta>(m, "MethodMeta")
+      .def("name", &PyMethodMeta::name, call_guard)
+      .def("num_inputs", &PyMethodMeta::num_inputs, call_guard)
+      .def("num_outputs", &PyMethodMeta::num_outputs, call_guard)
+      .def(
+          "input_tensor_meta",
+          &PyMethodMeta::input_tensor_meta,
+          py::arg("index"),
+          call_guard)
+      .def(
+          "output_tensor_meta",
+          &PyMethodMeta::output_tensor_meta,
+          py::arg("index"),
+          call_guard)
+      .def("__repr__", &PyMethodMeta::repr, call_guard);
 }
 
 } // namespace pybindings
diff --git a/extension/pybindings/pybindings.pyi b/extension/pybindings/pybindings.pyi
index 14e8ec13e1..818df1f760 100644
--- a/extension/pybindings/pybindings.pyi
+++ b/extension/pybindings/pybindings.pyi
@@ -5,10 +5,24 @@
 # LICENSE file in the root directory of this source tree.
 
 # pyre-strict
-from typing import Any, Dict, List, Optional, Sequence, Tuple
+from __future__ import annotations
+
+from typing import Any, Dict, Enum, List, Optional, Sequence, Tuple
 
 from executorch.exir._warnings import experimental
 
+@experimental("This API is experimental and subject to change without notice.")
+class Verification(Enum):
+    """Verification maps C++ Program::Verification to Python.
+
+    .. warning::
+
+        This API is experimental and subject to change without notice.
+    """
+
+    Minimal: ...
+    InternalConsistency: ...
+
 @experimental("This API is experimental and subject to change without notice.")
 class ExecuTorchModule:
     """ExecuTorchModule is a Python wrapper around a C++ ExecuTorch program.
@@ -43,6 +57,8 @@ class ExecuTorchModule:
     def write_etdump_result_to_file(
         self, path: str, debug_buffer_path: Optional[str] = None
     ) -> None: ...
+    def method_meta(self, method_name: str) -> MethodMeta: ...
+    def method_names(self) -> List[str]: ...
 
 @experimental("This API is experimental and subject to change without notice.")
 class BundledModule:
@@ -54,9 +70,78 @@ class BundledModule:
 
     ...
 
+@experimental("This API is experimental and subject to change without notice.")
+class TensorInfo:
+    """Metadata about a tensor such as the shape and dtype.
+
+    .. warning::
+
+        This API is experimental and subject to change without notice.
+    """
+
+    def sizes(self) -> Tuple[int, ...]:
+        """Shape of the tensor as a tuple"""
+        ...
+
+    def dtype(self) -> int:
+        """The data type of the elements inside the tensor.
+        See documentation for ScalarType in executorch/runtime/core/portable_type/scalar_type.h
+        for the values these integers can take."""
+        ...
+
+    def is_memory_planned(self) -> bool:
+        """True if the tensor is already memory planned, meaning no allocation
+        needs to be provided. False otherwise"""
+        ...
+
+    def nbytes(self) -> int:
+        """Number of bytes in the tensor. Not the same as numel if the dtype is
+        larger than 1 byte wide"""
+        ...
+
+    def __repr__(self) -> str: ...
+
+@experimental("This API is experimental and subject to change without notice.")
+class MethodMeta:
+    """Metadata about a method such as the number of inputs and outputs.
+
+    .. warning::
+
+        This API is experimental and subject to change without notice.
+    """
+
+    def name(self) -> str:
+        """The name of the method, such as 'forward'"""
+        ...
+
+    def num_inputs(self) -> int:
+        """The number of user inputs to the method. This does not include any
+        internal buffers or weights, which don't need to be provided by the user"""
+        ...
+
+    def num_outputs(self) -> int:
+        """The number of outputs from the method. This does not include any mutated
+        internal buffers"""
+        ...
+
+    def input_tensor_meta(self, index: int) -> TensorInfo:
+        """The tensor info for the 'index'th input. Index must be in the interval
+        [0, num_inputs()). Raises an IndexError if the index is out of bounds"""
+        ...
+
+    def output_tensor_meta(self, index: int) -> TensorInfo:
+        """The tensor info for the 'index'th output. Index must be in the interval
+        [0, num_outputs()). Raises an IndexError if the index is out of bounds"""
+        ...
+
+    def __repr__(self) -> str: ...
+
 @experimental("This API is experimental and subject to change without notice.")
 def _load_for_executorch(
-    path: str, enable_etdump: bool = False, debug_buffer_size: int = 0
+    path: str,
+    enable_etdump: bool = False,
+    debug_buffer_size: int = 0,
+    program_verification: Verification = Verification.InternalConsistency,
 ) -> ExecuTorchModule:
     """Load an ExecuTorch Program from a file.
 
@@ -79,7 +164,10 @@ def _load_for_executorch(
 
 @experimental("This API is experimental and subject to change without notice.")
 def _load_for_executorch_from_buffer(
-    buffer: bytes, enable_etdump: bool = False, debug_buffer_size: int = 0
+    buffer: bytes,
+    enable_etdump: bool = False,
+    debug_buffer_size: int = 0,
+    program_verification: Verification = Verification.InternalConsistency,
 ) -> ExecuTorchModule:
     """Same as _load_for_executorch, but takes a byte buffer instead of a file path.
 
diff --git a/extension/pybindings/test/TARGETS b/extension/pybindings/test/TARGETS
index feb4779a05..41f2c84dcc 100644
--- a/extension/pybindings/test/TARGETS
+++ b/extension/pybindings/test/TARGETS
@@ -11,7 +11,10 @@ runtime.python_library(
     srcs = [
         "make_test.py",
     ],
-    visibility = ["//executorch/extension/pybindings/..."],
+    visibility = [
+        "//executorch/extension/pybindings/...",
+        "//executorch/runtime/...",
+    ],
     deps = [
         "//caffe2:torch",
         "//caffe2:torch_fx",
diff --git a/extension/pybindings/test/make_test.py b/extension/pybindings/test/make_test.py
index 44e41ed443..e8d23fd44e 100644
--- a/extension/pybindings/test/make_test.py
+++ b/extension/pybindings/test/make_test.py
@@ -7,6 +7,7 @@
 # pyre-unsafe
 
 import unittest
+from types import ModuleType
 from typing import Any, Callable, Optional, Tuple
 
 import torch
@@ -15,117 +16,122 @@
 from torch.export import export
 
 
-def make_test(  # noqa: C901
-    tester: unittest.TestCase,
-    load_fn: Callable,
-) -> Callable[[unittest.TestCase], None]:
-    """
-    Returns a function that operates as a test case within a unittest.TestCase class.
+class ModuleAdd(torch.nn.Module):
+    """The module to serialize and execute."""
 
-    Used to allow the test code for pybindings to be shared across different pybinding libs
-    which will all have different load functions. In this case each individual test case is a
-    subfunction of wrapper.
-    """
+    def __init__(self):
+        super(ModuleAdd, self).__init__()
 
-    def wrapper(tester: unittest.TestCase) -> None:
-        class ModuleAdd(torch.nn.Module):
-            """The module to serialize and execute."""
+    def forward(self, x, y):
+        return x + y
 
-            def __init__(self):
-                super(ModuleAdd, self).__init__()
+    def get_methods_to_export(self):
+        return ("forward",)
 
-            def forward(self, x, y):
-                return x + y
+    def get_inputs(self):
+        return (torch.ones(2, 2), torch.ones(2, 2))
 
-            def get_methods_to_export(self):
-                return ("forward",)
 
-            def get_inputs(self):
-                return (torch.ones(2, 2), torch.ones(2, 2))
+class ModuleMulti(torch.nn.Module):
+    """The module to serialize and execute."""
 
-        class ModuleMulti(torch.nn.Module):
-            """The module to serialize and execute."""
+    def __init__(self):
+        super(ModuleMulti, self).__init__()
 
-            def __init__(self):
-                super(ModuleMulti, self).__init__()
+    def forward(self, x, y):
+        return x + y
 
-            def forward(self, x, y):
-                return x + y
+    def forward2(self, x, y):
+        return x + y + 1
 
-            def forward2(self, x, y):
-                return x + y + 1
+    def get_methods_to_export(self):
+        return ("forward", "forward2")
 
-            def get_methods_to_export(self):
-                return ("forward", "forward2")
+    def get_inputs(self):
+        return (torch.ones(2, 2), torch.ones(2, 2))
 
-            def get_inputs(self):
-                return (torch.ones(2, 2), torch.ones(2, 2))
 
-        class ModuleAddSingleInput(torch.nn.Module):
-            """The module to serialize and execute."""
+class ModuleAddSingleInput(torch.nn.Module):
+    """The module to serialize and execute."""
 
-            def __init__(self):
-                super(ModuleAddSingleInput, self).__init__()
+    def __init__(self):
+        super(ModuleAddSingleInput, self).__init__()
 
-            def forward(self, x):
-                return x + x
+    def forward(self, x):
+        return x + x
 
-            def get_methods_to_export(self):
-                return ("forward",)
+    def get_methods_to_export(self):
+        return ("forward",)
 
-            def get_inputs(self):
-                return (torch.ones(2, 2),)
+    def get_inputs(self):
+        return (torch.ones(2, 2),)
 
-        class ModuleAddConstReturn(torch.nn.Module):
-            """The module to serialize and execute."""
 
-            def __init__(self):
-                super(ModuleAddConstReturn, self).__init__()
-                self.state = torch.ones(2, 2)
+class ModuleAddConstReturn(torch.nn.Module):
+    """The module to serialize and execute."""
 
-            def forward(self, x):
-                return x + self.state, self.state
+    def __init__(self):
+        super(ModuleAddConstReturn, self).__init__()
+        self.state = torch.ones(2, 2)
 
-            def get_methods_to_export(self):
-                return ("forward",)
+    def forward(self, x):
+        return x + self.state, self.state
 
-            def get_inputs(self):
-                return (torch.ones(2, 2),)
+    def get_methods_to_export(self):
+        return ("forward",)
 
-        def create_program(
-            eager_module: torch.nn.Module,
-            et_config: Optional[ExecutorchBackendConfig] = None,
-        ) -> Tuple[ExecutorchProgramManager, Tuple[Any, ...]]:
-            """Returns an executorch program based on ModuleAdd, along with inputs."""
+    def get_inputs(self):
+        return (torch.ones(2, 2),)
 
-            # Trace the test module and create a serialized ExecuTorch program.
-            inputs = eager_module.get_inputs()
-            input_map = {}
-            for method in eager_module.get_methods_to_export():
-                input_map[method] = inputs
 
-            class WrapperModule(torch.nn.Module):
-                def __init__(self, fn):
-                    super().__init__()
-                    self.fn = fn
+def create_program(
+    eager_module: torch.nn.Module,
+    et_config: Optional[ExecutorchBackendConfig] = None,
+) -> Tuple[ExecutorchProgramManager, Tuple[Any, ...]]:
+    """Returns an executorch program based on ModuleAdd, along with inputs."""
 
-                def forward(self, *args, **kwargs):
-                    return self.fn(*args, **kwargs)
+    # Trace the test module and create a serialized ExecuTorch program.
+    inputs = eager_module.get_inputs()
+    input_map = {}
+    for method in eager_module.get_methods_to_export():
+        input_map[method] = inputs
 
-            exported_methods = {}
-            # These cleanup passes are required to convert the `add` op to its out
-            # variant, along with some other transformations.
-            for method_name, method_input in input_map.items():
-                wrapped_mod = WrapperModule(  # pyre-ignore[16]
-                    getattr(eager_module, method_name)
-                )
-                exported_methods[method_name] = export(wrapped_mod, method_input)
+    class WrapperModule(torch.nn.Module):
+        def __init__(self, fn):
+            super().__init__()
+            self.fn = fn
+
+        def forward(self, *args, **kwargs):
+            return self.fn(*args, **kwargs)
+
+    exported_methods = {}
+    # These cleanup passes are required to convert the `add` op to its out
+    # variant, along with some other transformations.
+    for method_name, method_input in input_map.items():
+        wrapped_mod = WrapperModule(getattr(eager_module, method_name))
+        exported_methods[method_name] = export(wrapped_mod, method_input)
+
+    exec_prog = to_edge(exported_methods).to_executorch(config=et_config)
 
-            exec_prog = to_edge(exported_methods).to_executorch(config=et_config)
+    # Create the ExecuTorch program from the graph.
+    exec_prog.dump_executorch_program(verbose=True)
+    return (exec_prog, inputs)
 
-            # Create the ExecuTorch program from the graph.
-            exec_prog.dump_executorch_program(verbose=True)
-            return (exec_prog, inputs)
+
+def make_test(  # noqa: C901
+    tester: unittest.TestCase,
+    runtime: ModuleType,
+) -> Callable[[unittest.TestCase], None]:
+    """
+    Returns a function that operates as a test case within a unittest.TestCase class.
+
+    Used to allow the test code for pybindings to be shared across different pybinding libs
+    which will all have different load functions. In this case each individual test case is a
+    subfunction of wrapper.
+    """
+    load_fn: Callable = runtime._load_for_executorch_from_buffer
+
+    def wrapper(tester: unittest.TestCase) -> None:
 
         ######### TEST CASES #########
 
@@ -236,6 +242,7 @@ def test_quantized_ops(tester):
 
             from executorch.exir import EdgeCompileConfig
             from executorch.exir.passes.quant_fusion_pass import QuantFusionPass
+            from executorch.kernels import quantized  # noqa: F401
             from torch.ao.quantization import get_default_qconfig_mapping
             from torch.ao.quantization.backend_config.executorch import (
                 get_executorch_backend_config,
@@ -297,6 +304,84 @@ def test_constant_output_not_memory_planned(tester):
 
         ######### RUN TEST CASES #########
 
+        def test_method_meta(tester) -> None:
+            exported_program, inputs = create_program(ModuleAdd())
+
+            # Use pybindings to load the program and query its metadata.
+            executorch_module = load_fn(exported_program.buffer)
+            meta = executorch_module.method_meta("forward")
+
+            # Ensure that all these APIs work even if the module object is destroyed.
+            del executorch_module
+            tester.assertEqual(meta.name(), "forward")
+            tester.assertEqual(meta.num_inputs(), 2)
+            tester.assertEqual(meta.num_outputs(), 1)
+            # Common string for all these tensors.
+            tensor_info = "TensorInfo(sizes=[2, 2], dtype=Float, is_memory_planned=True, nbytes=16)"
+            float_dtype = 6
+            tester.assertEqual(
+                str(meta),
+                "MethodMeta(name='forward', num_inputs=2, "
+                f"input_tensor_meta=['{tensor_info}', '{tensor_info}'], "
+                f"num_outputs=1, output_tensor_meta=['{tensor_info}'])",
+            )
+
+            input_tensors = [meta.input_tensor_meta(i) for i in range(2)]
+            output_tensor = meta.output_tensor_meta(0)
+            # Check that accessing out of bounds raises IndexError.
+            with tester.assertRaises(IndexError):
+                meta.input_tensor_meta(2)
+            # Test that tensor metadata can outlive method metadata.
+            del meta
+            tester.assertEqual([t.sizes() for t in input_tensors], [(2, 2), (2, 2)])
+            tester.assertEqual(
+                [t.dtype() for t in input_tensors], [float_dtype, float_dtype]
+            )
+            tester.assertEqual(
+                [t.is_memory_planned() for t in input_tensors], [True, True]
+            )
+            tester.assertEqual([t.nbytes() for t in input_tensors], [16, 16])
+            tester.assertEqual(str(input_tensors), f"[{tensor_info}, {tensor_info}]")
+
+            tester.assertEqual(output_tensor.sizes(), (2, 2))
+            tester.assertEqual(output_tensor.dtype(), float_dtype)
+            tester.assertEqual(output_tensor.is_memory_planned(), True)
+            tester.assertEqual(output_tensor.nbytes(), 16)
+            tester.assertEqual(str(output_tensor), tensor_info)
+
+        def test_bad_name(tester) -> None:
+            # Create an ExecuTorch program from ModuleAdd.
+            exported_program, inputs = create_program(ModuleAdd())
+
+            # Use pybindings to load and execute the program.
+            executorch_module = load_fn(exported_program.buffer)
+            # Invoke the callable on executorch_module instead of calling module.forward.
+            with tester.assertRaises(RuntimeError):
+                executorch_module.run_method("not_a_real_method", inputs)
+
+        def test_verification_config(tester) -> None:
+            # Create an ExecuTorch program from ModuleAdd.
+            exported_program, inputs = create_program(ModuleAdd())
+            Verification = runtime.Verification
+
+            # Use pybindings to load and execute the program.
+            for config in [Verification.Minimal, Verification.InternalConsistency]:
+                executorch_module = load_fn(
+                    exported_program.buffer,
+                    enable_etdump=False,
+                    debug_buffer_size=0,
+                    program_verification=config,
+                )
+
+                executorch_output = executorch_module.forward(inputs)[0]
+
+                # The test module adds the two inputs, so its output should be the same
+                # as adding them directly.
+                expected = inputs[0] + inputs[1]
+
+                tester.assertEqual(str(expected), str(executorch_output))
+
+        ######### RUN TEST CASES #########
         test_e2e(tester)
         test_multiple_entry(tester)
         test_output_lifespan(tester)
@@ -305,5 +390,8 @@ def test_constant_output_not_memory_planned(tester):
         test_stderr_redirect(tester)
         test_quantized_ops(tester)
         test_constant_output_not_memory_planned(tester)
+        test_method_meta(tester)
+        test_bad_name(tester)
+        test_verification_config(tester)
 
     return wrapper
diff --git a/extension/pybindings/test/test_pybindings.py b/extension/pybindings/test/test_pybindings.py
index d4ce2af039..d7a1cf4ca0 100644
--- a/extension/pybindings/test/test_pybindings.py
+++ b/extension/pybindings/test/test_pybindings.py
@@ -10,24 +10,19 @@
 
 kernel_mode = None  # either aten mode or portable mode
 try:
-    from executorch.extension.pybindings.portable_lib import (
-        _load_for_executorch_from_buffer,
-    )
+    from executorch.extension.pybindings import portable_lib as runtime
 
     kernel_mode = "portable"
 except Exception:
     print("can't load portable lib")
 
-try:
-    from executorch.extension.pybindings.aten_lib import (  # noqa: F811
-        _load_for_executorch_from_buffer,
-    )
-
-    assert kernel_mode is None
+if kernel_mode is None:
+    try:
+        from executorch.extension.pybindings import aten_lib as runtime  # noqa: F811
 
-    kernel_mode = "aten"
-except Exception:
-    print("can't load aten lib")
+        kernel_mode = "aten"
+    except Exception:
+        print("can't load aten lib")
 
 assert kernel_mode is not None
 
@@ -37,4 +32,4 @@
 
 class PybindingsTest(unittest.TestCase):
     def test(self):
-        make_test(self, _load_for_executorch_from_buffer)(self)
+        make_test(self, runtime)(self)
diff --git a/pytest.ini b/pytest.ini
index 701c0187ec..49f46ff6e3 100644
--- a/pytest.ini
+++ b/pytest.ini
@@ -34,6 +34,8 @@ addopts =
     backends/xnnpack/test
     # extension/
     extension/pybindings/test
+    # Runtime
+    runtime
     # test
     test/end2end/test_end2end.py
     --ignore=backends/xnnpack/test/ops/linear.py
diff --git a/runtime/TARGETS b/runtime/TARGETS
new file mode 100644
index 0000000000..b9b0fc2c30
--- /dev/null
+++ b/runtime/TARGETS
@@ -0,0 +1,14 @@
+load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")
+
+oncall("executorch")
+
+runtime.python_library(
+    name = "runtime",
+    srcs = ["__init__.py"],
+    deps = [
+        "//executorch/extension/pybindings:portable_lib",
+    ],
+    visibility = [
+        "//executorch/runtime/...",
+    ],
+)
diff --git a/runtime/__init__.py b/runtime/__init__.py
new file mode 100644
index 0000000000..80ffeeba03
--- /dev/null
+++ b/runtime/__init__.py
@@ -0,0 +1,198 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Interface to the native C++ ExecuTorch runtime.
+
+Example usage:
+.. code-block:: text
+
+    from pathlib import Path
+
+    import torch
+    from executorch.runtime import Verification, Runtime
+
+    et_runtime: Runtime = Runtime.get()
+    program: Program = et_runtime.load_program(
+        Path("/tmp/program.pte"),
+        verification=Verification.Minimal,
+    )
+    print("Program methods:", program.method_names)
+    forward: Method = program.load_method("forward")
+
+    inputs = (torch.ones(2, 2), torch.ones(2, 2))
+    outputs = forward.execute(inputs)
+    print(f"Ran forward({inputs})")
+    print(f"  outputs: {outputs}")
+
+Example output:
+.. code-block:: text
+
+    Program methods: ('forward', 'forward2')
+    Ran forward((tensor([[1., 1.],
+            [1., 1.]]), tensor([[1., 1.],
+            [1., 1.]])))
+      outputs: [tensor([[1., 1.],
+            [1., 1.]])]
+"""
+
+import functools
+from pathlib import Path
+from types import ModuleType
+from typing import Any, BinaryIO, Dict, Optional, Sequence, Set, Union
+
+try:
+    from executorch.extension.pybindings.portable_lib import (
+        ExecuTorchModule,
+        MethodMeta,
+        Verification,
+    )
+except ModuleNotFoundError as e:
+    raise ModuleNotFoundError(
+        "Prebuilt <site-packages>/extension/pybindings/_portable_lib.so "
+        "is not found. Please reinstall ExecuTorch from pip."
+    ) from e
+
+
+class Method:
+    """An ExecuTorch method, loaded from a Program.
+    This can be used to execute the method with inputs.
+    """
+
+    def __init__(self, method_name: str, module: ExecuTorchModule) -> None:
+        # TODO: This class should be pybind to the C++ counterpart instead of hosting ExecuTorchModule.
+        self._method_name = method_name
+        self._module = module
+
+    def execute(self, inputs: Sequence[Any]) -> Sequence[Any]:
+        """Executes the method with the given inputs.
+
+        Args:
+            inputs: The inputs to the method.
+
+        Returns:
+            The outputs of the method.
+        """
+        return self._module.run_method(self._method_name, inputs)
+
+    @property
+    def metadata(self) -> MethodMeta:
+        """Gets the metadata for the method.
+
+        Returns:
+            The metadata for the method.
+        """
+        return self._module.method_meta(self._method_name)
+
+
+class Program:
+    """An ExecuTorch program, loaded from binary PTE data.
+
+    This can be used to load the methods/models defined by the program.
+    """
+
+    def __init__(self, module: ExecuTorchModule, data: Optional[bytes]) -> None:
+        # Hold the data so the program is not freed.
+        self._data = data
+        self._module = module
+        self._methods: Dict[str, Method] = {}
+        # ExecuTorchModule already pre-loads all Methods when created, so this
+        # doesn't do any extra work. TODO: Don't load a given Method until
+        # load_method() is called. Create a separate Method instance each time,
+        # to allow multiple independent instances of the same model.
+        for method_name in self._module.method_names():
+            self._methods[method_name] = Method(method_name, self._module)
+
+    @property
+    def method_names(self) -> Set[str]:
+        return set(self._methods.keys())
+
+    def load_method(self, name: str) -> Optional[Method]:
+        """Loads a method from the program.
+
+        Args:
+            name: The name of the method to load.
+
+        Returns:
+            The loaded method.
+        """
+        return self._methods.get(name, None)
+
+
+class OperatorRegistry:
+    """The registry of operators that are available to the runtime."""
+
+    def __init__(self, legacy_module: ModuleType) -> None:
+        # TODO: Expose the kernel callables to Python.
+        self._legacy_module = legacy_module
+
+    @property
+    def operator_names(self) -> Set[str]:
+        """The names of all registered operators."""
+        return set(self._legacy_module._get_operator_names())
+
+
+class Runtime:
+    """An instance of the ExecuTorch runtime environment.
+
+    This can be used to concurrently load and execute any number of ExecuTorch
+    programs and methods.
+    """
+
+    @staticmethod
+    @functools.lru_cache(maxsize=1)
+    def get() -> "Runtime":
+        """Gets the Runtime singleton."""
+        import executorch.extension.pybindings.portable_lib as legacy_module
+
+        return Runtime(legacy_module=legacy_module)
+
+    def __init__(self, *, legacy_module: ModuleType) -> None:
+        # Public attributes.
+        self.operator_registry = OperatorRegistry(legacy_module)
+        # Private attributes.
+        self._legacy_module = legacy_module
+
+    def load_program(
+        self,
+        data: Union[bytes, bytearray, BinaryIO, Path, str],
+        *,
+        verification: Verification = Verification.InternalConsistency,
+    ) -> Program:
+        """Loads an ExecuTorch program from a PTE binary.
+
+        Args:
+            data: The binary program data to load; typically PTE data.
+            verification: level of program verification to perform.
+
+        Returns:
+            The loaded program.
+        """
+        if isinstance(data, (Path, str)):
+            m = self._legacy_module._load_for_executorch(
+                str(data),
+                enable_etdump=False,
+                debug_buffer_size=0,
+                program_verification=verification,
+            )
+            return Program(m, data=None)
+        elif isinstance(data, BinaryIO):
+            data_bytes = data.read()
+        elif isinstance(data, bytearray):
+            data_bytes = bytes(data)
+        elif isinstance(data, bytes):
+            data_bytes = data
+        else:
+            raise TypeError(
+                f"Expected data to be bytes, bytearray, a path to a .pte file, or a file-like object, but got {type(data).__name__}."
+            )
+        m = self._legacy_module._load_for_executorch_from_buffer(
+            data_bytes,
+            enable_etdump=False,
+            debug_buffer_size=0,
+            program_verification=verification,
+        )
+
+        return Program(m, data=data_bytes)
diff --git a/runtime/test/TARGETS b/runtime/test/TARGETS
new file mode 100644
index 0000000000..728de01b01
--- /dev/null
+++ b/runtime/test/TARGETS
@@ -0,0 +1,12 @@
+load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")
+
+oncall("executorch")
+
+runtime.python_test(
+    name = "test_runtime",
+    srcs = ["test_runtime.py"],
+    deps = [
+        "//executorch/extension/pybindings/test:make_test",
+        "//executorch/runtime:runtime",
+    ],
+)
diff --git a/runtime/test/test_runtime.py b/runtime/test/test_runtime.py
new file mode 100644
index 0000000000..f0722f357e
--- /dev/null
+++ b/runtime/test/test_runtime.py
@@ -0,0 +1,78 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+import tempfile
+import unittest
+from pathlib import Path
+
+import torch
+
+from executorch.extension.pybindings.test.make_test import (
+    create_program,
+    ModuleAdd,
+    ModuleMulti,
+)
+from executorch.runtime import Runtime, Verification
+
+
+class RuntimeTest(unittest.TestCase):
+    def test_smoke(self):
+        ep, inputs = create_program(ModuleAdd())
+        runtime = Runtime.get()
+        # Demonstrate that get() returns a singleton.
+        runtime2 = Runtime.get()
+        self.assertTrue(runtime is runtime2)
+        program = runtime.load_program(ep.buffer, verification=Verification.Minimal)
+        method = program.load_method("forward")
+        outputs = method.execute(inputs)
+        self.assertTrue(torch.allclose(outputs[0], inputs[0] + inputs[1]))
+
+    def test_module_with_multiple_method_names(self):
+        ep, inputs = create_program(ModuleMulti())
+        runtime = Runtime.get()
+
+        program = runtime.load_program(ep.buffer, verification=Verification.Minimal)
+        self.assertEqual(program.method_names, set({"forward", "forward2"}))
+        method = program.load_method("forward")
+        outputs = method.execute(inputs)
+        self.assertTrue(torch.allclose(outputs[0], inputs[0] + inputs[1]))
+
+        method = program.load_method("forward2")
+        outputs = method.execute(inputs)
+        self.assertTrue(torch.allclose(outputs[0], inputs[0] + inputs[1] + 1))
+
+    def test_print_operator_names(self):
+        ep, inputs = create_program(ModuleAdd())
+        runtime = Runtime.get()
+
+        operator_names = runtime.operator_registry.operator_names
+        self.assertGreater(len(operator_names), 0)
+
+        self.assertIn("aten::add.out", operator_names)
+
+    def test_load_program_with_path(self):
+        ep, inputs = create_program(ModuleAdd())
+        runtime = Runtime.get()
+
+        def test_add(program):
+            method = program.load_method("forward")
+            outputs = method.execute(inputs)
+            self.assertTrue(torch.allclose(outputs[0], inputs[0] + inputs[1]))
+
+        with tempfile.NamedTemporaryFile() as f:
+            f.write(ep.buffer)
+            f.flush()
+            # filename
+            program = runtime.load_program(f.name)
+            test_add(program)
+            # pathlib.Path
+            path = Path(f.name)
+            program = runtime.load_program(path)
+            test_add(program)
+            # BytesIO
+            with open(f.name, "rb") as f:
+                program = runtime.load_program(f.read())
+                test_add(program)
diff --git a/setup.py b/setup.py
index f6adb4f86c..3bc5f703ff 100644
--- a/setup.py
+++ b/setup.py
@@ -423,6 +423,11 @@ def run(self):
                 "devtools/bundled_program/schema/scalar_type.fbs",
                 "devtools/bundled_program/serialize/scalar_type.fbs",
             ),
+            # Install executorch-wheel-config.cmake to pip package.
+            (
+                "build/executorch-wheel-config.cmake",
+                "share/cmake/executorch-config.cmake",
+            ),
         ]
         for src, dst in src_to_dst:
             dst = os.path.join(dst_root, dst)
@@ -663,6 +668,10 @@ def get_ext_modules() -> List[Extension]:
     return ext_modules
 
 
+# Override extension suffix to be ".so", skipping package info such as
+# "cpython-311-darwin"
+os.environ["SETUPTOOLS_EXT_SUFFIX"] = ".so"
+
 setup(
     version=Version.string(),
     # TODO(dbort): Could use py_modules to restrict the set of modules we
@@ -680,6 +689,7 @@ def get_ext_modules() -> List[Extension]:
         "executorch/schema": "schema",
         "executorch/devtools": "devtools",
         "executorch/devtools/bundled_program": "devtools/bundled_program",
+        "executorch/runtime": "runtime",
         "executorch/util": "util",
         # Note: This will install a top-level module called "serializer",
         # which seems too generic and might conflict with other pip packages.