Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing the dreaded ⨯ TypeError: Cannot read properties of undefined (reading 'call') error again with NextJS #2879

Open
5 tasks done
benmarch opened this issue Aug 22, 2024 · 16 comments

Comments

@benmarch
Copy link
Contributor

benmarch commented Aug 22, 2024

Describe the bug

Note: I'm not able to reproduce this in the example repos, and I can't expose my project publicly. I've spent a while trying to track down the problem, but I can't seem to find where these specific modules are removed from the module cache. I'm just hoping for some advice on how to track this down.

It appears only to happen with modules in the share scope (at least the ones that trigger the error are specifically shared modules, but I'm not sure if other modules would also have the same issue if the app didn't crash).

This was an issue in a much older version of the MF plugins, but I hadn't seen it in quite some time. I'm upgrading from [email protected] and [email protected] to [email protected] and [email protected].

Any advice would be appreciated. Thanks!

More info:

This appears to happen only after a remote updates and the host detects the update. I see this message in the console:

<remote name> hash is different - must hot reload server

Then shortly afterward I see this:

⨯ TypeError: Cannot read properties of undefined (reading 'call')

which points to webpack-api-runtime.js in the __webpack_require__ function:

/******/ 		var threw = true;
/******/ 		try {
/******/ 			var execOptions = { id: moduleId, module: module, factory: __webpack_modules__[moduleId], require: __webpack_require__ };
/******/ 			__webpack_require__.i.forEach(function(handler) { handler(execOptions); });
/******/ 			module = execOptions.module;
/>>>>>>/ 			execOptions.factory.call(module.exports, module, module.exports, execOptions.require);
/******/ 			threw = false;
/******/ 		} finally {
/******/ 			if(threw) delete __webpack_module_cache__[moduleId];
/******/ 		}

I've tried commenting out some of the new code in hot-reload.ts but it still occurs.

Reproduction

(sorry, i can't reproduce outside my project)

Used Package Manager

npm

System Info

System:
    OS: macOS 13.6.9
    CPU: (12) x64 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
    Memory: 76.14 MB / 16.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 18.19.1 - /usr/local/bin/node
    npm: 10.2.4 - /usr/local/bin/npm
    pnpm: 8.15.2 - /usr/local/bin/pnpm
  Browsers:
    Chrome: 127.0.6533.120
    Safari: 16.6

Validations

@ScriptedAlchemy
Copy link
Member

I have major PR which changes how the runtime is embedded. This may fix it. Do you have any custom chunk split?

@benmarch
Copy link
Contributor Author

Awesome, I can try to upgrade again when that's ready. No, we don't have any custom chunk splitting. Our Next config just includes the MF webpack plugin, webpack resolve fallbacks, and transpilePackages. Maybe it's worth noting that we're producing a "standalone" bundle and deploying via Docker; not sure if that affects anything.

For full context, we're running into an issue (and have been throughout this project over the past year) where anytime we deploy a producer without the consumer updating, we get an error from the MF runtime, and then the app stops working properly. Because it's NextJS, it manifests as hanging requests that eventually 504.

I think this is due to our infrastructure: we use AWS Fargate on ECS to deploy 2 tasks for each MFE. What i believe is happening is that when a producer redeploys, both the 2 old tasks and the 2 new tasks are available concurrently for a short period of time. During this window, the consumers detect the change and refetch and recache the entry point. However, because we don't have sticky sessions, the consumer could first hit a new task to get the entry, then hit an old task when pulling a chunk/container. We see this in the logs as a 404 on the chunk because the filename is different on the old task versus what is in the new entry file.

I've been trying to force clear the cache when this happens so that the consumer can refetch after the old tasks are removed, but there isn't a good way to do this. I saw in a recent release that the hashmap used by the hot-reload script was made available globally, so i tried to clear that from an API route. Technically, i think that worked, but i was running into the (reading 'call') error (even without clearing the hashmap).

I have rolled back to the versions that were working, and I've found a workaround where I "force" the revalidate function to invalidate, but for some reason, i need to call it multiple times for it to work (I always thought it was not working at all, but hitting it in rapid succession seems to work). Ideally, I'd like to be able to do this from the errorLoadRemote hook, but because it's an AsyncHook, the one from the nextjs-mf plugin prevents mine from being called. Anyway, I'm deploying this to our QA environment now for testing.

If you're working on a runtime update, maybe you could expose a really simple and safe way to force clear all caches (not sure why revalidate(undefined, true) isn't working every time)? It would be great if we automatically trigger that when an error occurs, but even via a secure API route would be helpful for us.

I very much appreciate your work on this, thanks!

@ScriptedAlchemy
Copy link
Member

@benmarch try 0.0.0-next-20240903212627 and add embedRuntime:true to module federation plugin options.

In this release i have done a few things:

  1. i "unpatch" entrypoints that are /api so fed runtime is not added.
  2. i redesigned module hoisting in chunks so we now have our own tree shake and module optimization, letting me ensure all referenced modules are hoisted into the runtime at boot, so no eager call of undefined should happen.

@benmarch
Copy link
Contributor Author

benmarch commented Sep 9, 2024

Thanks @ScriptedAlchemy! Sorry for the delayed response, i caught the plague last week. Let me try that and i'll get back as soon as i can.

@ScriptedAlchemy
Copy link
Member

yeah no worries, feel better!

I will be releaseing another "half" of the update in the near future. the current release opts into the experiment under the hood and the next release should implement the other half the the experiment so i no longer rely on entrypoint patching to inject federation runtime

@benmarch
Copy link
Contributor Author

benmarch commented Sep 11, 2024

Ok sorry again for the delay, and thank you for the well wishes! I'm getting an error in the server console when accessing the home page with this version and config:

⨯ ../packages/module-federation/node_modules/@module-federation/webpack-bundler-runtime/dist/index.esm.js (125:26) @ eval
⨯ Error: Federation instance not found!
   at eval (webpack-internal:///../../packages/module-federation/node_modules/@module-federation/webpack-bundler-runtime/dist/index.esm.js:129:27)
   at Array.forEach (<anonymous>)
   at Object.consumes (webpack-internal:///../../packages/module-federation/node_modules/@module-federation/webpack-bundler-runtime/dist/index.esm.js:108:31)
   at __webpack_require__.f.consumes (<my app>/apps/home/.next/server/webpack-api-runtime.js:1441:59)
   at <my app>/apps/home/.next/server/webpack-api-runtime.js:817:40
   at Array.reduce (<anonymous>)
   at __webpack_require__.e (<my app>/apps/home/.next/server/webpack-api-runtime.js:816:67)
   at __WEBPACK_DEFAULT_EXPORT__ (webpack-internal:///(api)/./src/pages/api/auth/[...nextauth].ts:6:442)
   at <my app>/node_modules/next/dist/compiled/next-server/pages-api.runtime.dev.js:21:3039
   at <my app>/node_modules/next/dist/server/lib/trace/tracer.js:133:36
   at NoopContextManager.with (<my app>/node_modules/next/dist/compiled/@opentelemetry/api/index.js:1:7062)
   at ContextAPI.with (<my app>/node_modules/next/dist/compiled/@opentelemetry/api/index.js:1:518)
   at NoopTracer.startActiveSpan (<my app>/node_modules/next/dist/compiled/@opentelemetry/api/index.js:1:18093)
   at ProxyTracer.startActiveSpan (<my app>/node_modules/next/dist/compiled/@opentelemetry/api/index.js:1:18854)
   at <my app>/node_modules/next/dist/server/lib/trace/tracer.js:122:103 {
 page: '/api/auth/[...nextauth]'
}
null

And the debugger in the browser shows this (which is not pointing to a federated import, nor are there any transitive federated imports from here):
image

@ScriptedAlchemy
Copy link
Member

Can you create a repo I can debug?

@benmarch
Copy link
Contributor Author

Let me see what i can do. I wasnt able to reproduce with the example repo earlier, but i'll try to set one up this weekend.

@ScriptedAlchemy
Copy link
Member

Or can you do a zoom call? Dm me on twitter and we can do one whenever, just need to see a handful of thing for a few moments to understand usage

@benmarch
Copy link
Contributor Author

Thanks @ScriptedAlchemy, I pinged you on twitter. I have all afternoon tomorrow and pretty much all day friday free if that works. Next week works for me as well.

@benmarch
Copy link
Contributor Author

benmarch commented Oct 4, 2024

@ScriptedAlchemy I was able to partially reproduce this locally with the prerelease from this release (the one from last Friday): https://github.com/module-federation/core/actions/runs/11076163958/job/30778657753

I was able to get the ID of the module that wasn't loading (35414) and the only place it maps to is this line in remoteEntry.js of one of the producers ("search"):

"./pages/SearchResults":()=>Promise.all([r.e(8056),r.e(2072),r.e(7600),r.e(352),r.e(372),r.e(7276),r.e(272),r.e(4004),r.e(1492),r.e(1188),r.e(7152),r.e(1944),r.e(7584),r.e(9808),r.e(5414),r.e(4820)]).then(()=>()=>r(35414))

This is part of what appears to be a map of exposed modules ("./pages/SearchResults" being one of the exposed modules).

There is no other reference to 35414 anywhere, but that seems consistent with the other exposed modules so idk.

To reproduce, i built the standalone apps and booted them up. Most pages are working as expected, but as soon as I go to the SearchResultsPage it throws this error with a 500. Looking at the code for the SearchResultsPage ssr chunk, it seems to be missing a lot of code that I would expect to be in there, so I'm wondering if it just didn't build correctly. I'm still digging into it; i want to try to comment some code out in that page component to see if it's contributing...

Ok, the main difference on the SearchResultsPage compared to others is that we're using next/dynamic with { ssr: false }. If i remove { ssr: false } but keep dynamic it works as expected.


Next day... I was able to fully reproduce exactly the error we're seeing in QA (stack trace redacted):
Screenshot 2024-10-04 at 10 31 23 AM

The syntax error is caused by a producer sending the 404 page to the consumer and Webpack tries to load it as a JS file. Once this happens, I haven't been able to find a way to recover, even by revalidating.

Immediately after this happens, then next page load starts showing the "property of undefined" error, and it appears always to point to the page module. After i triggered this again i got this module ID: 40096. This represents the "AssetPage" which was the page i attempted to navigate to after the syntax error.

It's still possible that this is related to some code i have in my runtime plugin to remove things from the FederationHost module cache. Let me remove that and try again...

Ok, I'm still getting the syntax error, but I'm no longer getting the "property of undefined" error 🎉 . This was the code i had in errorLoadRemote that I removed:

  if (origin?.moduleCache?.has?.(remoteID)) {
      log(remoteID, `Removing remote ${remoteID} from the module cache.`)
      origin.moduleCache.delete(remoteID)
    }

So this immediate issue might be resolved, but i have more information about the effects of the syntax error:

Even without the "property of undefined" error, the app does not recover from the syntax error (when the remote chunk 404s). Interestingly, this only happens on the page that the error occurred; the other pages continue to work correctly. Even after "fixing" the producers and revalidating the consumer, the app does not recover, it just keeps replaying the syntax error. I wonder if Webpack is still caching this due to the error?

I think I'll stop here and get your feedback. I'll try updating to the latest prerelease as well to see if the latest changes make a difference. Thanks!


Quick update: this is where the syntax error is being thrown, and I think it's part of the MF code, so maybe it just comes down to cache-busting these errors:
Screenshot 2024-10-04 at 11 23 47 AM

@ScriptedAlchemy
Copy link
Member

You cant use next/dynamic with federated imports - it converts them into require calls, not async imports. The ssr flag doesnt do anything as it still requires the code in both builds. Use react lazy instead.

If there is syntax error then yes it might be getting a http response like 404 page from the url and cannot evaluate it. If the url was a query string, would that get around the cache? or is it that the whole url it gets just doesnt exist anymore.

@benmarch
Copy link
Contributor Author

benmarch commented Oct 10, 2024

@ScriptedAlchemy I've spent a few days trying to replace all my next/dynamics with react lazy, but I can't seem to avoid hydration errors when i use it. Even in the simplest cases I'll either get "initial UI does not match what was rendered on the server" or "Suspense boundary received an update before it finished hydrating". I see in the docs it says using next/dynamic will cause hydration errors, but I'm seeing the opposite.

Is this a complete no-go if I'm using dynamic, or is it ok as long as it's not causing immediate errors?


If there is syntax error then yes it might be getting a http response like 404 page from the url and cannot evaluate it. If the url was a query string, would that get around the cache? or is it that the whole url it gets just doesnt exist anymore.

The whole URL doesn't exist because it's named based on a hash. So an example would be "/_next/static/ssr/__federation_expose_AssetCard-8e384a5185d76b62.js". When it changes, the hash would be different and return a 404, which does contain the HTML for the generic 404 page and fails to evaluate.

@ScriptedAlchemy
Copy link
Member

ScriptedAlchemy commented Oct 11, 2024

Next dynamic can only be used on federated imports. If you have other imports for non-module federation code, Next will want dynamic for inter-component dynamic import or local imports.

Regarding the 404, yes, that looks right. Or at least like there's an old remote entry in the server but the JavaScript files and unloaded chunks have changed? Does the 404 evaluation "hard crash" execution or does it just throw to a runtime plugin hook?

@benmarch
Copy link
Contributor Author

@ScriptedAlchemy when the 404 fails to evaluate it does throw to the runtime plugin, but it never recovers like it does when a remote is offline initially and then comes online later. Meaning, even when the remotes are invalidated and the caches are cleared, I'll keep getting the same error replayed until the consumer is rebooted.

I unfortunately need to work on some other things, but I'll come back to this the week after next. Thank you for your help on this!

@ScriptedAlchemy
Copy link
Member

No worries. Thanks ill try looking around with this info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@benmarch @ScriptedAlchemy and others