Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulate BLE connection between central and peripheral zephyr samples #56

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

grifcj
Copy link
Contributor

@grifcj grifcj commented Oct 20, 2021

Changes to enable simulation of bluetooth connection between zephyr samples central_hr and periperhal_hr. I've been testing against main, most recently this commit. Simulation should work without any modifications to project configs. I did most development with heart-rate samples, but also did a quick check with central_ht/peripheral_ht and those seemed to connect.

west commands to build zephyr images

west build -d ../build-central-hr -b nrf52840dk_nrf52840 -p always samples/bluetooth/central_hr
west build -d ../build-peripheral-hr -b nrf52840dk_nrf52840 -p always samples/bluetooth/peripheral_hr

Changes here are paired with updates to parent renode repository, which contains an example script for executing the demo. See bluetooth-work branch in my fork.

The changes primarily deal with updates for coordinating the radio with timers and PPI. The zephyr stack makes heavy use of PPI and rtc0, timer0, & timer1 for scheduling actions in bluetooth controller firmware. The firmware is fairly time aware, and expects radio actions to consume some amount of time and that simulation has enough resolution to adequately order events below 100 us. I found that a simulation quantum of 100 us would not function, but a 10 us one does. Perhaps there are ways around this, as it unfortunately affects overall simulation performance.

I've only tested with stock zephyr configuration of samples and with privacy and security disabled. Things that rely on additional hardware features beyond default config, e.g. TIFS hardware support, isn't included in these changes.

image

image

@PiotrZierhoffer
Copy link
Member

Hey @grifcj thanks for your contribution!

It's very very interesting. We confirmed it to work on our side and we're processing it internally.

There are some parts that are a little controversial and we need to spend some time on them (especially in the time handling areas), but we hope to move swiftly with this.

As the PR is quite large, we'll let you know what can we just merge in and which parts we'd like to discuss more.

@grifcj
Copy link
Contributor Author

grifcj commented Oct 24, 2021

@PiotrZierhoffer Thanks for taking a look.

I suspected I might be trampling on an important optimization with the timer change or breaking something. Hoping the test cases I added will cover what's needed for the bluetooth simulation, so that impl may be reverted to something more similar to what it was while keeping tests going if need be.

I'm sure you've seen some of the other things too that we're added mainly to support debugging, but perhaps shouldn't make it to master, least not in current form.

  • Logging virtual time to console and file. Hard-coded, not flexible, not convinced it being captured in right place.
  • Bluetooth channels for radio model. Specific to BLE.
  • Gratuitous logging

I did a little more characterization. Out of the zephyr samples, the basic central and peripheral ones seem to work. The central-multilink also worked with 3 peripherals, provided I commented out RSSI checks and programmed peripherals with different addresses.

multi-link-sim

Some of the notable things that don't currently work which might be worth checking into next:

  • RSSI (think there might be references for this in other radio examples?)
  • Privacy and security features, i.e. CONFIG_BT_PRIVACY and CONFIG_BT_SMP

Also, no automated integration testing yet. Wonder if there was a preference to some simple robot tests or perhaps checking into using the existing zephyr test suite in some way. I believe I remember reading twister can deploy to boards and simulators. I'm not aware of any other demos using this radio model (NRF52840_Radio.cs) for other protocols, but if so, I haven't regression tested in that regard either.

@mateusz-holenko
Copy link
Member

I suspected I might be trampling on an important optimization with the timer change or breaking something. Hoping the test cases I added will cover what's needed for the bluetooth simulation, so that impl may be reverted to something more similar to what it was while keeping tests going if need be.

Thanks for those tests. I'm currently looking into changes in the timers infrastructure.

I'm sure you've seen some of the other things too that we're added mainly to support debugging, but perhaps shouldn't make it to master, least not in current form.

I agree that part of the proposed changes could be extracted to separate PRs (or dropped).

Also, no automated integration testing yet. Wonder if there was a preference to some simple robot tests or perhaps checking into using the existing zephyr test suite in some way.

I've already created a simple robot test for central_hr/peripheral_hr samples and I'm planning to add it when merging changes from this PR.

@grifcj
Copy link
Contributor Author

grifcj commented Nov 7, 2021

Latest work here does a couple things.

It enables some simple soft device execution as described here

Adding a basic ECB peripheral enables execution with CONFIG_BT_SMP. CONFIG_BT_PRIVACY also works. Work here being defined as heart-rate example between central/peripheral can connect and send notifications. This was against latest zephyr/main:d435340f1b.

@mateusz-holenko
Copy link
Member

@grifcj I merged fixes to BasicClockSource and ComparingTimer together with your (slightly refactored) tests upstream: 62fa1ef0110d5004f64cb01dae1c684b6426f25e.

@grifcj
Copy link
Contributor Author

grifcj commented Nov 10, 2021

Cool, so it be ok if I rebase on latest master, drop the related commits from PR, and force push the PR branch? Or is another process preferred?

@mateusz-holenko
Copy link
Member

Cool, so it be ok if I rebase on latest master, drop the related commits from PR, and force push the PR branch?

That would be great, thanks!

@grifcj
Copy link
Contributor Author

grifcj commented Nov 11, 2021

@mateusz-holenko As you can see, pushed changes up. Added a couple, the interrupt fix to RTC being the important one.

Spot checked against 3 sims and all appeared to work as intended / or at least equivalent to where I last left them. The last two configurations are less functional than the zephyr one.

  • Zephyr central-hr w/ peripheral-hr
    Connection functional and logging for HR notifications observed
  • Nordic softdevice peripher-hr
    Wireshark shows tracing for advertising
  • Zephyr central-hr w/ riot+nimble peripheral-hr
    Peripheral is advertising but not responding correctly to connect requests.

@grifcj
Copy link
Contributor Author

grifcj commented Nov 15, 2021

Latest commits enhance radio model and enable simulation of RIOT/NimBLE peripheral w/ Zephyr central. The existing Zephyr central / peripheral functionality remains. Nordic softdevice peripheral with zephyr central seemed more functional, but no functional connection.

Timing accuracy seems to require 1 us quantum, which slows down simulation considerably.

Screenshot from 2021-11-14 23-30-20

Couple of issues.

  • The direct timer0 captures are a hack, but it seems to be one way to get timer capture without delay. Capture must coincide with the end event and even a 1 us delay through next sync point is too much for NimBLE logic. An ISR fires and reads timer value immediately and expects it to be the captured packet end time.
  • I'm using a nrf52840.repl that tags DEVICEADDR like so. This works for RIOT but not Zephyr. There seems to be an issue with word vs byte access. RIOT build was triggering byte access and I think Zephyr the opposite.
        Tag <0x100000a0, 0x100000a3> "DEVICEADDRTYPE" 0x1 # random address
        Tag <0x100000a4, 0x100000a4> "DEVICEADDR[0][0]" 0xDD
        Tag <0x100000a5, 0x100000a5> "DEVICEADDR[0][1]" 0xCC
        Tag <0x100000a6, 0x100000a6> "DEVICEADDR[0][2]" 0xBB
        Tag <0x100000a7, 0x100000a7> "DEVICEADDR[0][3]" 0xAA

@mateusz-holenko
Copy link
Member

I checked and merged upstream (with cosmetic formatting changes) commits fixing things in CLOCK, PPI and Timer.

I also took a look at the ECB model and it seems fine. I believe it should be fairly simple to add actual encryption logic after extracting it from CC2538_Cryptoprocessor - I'll do that as a next step.

I'm not sure whether logger-related changes should be merged upstream. If you find them useful, could you extract them from this PR so that we can discuss them separately?

@grifcj
Copy link
Contributor Author

grifcj commented Nov 25, 2021

Rebased on latest master, dropping merged commits. Extracted logging to separate branch and will create ticket for further discussion / development.

@mateusz-holenko
Copy link
Member

I extended the ECB model with actual AES calculations and merged this together with RTC changes upstream.

What's left for review now are changes to the Radio controller model.

Code in interrupt handlers assume some time delay before packet goes
out, hence the resulting actions as triggered to PPI need some delay.
This change was found to ok for 10 us quantum.
Remove while loop, just pull one frame off queue. Loop causes simulation
to deadlock because after receiving first frame, then radio is disabled
and the receive just enqueues again.
Between RxEnable and PacketEnd and RadioDisable, embedded code may be
written in way that assumes there is some time passing. Have not
verified this like with transmit side, but seems a reasonable assumption
that it should be similar.
When sending to wireless medium, use in-air representation. When
reading/writing to memory use RAM representation.

Radio allows s1 field to be optionally included in on-air packet. This
becomes relevant for BLE data PDUs which may have 16 or 24 bit header.
Calculation of timing for transmit and receive side of radio events is
similar enough to share implementation.
Bit counter can used for generating successive events. This
implementation just fires a single match event configured prior to
receiving radio frame.
@grifcj
Copy link
Contributor Author

grifcj commented Dec 4, 2021

Rebased. Tested again zephyr and riot simulations. Everything still works.

My platform file has some small mods to make RIOT simulation work. It needs the interrupt hookup for ECB peripheral. The DEVICEADDR stuff was described above

@@ -87,11 +87,15 @@ rng: Miscellaneous.NRF52840_RNG @ sysbus 0x4000D000
     -> nvic@13
 
 ecb: Miscellaneous.NRF52840_ECB @ sysbus 0x4000E000
+    -> nvic@14
 
 sysbus:
     init:
         ApplySVD @https://dl.antmicro.com/projects/renode/svd/NRF52840.svd.gz
         Tag <0x10000000, 0x10000FFF> "FICR"
         Tag <0x100000a0, 0x100000a3> "DEVICEADDRTYPE" 0x1 # random address
-        Tag <0x100000a4, 0x100000a7> "DEVICEADDR[0]" 0xAABBCCDD
+        Tag <0x100000a4, 0x100000a4> "DEVICEADDR[0][0]" 0xDD
+        Tag <0x100000a5, 0x100000a5> "DEVICEADDR[0][1]" 0xCC
+        Tag <0x100000a6, 0x100000a6> "DEVICEADDR[0][2]" 0xBB
+        Tag <0x100000a7, 0x100000a7> "DEVICEADDR[0][3]" 0xAA
         Tag <0x10001200, 0x10001203> "PSELRESET"

@mateusz-holenko
Copy link
Member

Most of the changes to NRF52840_Radio are now merged upstream - thanks again for your contribution @grifcj!

We added an automated test running the Zephyr peripheral/central HR sample.

For now we skipped the direct PPI events triggering, as those were quite hacky and seemed not to be necessary for the Zephyr demo. In the comment you mentioned that those were necessary to support RIOT stack correctly, is that right? Could you elaborate on this a bit more - perhaps we can come up with an alternative solution to the problem.

@grifcj
Copy link
Contributor Author

grifcj commented Jan 27, 2022

Some of the RIOT ISRs have code like below, which rely on timer captures triggered by radio events. This particular one is in nimble/drivers/nrf52/src/ble_phy.c (I believe this was one of the problem ones, but didn't verify it again). Summary is that the simulated day from radio event to PPI task is 1 or 2 quantums, which allows too much ISR code to execute and results in incorrect simulated behavior.

image

Just a bit more specificity on this particular scenario, the preprogrammed channels 26 and 27 tie radio events Address and End to captures on Timer0 CC1 and CC2. On hardware, PPI propagation is synced to 16 MHz clock, so a max latency of 62.5 ns. When run in simulation, the PPI model uses the "ExecuteInNearestSyncState" for executing the tasks triggered by events and hence the simulated latency will be dependent on chosen time quantum. If I recall correctly, it can be up to 2 quantum delay. So 2 us >> 62 ns (assuming 1 us quantum, the min), and the simulated CPU executes too much ISR code (which is fired by radio END event), passing the statements where the timer captures are evaluated. The direct writes to the timer tasks in the radio model bypass the sync delays, ensuring that that captures take place in time.

Zephyr on the other hand doesn't have any code that stresses the simulation models in this fashion, so it executes fine without these statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants