Arrays, Slices, and Strings

Arrays and strings are the data structures that break the most programs. In C, they are raw memory with no guardrails. In Rust, they carry their length and check bounds. This chapter covers both approaches and shows why buffer overflows keep making headlines.

C Arrays: Fixed Size on the Stack

A C array is a contiguous block of elements. The size must be a compile-time constant (in standard C89/C99 with some caveats).

/* c_array.c */
#include <stdio.h>

int main(void)
{
    int arr[5] = {10, 20, 30, 40, 50};

    printf("sizeof(arr) = %zu bytes\n", sizeof(arr));   /* 20 */
    printf("elements    = %zu\n", sizeof(arr) / sizeof(arr[0])); /* 5 */

    for (int i = 0; i < 5; i++) {
        printf("arr[%d] = %d\n", i, arr[i]);
    }

    return 0;
}
  Stack layout:
  +----+----+----+----+----+
  | 10 | 20 | 30 | 40 | 50 |
  +----+----+----+----+----+
  arr[0]              arr[4]

  Total: 5 * sizeof(int) = 20 bytes

No length is stored anywhere. You, the programmer, must track it.

Variable-Length Arrays (VLAs)

C99 added VLAs where the size comes from a runtime value. They live on the stack and can blow it up.

/* vla.c */
#include <stdio.h>

void fill(int n)
{
    int arr[n];   /* VLA: size determined at runtime */
    for (int i = 0; i < n; i++) {
        arr[i] = i * i;
    }
    for (int i = 0; i < n; i++) {
        printf("%d ", arr[i]);
    }
    printf("\n");
}

int main(void)
{
    fill(5);
    fill(10);
    return 0;
}

Caution: VLAs are banned in the Linux kernel (-Wvla flag). A large n overflows the kernel stack (typically 8 KB or 16 KB). Use kmalloc or fixed-size arrays instead.

Heap Arrays in C

For dynamic sizes, allocate on the heap with malloc.

/* heap_array.c */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int n = 5;
    int *arr = malloc(n * sizeof(int));
    if (arr == NULL) {
        perror("malloc");
        return 1;
    }

    for (int i = 0; i < n; i++) {
        arr[i] = (i + 1) * 100;
    }

    for (int i = 0; i < n; i++) {
        printf("arr[%d] = %d\n", i, arr[i]);
    }

    free(arr);
    return 0;
}
  Stack                        Heap
  +-----+----------+          +-----+-----+-----+-----+-----+
  | arr | 0x8000  -|--------->| 100 | 200 | 300 | 400 | 500 |
  +-----+----------+          +-----+-----+-----+-----+-----+
  | n   |    5     |
  +-----+----------+

Rust Arrays: [T; N]

Rust arrays have their size baked into the type. [i32; 5] is a different type from [i32; 3].

// rust_array.rs
fn main() {
    let arr: [i32; 5] = [10, 20, 30, 40, 50];

    println!("length = {}", arr.len());

    for (i, val) in arr.iter().enumerate() {
        println!("arr[{}] = {}", i, val);
    }

    // Bounds checking at runtime:
    // let bad = arr[10];  // panics: index out of bounds
}
$ rustc rust_array.rs && ./rust_array
length = 5
arr[0] = 10
arr[1] = 20
arr[2] = 30
arr[3] = 40
arr[4] = 50

The length is part of the type. No separate variable needed.

Rust Vec<T>: The Growable Array

Vec<T> is Rust's heap-allocated, growable array. It replaces C's malloc/realloc pattern.

// vec_demo.rs
fn main() {
    let mut v: Vec<i32> = Vec::new();

    v.push(10);
    v.push(20);
    v.push(30);

    println!("length   = {}", v.len());
    println!("capacity = {}", v.capacity());

    for val in &v {
        println!("{}", val);
    }

    v.pop();  // removes last element
    println!("after pop: {:?}", v);
}
$ rustc vec_demo.rs && ./vec_demo
length   = 3
capacity = 4
10
20
30
after pop: [10, 20]
  Vec<T> layout:

  Stack (Vec struct)         Heap (buffer)
  +----------+---------+    +----+----+----+----+
  | pointer  | 0x5000 -|--->| 10 | 20 | 30 |    |
  +----------+---------+    +----+----+----+----+
  | length   |    3    |    [0]  [1]  [2]  unused
  +----------+---------+
  | capacity |    4    |
  +----------+---------+

When you push beyond capacity, Vec allocates a new, larger buffer, copies the data, and frees the old one. This is automatic realloc.

Slices: &[T]

C has no concept of a slice. When you pass an array to a function in C, you pass a pointer and pray the caller also passed the correct length.

Rust slices bundle pointer and length together.

// slices.rs
fn sum(data: &[i32]) -> i32 {
    let mut total = 0;
    for &val in data {
        total += val;
    }
    total
}

fn main() {
    let arr = [1, 2, 3, 4, 5];
    let v = vec![10, 20, 30];

    // Slice from array
    println!("sum(arr)       = {}", sum(&arr));
    println!("sum(arr[1..4]) = {}", sum(&arr[1..4]));  // 2+3+4

    // Slice from Vec
    println!("sum(v)         = {}", sum(&v));
    println!("sum(v[..2])    = {}", sum(&v[..2]));     // 10+20
}
$ rustc slices.rs && ./slices
sum(arr)       = 15
sum(arr[1..4]) = 9
sum(v)         = 60
sum(v[..2])    = 30

The C equivalent requires explicit length passing:

/* sum.c */
#include <stdio.h>

int sum(const int *data, int len)
{
    int total = 0;
    for (int i = 0; i < len; i++) {
        total += data[i];
    }
    return total;
}

int main(void)
{
    int arr[] = {1, 2, 3, 4, 5};
    printf("sum = %d\n", sum(arr, 5));
    printf("sum[1..4] = %d\n", sum(arr + 1, 3));
    return 0;
}

Rust Note: Slices perform bounds checking on every index access. This costs a branch instruction but prevents buffer overflows. In hot loops, the optimizer often eliminates the check.

Try It: Write a Rust function fn max_value(data: &[i32]) -> Option<i32> that returns None for an empty slice and Some(max) otherwise. Compare how much simpler it is than the C equivalent.

C Strings: Null-Terminated char *

C strings are arrays of char terminated by a zero byte ('\0'). There is no stored length.

/* cstring.c */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char greeting[] = "Hello";

    printf("string:  %s\n", greeting);
    printf("strlen:  %zu\n", strlen(greeting));   /* 5 */
    printf("sizeof:  %zu\n", sizeof(greeting));   /* 6 (includes \0) */

    /* Print each byte */
    for (int i = 0; i <= (int)strlen(greeting); i++) {
        printf("  [%d] = '%c' (%d)\n", i, greeting[i], greeting[i]);
    }

    return 0;
}
string:  Hello
strlen:  5
sizeof:  6
  [0] = 'H' (72)
  [1] = 'e' (101)
  [2] = 'l' (108)
  [3] = 'l' (108)
  [4] = 'o' (111)
  [5] = '' (0)       <-- null terminator
  C string in memory:
  +---+---+---+---+---+----+
  | H | e | l | l | o | \0 |
  +---+---+---+---+---+----+
  greeting[0]        greeting[5]

The Dangerous String Functions

strcpy -- No Bounds Checking

/* strcpy_bad.c -- DO NOT DO THIS in production */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char buf[8];
    char *input = "This string is way too long for buf";

    strcpy(buf, input);  /* BUFFER OVERFLOW */
    printf("%s\n", buf);
    return 0;
}

Caution: strcpy writes until it hits \0 in the source. It has no idea how big the destination is. This is the cause of thousands of CVEs.

strncpy -- Better, But Tricky

/* strncpy_demo.c */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char buf[8];
    strncpy(buf, "Hello, World!", sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';  /* strncpy may not null-terminate! */

    printf("buf = '%s'\n", buf);  /* "Hello, " (truncated) */
    return 0;
}

Caution: strncpy does NOT guarantee null-termination if the source is longer than the buffer. Always set the last byte to \0 manually.

snprintf -- The Safe Choice

/* snprintf_demo.c */
#include <stdio.h>

int main(void)
{
    char buf[16];
    int written = snprintf(buf, sizeof(buf), "Count: %d", 42);

    printf("buf = '%s'\n", buf);
    printf("would have written %d chars\n", written);

    /* If written >= sizeof(buf), truncation occurred */
    if (written >= (int)sizeof(buf)) {
        printf("WARNING: output truncated\n");
    }

    return 0;
}

snprintf always null-terminates (if size > 0) and tells you how many characters it wanted to write. Use it for all string formatting in C.

Driver Prep: The Linux kernel uses scnprintf, a variant that returns the number of characters actually written (not the would-have-been count). Never use sprintf in kernel code.

Rust Strings: String and &str

Rust has two main string types:

  • String -- owned, heap-allocated, growable (like Vec<u8> with UTF-8 guarantee)
  • &str -- borrowed string slice (like &[u8] but guaranteed UTF-8)
// rust_strings.rs
fn greet(name: &str) {
    println!("Hello, {}!", name);
}

fn main() {
    // String literal -> &str (stored in binary, read-only)
    let s1: &str = "world";
    greet(s1);

    // Owned String on the heap
    let s2: String = String::from("Rust");
    greet(&s2);  // &String auto-coerces to &str

    // Building strings
    let mut s3 = String::new();
    s3.push_str("Hello");
    s3.push(' ');
    s3.push_str("World");
    println!("{}", s3);

    // Length is always known
    println!("len = {}", s3.len());      // bytes
    println!("chars = {}", s3.chars().count()); // unicode scalar values
}
$ rustc rust_strings.rs && ./rust_strings
Hello, world!
Hello, Rust!
Hello World
len = 11
chars = 11
  String layout:

  Stack (String struct)       Heap
  +----------+---------+    +---+---+---+---+---+---+---+---+---+---+---+
  | pointer  | 0x7000 -|--->| H | e | l | l | o |   | W | o | r | l | d |
  +----------+---------+    +---+---+---+---+---+---+---+---+---+---+---+
  | length   |   11    |    UTF-8 bytes, NO null terminator
  +----------+---------+
  | capacity |   16    |
  +----------+---------+

  &str layout:
  +----------+---------+
  | pointer  | 0x7000  |    Points into String or binary data
  +----------+---------+
  | length   |   11    |    Fat pointer, always knows its length
  +----------+---------+

Rust Note: Rust strings are always valid UTF-8. You cannot put arbitrary bytes in a String. For raw bytes, use Vec<u8> or &[u8]. For OS-interface strings, use OsString and OsStr.

Buffer Overflows: Why They Happen

Buffer overflows happen when code writes past the end of a buffer. In C, this is trivially easy:

/* overflow.c */
#include <stdio.h>
#include <string.h>

int main(void)
{
    char password[8] = "secret";
    char buffer[8];

    printf("Enter name: ");
    /* gets() has been removed from the C standard.
       scanf without width limit is equally dangerous: */
    scanf("%s", buffer);  /* no length limit! */

    printf("buffer = '%s'\n", buffer);
    printf("password = '%s'\n", password);

    return 0;
}

If the user types more than 7 characters, buffer overflows into password (or whatever is adjacent on the stack). This is how stack-smashing attacks work. The Rust equivalent simply cannot overflow:

// no_overflow.rs
use std::io;

fn main() {
    let mut buffer = String::new();
    println!("Enter name:");
    io::stdin().read_line(&mut buffer).unwrap();

    // String grows as needed -- cannot overflow
    println!("buffer = '{}'", buffer.trim());
}

Side by Side: Processing CSV Lines

A practical example showing the difference in safety.

C Version

/* csv_c.c */
#include <stdio.h>
#include <string.h>

void parse_line(const char *line)
{
    char buf[256];
    strncpy(buf, line, sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';

    char *token = strtok(buf, ",");
    int col = 0;
    while (token != NULL) {
        printf("  col %d: '%s'\n", col, token);
        token = strtok(NULL, ",");
        col++;
    }
}

int main(void)
{
    const char *lines[] = {
        "Alice,30,Engineer",
        "Bob,25,Designer",
        "Carol,35,Manager",
    };

    for (int i = 0; i < 3; i++) {
        printf("Line %d:\n", i);
        parse_line(lines[i]);
    }

    return 0;
}

Rust Version

// csv_rust.rs
fn parse_line(line: &str) {
    for (col, token) in line.split(',').enumerate() {
        println!("  col {}: '{}'", col, token);
    }
}

fn main() {
    let lines = [
        "Alice,30,Engineer",
        "Bob,25,Designer",
        "Carol,35,Manager",
    ];

    for (i, line) in lines.iter().enumerate() {
        println!("Line {}:", i);
        parse_line(line);
    }
}

The Rust version has no fixed-size buffer, no null terminator management, no strtok with its hidden static state, and no possible overflow.

Try It: Extend the C version to handle lines longer than 256 characters. Notice how much code you need. Then notice that the Rust version already handles any length.

Knowledge Check

  1. What is the difference between strlen(s) and sizeof(s) for a char array?
  2. Why is strcpy dangerous? What should you use instead?
  3. How does a Rust &str differ from a C const char *?

Common Pitfalls

  • Forgetting the null terminator -- C strings need +1 byte. char buf[5] holds at most 4 characters.
  • Using strlen in a loop condition -- it traverses the string every call. Cache the length.
  • strncpy does not null-terminate -- if source is longer than n, the destination has no \0.
  • Mixing up bytes and characters -- UTF-8 characters can be 1-4 bytes. strlen counts bytes.
  • Array decay in sizeof -- sizeof(arr) inside the declaring function gives array size; inside a called function, it gives pointer size.
  • Off-by-one in loop bounds -- i <= n when you mean i < n.
  • Not checking snprintf return -- it tells you if truncation occurred; ignoring it means silent data loss.