Arrays, Slices, and Strings
Arrays and strings are the data structures that break the most programs. In C, they are raw memory with no guardrails. In Rust, they carry their length and check bounds. This chapter covers both approaches and shows why buffer overflows keep making headlines.
C Arrays: Fixed Size on the Stack
A C array is a contiguous block of elements. The size must be a compile-time constant (in standard C89/C99 with some caveats).
/* c_array.c */
#include <stdio.h>
int main(void)
{
int arr[5] = {10, 20, 30, 40, 50};
printf("sizeof(arr) = %zu bytes\n", sizeof(arr)); /* 20 */
printf("elements = %zu\n", sizeof(arr) / sizeof(arr[0])); /* 5 */
for (int i = 0; i < 5; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
return 0;
}
Stack layout:
+----+----+----+----+----+
| 10 | 20 | 30 | 40 | 50 |
+----+----+----+----+----+
arr[0] arr[4]
Total: 5 * sizeof(int) = 20 bytes
No length is stored anywhere. You, the programmer, must track it.
Variable-Length Arrays (VLAs)
C99 added VLAs where the size comes from a runtime value. They live on the stack and can blow it up.
/* vla.c */
#include <stdio.h>
void fill(int n)
{
int arr[n]; /* VLA: size determined at runtime */
for (int i = 0; i < n; i++) {
arr[i] = i * i;
}
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}
int main(void)
{
fill(5);
fill(10);
return 0;
}
Caution: VLAs are banned in the Linux kernel (
-Wvlaflag). A largenoverflows the kernel stack (typically 8 KB or 16 KB). Usekmallocor fixed-size arrays instead.
Heap Arrays in C
For dynamic sizes, allocate on the heap with malloc.
/* heap_array.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int n = 5;
int *arr = malloc(n * sizeof(int));
if (arr == NULL) {
perror("malloc");
return 1;
}
for (int i = 0; i < n; i++) {
arr[i] = (i + 1) * 100;
}
for (int i = 0; i < n; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
free(arr);
return 0;
}
Stack Heap
+-----+----------+ +-----+-----+-----+-----+-----+
| arr | 0x8000 -|--------->| 100 | 200 | 300 | 400 | 500 |
+-----+----------+ +-----+-----+-----+-----+-----+
| n | 5 |
+-----+----------+
Rust Arrays: [T; N]
Rust arrays have their size baked into the type. [i32; 5] is a different type
from [i32; 3].
// rust_array.rs fn main() { let arr: [i32; 5] = [10, 20, 30, 40, 50]; println!("length = {}", arr.len()); for (i, val) in arr.iter().enumerate() { println!("arr[{}] = {}", i, val); } // Bounds checking at runtime: // let bad = arr[10]; // panics: index out of bounds }
$ rustc rust_array.rs && ./rust_array
length = 5
arr[0] = 10
arr[1] = 20
arr[2] = 30
arr[3] = 40
arr[4] = 50
The length is part of the type. No separate variable needed.
Rust Vec<T>: The Growable Array
Vec<T> is Rust's heap-allocated, growable array. It replaces C's
malloc/realloc pattern.
// vec_demo.rs fn main() { let mut v: Vec<i32> = Vec::new(); v.push(10); v.push(20); v.push(30); println!("length = {}", v.len()); println!("capacity = {}", v.capacity()); for val in &v { println!("{}", val); } v.pop(); // removes last element println!("after pop: {:?}", v); }
$ rustc vec_demo.rs && ./vec_demo
length = 3
capacity = 4
10
20
30
after pop: [10, 20]
Vec<T> layout:
Stack (Vec struct) Heap (buffer)
+----------+---------+ +----+----+----+----+
| pointer | 0x5000 -|--->| 10 | 20 | 30 | |
+----------+---------+ +----+----+----+----+
| length | 3 | [0] [1] [2] unused
+----------+---------+
| capacity | 4 |
+----------+---------+
When you push beyond capacity, Vec allocates a new, larger buffer, copies the
data, and frees the old one. This is automatic realloc.
Slices: &[T]
C has no concept of a slice. When you pass an array to a function in C, you pass a pointer and pray the caller also passed the correct length.
Rust slices bundle pointer and length together.
// slices.rs fn sum(data: &[i32]) -> i32 { let mut total = 0; for &val in data { total += val; } total } fn main() { let arr = [1, 2, 3, 4, 5]; let v = vec![10, 20, 30]; // Slice from array println!("sum(arr) = {}", sum(&arr)); println!("sum(arr[1..4]) = {}", sum(&arr[1..4])); // 2+3+4 // Slice from Vec println!("sum(v) = {}", sum(&v)); println!("sum(v[..2]) = {}", sum(&v[..2])); // 10+20 }
$ rustc slices.rs && ./slices
sum(arr) = 15
sum(arr[1..4]) = 9
sum(v) = 60
sum(v[..2]) = 30
The C equivalent requires explicit length passing:
/* sum.c */
#include <stdio.h>
int sum(const int *data, int len)
{
int total = 0;
for (int i = 0; i < len; i++) {
total += data[i];
}
return total;
}
int main(void)
{
int arr[] = {1, 2, 3, 4, 5};
printf("sum = %d\n", sum(arr, 5));
printf("sum[1..4] = %d\n", sum(arr + 1, 3));
return 0;
}
Rust Note: Slices perform bounds checking on every index access. This costs a branch instruction but prevents buffer overflows. In hot loops, the optimizer often eliminates the check.
Try It: Write a Rust function
fn max_value(data: &[i32]) -> Option<i32>that returnsNonefor an empty slice andSome(max)otherwise. Compare how much simpler it is than the C equivalent.
C Strings: Null-Terminated char *
C strings are arrays of char terminated by a zero byte ('\0'). There is no
stored length.
/* cstring.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
char greeting[] = "Hello";
printf("string: %s\n", greeting);
printf("strlen: %zu\n", strlen(greeting)); /* 5 */
printf("sizeof: %zu\n", sizeof(greeting)); /* 6 (includes \0) */
/* Print each byte */
for (int i = 0; i <= (int)strlen(greeting); i++) {
printf(" [%d] = '%c' (%d)\n", i, greeting[i], greeting[i]);
}
return 0;
}
string: Hello
strlen: 5
sizeof: 6
[0] = 'H' (72)
[1] = 'e' (101)
[2] = 'l' (108)
[3] = 'l' (108)
[4] = 'o' (111)
[5] = '' (0) <-- null terminator
C string in memory:
+---+---+---+---+---+----+
| H | e | l | l | o | \0 |
+---+---+---+---+---+----+
greeting[0] greeting[5]
The Dangerous String Functions
strcpy -- No Bounds Checking
/* strcpy_bad.c -- DO NOT DO THIS in production */
#include <stdio.h>
#include <string.h>
int main(void)
{
char buf[8];
char *input = "This string is way too long for buf";
strcpy(buf, input); /* BUFFER OVERFLOW */
printf("%s\n", buf);
return 0;
}
Caution:
strcpywrites until it hits\0in the source. It has no idea how big the destination is. This is the cause of thousands of CVEs.
strncpy -- Better, But Tricky
/* strncpy_demo.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
char buf[8];
strncpy(buf, "Hello, World!", sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0'; /* strncpy may not null-terminate! */
printf("buf = '%s'\n", buf); /* "Hello, " (truncated) */
return 0;
}
Caution:
strncpydoes NOT guarantee null-termination if the source is longer than the buffer. Always set the last byte to\0manually.
snprintf -- The Safe Choice
/* snprintf_demo.c */
#include <stdio.h>
int main(void)
{
char buf[16];
int written = snprintf(buf, sizeof(buf), "Count: %d", 42);
printf("buf = '%s'\n", buf);
printf("would have written %d chars\n", written);
/* If written >= sizeof(buf), truncation occurred */
if (written >= (int)sizeof(buf)) {
printf("WARNING: output truncated\n");
}
return 0;
}
snprintf always null-terminates (if size > 0) and tells you how many
characters it wanted to write. Use it for all string formatting in C.
Driver Prep: The Linux kernel uses
scnprintf, a variant that returns the number of characters actually written (not the would-have-been count). Never usesprintfin kernel code.
Rust Strings: String and &str
Rust has two main string types:
String-- owned, heap-allocated, growable (likeVec<u8>with UTF-8 guarantee)&str-- borrowed string slice (like&[u8]but guaranteed UTF-8)
// rust_strings.rs fn greet(name: &str) { println!("Hello, {}!", name); } fn main() { // String literal -> &str (stored in binary, read-only) let s1: &str = "world"; greet(s1); // Owned String on the heap let s2: String = String::from("Rust"); greet(&s2); // &String auto-coerces to &str // Building strings let mut s3 = String::new(); s3.push_str("Hello"); s3.push(' '); s3.push_str("World"); println!("{}", s3); // Length is always known println!("len = {}", s3.len()); // bytes println!("chars = {}", s3.chars().count()); // unicode scalar values }
$ rustc rust_strings.rs && ./rust_strings
Hello, world!
Hello, Rust!
Hello World
len = 11
chars = 11
String layout:
Stack (String struct) Heap
+----------+---------+ +---+---+---+---+---+---+---+---+---+---+---+
| pointer | 0x7000 -|--->| H | e | l | l | o | | W | o | r | l | d |
+----------+---------+ +---+---+---+---+---+---+---+---+---+---+---+
| length | 11 | UTF-8 bytes, NO null terminator
+----------+---------+
| capacity | 16 |
+----------+---------+
&str layout:
+----------+---------+
| pointer | 0x7000 | Points into String or binary data
+----------+---------+
| length | 11 | Fat pointer, always knows its length
+----------+---------+
Rust Note: Rust strings are always valid UTF-8. You cannot put arbitrary bytes in a
String. For raw bytes, useVec<u8>or&[u8]. For OS-interface strings, useOsStringandOsStr.
Buffer Overflows: Why They Happen
Buffer overflows happen when code writes past the end of a buffer. In C, this is trivially easy:
/* overflow.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
char password[8] = "secret";
char buffer[8];
printf("Enter name: ");
/* gets() has been removed from the C standard.
scanf without width limit is equally dangerous: */
scanf("%s", buffer); /* no length limit! */
printf("buffer = '%s'\n", buffer);
printf("password = '%s'\n", password);
return 0;
}
If the user types more than 7 characters, buffer overflows into password
(or whatever is adjacent on the stack). This is how stack-smashing attacks work.
The Rust equivalent simply cannot overflow:
// no_overflow.rs use std::io; fn main() { let mut buffer = String::new(); println!("Enter name:"); io::stdin().read_line(&mut buffer).unwrap(); // String grows as needed -- cannot overflow println!("buffer = '{}'", buffer.trim()); }
Side by Side: Processing CSV Lines
A practical example showing the difference in safety.
C Version
/* csv_c.c */
#include <stdio.h>
#include <string.h>
void parse_line(const char *line)
{
char buf[256];
strncpy(buf, line, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0';
char *token = strtok(buf, ",");
int col = 0;
while (token != NULL) {
printf(" col %d: '%s'\n", col, token);
token = strtok(NULL, ",");
col++;
}
}
int main(void)
{
const char *lines[] = {
"Alice,30,Engineer",
"Bob,25,Designer",
"Carol,35,Manager",
};
for (int i = 0; i < 3; i++) {
printf("Line %d:\n", i);
parse_line(lines[i]);
}
return 0;
}
Rust Version
// csv_rust.rs fn parse_line(line: &str) { for (col, token) in line.split(',').enumerate() { println!(" col {}: '{}'", col, token); } } fn main() { let lines = [ "Alice,30,Engineer", "Bob,25,Designer", "Carol,35,Manager", ]; for (i, line) in lines.iter().enumerate() { println!("Line {}:", i); parse_line(line); } }
The Rust version has no fixed-size buffer, no null terminator management, no
strtok with its hidden static state, and no possible overflow.
Try It: Extend the C version to handle lines longer than 256 characters. Notice how much code you need. Then notice that the Rust version already handles any length.
Knowledge Check
- What is the difference between
strlen(s)andsizeof(s)for a char array? - Why is
strcpydangerous? What should you use instead? - How does a Rust
&strdiffer from a Cconst char *?
Common Pitfalls
- Forgetting the null terminator -- C strings need
+1byte.char buf[5]holds at most 4 characters. - Using
strlenin a loop condition -- it traverses the string every call. Cache the length. strncpydoes not null-terminate -- if source is longer thann, the destination has no\0.- Mixing up bytes and characters -- UTF-8 characters can be 1-4 bytes.
strlencounts bytes. - Array decay in sizeof --
sizeof(arr)inside the declaring function gives array size; inside a called function, it gives pointer size. - Off-by-one in loop bounds --
i <= nwhen you meani < n. - Not checking
snprintfreturn -- it tells you if truncation occurred; ignoring it means silent data loss.