Log in to access Rust Adventure videos!

Lesson Details

To turn csv rows into our own structs, we'll can add serde to our package.

cargo add -p upload-pokemon-data serde

Serde is a library that is widely used in the Rust ecosystem for serializing and deserializing Rust data types into various formats. Some of those formats include JSON, TOML, YAML, and MessagePack.

We will be using the Deserialize type to derive a deserializer for our PokemonCsv type, so be sure to add derive as a feature to the serde dependency in update-pokemon-data's Cargo.toml. If you do not do this, you will see errors because the code that enables us to do this won't be included in the serde library. You will need to add the version and features keys, although the version will already be specified as a string.

serde = { version = "1.0.129", features = ["derive"] }

Cargo features are powerful ways to turn different pieces of code on and off depending on what a consumer needs to use, ensuring that our projects don't include code they don't need in the compilation process.

Next we'll create a new sub-module to hold the PokemonCSV struct that represents what's in each CSV row. In src/main.rs:

mod pokemon_csv;
use pokemon_csv::*;

Rust modules don't necessarily reflect the filesystem, but in this case it will. We'll use mod pokemon_csv to define a new submodule that will exist in src/pokemon_csv.rs, and we'll use pokemon_csv::* to pull all of the public items into scope in main.rs.

In src/pokemon_csv.rs we'll define a new public struct PokemonCsv. We also label all of the fields as pub so that they're accessible if we want them wherever we create a PokemonCsv.

At the top we've included two derive macros for Debug and Deserialize. Debug we've already talked about. We're deriving it for convenience if we want to use the Debug formatter ("{:?}") or the dbg! macro with a value of type PokemonCsv. Deserialize is from serde, and does a pretty good job of automatically handling all of the fields with the types we've given. If we didn't derive Deserialize we'd have to manually implement it for every type... but all of these types already have implmentations so we let serde write that code for us.

We use a few different number types for the data in our csv: u8, u16, and f32.

  • A u8 is an unsigned (unsigned means: not negative) integer from 0-255, just like a color in CSS.
  • A u16 is bigger than a u8 and can hold values from 0 to 65535.

Why are there different integer types? because they're different sizes in memory. Storing a u8 takes 8 bits (one byte), while storing a u64 takes 64 bits (8 bytes). So if we can appropriately size the number we use, we can store more numberts in less memory.

An f32 is a 32 bit float. Floats store numbers with decimal places, so not integers.

The other types we're using are Option, bool, and String.

  • bool is true or false
  • Option is an enum that represent a value that can be there or not. The values are Some(value) if there is a value or None if there isn't.
  • String is an owned string. That is, it's mostly what you think of when you think of strings in languages like JavaScript. We can add more to it and otherwise do whatever we want with it.

We match all of these types to the types in the CSV. I've chosen to match the integer types as tightly as possible, even though I don't know if more pokemon with bigger values will be added in future generations, because I'm trying to match what's in the CSV now, not what could be added to the database in the future.

use serde::Deserialize;

#[derive(Debug, Deserialize)]
pub struct PokemonCsv {
    pub name: String,
    pub pokedex_id: u16,
    pub abilities: String,
    pub typing: String,
    pub hp: u8,
    pub attack: u8,
    pub defense: u8,
    pub special_attack: u8,
    pub special_defense: u8,
    pub speed: u8,
    pub height: u16,
    pub weight: u16,
    pub generation: u8,
    pub female_rate: Option<f32>,
    pub genderless: bool,
    #[serde(rename(deserialize = "legendary/mythical"))]
    pub is_legendary_or_mythical: bool,
    pub is_default: bool,
    pub forms_switchable: bool,
    pub base_experience: u16,
    pub capture_rate: u8,
    pub egg_groups: String,
    pub base_happiness: u8,
    pub evolves_from: Option<String>,
    pub primary_color: String,
    pub number_pokemon_with_typing: f32,
    pub normal_attack_effectiveness: f32,
    pub fire_attack_effectiveness: f32,
    pub water_attack_effectiveness: f32,
    pub electric_attack_effectiveness: f32,
    pub grass_attack_effectiveness: f32,
    pub ice_attack_effectiveness: f32,
    pub fighting_attack_effectiveness: f32,
    pub poison_attack_effectiveness: f32,
    pub ground_attack_effectiveness: f32,
    pub fly_attack_effectiveness: f32,
    pub psychic_attack_effectiveness: f32,
    pub bug_attack_effectiveness: f32,
    pub rock_attack_effectiveness: f32,
    pub ghost_attack_effectiveness: f32,
    pub dragon_attack_effectiveness: f32,
    pub dark_attack_effectiveness: f32,
    pub steel_attack_effectiveness: f32,
    pub fairy_attack_effectiveness: f32,
}

The only thing left to talk about is the use of a field-level attribute macro. Serde offers us the power to rename fields when we're deserializing, so we'll take advantage of that to rename the / out of legendary/mythical and transform it into is_legendary_or_mythical.

#[serde(rename(deserialize = "legendary/mythical"))]
pub is_legendary_or_mythical: bool,

In our for loop that iterates over the CSV reader, we can change the function used from records to deserialize. The deserialize function needs a type parameter to tell it what type to deserialize into. We can get Rust to infer that type if we label the record as a PokemonCsv type, because Rust is capable of knowing that this type will propogate back up to the deserialize function and it is the only possible value for that type parameter.

for result in rdr.deserialize() {
    let record: PokemonCsv = result?;
    println!("{:?}", record);
}

If you have Rust Analyzer with the type inlays on, you will see that Rust Analyzer correctly shows the type of result as Result<PokemonCsv, csv::Error>.

Running the program result in a DeserializeError that specifically specifies a ParseBool error at a specific byte on a specific line of the csv.

❯ cargo run --bin upload-pokemon-data
Error: Error(Deserialize { pos: Some(Position { byte: 781, line: 1, record: 1 }), err: DeserializeError { field: Some(14), kind: ParseBool(ParseBoolError { _priv: () }) } })

If we look at the csv values, we can see that this is because the true/false values are capital T True and capital F False, which don't parse into Rust's true and false.

We can create a new function, just for these fields, to deserialize True and False into bools.

We first need to use serde's field-level attribute macro to tell it that when we deserialize, we're going to use a function called from_capital_bool. You can see that we can also add it alongside other usage, such as the rename.

#[serde(deserialize_with = "from_capital_bool")]
pub genderless: bool,
#[serde(
    rename(deserialize = "legendary/mythical"),
    deserialize_with = "from_capital_bool"
)]
pub is_legendary_or_mythical: bool,
#[serde(deserialize_with = "from_capital_bool")]
pub is_default: bool,
#[serde(deserialize_with = "from_capital_bool")]

The from_capital_bool function signature is already defined for us by serde and is shown in the docs. We do not get the option to change it aside from the bool value that represents the value we'll be parsing out.

The function signature from the docs is

fn<'de, D>(D) -> Result<T, D::Error> where D: Deserializer<'de>

The function signature we use reads as:

The function from_capital_bool, which makes use of a lifetime named 'de and some type D, accepts an argument named deserializer that is of type D, and returns a Result where a successful deserialization ends up being a bool type and a failure is the associated Error type that D defines.

Additionally, D must implement the Deserializer trait, which makes use of the same 'de lifetime we talked about earlier.

As it happens, serde has an entire page explaining why the 'de lifetime is like this, and what the D: de::Deserializer<'de> trait bound is useful for.

The short version is that this new (to us) usage of lifetimes and generics is responsible for safely ensuring the ability to create zero-copy deserializations, which is some advanced Rust. We haven't done that in our PokemonCsv struct, but we could.

The usage of the 'de lifetime means that the input string that we're deserializing from needs to live as long as the struct that we're creating from it.

Overall, as it turns out, we're doing this so that we can take advantage of the csv crate's implementation of Deserializer to deserialize the string "True" or "False" from the csv's values.

Then we can directly match on that string value and turn "True" into true and "False" into false. If for some reason we've annotated the wrong field with this function and we get something that isn't one of those two strings, we fail with a custom error message.

fn from_capital_bool<'de, D>(
    deserializer: D,
) -> Result<bool, D::Error>
where
    D: de::Deserializer<'de>,
{
    let s: &str =
        de::Deserialize::deserialize(deserializer)?;

    match s {
        "True" => Ok(true),
        "False" => Ok(false),
        _ => Err(de::Error::custom("not a boolean!")),
    }
}

Keep in mind that the previous explanation of lifetimes and generics is something we could have avoided entirely if we wanted to. We could have mapped over the StringRecords and manually constructed the PokemonCsvs ourselves, never having touched serde.

We could have also cleaned up the csv data before attempting to parse it at all, manually switching out True for true and False for false. I've chosen to present you this deserialize_with approach specifically because it brings up new concepts and that's what this course is all about: learning more about Rust little by little.

We're left with the output being a PokemonCsv now.

PokemonCsv {
	name: "Bulbasaur",
	pokedex_id: 1,
	abilities: "Overgrow, Chlorophyll",
	typing: "Grass, Poison",
	hp: 45,
	attack: 49,
	defense: 49,
	special_attack: 65,
	special_defense: 65,
	speed: 45,
	height: 7,
	weight: 69,
	generation: 1,
	female_rate: Some(0.125),
	genderless: false,
	is_legendary_or_mythical: false,
	is_default: true,
	forms_switchable: false,
	base_experience: 64,
	capture_rate: 45,
	egg_groups: "Monster, Plant",
	base_happiness: 70,
	evolves_from: None,
	primary_color: "green",
	number_pokemon_with_typing: 15.0,
	normal_attack_effectiveness: 1.0,
	fire_attack_effectiveness: 2.0,
	water_attack_effectiveness: 0.5,
	electric_attack_effectiveness: 0.5,
	grass_attack_effectiveness: 0.25,
	ice_attack_effectiveness: 2.0,
	fighting_attack_effectiveness: 0.5,
	poison_attack_effectiveness: 1.0,
	ground_attack_effectiveness: 1.0,
	fly_attack_effectiveness: 2.0,
	psychic_attack_effectiveness: 2.0,
	bug_attack_effectiveness: 1.0,
	rock_attack_effectiveness: 1.0,
	ghost_attack_effectiveness: 1.0,
	dragon_attack_effectiveness: 1.0,
	dark_attack_effectiveness: 1.0,
	steel_attack_effectiveness: 1.0,
	fairy_attack_effectiveness: 0.5,
};

Finally, we can see a few of these fields are actually multiple values, abilities for example is "Overgrow, Chlorophyll", which is two abilities.

We can take the same approach we just did for the capital booleans to turn these array-strings into Vecs. Instead of returning bool we'll return a Vec<String> from our new from_comma_separated function.

We can use split to turn the string values into an Iterator over string slices (&str), which are views into the original string. Then we can filter out any potentially empty strings using filter and .is_empty() and finally map over those views to turn them into owned Strings, and collect into a Vec.

.collect() infers that it's type should be Vec<String> from the function signature, so we don't need to additionally specify it.

fn from_comma_separated<'de, D>(
    deserializer: D,
) -> Result<Vec<String>, D::Error>
where
    D: de::Deserializer<'de>,
{
    let s: &str =
        de::Deserialize::deserialize(deserializer)?;

    Ok(s.split(", ")
        .filter(|v| !v.is_empty())
        .map(|v| v.to_string())
        .collect())
}

With our new from_comma_separated function set up, we can put the deserialize_with on any types we want to deserialize into a Vec<String>.

#[serde(deserialize_with = "from_comma_separated")]
pub abilities: Vec<String>,
#[serde(deserialize_with = "from_comma_separated")]
pub typing: Vec<String>,
#[serde(deserialize_with = "from_comma_separated")]
pub egg_groups: Vec<String>,

And now we have a fully serialized PokemonCsv struct for every element in the csv.