class CIMNormalize extends CIMRDD with Serializable
Handle normalization.
For each element with a 1:N relation, ensure the N referece the 1 and not vice versa.
- Alphabetic
- By Inheritance
- CIMNormalize
- Serializable
- Serializable
- CIMRDD
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
CIMNormalize(spark: SparkSession, storage: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER)
- spark
The Spark session this class is running in.
- storage
The storage level to cache the resultant RDD.
Type Members
-
case class
Relation(parent: String, parent_class: String, relationship: CIMRelationship, referred: List[String]) extends Product with Serializable
A denormalized relation to be fixed.
A denormalized relation to be fixed.
- parent
the id of the object containing the denormalized list
- parent_class
the class of the denormalized object
- relationship
the relationship (from the static list in the companion object of the class or superclass with the 1:N relation)
- referred
the list of N element id that need to be fixed
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
add(arg: (String, (Element, Iterable[Relation]))): (String, Element)
Add the normalized fields given by the relation.
Add the normalized fields given by the relation.
- arg
the id and a tuple of the original element and the relations to add
- returns
a tuple of the id and a new (fixed) element
-
def
asDataSet[T <: Product](implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], spark: SparkSession): Dataset[T]
Get a typed DataSet for the given class.
Get a typed DataSet for the given class.
- T
The type of RDD.
- spark
The Spark session.
- returns
A typed dataset of the RDD.
- Definition Classes
- CIMRDD
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
biggest(name: String): (Int, (Int, RDD[_])) ⇒ Int
Find the largest integer after the name| pattern in a foldLeft operation.
Find the largest integer after the name| pattern in a foldLeft operation.
- name
the name to check
- Definition Classes
- CIMRDD
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
companion[T](name: String)(implicit man: Manifest[T]): T
Retrieve a (static) companion object.
Retrieve a (static) companion object.
- T
the trait of the companion object (what to cast it to)
- name
the class name of the object with the companion to get
- man
the type information for the object
- returns
the companion object as type T
- def do_normalization(): RDD[Element]
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
get[T](implicit arg0: ClassTag[T], spark: SparkSession, log: Logger): RDD[T]
Get the typed RDD.
Get the typed RDD.
Convenience method where the name of the RDD is the same as the contained class type (the usual case).
- T
The type of the RDD, e.g.
RDD[T]
.- spark
The Spark session which persisted the typed RDD.
- log
A logger for error messages.
- returns
The RDD with the given type of objects, e.g.
RDD[ACLineSegment]
.
- Definition Classes
- CIMRDD
-
def
get[T](name: String)(implicit arg0: ClassTag[T], spark: SparkSession, log: Logger): RDD[T]
Get the named RDD.
Get the named RDD.
- T
The type of objects contained in the named RDD.
- name
The name of the RDD, usually the same as the CIM class.
- spark
The Spark session which persisted the named RDD.
- log
A logger for error messages.
- returns
The typed RDD, e.g.
RDD[T]
.
- Definition Classes
- CIMRDD
The RDD of all elements is somewhat special, currently it is named Elements (plural), so this method must be used:
val elements: RDD[Element] = get[Element]("Elements")
Example: -
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
getOrElse[T](implicit arg0: ClassTag[T], spark: SparkSession): RDD[T]
Get the typed RDD or an empty RDD if none was registered.
Get the typed RDD or an empty RDD if none was registered.
Convenience method where the name of the RDD is the same as the contained class type (the usual case).
- T
The type of the RDD, e.g.
RDD[T]
.- spark
The Spark session which persisted the typed RDD.
- returns
The RDD with the given type of objects, e.g.
RDD[ACLineSegment]
, or an empty RDD of the requested type.
- Definition Classes
- CIMRDD
-
def
getOrElse[T](name: String)(implicit arg0: ClassTag[T], spark: SparkSession): RDD[T]
Get the named RDD or else an empty RDD of the requested type.
Get the named RDD or else an empty RDD of the requested type.
- T
The type of objects contained in the named RDD.
- name
The name of the RDD, usually the same as the CIM class.
- spark
The Spark session which persisted the named RDD.
- returns
The typed RDD, e.g.
RDD[T]
, as either the persisted RDD or an empty one if none was found.
- Definition Classes
- CIMRDD
-
def
get_denormalized(arg: (String, Element)): List[Relation]
Identify objects that are broken.
Identify objects that are broken.
- arg
the id and element to check for denormalized relation
- returns
a list of denormalized 1:N relations
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- implicit val level: StorageLevel
-
def
like(name: String): ((Int, RDD[_])) ⇒ Boolean
Match names with pattern "name|xxx".
Match names with pattern "name|xxx".
- name
the name to match
- returns
true
if the RDD name matches the pattern
- Definition Classes
- CIMRDD
- implicit val log: Logger
-
def
nameOf[T](implicit arg0: ClassTag[T]): String
Get the base name of the CIM class.
Get the base name of the CIM class.
- T
The type of the class.
- returns
The base name of the class.
- Definition Classes
- CIMRDD
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
put[T <: Product](rdd: RDD[T], keep: Boolean = false)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], spark: SparkSession, storage: StorageLevel): Unit
Persist the typed RDD using the class name, checkpoint it if that is enabled, and create the SQL view for it.
Persist the typed RDD using the class name, checkpoint it if that is enabled, and create the SQL view for it.
- T
The type of RDD.
- rdd
The RDD to persist
- keep
If
true
then don't unpersist any existing RDD with the given name.- spark
The Spark session.
- storage
The storage level for persistence.
- Definition Classes
- CIMRDD
-
def
put[T <: Product](rdd: RDD[T], name: String, keep: Boolean)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], spark: SparkSession, storage: StorageLevel): Unit
Persist the typed RDD using the given name, checkpoint it if that is enabled, and create the SQL view for it.
Persist the typed RDD using the given name, checkpoint it if that is enabled, and create the SQL view for it.
Since the use of RDD persistence can be problematic for memory reuse, the
keep
parameter has special processing. Whenkeep
istrue
any existing RDD of the same name is renamed toname|n
where n is sequentially increasing to the next available integer. Whenkeep
isfalse
all existing RDD of the above form are unpersisted.This facilitates the use-case where the new RDD depends on the existing one. For example, the
Elements
RDD is rewritten by about, dedup, join, normalize and topological processing, where the new RDD is derived from the original. But when a completely new CIM RDF file is read in,keep=false
will unpersist all the derived RDD.- T
The type of RDD.
- rdd
The RDD to persist
- name
The name under which to persist it.
- keep
If
true
then don't unpersist the existing RDD with the given name.- spark
The Spark session.
- storage
The storage level for persistence.
- returns
The named, viewed and possibly checkpointed original RDD.
- Definition Classes
- CIMRDD
-
def
remove(arg: (String, (Element, Iterable[Relation]))): (String, Element)
Remove the denormalized fields given by the relations.
Remove the denormalized fields given by the relations.
- arg
the id and a tuple of the original element and the relations to remove
- returns
a tuple of the id and a new (fixed) element
- implicit val session: SparkSession
-
def
set(element: Element, child: String, field: String, value: String, setbit: Boolean = true): Element
Create a new Element with the (possibly child) class's field set to value.
Create a new Element with the (possibly child) class's field set to value.
Recursively hunts for
child
class in the hierarchicalelement
and sets itsfield
tovalue
.- element
the element to copy with the changed field
- child
the name of the class with the field
- field
the field to set
- value
the value to set the field to
- setbit
if
true
set the bitfield of the field, else clear it- returns
the new (cloned) element with the field set
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
test[T](implicit arg0: ClassTag[T], spark: SparkSession): Boolean
Check for the typed RDD.
Check for the typed RDD.
Convenience method where the name of the RDD is the same as the contained class type (the usual case).
- T
The type of the RDD, e.g.
RDD[T]
.- spark
The Spark session which persisted the typed RDD.
- returns
true
if the named RDD exists,false
otherwise.
- Definition Classes
- CIMRDD
-
def
test[T](name: String)(implicit arg0: ClassTag[T], spark: SparkSession): Boolean
Check for the named RDD.
Check for the named RDD.
- T
The type of objects contained in the named RDD.
- name
The name of the RDD, usually the same as the CIM class.
- spark
The Spark session which persisted the named RDD.
- returns
true
if the named RDD exists,false
otherwise.
- Definition Classes
- CIMRDD
-
def
toInt(s: String): Option[Int]
- Definition Classes
- CIMRDD
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
- Deprecated
(Since version ) see corresponding Javadoc for more information.